Edit (April 5, 2019): When I wrote this post, I was confused about the term “behaviour cloning” or “behavioural cloning”. I mistakenly thought the term was synonymous with end-to-end learning. While some examples of behavioural cloning are also examples of end-to-end learning, behavioural cloning doesn’t have to be end-to-end learning.
Behavioural cloning falls under the umbrella of imitation learning, a family of machine learning techniques wherein a neural network attempts to learn from a human demonstrator. Behavioural cloning is distinct from other forms of imitation learning in that it “treats IL as a supervised learning problem”. That is, it learns to map actions to states the same way a neural network competing in the ImageNet challenge learns to map labels to images. (The definition I quoted is from this paper.)
What follows is the original post from December 1, 2018.
When The Information recently published two articles on Tesla and autonomy, the strangest thing to come out of that reporting was this bit under the subheading “Behavior Cloning”:
Tesla’s cars collect so much camera and other sensor data as they drive around, even when Autopilot isn’t turned on, that the Autopilot team can examine what traditional human driving looks like in various driving scenarios and mimic it, said the person familiar with the system. It uses this information as an additional factor to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object. Such an approach has its limits, of course: behavior cloning, as the method is sometimes called… But Tesla’s engineers believe that by putting enough data from good human driving through a neural network, that network can learn how to directly predict the correct steering, braking and acceleration in most situations. “You don’t need anything else” to teach the system how to drive autonomously, said a person who has been involved with the team. They envision a future in which humans won’t need to write code to tell the car what to do when it encounters a particular scenario; it will know what to do on its own.
As I understand it, when software engineers who work on self-driving cars use the term “behaviour cloning”, this means the same thing as “end to end learning”, i.e. the entire system is just one big neural network that takes sensor data as its input and outputs steering, acceleration, and braking.
What’s not made clear in the article is the difference between end to end learning and neural networks in general. If you use neural networks, but not end to end learning, that’s still a situation where humans don’t need to write code for specific scenarios.
Amnon Shashua has a really good talk on end to end learning vs. the “semantic abstraction” approach to using neural networks:
As Amnon says, if Tesla were using end to end learning, it would not need to label images. The only “labelling” that occurs is the human driver’s actions: the steering angle, accelerator pushes, and brake pedal pushes. The sensor data is the input, and the one big neural network tries to learn how to map that sensor data onto the human driver’s actions. Since we know Tesla is labelling images, we know Tesla can’t be using end to end learning. Since “end to end learning” and “behaviour cloning” are synonymous, we know Tesla can’t be using behaviour cloning.
So, what did Amir Efrati at The Information hear from his sources that led him to report that Tesla is using “behaviour cloning”? Amir writes that how humans drive is used “to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object.” What this makes me think is that perhaps Tesla is working on a neural network for path planning (or motion planning) and/or control. Perhaps a path planning neural network and/or control neural network is being trained not with sensor data as input, but with the metadata outputted by the perception neural networks. The Tesla drivers’ behaviour — steering, acceleration, brake — “labels” the metadata in the same way that, in end to end learning, the human driver’s behaviour “labels” the sensor data.
This approach would solve the combinatorial explosion problem of end to end learning (described by Amnon in the video above) by decomposing perception and action. Perception tasks and action tasks would be handled independently by separate neural networks that are trained independently.
Using human drivers’ actions as the supervisory signal/training signal for path planning and/or control actually makes sense to me (whereas end to end learning does not). What are the alternatives?
a) Use a hand-coded algorithm. While this may be effective, we have lots of examples where fluid neural networks outperform brittle hand-crafted rules.
b) Use simulation. Until recently, I didn’t appreciate how much trouble we have simulating the everyday physics of the real world. From OpenAI:
Learning methods for robotic manipulation face a dilemma. Simulated robots can easily provide enough data to train complex policies, but most manipulation problems can’t be modeled accurately enough for those policies to transfer to real robots. Even modeling what happens when two objects touch — the most basic problem in manipulation — is an active area of research with no widely accepted solution. Training directly on physical robots allows the policy to learn from real-world physics, but today’s algorithms would require years of experience to solve a problem like object reorientation.
While simulation may be a part of the development and training process for path planning and/or control, it probably can’t be the whole process.
If Tesla were to train a neural network using the behaviour of Tesla drivers — and use human review to remove examples of bad driving — then it would avoid hand-coded algorithms’ brittleness and simulations’ lack of verisimilitude. I think (but I’m not sure) it would then be possible to use reinforcement learning or supervised learning to improve on this. Tesla could put the path planning and/or control neural network into cars running Enhanced Autopilot and more advanced future features, and then use disengagements, aborts, crashes, and bug reports to identify failures. These failures would then become part of the training signal.
If my conjecture is correct, I can see how this would be an extremely fast way to solve path planning and/or control. I can also see how it’s an approach that Tesla is uniquely suited to pursue, given a fleet of HW2 cars that is driving something like 400 million miles a month (300,000 cars x 1,380 miles per month). Based on (admittedly scant) anecdotal evidence, each HW2 car might be uploading an average of 30 MB+ per day.
I can’t help but think of Elon’s comments:
…I think no one is likely to achieve a generalized solution to self-driving before Tesla. I could be surprised, but… You know, I think we’ll get to full self-driving next year. As a generalized solution, I think. … Like we’re on track to do that next year. So I don’t know. I don’t think anyone else is on track to do it next year. … I would say, unless they’re keeping it incredibly secret, which is unlikely, I don’t think any of the car companies are likely to be a serious competitor.