Tesla AI and behaviour cloning: what’s really happening?

I think labeling is still needed due to the underlying problem being solved. For example, training a NN behavior at a freeway off ramp would be difficult since some drivers take the ramp and some don’t.

Labeling could also speed training since we know the car shouldn’t cross lane lines unless changing lanes or avoiding obstacles. Unlabeled training data would have instances of drivers changing lanes and thus need meta data to justify that action and gate the training. Even if all drivers used the turn signal, the NN would still need to link that one input to the behavior on its own without a training nudge.

1 Like

Apparently there are people out there trying to make the case for end-to-end in self driving. Uber did a presentation at NeurIPS yesterday arguing that decomposing the NN into stages to enable more conventional engineering methods invariably reduces performance and extends development time. I expect that we’ll see systems become more end-to-end’ish over time as various obstacles to it decline.

Still, it seems to be too early right now to seriously try end-to-end for a real product, especially something as high stakes as driving a car. So for the time being labeling data is going to continue to be an important part of engineering these systems.

1 Like

Are you at NeurIPS? Anywhere I can find out more about the talk?

Ah no - somebody I know tweeted a couple of slides from an Uber presentation.

1 Like

Oh, cool. Do you mind sharing the tweets of those slides? I am very curious…

Follow up here from Waymo engineers:

1 Like

I found this article today and appears to be somewhat related but if not please move but either way a very good read from the Waymo view.

1 Like

Definitely related: this is the blog post that summarizes the paper Amir tweeted. Thank you! Thank you also @thenonconsensus for sharing Amir’s tweet.

Amir quotes his own previous tweet that Tesla is using “behaviour cloning” (or perhaps imitation learning) for path planning specifically. This helps clear up some confusion.

Amir’s tweet is still a bit confusing because end to end learning (as demonstrated in Nvidia’s BB8 prototype) means pixels to actuators. Tesla is not doing that.

Woah. Aha moment! This tidbit from the Waymo blog post:

In order to drive by imitating an expert, we created a deep recurrent neural network (RNN) named ChauffeurNet that is trained to emit a driving trajectory by observing a mid-level representation of the scene as an input. A mid-level representation does not directly use raw sensor data, thereby factoring out the perception task, and allows us to combine real and simulated data for easier transfer learning.

By using metadata a.k.a. mid-level representations, you can train action tasks (like path planning) with simulation, without worrying about tainting your neural networks that do perception tasks with synthetic sensor data.

1 Like

Lots of good stuff!

1 Like

High-level question I’m asking myself about simulation: why can’t we do AlphaGo for path planning?

A partial answer from the blog post (my emphasis):

This work demonstrates one way of using synthetic data. Beyond our approach, extensive simulations of highly interactive or rare situations may be performed, accompanied by a tuning of the driving policy using reinforcement learning (RL). However, doing RL requires that we accurately model the real-world behavior of other agents in the environment, including other vehicles, pedestrians, and cyclists. For this reason, we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL.

This reminds me a paper that Oliver Cameron (CEO of Voyage) tweeted about:

In theory, Tesla could also leverage production fleet data for this purpose… :thinking:

Useful tweet thread from Oliver Cameron explaining Waymo’s paper:

(open tweet to see the rest of the thread)

My tweets, doing some back-of-the-envelope math:

An important difference between Waymo and Tesla. ChauffeurNet was trained on less than 100,000 miles of human driving (60 days * 24 hours * 65 mph = 93,600 miles). HW2 Teslas drive something like 250 million miles per month (30 miles per day * 30 days * 300,000 vehicles = 270 million).

We don’t know how many (if any!) of those ~250 million miles/month are logged and uploaded to Tesla. Anecdotal evidence suggests 30 MB+ per HW2 car per day is uploaded. If the metadata (i.e. mid-level perception network output representations) is 1 MB per mile, it could be ~100%.

Based on data from Tesla, there is a crash or crash-like event every 2.06 million miles — if we assume Autopilot is 10% of miles. That’s 121 events per 250 million miles.

There’s no reason Tesla can’t use simulation also, but there are plenty of real world perturbations to use.

Suppose Tesla can collect 10 billion miles of path planning metadata from HW2 drivers. That’s 100,000x more than ChauffeurNet.

Actually, since a more realistic estimate for ChauffeurNet is 50,000 miles (assuming an average speed of 35 mph instead of 65 mph), it’s 200,000x.

Caveat: Tesla has to solve perception before the metadata will be fully reliable.

ChauffeurNet uses supervised learning. I wonder if reinforcement learning could be used at some point.

Waymo proposes this idea in their blog post:

we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL.

Suppose Tesla uses a ChauffeurNet-like approach to simulating how Tesla drivers drive — without filtering out or training against all the bad stuff that human drivers actually do. The idea here is to get a realistic simulation of how humans drive, good and bad. Tesla populates its simulator with Tesla drivers. The ego car (i.e. the car Tesla wants to train to be superhuman) then drives around this simulated world filled with synthetic Tesla drivers. It uses reinforcement learning to minimize its rate of crashes and near-crashes.

This is an AlphaGo-ish approach. First, use supervised learning to copy how humans behave. Second, use reinforcement learning and self-play (i.e. simulation) to improve on that.

In the case of Tesla’s driving AI, an intermediate step (before reinforcement learning) would be to do what Waymo did with ChauffeurNet and use supervised learning to train against all the labelled examples of crashes, near-crashes, or other undesirable perturbations.

Let me propose, then, a possible Tesla Master Plan to master path planning:

  1. Solve perception.

  2. Collect 10 billion miles of path planning data from HW2 cars to learn how human drivers do path planning. (It’s possible this data could also be collected about surrounding vehicles, not just the Teslas themselves.)

  3. Use supervised learning to, like Waymo did with ChauffeurNet, train against examples of bad driving scenarios.

  4. Populate a simulated world with naturally behaving synthetic human drivers. Use reinforcement learning to improve path planning over many billions or even trillions of miles of simulated driving.

  5. Surpass human performance.

1 Like

Interesting tweet about the Waymo paper from a deep learning engineer:

I think step 5 needs to be GOTO 1.

One of the strongest uses I see for something like ChauffeurNet isn’t necessarily driving it’s seeing when ChauffeurNet fails. Inevitably the Net will fail and you can start to bin failures into categories and some of those categories are solvable through further training but some of those will require a return to the fundamentals (perception). If the expert driver is reacting to some detail in the real world that doesn’t exist in the mid level data set. For instance if drivers are reacting to blinkers you need to Solve Perception in regard to adding blinker metadata for every vehicle. If a driver sometimes departs the roadway to go around a stopped vehicle but sometimes doesn’t you have a good data set of “departing roadway” to start adding metadata for road surface type “dirty, gravel, requires human intervention (uneven terrain with rocks)”.

And of course there will need to be ‘divine’ intervention where commandments are handed down from on high like “Thou shalt not back down the shoulder to take an exit you missed, no matter how much time it saves you.”

1 Like

Woah. This feels like a very deep insight: we don’t know a priori what self-driving cars need to perceive.

If this sounds counterintuitive to anyone, think about this: we don’t know how humans drive. We just do it. What we think we know about how humans drive — beyond the explicit knowledge we learn from driver’s ed — is mostly a posthoc reconstruction of our implicit knowledge. For all we know, we might be wrong in many parts of that reconstruction.

Or consider that, in general, neural networks are good at doing things that we have no idea how to tell them to do. We assume — or I assume — that we know how to tell a robotic system to drive. But why? Maybe we don’t know how to tell a robot to drive anymore than we know how to tell a robot to walk, or to see. Maybe driving involves an array of subtasks that are cognitively impenetrable and opaque to introspection.

im.thatoneguy, I don’t know who you are or what your background is, but it seems like you have really good instincts because you proposed months ago that Tesla could just upload mid-level representations instead of sensor data. When I said above:

I think it was your post on TMC that had planted the seed in my mind. It’s pretty cool that your hunch has turned into a Waymo research paper and some reporting that suggests Tesla might actually be trying this approach.

What you said about using path planning failures to notice perception failures jives with what Karpathy said in this talk about Tesla’s “data engine”:

Perhaps the development process is a loop. Get far enough with perception to deploy a path planning feature (e.g. Navigate on Autopilot), then notice failures with that feature and identify them as either failures in perception or path planning, and then go back and work on perception some more or work on path planning some more. At the same time, keep working on new perception features (e.g. stop sign recognition) to enable new path planning features (e.g. automatic stopping for stop signs). Repeat the loop with those features.

I think the way I have been thinking about autonomous car development may be wrong because I have been thinking that we know what we need to solve. We know what all the parts of the problem are, we can solve those parts independently, and when we put all the parts together, that will be a complete solution. But this overlooks the fact that we have no idea why features will fail. The behaviour of the overall system is emergent from complex interactions within the system and with the environments, and it’s often unexpected.

Neural networks are black boxes, and even hand-coded software which is in theory transparent and deterministic often fails in ways we don’t expect.

If you try to build something without testing it in wild and varied conditions as quickly as possible, you run the risk that your posthoc reconstruction of what needs to be solved will diverge more and more over time with what actually needs to be solved.

My mental model has largely been “feed neural networks lots and lots of data and eventually they might solve the problem”. But this implies you already know a priori the problem that needs to be solved. And that knowledge of what needs to be solved comes from a posthoc reconstruction which is fallible. You need to test your whole system in the wild as early as possible to narrow the gap between your posthoc reconstruction and real driving.

To use an analogy, it won’t do to move closer and closer to hitting a target. You also have to keep checking whether that’s the right target to hit. You can’t just keep making progress on solving a problem. You have to make sure that’s the right problem to solve.

This is a made-up example just to illustrate the point. I can’t think of a real example, and I think the point I’m making is that real examples are hard to think of because they’re gaps between our explicit knowledge via posthoc reconstruction and how humans really drive using implicit knowledge.

Say that figuring out speed limits was a really hard problem for self-driving car engineers. And say that engineers thought this was a vital problem to solve because human drivers follow speed limits.

But say that, in reality, it turned out that human drivers completely ignore speed limits and just follow the natural flow of traffic, which emerges organically. (There might be a grain of truth in this; it’s inspired by a theory I read but only half-remember and can’t find now. I think some people argue it’s safer to increase speed limits because driving is safest when the traffic flows at an organic speed.)

You wouldn’t notice that until you deployed your self-driving car and found that it was getting into trouble because it was going a different speed than all the other vehicles (either driving too fast or too slow). You would be operating on a false theory about how driving is done, and you might put a lot of work into developing a solution to the speed limit problem before finally deploying and realizing that you solved the wrong problem. Not only is the solution you built unnecessary, it’s also insufficient.

To get a self-driving car working in the real world, you need to solve it feature by feature, and test the smallest possible features (atomic features?) as quickly as possible in the real world with the whole system running. If you don’t, you might solve problems that don’t need to be solved (like detecting speed limits, in the made-up example), and you might not solve problems that need to be solved (like how to follow the flow of traffic).

This is a whole new way of thinking for me that I’m not used to. I will have to think about this more and revisit some of my old assumptions.

It’s a super exciting conceptual revelation. What’s particularly interesting to me here on a meta level is that you can derive an engineering approach from epistemology, i.e. thinking carefully about what you know and how you know it, about how human knowledge is created (especially with regard to complex systems), what humans can and can’t know in different contexts (e.g. you can’t predict the discovery of a failure mode without making that discovery), and the difference between human competence and human comprehension (implicit knowledge and explicit knowledge).

Epistemology, either explicit or implicit (or a combination of both), is arguably behind the success of science and engineering as approaches and cultures of solving problems. I’m always excited when really abstract, dreamy concepts unexpectedly collide with nitty gritty technical concepts. It’s a reminder that thinking dreamy thoughts isn’t a waste of time and actually impacts the physical world in big ways.

What I saw was just smartphone snapshots of two slides touting the ‘pros’ and ‘cons’ of end-to-end. The slides were clearly promoting the idea that end-to-end had some advantages. I didn’t save copies and they seem to have fallen off of my twitter stream now.

If your comment about AlphaGo is ‘why won’t the same method work’, then the main issue there is probably that AlphaGo has the advantage of a perfect model of the environment - this means there’s no noise in their feedback signal. Additionally AlphaGo’s model is extremely compact so they don’t need to be particularly sample efficient. Developing path planning from RL will need an approach that is more noise resistant and more sample efficient than AlphaGo needed to have.

Working from the latent space of a perception system that is trained on labeled data (what Waymo seems to be doing) helps with the sample efficiency issue but doesn’t resolve the noise issue.

This is not to say that RL cannot be made to work for path planning. I think it’s likely that all of these issues will be overcome in time. But you probably can’t do it naively today. Tomorrow? Who knows.

1 Like

That’s very helpful, thanks. What do you think is the source of feedback signal noise in our environment models for autonomous driving? Is this a perception problem, or something else?

Based on what I’ve heard and read recently, the actual physics of the world isn’t hard to model for a car context, it’s modelling the behaviour of agents in the environment.

Pieter Abbeel suggests using inverse reinforcement learning to derive a reward function from observation of human driving. This is an interesting idea to me because a company like Tesla (or whoever else in the future might have a large enough production fleet with the right hardware) could, in theory:

  • Upload 10 billion miles of mid-level representations data
  • Use inverse reinforcement learning to derive a reward function
  • Use reinforcement learning in simulation to search for a policy that optimizes for that reward function

The derived reward function would — presumably — include a tacit model of how agents in the world behave, and how to interact with them.

I wonder how far you could get with a randomization approach. That is, populate a simulator with randomized variants of the ego car. Or run multiple simulations in parallel each with a different, randomized variant populating the roads.

If you can’t accurately simulate the physics relevant to your problem, you can just train a neural network on many random variants of the physics (like OpenAI did with their robot hand). Maybe if you can’t accurately simulate the behaviour of other agents in the environment, you can just train a neural network on many random variants of behaviour.

In a self-driving car context, this is analogous to — though different than — self-play. As the ego car gets better at driving, the randomized other cars will generally, on average get better at driving too. Hopefully this means the end product isn’t a car that is impractically cautious for the real world. The need for caution will decrease as the surrounding drivers get better. And if you need bad drivers for your simulator, you have plenty of older versions of the ego car you can use.

A potential problem I can see is that the simulated cars might develop unhumanlike behaviours even if the end result is good driving by human standards. So the ego car might not know how to predict the behaviour of real cars in the real world. The design space/possibility space of driver behaviour might be too large for randomization to be an effective workaround.

1 Like

I say ‘noise’ just to differentiate it from signal. Strictly speaking it is information which correlates poorly with your objective function. When you are simulating a Go game there’s almost nothing uncorrelated in the simulation, but the driving problem involves absorbing very high dimensional and highly abstracted data which is largely uncorrelated with the objective of driving. The car doesn’t care what kind of trees are on the side of the road, whether the fence is plastic or wood, if the clouds are cirrus or cumulus. The car also doesn’t care about the vast majority of the actions of other road users - it only cares about the ones that plausibly affect its future options. This is all on top of the fact that the other information is stochastic. To an RL agent all of that stuff hides the stuff it really needs to pay attention to. Training an RL agent on ‘noisy’ data presents kinds of problems that AlphaGo didn’t need to consider and those problems will need addressing.

I’m personally sanguine about the potential for RL to make contributions to the driving problem. I think that eventually even ‘end to end’ can probably be made to work - with enough computation, some new techniques, and a lot of refinement. It’s surprising how far you can get with simple approaches. It’s almost as if the universe was designed with this kind of problem solving in mind.

I’m also a fan of physics simulations making contributions. Humans are actually really bad at physics compared to even simple computers. Having accurate physical simulation integrated into driving agents is going to be a big advantage.

1 Like