Imitation learning and reinforcement learning vs. hand coding


In his first public lecture of 2019, MIT’s Lex Fridman says that beyond perception, there is very little machine learning in autonomous cars today. But that is starting to change:

  • Tesla is reportedly using imitation learning for autonomous driving.

  • Mobileye is openly using reinforcement learning for autonomous driving.

  • Waymo says it may incorporate components of an imitation learning system into its autonomous driving software.

  • Anthony Levandowski — a famous self-driving car engineer who formerly worked at Waymo, Otto, and Uber ATG, and who competed in the DARPA Grand Challenge — recently announced a new startup called Pronto and said it will use “end-to-end deep learning”. That could mean imitation learning or reinforcement learning. Or both.

An honourable mention to Wayve, a small startup in the UK founded by Cambridge machine learning researchers. Wayve’s CEO says: “Rather than hand-engineering our solution with heavily rule-based systems, we aim to build data-driven machine learning at every layer of our system, which would learn from experience and not simply be given if-else statements.” Wayve’s website mentions reinforcement learning.

I feel pessimistic about the hand-coded, rule-based approach to driving. (That is, the non-perceptual, action-related parts of driving.) There are a few reasons:

  1. It hasn’t seemed to work all that great for much of anything — not computer vision, getting bipedal robots to walk or open doors, board games, video games… Is there an example of it achieving human-level performance on any complex task?

  2. A beaver can build a dam, but it has no idea how to tell you to build a dam. Similarly, many tasks that humans perform easily, effortlessly, mindlessly — we don’t actually know how we do them, and we don’t know how to tell a robot to do them. We might need a better scientific understanding of how humans drive before we can get robots to do it. The introspection of software engineers might not do the trick.

  3. If we are stuck with hand coding robots, I worry that engineers will continue to gradually pluck away at the problem, inching ahead year by year. Only making as much progress this year as they made last year. There seems to be a wide chasm between today’s robot drivers and human drivers. Crossing that chasm inch by inch seems like it would take quite a while. To get across that chasm in a few years, it seems like we need progress to move a lot faster.

In sum, I worry that hand coding will only make slow linear progress, and it may at some point hit a ceiling where engineers just don’t know how to solve the next problems.

By contrast, machine learning has shown us a few examples of fast exponential progress, where it went from subhuman performance to superhuman performance in a few years. ImageNet, AlphaGo, maybe Dota.

If all these companies try various machine learning approaches to driving for a few years, and they don’t get any traction… I will feel pretty pessimistic about self-driving cars. If that happens, I think I might feel that the problem can’t be solved with the current machine learning paradigm, and that hand coding is unlikely to solve it either. So self-driving cars would be indefinitely on hold. Instead of being an engineering problem, self-driving cars would become (in my eyes) a science problem.

That’s a bleak place to be. Scientific progress in AI has happened in fits and starts. Prior to 2012, there was a long period of stagnation.

So, as a fan of self-driving cars — or the idea of self-driving cars — I am watching imitation learning and reinforcement learning because I think one or both of those techniques could be the key to all of it.


Something I’m thinking about with regard to imitation learning. Is it possible to predict human driving behaviour (i.e. steering and pedal output) from the purely perceptual cues found in mid-level representations?

When humans drive, we use theory of mind and high-level reasoning. For example, if I see a Domino’s delivery car stop on a residential street, I know that the driver wants to deliver a pizza, so they’re going to get out of their car and go up to a building.

It’s impossible for neural networks to obtain this kind of knowledge solely through mid-level representations data. They won’t understand the concept of pizza delivery.

The question is whether the perceptual cues are enough. For example, the network might find that when any car stops on a street, humans drivers tend to nudge cautiously around it. If a person gets out of a stopped car, humans drivers might tend to stop until the person is clear. It might be enough just to respond directly to the perceptual cues, and not to a high-level understanding of the situation.

I’m trying to think of situations where a high-level understanding is necessary to drive correctly; where perceptual cues on their own don’t provide enough information to predict what a human would do. Can anyone think of any?


:bulb: Aha! Around 35 minutes in this Mobileye video Amnon Shashua makes the excellent point that an autonomous car can infer what to do based on the behaviour of other road users. If other cars are going around a stopped truck, an autonomous car can infer that it should go around the truck too, rather than waiting behind it.

This is a smart way of getting around autonomous cars’ lack of sophisticated world understanding, and their need to respond to direct perceptual cues.


Humans can use high level situational understanding to drive, but we almost never do. When was the last time you confronted a driving situation where you had to stop and think about the right way to proceed? It happens to beginners, but almost never to veteran drivers. Once you become experienced virtually all of driving becomes reflexive to the point where you don’t even remember doing it afterwards.

It may be the case that machine trained algorithms will struggle with situations where humans can rely on their high level understanding. But those situations are rare and can be dealt with slowly and carefully if needed. The thing we care about most is safety and safety decisions are most acute in split second decision making that happens at high speed. Humans are not using their high level understanding in those situations because it’s not available to a sub-second response time.

This still leaves open the bigger problem of having a vehicle behave in a human-enough fashion that it doesn’t confuse or anger other drivers. I think the jury is still out on that one.


I hope this is true, but I’m trying to think of counterexamples in order to challenge the thesis I’m developing about imitation learning. An example I read about with regard to Waymo was that the road widened and added a dedicated bus lane. The bus ahead of the Waymo van moved into the bus lane. The Waymo van then hesitated for a few seconds until the safety driver took over.

A similar example would be a bus pulling over to pick up passengers. At a glance, a human can absorb this kind of information — the bus is stopping to pick people up, so it’s probably not going to move for a bit. Humans are responding to an understanding of the bus driver’s goals and plans, not simply to a visual cue.

Maybe a neural network could learn to infer when a bus is stopping to pick up passengers based on pedestrians on the sidewalk, or even recognizing bus stops. Or maybe it could develop more general habits for deciding when to go around a stopped vehicle. Just because humans are responding to high-level understandings of job obligations, etc. doesn’t necessarily mean that human driving behaviour can’t be predicted or emulated based on perceptual information alone.

As long as there is a) a distinct perceptual cue to go on, or b) when a general habit for dealing with that kind of situation will work, then there is no worry. But if there are common situations where neither condition is satisfied, then that’s a worry.