Mobileye: Deep Reinforcement Learning For Driving Policy

There is a lot of math in this talk, and I didn’t understand any of it, so I skipped over those parts. But I did understand some of the high-level concepts, like semantic abstraction of the driving task into discrete subtasks, and doing self-play in simulation.

I think Mobileye is using imitation learning and human driving data to bootstrap reinforcement learning. That’s interesting because of Waymo’s recent paper suggesting the same.

This talk also helped me to appreciate the perspective that path planning/driving policy systems might always be hybrid. There might always be some hand-coded hard constraints around the neural network, even if the neural network determines 99% of driving policy/path planning decisions.


A few slides (tap to enlarge):

Amnon Shashua goes over many of the same points in this video, with no math (from 33:00 to 44:00):

In the Q&A session, Shashua says the target year for full autonomy is 2021, but that in 2021 full autonomy will still require a human safety driver behind the wheel. Something I had never heard him say before.


I believe the demo at 33:20 in this video shows Mobileye’s reinforcement learning system in action:

It’s an impressive demo. This has flown under my radar, but Mobileye is already testing reinforcement learning-based path planning in the real world.

It seems to me (as a casual, outside observer with limited knowledge of the technology) that path planning is currently the primary bottleneck to progress on autonomous driving. Hand-coded elements of path planning systems require humans to introspect and theorize about the tacit knowledge that enables them to drive a car. Then to attempt to operationalize that tacit knowledge in a programming language. This is inherently a daunting problem, and requires a slow iterate-and-test cycle as software engineers tweak the code, send out safety drivers, wait, collect reports, tweak the code again, and repeat.

Trying to solve path planning using hand coding might be:

  1. intractable

Or, even if it isn’t, it might be:

  1. excruciatingly slow

Tacit knowledge is not something that, historically, humans have had great success in coding into robots. For example, hand coding has mostly failed to get robot hands to manipulate objects like human hands do. Recent progress has been made using machine learning, and it’s still an unsolved problem for many applications, such as packing Amazon boxes in warehouses or general assembly of Model 3s at the Tesla factory.

The tacit knowledge is instantiated in our brains, but that doesn’t mean we can articulate it explicitly in a natural language (like English) or a programming language. By analogy, we have all kinds of tacit knowledge about cell division, immune response, protein folding, and so on in our bodies, but we could explicitly articulate nothing about them for thousands of years.

Having a well-working human body doesn’t automatically impart scientific knowledge about biological systems, and having a well-working human brain doesn’t inherently impart scientific knowledge about cognitive systems. Decades or centuries of scientific work might remain before we can articulate an explicit, step-by-step explanatory theory of how humans drive that is detailed enough that it could be implemented in a robot. Software engineers at Google might not be equipped to solve this problem through introspection, intuition, folk knowledge, and on-the-fly theorizing.

But machine learning provides a hope. We don’t have a complete scientific understanding of human vision, but deep supervised learning has allowed us to solve some vision problems without fully understanding either human vision or computer vision. It therefore makes sense to try machine learning on other problems where human success relies on tacit knowledge and where scientific knowledge is lacking.

Since we don’t fully understand reinforcement learning, and since we also don’t fully understand the intricacies of human path planning, we can’t say in advance whether reinforcement learning will be capable of human-level path plannning. We can only try it and hope that eventually it’s as successful as supervised learning has been for vision.

Mobileye’s work is encouraging because it is an early, limited proof of concept that reinforcement learning can do path planning well, in simulation and in the real world. With approaches to path planning that rely on large hand-coded elements, I worry that self-driving cars might never come to fruition, or might take decades to become competent drivers. With machine learned path planning systems, there is the hope of rapid progress and sudden breakthroughs that enable the technology to be commercialized at scale in the near future.

I think it’s pretty remarkable that Mobileye is using reinforcement learning so openly, with such confidence, and with such good results so far (at least the ones they’ve shared).

Mobileye believes that the driving environment is too complex for hand-crafted rule-based decision making. Instead we adopt the use of machine learning to “learn” the decision making process through exposure to data.

Mobileye’s approach to this challenge is to employ what is called reinforcement learning algorithms trained through deep networks. This requires training the vehicle system through increasingly complex simulations by rewarding good behavior and punishing bad behavior.