I wrote a detailed post on the Tesla Motors Club forum essentially resummarizing the same points I’ve made here and elsewhere. Here it is, reproduced in full:
Algorithms, compute, data: a mental model for thinking about autonomous vehicle competition
Autonomous driving can be split into two kinds of task:
Perception: object detection, depth mapping, semantic segmentation of driveable roadway, etc. In short, computer vision tasks that are done with supervised learning. (When people say “deep learning”, they’re typically referring to deep supervised learning.)
Action: driving policy (i.e. making high-level decisions like whether to overtake a slow car) and path planning (i.e. the exact trajectory and speed of a car).
Supervised learning is universally used for camera-based perception. Historically, action has been handled by traditional, hand-coded software, not neural networks or machine learning. But increasingly it seems like engineers are talking about using machine learning to solve the action part of the problem.
Types of machine learning
As far as I know, there are just two machine learning approaches to action:
One form of imitation learning, called “behavioural cloning”, is just supervised learning applied to action. A neural network learns to map actions taken by humans to the environmental cues that prompt those actions (e.g. maps stopping to stop signs).
Another form of imitation learning is inverse reinforcement learning, which attempts to derive a reward function (i.e. a points system used in reinforcement learning) from human actions.
Reinforcement learning is essentially trial and error over a massive number of iterations, with an agent taking actions that increase reward and avoiding actions that decrease reward.
What follows is strictly my own opinion.
Types of competitive advantage
We can boil down competitive advantage in machine learning into three main things: algorithms, compute, and data. Andrej Karpathy, among other experts, have identified these three things as fundamental inputs to AI progress:
I broadly like to think about four separate factors that hold back AI:
- Compute (the obvious one: Moore’s Law, GPUs, ASICs),
- Data (in a nice form, not just out there somewhere on the internet - e.g. ImageNet),
- Algorithms (research and ideas, e.g. backprop, CNN, LSTM), and
- Infrastructure (software under you - Linux, TCP/IP, Git, ROS, PR2, AWS, AMT, TensorFlow, etc.).”
(In discussing competition between autonomous vehicle companies, I’ll ignore infrastructure since it doesn’t seem relevant — these things seem to be either open source or commoditized.)
If imitation learning turns out to be the solution to action for autonomous vehicles, then I believe algorithms and training data will be what gives companies a competitive advantage.
Similar to supervised learning for perception, the performance of imitation learning will be a function of the neural network architecture and the training dataset.
If reinforcement learning turns out to be the solution, then I believe algorithms and possibly compute will be what gives companies a competitive advantage.
Reinforcement learning (RL) will most likely take place in simulation. It will depend on RL algorithms that can learn to drive in a simulator, and transfer that learning to the real world. Since simulation and training is computationally intensive, computing resources may confer a competitive advantage.
Competitive advantage in imitation learning
In a scenario where imitation learning turns out to be the right solution, I predict Tesla will be in the best competitive position.
As I understand it, the operative metric for training data is the number of unique examples for each semantic class. In image classification, a semantic class would be an image type like images of great white sharks. In imitation learning, a semantic class would seem to be (this is my own guess) pairings of actions and environmental cues. If some environmental cues are incredibly rare — such as the conditions that lead to a deadly accident on average every 100 million miles — then it would take an incredibly large amount of data to collect a significant number of examples for each semantic class.
Drago Anguelov, Waymo’s head of research, recently said that Waymo doesn’t have the ability to collect data on the rare corner cases that form the “long tail” of human driving behaviour:
This suggests the scale of data needed to capture long tail events is not millions of miles, but perhaps billions of miles. For example, to capture 1,000 examples of deadly crashes, you would need 100 billion miles of data.
When Tesla has about 1.1 million HW3 cars on the road, the HW3 fleet will be driving 1 billion miles a month. If Tesla reaches 1.1 million HW3 cars in 2020 and builds 750,000 cars a year from 2020 on, the HW3 fleet will reach 100 billion miles cumulatively by the end of 2023.
As far as I know, no other company yet has the ability to collect the data Tesla can collect at the scale of billions of miles.
Even if other companies have superior algorithms for imitation learning, it doesn’t matter if they have no long tail data to train them on.
Competitive advantage in reinforcement learning
In a scenario where reinforcement learning turns out to be the right solution, I predict Waymo will be in the best competitive position.
DeepMind and Google are under the Alphabet umbrella with Waymo. Taken together, DeepMind and Google seem to have the highest number of world-class RL researchers and engineers of any company. This makes me think that, assuming Waymo can fully tap into this expertise, it is the best positioned to develop and apply RL algorithms for autonomous driving.
Google also has massive compute, which could turn out to be an advantage.
Competitive advantage in a hybrid approach
What if imitation learning is required to bootstrap reinforcement learning? This is an idea suggested by two researchers at Waymo:
…extensive simulations of highly interactive or rare situations may be performed, accompanied by a tuning of the driving policy using reinforcement learning (RL). However, doing RL requires that we accurately model the real-world behavior of other agents in the environment, including other vehicles, pedestrians, and cyclists. For this reason, we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL.
In this case, it depends on how much training data is needed for imitation learning to bootstrap RL. If it’s on the scale of millions of miles, then Waymo is in the best competitive position. If it’s on the scale of billions of miles, then Tesla is in the best competitive position.
Given this framing — imitation learning vs. reinforcement learning, and competitive advantage as algorithms, compute, and data — the favourite to win in either scenario seems straightforward. The fundamentally uncertain part is what the solution to driving policy and path planning is going to be: imitation learning, reinforcement learning, a hybrid, or none of the above.
It’s possible that Tesla will lose its favourite position if another company gains or disclosed the ability to collect the same kind of data at a greater scale. Mobileye, for instance, collects some data at a large scale, but the details aren’t clear. The stated aim is compiling HD maps and not imitation learning, although that doesn’t necessarily preclude the data from being used for both purposes. A forward-thinking car manufacturer could copy Tesla’s approach and equip its cars with the hardware, software, and wifi connectivity to upload data for imitation learning.
To date, it feels to me like there has been little in the way of principled theories of autonomous vehicle competition. The discourse seems to largely revolve around demos, which convey basically no information about a system’s reliability over millions of miles, and the rate of safety-critical disengagements in California, which is a misleading metric.
Some common mistakes in analysis (a few of which I’ve made):
Conflating performance in a demo with a system’s reliability over millions of miles.
Taking California DMV disengagements data at face value (a mistake I made too).
Making apples-to-oranges comparisons between secretive prototype autonomous vehicles and production ADAS features used by thousands of people.
Treating time spent on development as a competitive advantage either a) on the assumption that hand-coded software is the solution, or b) without being specific about how much time has been spent on IL or RL algorithms versus hand-coded stuff that was thrown out.
Talking about data without being specific about what kind of data it is, how exactly it’s useful for machine learning, where the supervisory signal comes from (e.g. human labelling or human driver input), the quality, diversity, and rarity of the data (and why that matters), and the cost of collecting data vs. the cost of labelling it (if applicable).
Related to the above: conflating testing a vehicle and training a neural network.
That’s why it’s important to me that the above mental model doesn’t rely on:
California DMV disengagements data.
Performance of ADAS features.
Brute time (i.e. time, without a deeper theory about why time matters).
Training data bottlenecked by human labelling, such as raw sensor data used to train perception neural networks.
Miles of testing in autonomous mode.
On reinforcement learning:
It seems to me that the essential difference between applying RL to driving versus other activities like Dota is the challenge of representing the driving task in simulation. How do you make sure what the agent learns is driving, and not a video game that closely resembles driving, but not closely enough?
With supervised learning, at least you don’t have that problem, because the data comes from the real world. The problem you do have is getting the neural network to learn all the correct mappings between action and state.
If you capture state-action pairs from long tail scenarios, in theory, the neural network should be able to learn the correct mappings.
If you don’t capture data from long tail scenarios, and you rely on interacting with other virtual agents in simulation, then I don’t see how your agent learns to handle those long tail scenarios. How do you generate them in simulation without observing them in the world? And if you don’t generate them in simulation, how does your agent learn to handle them?
What are the odds that policies learned outside of the long tail, in the chunky middle, will generalize to the long tail?