This is by far the most insight I’ve ever seen anyone give into what Waymo is doing:
Normally these talks are bland because the speakers avoid giving any depth or detail to what they’re doing with their technology. This talk is not like that!
An overarching theme of the talk is this:
Drago Anguelov (the lead of Waymo’s research team) talks about Waymo’s experiments with supervised imitation learning/behavioural cloning — a technique that applies supervised learning to state-action pairs from human driving. He also talks about Waymo’s trajectory optimization agent, which uses inverse reinforcement learning, another form of imitation learning.
Overall, Drago seems to believe imitation learning will be necessary in order to solve autonomy:
He says (at 49:05):
Learning from demonstration is key. You can encode some simple models by hand but ultimately the task of modelling agent behaviour is complex and it’s much better learned.
To restate what Drago wrote in the slide above (as I understand it), using imitation learning to train a neural network that emulates human driving behaviour is important because:
You can use it to predict what other vehicles on the road will do.
You can use it to bridge the “reality gap” between vehicles in simulation and vehicles in reality.
Your self-driving car can copy human driving behaviours, and thereby learn new driving tasks that can’t be hand-coded with the same efficacy.
Drago doesn’t explicitly talk about this, but (2) could be used to enable reinforcement learning in simulation.
Waymo’s blog post on ChauffeurNet does mention it:
…doing RL requires that we accurately model the real-world behavior of other agents in the environment, including other vehicles, pedestrians, and cyclists. For this reason, we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL.