Waymo’s imitation learning network ChauffeurNet: test results

As discussed here, Waymo used supervised imitation learning to train a neural network called ChauffeurNet. Here are the results of Waymo’s simulated tests (from the paper). It sounds like ChauffeurNet drove correctly 100% of the time when tested on a few types of situation, because they didn’t give numbers and just say:

Here, we present results from experiments using the various models in the closed-loop simulation setup. We first evaluated all the models on simple situations such as stopping for stop-signs and red traffic lights, and lane following along straight and curved roads by creating 20 scenarios for each situation, and found that all the models worked well in these simple cases. Therefore, we will focus below on specific complex situations that highlight the differences between these models.

On the three more complex types of situation, ChauffeurNet still drove correctly 85%+ of the time in its final iteration:

The 10% crash rate for nudging around a parked car is better than it first appears:

Note that in this scenario, we generate several variations by changing the starting speed of the agent relative to the parked car. This creates situations of increasing difficulty, where the agent approaches the parked car at very high relative speed and thus does not have enough time to nudge around the car given the dynamic constraints. A 10% collision rate for M4 is thus not a measure of the absolute performance of the model since we do not have a perfect driver which could have performed well at all the scenarios here.

This is also true for slowing down for a slow car:

For the variation with the largest relative speed, there isn’t enough time for most models to stop the agent in time, thus leading to a collision.

It would be interesting to put humans in driving simulators (similar to the flight simulators that pilots train on) and see how humans perform in the same situations. How close is ChauffeurNet to human performance in these situations? I wish every paper on autonomous vehicles included a human benchmark section.

Waymo ends the paper by saying this:

…the model is not yet fully competitive with motion planning approaches but we feel that this is a good step forward for machine learned driving models. There is room for improvement: comparing to end-to-end approaches, and investigating alternatives to imitation dropout are among them. But most importantly, we believe that augmenting the expert demonstrations with a thorough exploration of rare and difficult scenarios in simulation, perhaps within a reinforcement learning framework, will be the key to improving the performance of these models especially for highly interactive scenarios.”

1 Like

I like the comment on testing humans in similar scenarios. That would be really interesting.

I haven’t had time to read this paper yet - maybe this weekend. It’s near the top of the stack.


Agreed, that would a really interesting data point to add, benchmark vs humans

Mayank Bansal, one of the ChauffeurNet co-authors, gave a talk on ChauffeurNet at Google I/O 2019 yesterday. Here’s the video:

Helpful slides that explain the difference between a traditional self-driving car system, a mid-to-mid system, and an end-to-end system.