Kyle Vogt (Cruise) Keynote - MIT AI Conference 2019

Key points:

  • We need machine learning for planning. It’s too difficult to write if-then statements for all driving behaviours.

  • Vertical integration with a car manufacturer (like GM) is necessary. Aftermarket kits won’t work.

  • “The reason we want lots of data and lots of driving is to try to maximize the entropy and diversity of the datasets we have.”

  • Automatic labelling or auto-labelling based on various forms of signal from humans (e.g. whether safety drivers intervene or not; observing the behaviour of other vehicles on the road and treating it as correct).

  • Training AVs in simulation is a promising area of R&D.


Why scale of training data matters, according to Kyle Vogt (13:45):

The reason we want lots of data and lots of driving is to try to maximize the entropy and diversity of the datasets we have.

As I understand it, entropy is essentially the surprisingness or unpredictability of a data point. Or, to put it another way, the informativeness of a data point; the amount of novel information contained in the data point.

Kyle Vogt also says some interesting stuff on automatic labelling or auto-labelling (22:27):

…basically, what I mean is you take the human labelling step out of the loop. … There’s a lot of things you can infer from the way a vehicle drives. If it didn’t make any mistakes, then you can sort of implicitly assume a lot of things were correct about the way that vehicle drove. … When the AVs are basically driving correctly and the people in the car are saying ‘you did a good job’, that, to me, is a very rich source of information.

Kyle Vogt’s statements about dataset entropy/diversity and automatic labelling seem applicable to Tesla.

For video clips that are labelled by humans, the benefit of Tesla’s fleet driving ~700 million miles a month is the entropy, diversity, and rarity of the training examples that can be automatically flagged by various signals. Those signals include deep learning-based queries (e.g. look for bikes mounted on vehicles), novelty detection, uncertainty estimation, human interventions, and disagreements between human driving and the Autopilot planner. In other words, using a combination of human signals and machine signals to trigger uploads, a higher quantity of data leads to a higher quality of dataset.

With automatic labelling, Tesla can leverage a vast amount of data for 1) weakly supervised learning for computer vision (this paper gives an example of one way this might work), 2) self-supervised (or unsupervised) learning for prediction, and 3) imitation learning (and possibly reinforcement learning) for planning.

There may also be some potential for self-supervised learning for computer vision, but I don’t yet really understand how that would work.

I interpret Kyle Vogt as agreeing, in principle, with the idea that more real world driving data is better and that human labour requirements don’t negate the usefulness of more data.

Some folks have argued that Tesla’s ~100-1000x quantity of real world miles relative to competitors is useless because more data is only valuable if you pay people to label it and it’s just too expensive for Tesla to label much more data than anyone else. Kyle Vogt seems to disagree, in principle, with folks who say that.