The lecture talks about the DAgger problem, which seems like it should be a big problem for self-driving cars or any task with an astronomically large, continuous action space and a long time horizon. But DeepMind used supervised learning/behavioural cloning for StarCraft, and it was able to overcome the DAgger problem just by collecting a massive and varied dataset. Alex Irpan says:
If you have a very large dataset, from a wide variety of experts of varying skill levels (like, say, a corpus of StarCraft games from anyone who’s ever played the game), then it’s possible that your data already has enough variation to let your agent learn how to recover from several of the incorrect decisions it could make.
Generative adversarial imitation learning (GAIL) has been proposed as a way to make imitation learning work for self-driving cars over longer time horizons than behavioural cloning. So, those are two potential ways of solving the DAgger problem.
The Stanford professor mentions a third. When the vehicle gets into trouble, if it can get an expert to take control and show it the correct action, that’s a way to train it. The problem is the scale of human labour required. This makes me think of Tesla owners taking over from the vehicle’s software (Autopilot/Full Self-Driving). Tesla has engineered a clever way to get 500,000 human “labellers”.