In 2020, Tesla will likely deliver its one millionth HW2/HW3 vehicle. Here’s some math:
1 million Tesla vehicles * 15,300 minutes of driving per year per Tesla = 29,100 years of driving per year
I may be mistaken, but I think DeepMind trained 300 versions of the AlphaStar agent for 200 years each, making it a combined total of 60,000 years. From the blog:
During training, each agent experienced up to 200 years of real-time StarCraft play.
I couldn’t find a number for OpenAI Five. The OpenAI blog says it trains for 180 years per day, but not how many days it trains for. Based on clues, it might have been 18 months, which would make it 97,200 years.
As I understand it, there are three ways to approach the problem of sim2real transfer learning:
- more realistic simulation
- don’t use simulation, use the real world
- develop sim2real algorithms
Lately I’ve been speculating that Tesla will use imitation learning to make a more realistic simulation. If you use imitation learning to make “smart agents” — virtual cars with human-like behaviour — you might be able to do deep RL in sim.
Today I realized that Tesla has another option, maybe. Oriol Vinyals, one of the creators of AlphaStar, said:
Driving a car is harder [than StarCraft or Dota]. The lack of (perfect) simulators doesn’t allow training for as much time as would be needed for Deep RL to really shine.
But a fleet of 1 million+ cars would allow for training time in the real world at the same sort of scale that AlphaStar and OpenAI Five did in simulation.
The feature complete version of Full Self-Driving that Elon wants to finish this year could be a scaffolding for real world RL at massive, simulation-level scale. Even if the scaffolding uses a hybrid of imitation learning and hand coding, and has a high disengagement rate, that could be enough to get RL off the ground.
Safety would be a big concern. Mobileye has its RSS framework, a set of hard rules it wraps around RL like a protective shell. I saw Facebook recently published something on safe RL for robots exploring in the real world, but I didn’t really understand it. You would want to prevent cars from exploring the action space in dumb, dangerous ways.