Speculation on Tesla RL: simulation-level scale scale in the real world?


#1

In 2020, Tesla will likely deliver its one millionth HW2/HW3 vehicle. Here’s some math:

1 million Tesla vehicles * 15,300 minutes of driving per year per Tesla = 29,100 years of driving per year

I may be mistaken, but I think DeepMind trained 300 versions of the AlphaStar agent for 200 years each, making it a combined total of 60,000 years. From the blog:

During training, each agent experienced up to 200 years of real-time StarCraft play.

I couldn’t find a number for OpenAI Five. The OpenAI blog says it trains for 180 years per day, but not how many days it trains for. Based on clues, it might have been 18 months, which would make it 97,200 years.

As I understand it, there are three ways to approach the problem of sim2real transfer learning:

  • more realistic simulation
  • don’t use simulation, use the real world
  • develop sim2real algorithms

Lately I’ve been speculating that Tesla will use imitation learning to make a more realistic simulation. If you use imitation learning to make “smart agents” — virtual cars with human-like behaviour — you might be able to do deep RL in sim.

Today I realized that Tesla has another option, maybe. Oriol Vinyals, one of the creators of AlphaStar, said:

Driving a car is harder [than StarCraft or Dota]. The lack of (perfect) simulators doesn’t allow training for as much time as would be needed for Deep RL to really shine.

But a fleet of 1 million+ cars would allow for training time in the real world at the same sort of scale that AlphaStar and OpenAI Five did in simulation.

The feature complete version of Full Self-Driving that Elon wants to finish this year could be a scaffolding for real world RL at massive, simulation-level scale. Even if the scaffolding uses a hybrid of imitation learning and hand coding, and has a high disengagement rate, that could be enough to get RL off the ground.

Safety would be a big concern. Mobileye has its RSS framework, a set of hard rules it wraps around RL like a protective shell. I saw Facebook recently published something on safe RL for robots exploring in the real world, but I didn’t really understand it. You would want to prevent cars from exploring the action space in dumb, dangerous ways.


#2

Could it really start (and eventually, get) to a point that is safe enough? Seems a bit dangerous and could do a lot of harm - not just to the individual, but also to Tesla’s reputation.


#3

Good question. I would like to hear an expert weigh in on this. There are a lot of pieces I don’t really understand.

I think as long as you have random exploration of the action space in the real world, there is an inherent risk that the random action the car takes will be less safe than what it would have done otherwise. Mobileye would claim that as long as a system stays within the hard constraints of RSS, it can never cause a crash, but who knows if that’s true.