Tesla Autopilot big rewrite to 4D perception and annotation | George Hotz and Lex Fridman

I think the plan even before the rewrite started was for Tesla going End to End with Dojo. But “4D” FSD is going to be the data mining engine. The more FSD understands the world, the more specific and more automatic the upload triggers can be to feed the end to end training dataset.

By definition nearly every FSD disengagement will be an interesting edge case. If tesla reviewed every AP disengagement today they would be buried in useless data.

I predict the next fundamental “rewrite” will be both End to End and an AutoML approach that uses a working hand tuned FSD network as the starting point. If you have a neural network that works, it’s a bit of a semantic distinction on whether it’s an end to end approach or not. You may not have built it end to end and you may have manually hand crafted the architecture instead of using AutoML to learn the network. But in a vacuum without any authorship credit, a neural network is just a neural network is a neural network. And theoretically an AutoML solution training end to end could result by random chance/training with the exact same network as Tesla’s “hand crafted” FSD AP.

Also Tesla has been very carefully selecting clips to train different aspects of FSD. In its entirety by definition, there has to be an end to end solution which can achieve parity with Tesla’s AP using the same clips. How do you know your dataset has enough stop sign samples in enough different environments for end to end learning? Well if you are Tesla you can say with 100% confidence “we included the exact same frames which trained 4DFSD’s stop sign detector so there are enough good samples to achieve at least 99% accuracy, if it can figure out how to use them. And if you need help, here are some pretrained chunks that are related to various driving tasks to use as starting blocks to build from.”