Tesla Autonomy Day: watch the full event

The event streamed live on Monday, April 22.

Demo video released after the event:

Wow, what a presentation. Curious to hear your guys’ thoughts. A few of mine:

  • A couple topics that have been discussed here and raised by @strangecosmos and others were brought up, which was great to hear. I even recognized some of the images that they showed from some of the papers on using vision for depth from @im.thatoneguy. A key topic in Andrej’s talk was their ability to automate labels. He even brought up self-supervision! Also recognized the discussion on simulation and edge case scenarios
  • The DOJO (sp?) computer must be what Amir’s article discussed about the end-to-end learning. Will be interesting to hear more about that later…
  • Shadow mode used extensively to test new features safely before deployment
  • The chip discussion got super technical, but the results and the power output were impressive. I was messaging with @strangecosmos just a couple days ago on the topic of power output as it was seen as a key issue in Nvidia chips today among AI industry ppl at CES earlier this year
  • The self-driving video seemed impressive. Note the car on the side of the road at 0:40 that didn’t trip it up. Wonder how demonstrative it is though and to what extent it was planned. I’m going to look at the research reports tonight/tomorrow to see what were institutional investor reactions to their demos. A few reactions out already.
  • I came away pretty blown away from it all and a bit more convinced that Tesla could solve autonomy, at least until Elon started talking at the end and throwing out next-year time frames. It made me nervous that Tesla wasn’t just thinking that autonomy would be here next year, but that they were betting the company on it. As in, they’re emphasizing lower battery packs, taking cars back at end of their lease, etc. I’m sure it’s more flexible than this, and that they can still potentially offer customers the option to purchase the vehicle at end of lease. But it still made me uncomfortable to hear him talk about that stuff. Can’t they just wait a year until FSD is more fully proven?
  • Did I hear correctly that a passenger getting into it will have to sit in the driver seat and be ready to take over, initially?
2 Likes

The SR+ was a tangent. Yes its good for autonomous taxi fleets but Tesla is extremely cell constrained and they missed production goals as a result. They are in a position where they can bend steel faster than dry batteries so of course they want to make 50% more vehicles with current battery capacity.

The car by the side of the road actually concerned me since it was treated as a parked car and didn’t show up on the display. They need to eventually expect a door to open and confirm they see it.

I have a giant shadow mode question that bugged me all day and I had a wild theory.

They said that fleet data helps find optimal values. We know it’s not running in the bg right? So they aren’t presumably running multiple passes on cached data while parked. They aren’t uploading video or we would see massively more data uploaded right? We aren’t even seeing high level extractions. So they aren’t running it on their data center. It would be silly to rotate hyper parameters on one car. So maybe they are doing AB testing at fleet scale. That would be extremely silicon Valley. What if every car has a slightly randomized seed on the weights in their build and they are “learning” as a cluster.

That would explain why so little data is uploaded. All they would upload is their accuracy value and then aggregated across the fleet you can find the gradients.

Shadow mode would be THE code base. There would be no exposed debug variables. And it would be easy to miss a single value being stored and updated.

In aggregate they could identify the… Ahem… Gradient Descent… based on a Database of VIN/hyperparameters and score values returned by each vehicle. Then in the next round push a new set of hyperparameters.

EDIT: Green says every version of the firmware he sees is identical so nope.

1 Like

Amir Efrati didn’t actually use the term “end-to-end” in his two Tesla autonomy articles at The Information. I just double-checked and the term doesn’t appear anywhere in those articles. I may have contributed to that impression because I initially confused behavioural cloning/supervised imitation learning and end-to-end learning.

On Twitter, Amir did use the term “end to end” to describe Waymo’s neural network ChauffeurNet, which uses behavioural cloning. But I think this may be incorrect because Drago Anguelov, Waymo’s head of research, describes ChauffeurNet as a “mid-to-mid” model.

End-to-end: the neural network is trained to predict steering, brake, and acceleration directly from pixels.

Mid-to-mid: a perception neural network outputs a representation of the environment around the car. A planning/policy neural network takes that representation and produces actions for the car to take. It then transmits those actions to the control software, which executes them.

Tesla’s use of imitation learning — confirmed by Andrej Karpathy today — is an example of a mid-to-mid system (I believe). Similarly, Mobileye’s use of imitation learning-initialized reinforcement learning is an example of a mid-to-mid system; Mobileye’s CEO Amnon Shashua has publicly argued that end-to-end learning isn’t practical. Yet Mobileye is using a neural network to drive the car.

So, then… what is the Dojo computer? Here is what Elon said:

The car is an inference-optimized computer. We do have a major program at Tesla — which we don’t have enough time to talk about today — called Dojo. That’s a super powerful training computer. The goal of Dojo will be to be able to take in vast amounts of data — at a video level — and do unsupervised massive training of vast amounts of video with the Dojo computer. But that’s for another day

By the way, it might be an acronym — DOJO — but the reason the computer is called “Dojo” must be because that’s where the neural network trains. Haha. Clever.

Edit: See @thenonconsensus’s reply below.

Now, Elon didn’t give us enough information to ascertain much about Dojo. Is it 1) a backend, data centre computer like Google’s TPUs? Or is it 2) a computer that is intended to go inside cars?

Also, if it’s intended for training on video data, does that mean it will be used for A) unsupervised learning for computer vision or B) end-to-end learning?

1A is the most conservative interpretation. 2B is the most radical interpretation.

If Tesla is going with an NN training computer inside the car for end-to-end learning, my interpretation would be that this is an experimental, long-term, next-gen approach. FSD Gen 1 is mid-to-mid, then FSD Gen 2 is end-to-end reinforcement learning using a fleet of millions of robotaxis (bootstrapped with FSD Gen 1). An NN training computer would allow for decentralized training and eliminate the need to upload video to Tesla. I don’t know exactly how much data it would take to transmit the results of the training — the adjustments to the neural network weights — but I believe it’s a small amount, less than a short video clip I would guess.

This is a futuristic idea that is consistent with Tesla’s philosophy of constant innovation. Tesla said today they’re already working on Hardware 4/FSD Computer 2. Working on an end-to-end learning system to replace Tesla’s mid-to-mid learning system, with fleet training starting a few years from now — that intuitively feels like something Tesla would do.

But it might just be a replacement for the GPUs Tesla uses on the backend to train neural networks, optimized for video, and perhaps yielding the same sort of cost savings/price-performance improvement as the inference computer they’re currently shipping.

1 Like

Some exciting highlights from Andrej Karpathy’s talk and the Q&A afterward:

  • Confirmation that Tesla is using imitation learning for path planning (e.g. on cloverleafs) and for lane changes. Philosophically, Andrej seems on board with imitation learning and he seems skeptical that driving behaviours can be successfully coded by hand. Sounds like currently Tesla may be using a mix of imitation learning and hand coding.

  • We learned that Tesla is using radar-supervised learning and self-supervised learning for camera-based depth perception. We also saw a cool demo of the 3D information Tesla can extract from video using multi-camera stereo vision.

  • Karpathy talked about automatic labelling of examples where a car ahead of you cuts into your lane. This sounds to me like unsupervised or self-supervised learning of prediction. Prediction as in predicting what other road users are going to do.

  • We learned that Tesla uses deep learning to trigger sensor data uploads. For example, Tesla will train a neural network to look for instances of bikes mounted on the backs of cars, then whenever the neural network running in a car thinks it sees an example of that, the car will save a snapshot and upload it to Tesla when it connects to wifi. This is so obvious in hindsight, but it never occurred to me that Tesla might be doing this. I liked Elon’s description: “it’s a massive compression of real world data.”

  • Karpathy briefly, offhandedly mentioned using Tesla’s driving simulator to do some training. I would love to hear more about this. In particular, was it training for computer vision using synthetic video data? Or was it training for path planning/driving policy, maybe using reinforcement learning?

I think the only way I could have been more thrilled is if Karpathy had mentioned T-REX and bootstrapping into reinforcement learning. :stuck_out_tongue:

2 Likes

Now, Elon didn’t give us enough information to ascertain much about Dojo. Is it 1) a backend, data centre computer like Google’s TPUs? Or is it 2) a computer that is intended to go inside cars?

Also, if it’s intended for training on video data, does that mean it will be used for A) unsupervised learning for computer vision or B) end-to-end learning?

He also said this during the Q&A:

Over time, I would expect that it moves really to just training against video, video in, car and steering and pedals out… that’s what we’re gonna use the dojo system for.

Link to the quote.

It really seemed like he meant 2B, in my mind, based on the context and where they are currently. But that’s just my guess.

The SR+ was a tangent. Yes its good for autonomous taxi fleets but Tesla is extremely cell constrained and they missed production goals as a result. They are in a position where they can bend steel faster than dry batteries so of course they want to make 50% more vehicles with current battery capacity.

I would disagree with this; depending on what you view as your end goal, it’s not necessarily better for them to make more SR+ vehicles with lower battery capacity than to make fewer LR vehicles. The margin difference between the vehicles is large enough to where Tesla emphasized making those vehicles first during their production ramp (to maximize cash flow). Some estimates have placed SR+ gross margin at just ~5-6% (vs. the high teens to low 20s gross margin they were reporting when they were solely making performance and LR variants)

But if your goal is to produce as many autonomous vehicles as possible, and not to maximize cash flow, then I suppose you want to go that route… that’s what makes me nervous though, as the former route (emphasizing gross and operating profit over number of vehicles on the road) is the safer route financially and assumes that full autonomy isn’t just around the corner

1 Like

Wow. I haven’t actually finished watching the whole event yet (brain overload). So I didn’t hear that quote.

So, from this quote, that definitely makes me think the Dojo training computer is for end-to-end learning with training occurring in the car. I would assume it’s end-to-end reinforcement learning, since it sounds like Tesla is planning to launch robotaxis before Dojo is ready.

From the rest of Elon’s answer, it sounds like the path planning/driving policy block is still largely hand-coded, and Tesla is incorporating more and more imitation learning into that block over time.

Another surprising thing from the event: Elon said Tesla had HD maps, but then decided it was a bad idea and canned the whole operation.

The question that always puzzled me with HD maps was: where is the source of redundancy? If HD maps are the equivalent of a visual memory, then they don’t make sense. You want to drive based on real time vision, not visual memory. If real time vision and visual memory disagree, you should always go with real time vision because things change. So, there is never a situation where an HD map disagrees with real time vision and that changes the vision system’s judgment or driving system’s action. There is no redundancy.

An argument that convinced me for a while is you can use HD maps in an emergency where you have total sensor failure. HD maps can allow you to pull over. But given the hardware redundancy in a self-driving car, this would be a truly rare scenario. Plus, now that I think more about it, this wouldn’t work. The HD map wouldn’t tell you whether there are road users — pedestrians, cyclists, or vehicles — in your path, so you can safely pull over with just HD maps. The best thing to do would be to simply turn on the hazards and come to a quick stop.

1 Like

These guys had a chance to talk to Andrej Karpathy and Stuart Bowers after the event:

The most buck wild thing they said was that the full self-driving software demoed at the event was developed in 3 months (?!?!). At 7:45:

And alledgedly that was only 3 months of development and testing. … 3 months of neural network training on, like, city driving.

At 14:10, Matt Joyce (CEO of Toom Dips) said that he talked to Karpathy and got the impression that Karpathy’s belief is that to make computer vision work well enough for self-driving cars, you need fleet learning on Tesla’s scale.

At 24:30, Matt talks about seeing the Tesla Network app.

My 3000-word essay on Autonomy Day:

Would appreciate constructive feedback.

1 Like

Just finished scanning analysts’ thoughts about the event and their demos.

  • Overall, most were impressed with their demo and the presentation
  • Numerous people saw disengagements (one person saw two), with different reasons for each (one was not recognizing cones in a parking lot, another was where the car got confused at an intersection and didn’t seem to know whether to make a right hand turn or not)
  • Most also noted that the ride was a bit rough. Took turns tightly and had some aggressive maneuvers that were not comfortable
  • Handled some complex situations well. Cars were summoned from the parking lot (no driver) and then went from the roads to highway. Car was hesitant while passing some cars on the side of the road, but overall got by it
  • Outside of the demo, many were skeptical of a LIDAR-only approach, noting that almost all AV experts believe LIDAR is necessary for sensor redundancy (one bank interviewed 20+ people and noted that over 90% thought LIDAR necessary). One cited experts that have said that it is especially important in the edge case scenarios.
  • Many also cited Nvidia’s correction of Tesla’s comparison (I think only one noted that the Pegasus consumed significantly more power and was more expensive)
  • Almost all believe timeline is too aggressive and that regulatory hurdles at a minimum will get in the way

A few interesting additional sources for research.

One is that Nvidia had an analyst day that went over their new tools that they’ll be offering to auto companies (worth listening to).

Another is that Aptiv at CES gave some demos and several people compared Tesla’s demo to that (in line to slightly better was generally the thought).

1 Like

Karpathy (2:24:00):

If I was to summarize my entire talk in one slide, it would be this. … We see a lot of things coming from the fleet. And we see them at some rate — like, a really good rate compared to all of our competitors. So, the rate of progress at which you can actually address these problems, iterate on the software, and really feed the neural networks with the right data — that rate of progress is really just proportional to how often you encounter these situations in the wild. And we encounter them significantly more frequently than anyone else, which is why we’re going to do extremely well.

If the city driving software Tesla demoed really is the product of just 3 months of neural network training and software development, then the rate of progress does seem very fast.

1 Like

New, better distillation of what we know about Tesla’s collection and labelling of training data post-Autonomy Day:

1 Like

Positive comments from folks at DeepMind and OpenAI:

1 Like

I published a Waymo vs. Tesla article recently that covers five arguments I’ve heard over and over:

If you’re reading this, you’ve most likely heard these arguments too:

  • “Self-driving cars need lidar.”

  • “Waymo is years ahead of Tesla.”

  • “Google and DeepMind are the world leaders in machine learning, so Waymo is the leader in self-driving cars.”

  • “Waymo has the lowest rate of disengagements by safety drivers.”

  • “Waymo is already operating a self-driving taxi service.”

Not too much new on self-driving from Tesla’s shareholder meeting. Elon comments on it a bit at 1:37:35.

Facebook AI researcher and co-creator of PyTorch:

1 Like