Wouldn’t a much better candidate for this be Baidu’s Apollo? Or Mobileye?
How many vehicles are running OpenPilot? In 2017, George mentioned on Danny in the Valley that there were “hundreds” of OpenPilot users.
Apollo’s software is open source and Apollo has lots of big partners like Toyota, Hyundai, and Ford. Mobileye’s software is closed source but it also has big partners like BMW and it ships millions of units of its EyeQ systems. Where is the room for a small startup like comma ai to be the Android of autonomy?
This sounds like wishful thinking to me. I’d like to see the distribution. To be fair, Hotz said it was just a theory. Unfortunately it has the kind of sound-bite character and intuitive-ishness that means a lot of people will believe it without examining it.
I don’t recall the source, but I believe Hotz has said recently that comma ai has one or two thousand regular drivers.
It does seem like Mobileye - or even Waymo for that matter - can gather enough training data to do what comma is doing now but comma is pursuing the Tesla strategy and the others are not. If comma is successful in pushing a proper consumer product after they get to 1.0 then they have the opportunity to scale their installed base to tens or hundreds of thousands very quickly - something that can’t be done by someone who isn’t building cars or by anyone who does build cars but uses the standard auto industry model-and-option business structure. Tesla eats the cost of the hardware on all the cars that don’t buy AP, and probably barely breaks even on the cars that do buy AP. They haven’t been able to recognize revenue for FSD yet. Putting the hardware on all the cars they ship has been a big and very expensive gamble that I believe mainstream vehicle makers are unlikely to do unless and until they see it as an existential risk.
I think comma is a really interesting concept that has some unique strengths. I think the jury is very much out on whether they will succeed. I’d give them maybe 10% chances right now because I don’t see how they are going to be able to do a consumer product without cooperation from vehicle makers. I don’t believe it’s impossible, but I don’t see a way to do it given the limited time I’ve spent on the problem.
I can’t recall if I pointed this out before, but the value in a training database doesn’t grow very quickly with the size of randomly collected data - something like the log of the size of the training data. Value grows much more quickly relative to fraction of your data which is curated to contain an optimal training distribution - i.e. seeded with disproportionately large fraction of high value/low probability samples. Tesla doesn’t use it’s fleet as a dumb collection system, they use the fleet as a distributed content curation and collection system. They curate the data at the collection point and they use the power of those hundreds of thousands of GPUs (soon to be NPUs) to perform the curation. The system only provides the data that Tesla needs. To pull off this trick you need not only the capability to record in the field and send that data back to the mothership, but also the ability to perform the curation functions in the field. This aspect of what they do doesn’t seem to be appreciated by anyone that I’ve talked to or anything that I’ve read on the topic.
Does Mobileye have that, or are they just collecting at random? If they can do the curation in the field then it seems like Mobileye might be positioned to create a high value training database fairly quickly.
I would believe it at the very least for lane keeping. The chances you’ll stop paying attention and drift out of your lane should follow a somewhat random distribution.
Just from a statistical standpoint 99.99999999% of the time you aren’t crashing or causing an accident. I suspect that holds true as well to other mistakes. Mistakes are usually random but the median behavior should be generally safe.
“What do you do when approaching a red light?”
99 driver actions: Stop.
1 driver actions: Drive at constant speed.
2 driver’s actions: Drive at constant speed and suddenly stop past the cross walk.
I just saw on comma.ai that they have “4,500+” users and “10,000,000+” miles. If those are all active daily users, that’s a lot more than I thought. (Edit: At 1:33:05 in the interview, George Hotz says comma ai has 700 daily active users and 1000 weekly active users.)
That’s my worry too. Comma ai’s approach doesn’t seem scalable to hundreds of thousands of cars without partnering with the car manufacturers. Cruise was originally pursuing comma ai’s current approach, but Kyle Vogt says he sold Cruise to GM because he saw the necessity of vertical integration. Vogt says there is a crazy amount of work that goes into testing that the different physical components of a vehicle work together safely. He says a startup doesn’t have the resources to do the kind of intense testing that car manufacturers do.
As I understand, comma ai is currently operating its business within a legal loophole. In 2016, comma ai received a letter from NHTSA telling them to stop selling their partial autonomy product. Now the hardware is marketed as a “dev kit”. It comes with the disclaimer:
EON Devkit does not come with any software capable of controlling your car. You can install open source software separately.
It’s a sort of wink-wink-nudge-nudge business model that seems like it’s just waiting for regulators to descend on if it gets too big or if a crash happens or someone sues. I think regulators have the discretion to clamp down on this sort of thing, even if it exploits a loophole in the laws as written and gets away on a technicality. As I understand, regulators don’t just enforce the letter of the law but make new judgment calls all the time. So if they see this as a conscious and deliberate attempt to circumvent safety regulations — which it seems like — then they may squash it if it ever causes trouble or gets their attention. I’m not too knowledgeable about law or regulation, but that’s my current understanding of how it works.
I also don’t yet understand how comma ai will make money long-term if the software is truly free and open source. Maybe they will go closed source after version 1.0? (Which will also be a consumer product that NHTSA will have to be okay with…) EON Devkit seems like commodity hardware anyone could sell. The point of Android is to get smartphone users to use Google apps and web services like search, Gmail, Google Maps, and so on and to generate ad revenue for Google. What’s the equivalent for openpilot? (Edit: At 1:17:25, George says his long-term business plan is for comma ai to become a car insurance company.)
Baidu’s business plan with Apollo is apparently to open source the software but sell the hardware. Similar I guess to Google with Chrome OS and Chromebooks. (Sort of.) But Baidu 1) has the resources to design custom hardware, similar to Tesla, 2) has partnerships with major auto companies to integrate its tech into their vehicles, and 3) has the support of the Chinese government. Apollo/Baidu is acting as a Tier 1 supplier, similar to Mobileye.
I believe the free and open source nature of the software may be to get buy-in from the automotive partners. It would be attractive to companies like GM, Toyota, Ford, Volkswagen, et al. to know they can always buy or manufacture non-Apollo hardware and still run (or fork) the software if Baidu goes super villain and tries to indenture the auto OEMs.
I’m trying to get the word out now. I wrote about it a bit here.
Verygreen insisted Tesla wasn’t using any sophisticated triggers for data collection, but then on Autonomy Day Karpathy came out and said Tesla is using deep learning-based triggers. I’m inclined to believe Karpathy.
Watching the Autonomy Day presentation, I was kicking myself a bit because the idea of deep learning-based triggers is something I might have thought of if I had allowed myself to use my imagination a bit more. I don’t think just because you hack your Tesla you necessarily understand everything that’s going on under the hood. (Especially if Tesla knows you hacked your Tesla and is actively trying to hide things from you!) If you restrict your mind to things hackers have found, you will close yourself off to important possibilities.
I wish I knew more about what exactly Mobileye is collecting or is able to collect. Right now, the only thing it’s super obvious they do is run computer vision neural networks on the EyeQ4 chip in the car and upload abstracted representations for visual HD maps that are “less than 10KB/km on average”.
By the way, I honestly don’t understand Mobileye’s approach with visual HD maps. I’m not saying it’s wrong; I just don’t get it. How does old camera data, i.e. visual HD maps, provide redundancy to new camera data, i.e. real time sensing?
There is other stuff obliquely mentioned in some Mobileye patents and papers. Their definition of HD maps is more expansive than what you’d usually think. I think they may upload data on the path cars with EyeQ4 take and include those paths in the HD map. Not just the physical features of the environment like lane lines.
Mobileye is super into deep RL trained in simulation for the behaviour generation part of the self-driving car problem. But they mention in papers that the initial policy is derived from human driving data, so it sounds like they’re bootstrapping with imitation learning.
Mobileye could theoretically collect more data than Tesla given how many units of the EyeQ chips it ships. But I’m not sure it has the technical capability or the right partnerships with auto manufacturers to actually do that.
Speaking of, Mobileye’s 10 KB/kilometre (16 KB/mile since 1 mile = 1.6 km) figure is useful for thinking about Tesla and behaviour prediction or imitation learning. HD maps don’t include road users (i.e. vehicles, cyclists, pedestrians). But even if an abstracted representation of the scene including road users plus driver input (steering, braking, accelerating, signalling) is 2 MB/mile (125x higher than Mobileye’s figure for visual HD maps), then if the average Tesla driver goes 40 miles per day, you only need to upload 80 MB per day or 2.4 GB per month to get ALL of it. (Not that this would necessarily be even useful!)
Anecdotally (example 1, example 2, example 3), we’ve seen Tesla owners upload data in this ballpark. Of course, we don’t know what’s being uploaded. Sensor data like video clips would be much chunkier uploads.
Based on my calculations for uploading raw video, Tesla’s Azure bill wouldn’t be out of control, either. A bottleneck I’m not sure about is the cost of training compute. I remember reading somewhere recently about how much it cost DeepMind to train AlphaStar or maybe how much it would have cost them to train it at Google Cloud pricing, but I can’t find the source now. I think it was said to be in the $10 to $50 million range.
Am I correct in thinking that (starting around 51:30) George Hotz is making an argument for end-to-end imitation learning when he rails against the idea of using a hand-designed or human understandable abstracted representation between perception and planning? He says:
The problem is that I don’t think you can hand code a feature vector. Like you have some list of: here’s my list of cars in the scene, here’s my list of pedestrians in the scene. This isn’t what humans are doing.
It sounds like he doesn’t believe the output of the perception system (the state vector or feature vector) should be anything other than whatever a neural network passes along to the next layer.
I loved Lex’s analogy that generating an abstracted representation is laying out the driving scene like a chessboard.
At 58:55 George says, “I want to build the AlphaGo of driving”. AlphaGo used end-to-end imitation learning and reinforcement learning.
Lex: “So, AlphaGo is really end-to-end. … Is that also kind of what you’re getting at with the perception and the planning?”
Lex: “That this whole problem — that the right way to do is really to learn the whole thing.”
George: “I’ll argue that not only is it the right way, it’s the only way that’s gonna exceed human performance. It’s certainly true for Go.”
Philosophically, I find George’s argument appealing. AlphaStar and OpenAI Five did imitation learning and reinforcement learning with supernatural vision: direct infallible knowledge of the universe. DeepMind and OpenAI hand-waved away the computer vision part of the problem by giving agents direct access to the game state. What Tesla is attempting to do with its bounding boxes and drivable roadway is to feed the vehicle agent a version of the game state. And then do imitation learning, heuristics, and maybe some reinforcement learning.
But DeepMind’s FTW got superhuman at Quake III going off of raw pixels from the monitor’s video feed. Maybe this is possible with cars.
This is why I originally thought Project Dojo was about end-to-end learning. I still think based on a comment Elon made at Autonomy Day that he envisions end-to-end learning as the eventual endpoint for Tesla, even if not in the first version of Tesla’s Level 5 software.
George Hotz seems like he’s really into the idea of doing training in simulation. That implies reinforcement learning. (Edit: Around 1:31:00, he says he wants to do reinforcement learning in the real world.) Two problems:
Where do you get the reward? Do you… hand code it? That seems to go against his philosophy expressed in the above discussion. Personally, I suspect you need some kind of reward learning from human demonstrations like D-REX and/or human feedback like driver interventions in a Level 2 situation.
Where do you get the training signal? That is, how do you ensure the reward meted out by the simulation is similar to the reward meted out in real life and therefore conductive to learning a policy that works well in the real world? If you use replay data from situations your real world cars have been in, do you a) keep all the other road users’ behaviour exactly the same, unrealistically ignoring the ego car like a ghost or b) animate the other road users’ with imitation learned models, in which case you have a bit of a chicken and egg problem?
P.S. Two other random things. I liked George’s point that we need 100% (or I would say ~100%) reliable driver monitoring once we have Level 2 or 3 software that needs a human to take over once per 1000 miles. I hope the selfie cam inside the Model 3 can be used for driver monitoring. Has anyone hacked the selfie cam to take video? Surely at least it can detect head pose…?
George said (something like) “I don’t really believe in mapping anymore”. I wish he had elaborated on this! It might just be for the same reason he doesn’t believe in human understandable abstracted representations — that’s what an HD map is.
I was pretty confused that Lex didn’t get what George was saying about the superiority of a vector rather than an API. That seems so trivially defensible and intuitive to me that Lex must have misunderstood the statement.
Vectors are superior in bandwidth, flexibility, and mutability. An API has the advantage of inspectability and compatibility to heuristic techniques. The former set of advantages easily trounces the latter in the long run though the latter are useful getting something working quickly and cheaply. APIs are used a lot now but will go away as the field matures.
It’s not the same, but it does have a bit of a relation. To train end-to-end you have to be able to propagate your error from the driving controls all the way back to the perception system. That usually means that you’ll be passing vectors between all the intervening subsystems in order to be able to efficiently back propagate. You can back propagate through an API, and there are systems that do this, but it constrains the system in ways that will degrade performance and inhibit training so it’s generally only used in certain uncommon situations.
It occurs to me that this statement is arguably incorrect because the first version of AlphaGo was superhuman but it combined monte-carlo tree search with a neural network. AlphaGo Zero and Alpha Zero also perform tree search in a fashion that is not amenable to implementation in a neural network. It is possible to run the NNs that are embedded in Alpha Zero without the use of tree search and the result might still be super human, but I haven’t seen that data. The search-less version of AlphaGo was not superhuman.
I guess if you read the statement as “you cannot become superhuman without some component that is end-to-end trained” then it still stands up, but I disagree with it as being over-general since AlphaGo became superhuman without going truly end-to-end in it’s first incarnation.
For the FSD problem I do agree that integrating perception and planning (with a vector and not an API) is the swiftest and surest path to success but I do not believe that control needs to be done with an NN and I predict that the first L4 systems will probably not perform control with an NN. That part is too easy to do with a heuristic - it’ll be the last thing to be rolled into the end-to-end solution.
RL has been done with proxy rewards that seem general. For instance, seeking novelty seems to be a general reward signal that leads to good performance on a range of problems but which doesn’t require the developer to explicitly state the objective of a problem.
Personally I’ve become a bit allergic to this conviction as it lacks support in the real world and seems to be mainly promoted by people think that FSD accidents are worse than human-driven accidents. Lex himself did the most relevant study on this WRT to Tesla AP in particular and his group did not find that driver attention declined when supervising an L2 system. I think the intuition that such will inevitably happen, inevitably lead to accidents, and that the frequency of those accidents will be comparable to or greater than the accidents that are avoided through the use of the L2 system is wholly unsupported and largely refuted by the data that Tesla has already released. Certainly we have already seen horrific accidents where the driver was grossly negligent in supervising AP but we don’t have a good way of seeing all the avoided accidents except through statistics. Though I would argue that reports of drivers falling asleep or passing out drunk while using AP should remind us of common events that frequently result in death but which did not here in part thanks to the use of AP.
[New Yorker cartoon]
“My car was curious what the ocean is like.”
Lex’s paper found that there was a disengagement once every 10 miles on average, right? The question is whether this finding will hold when the car goes 7,000 miles with no need to intervene. If your car drives itself for 6 months with no problem, do you still stay as vigilant as when it needed your intervention once a day?
(Elon’s mental model is that Tesla drivers will just ride the exponential curve. By the time 6 months have passed, the car will go 1 million miles between failures. So it will be a non-issue.)
Discourse is magic! Forum software you can actually use on your phone.