Scale AI: Is Elon Wrong About LiDAR?

Interesting post that is worth a read. My personal thoughts are below.

It would be extremely dangerous to let a predictive system like a neural net (infamously cryptic to debug and prone to confusion) take control of a self-driving car directly (the end-to-end learning approach).

Ultimately this means leaning even more heavily on neural nets — with their unpredictable and extreme failure cases — for safety critical systems.

It’s somewhat surprising to me that a company with “AI” in its name seems to take a philosophical anti-neural network stance (when it comes to safety-critical robotics applications). :thinking: But these are popular arguments.

My own gut feeling is that the best hope for self-driving cars is something closer to end-to-end learning than what Waymo currently seems to be deploying in its Waymo One minivans. If behaviour generation (a.k.a. planning a.k.a. path planning and driving policy) relies on engineers manually tweaking hand-written code after watching simulations and real world tests, that seems like a recipe for failure. Today, that would seem like a quaint approach if you insisted it was the best way to play Go, Dota, StarCraft, or Quake. As far as I know, there is no track record of success with this approach in complex robotics problems or virtual agent problems that are comparable to driving. But I could be wrong!

With perception, there are all kinds of entities that either have near-zero depth or are made of light that a self-driving car has to see with super high accuracy: the colour of traffic lights, turn signals and brake lights, the paint markings on roads like lane lines and crosswalks, and the surfaces of road signs. Lidar can’t see these things, so we’re going to need to solve seeing them with cameras and neural networks. Whether we use lidar or not.

To solve computer vision for these entities, maybe we’ll need to employ a new approach like self-supervised learning or end-to-end learning. I would say we shouldn’t rule out new approaches until we’re confident these entities can be solved with conventional supervised learning and human labelling of images/videos. And if we’re confident these entities can be solved that way, why not other entities like vehicles, pedestrians, and cyclists? This isn’t a rhetorical question; I’m really asking if there is a good reason to think cameras + human labels + neural nets can solve lidar invisible entities but not lidar visible entities. Maybe there is. I’m not an expert.

Additionally, almost all self-driving stacks visualize the world in top-down perspective for planning purposes, so misjudging the width of a car (as we saw in the first example) can lead the planning system to incorrectly predict what maneuvers other cars on the road have the space to perform or even to propose a path that would lead to the AV side-swiping the other vehicle.

Failing to detect vehicles carries obvious risks. But the post also makes a claim about the importance of getting the exact size and location of vehicles right.

At parking speeds and distances, the human margin of error is something on the order of 10 centimetres. At highway speeds and distances, it’s probably many metres. In any case, you want to keep a certain amount of distance between yourself and other vehicles. So if an AV does the same, the computer vision error has to exceed whatever margin of error it uses.

This is robustness vs. exactness. You want a perception system that is robust to rare objects, diverse lighting and weather conditions, and other variations in visual conditions. But you only need so much exactness. It is better to have a system that detects vehicles 99.999% of the time with 20 cm of accuracy than a system that detects them 99.995% of the time with 1 cm of accuracy.

The more convincing argument about lidar, in my opinion, is that it helps with robustness rather than the argument that it helps with exactness. Timothy B. Lee recently pointed out on Twitter that Tesla could have gotten the benefits of large-scale fleet learning without the liability of promising “Full Self-Driving” by simply saying the new vehicle hardware is for advanced driver assistance. A company like Tesla or maybe even General Motors could use fleet learning to collect training data and run large-scale testing while equipping a separate small fleet of robotaxis with lidar. I personally think this the best argument for lidar, since it combines the strengths of lidar and the fleet learning approach. (I don’t know if it would be at all feasible for Tesla to retrofit its Hardware 2/Hardware 3 vehicles with lidar at some hypothetical future eventuality where lack of lidar is the only thing holding back the fleet from achieving superhuman full autonomy.)

A distinct but related argument from the quote above is that exactness in computer vision is important for behaviour prediction. I don’t know enough about this to comment. I think it’s an interesting argument that deserves further thought.

I often see statements like this and wonder if they are actually true if you are compute limited. I have no idea, but it doesn’t seem obvious to me that more is always better.

Maybe, don’t build it that way. This stacks approach that they like seems like it could easily waste a lot of effort on things that don’t matter to the ultimate actions of the vehicle. People often complain about spinning nearby cars on the screen when stopped at a light. Your car is stopped with essentially no possible actions available. What the other cars are doing is irrelevant. How much effort (programmer and/or neural network capacity) should be devoted to something that has no impact on any decision the car could make? I don’t know. [Actual solution: just pull up a little closer to the person in front of you so that the nearby cars stop being shown. :slight_smile:]

It would sometimes be nice if Elon’s statements were less confident and more educational. Imagine if instead of “LiDAR is a fool’s errand,” he had said: “Cameras are available in quantities/prices that are reasonable for a mass produced car today (and three years ago). We are 100% certain that cameras/vision are required for self driving (traffic lights, for example). Given that cameras/vision are required (and available), we are going to see how far they can take us. The team thinks it is likely that they will get us to acceptable levels of capability/safety. There are no guarantees, however.” (or whatever it is he actually thinks)

(I have no experience in any of this, but find it interesting.)

1 Like

Nothing that is, can’t be. We already know that vision (and hearing) is enough to produce human levels of driving ability, and that a well trained, well rested, and attentive human driver will outperform the average human by a large margin.

The only question, then, is whether we have the technology and knowledge to emulate what nature has already accomplished. Those who insist that LiDAR is needed are voicing a pessimistic view. Elon is convinced that it’s not the sensors that are lacking, but the ‘brain’ that interprets the visual images to control the vehicle.

That ‘brain’ is Autopilot’s FSD hardware and software that’s under development. The jury is out on whether it will do the job. It will most certainly move us closer to the objective, and, I believe, he’s pursuing the goal along the correct path.

If Elon’s optimism is misplaced, and those who are convinced that LiDAR is needed are correct, then the future of AV will take a far different course than what Elon envisions.

The case the article makes for LiDAR as a sensor has merit. The problem is that engineering, at its core, is economics, a game of tradeoffs and compromise. Any viable AV solution must take a holistic approach to it implementation, cost, deployment, operation, maintenance, and evolution. Where LiDAR solves one aspect of the total AV solution, it comes at a significant cost.

LiDAR is a crutch just as the headlights on a car is a crutch to overcome a human driver’s inability to see in the dark. We might have required that humans driving at night wear night vision goggles. That’s kind of how I see LiDAR.

Humans can’t produce lidar levels of 3d precision. News at 11.

Who needs to know exactly how long a vehicle is at 100 meters ahead? All that matters is how far ahead their rear bumper is.

Spoilers neither lidar nor vision can tell the length of a vehicle directly ahead. It’s fine though in that it doesn’t matter that’s not a safety or even comfort factor.

As vehicles get closer, the precision needed increases but so does the ability of stereo to discern depth. At long range you mostly just need very broad understandings.

1 Like

That’s not necessarily true. Lidar can see road markings because they have a much higher albedo which can be measured. LIDAR should really be thought of as a 2 channel camera not a 1 channel depth camera. It can capture a monochrome luminance and depth channel. This actually can be superior to a standard camera because you aren’t capturing the current lighting conditions you’re capturing just the inherent reflectivity of an object which works great for things like cross walk paint but not getting any confusing sunlight.

Also of course 3D makes it far easier to segment out regions than picking it out with visual clutter behind. Let’s play: find the stop light in this photo.

Autonomous driving is imo impossible without a 3d spatial understanding of the world. The question is not whether lidar is needed but whether or not we will quickly enough be able to come up with a reliable enough 3D volume. Of course it can, but how long is it going to take? I can in my brain look at a pair of headphones and in my head from a single photo spin them around, ‘look’ at them from different angles and even recognize them in nearly any lighting condition from a single photo or even different colors and folded, unfolded, inside of a package with a window etc. I can do that because human vision cognition doesn’t just recognize shapes it ‘understands’ what a thing is. Take a sweater, ball it up… your brain will still recognize it as the exact same sweater. That’s our super power. LIDAR doesn’t necessarily give us the same thing but it has its own super power which is to directly see in 3D which yes is a crutch but it’s a crutch for a pretty hard vision problem.

If neural nets + vision solves driving, they’re going to do it using their own unique strengths not leveraging human strengths and not leveraging lidar strengths. So judging and measuring neural nets by how well they match another technology’s performance in one area is the same flaw computer scientists made in thinking that a computer’s impressive ability to do millions of multiplication problems per second would somehow translate into being a great chess player. Waymo has the best lidar in the industry… and yet…

2 Likes

Is this image gleaned purely from lidar? I’d read about lidar being used to detect lane lines based on the surface reflectivity, but I thought it was a super unreliable, super low accuracy thing.

I would think that the capabilities of LiDAR sensors, including range and resolution, depend heavily on price. The kind of LiDAR that one can economically justify equipping a robocar will probably not have extraordinary capabilities.

Mobileye says:

While other sensors such as radar and LiDAR may provide redundancy for object detection – the camera is the only real-time sensor for driving path geometry and other static scene semantics (such as traffic signs, on-road markings, etc.).

I tried to find the origin of this image. I think it was originally park of the marketing materials for the Dynascan S250. It sure has been used a lot of places. They apparently drove that road using their lidar and then generated this view (and the video below) from that data.

2 Likes

Even the livox Mid40 can do it and it’s one of the cheapest if not the cheapest automotive grade sensors.

I think one issue besides being unable to read road signs is that integration time is nowhere near an optical camera’s 60 fps.

1 Like

Why does Mobileye say “the camera is the only real-time sensor for driving path geometry and other static scene semantics (such as traffic signs, on-road markings, etc.)”?

I think the emphasis needs to be on the word real-time. In the gif above you can see that the resolution is extremely low at even 10fps. Secondly you can’t see in color so is it a dashed yellow line or a dashed white line?

Traffic lights still require an optical sensor as well. They are an emissive sign not a reflective sign so they appear black to a lidar sensor. So of course we will also need RGB cameras as well. But LIDAR isn’t completely useless for many road markings. And it definitely can assist in guiding visual systems on where to focus.

But you can also project RGB onto LIDAR data pretty cheaply computationally if you wanted RGB road line color classification.

1 Like

Is the controversy over the use of LiDAR a matter of technical feasibility, economics, aesthetics, health and safety, politics, ego, something else, all of the above? Does it matter who’s right?

I think it’s technical feasibility x economics. Quantity of training data —> quantity of vehicles —> cost of vehicles. Lidar is expensive therefore makes vehicles expensive.

Companies like Tesla and General Motors could, in theory, collect training data from mass market vehicles and then operate separate robotaxi fleets with lidar. The reason Tesla and GM haven’t done this is up for debate.

Has anyone estimated the cost of equipping a fleet of cars with LiDAR? Has anyone estimated the cost of generating the necessary HD maps for the entire country? I’ve read that the price of LiDAR is coming down considerably, but I doubt that the cost of creating and maintaining the HD maps (that make LiDAR work for localization) is getting cheaper. Is using LiDAR as a supplementary sensor to computer vision, radar, and deep machine learning à la Tesla worth the added cost per car if it turns out to be the only way to achieve six-sigma level driverless performance? It seems that most of the criticism I read is aimed at Tesla for not using LiDAR, not at Google, GM, et. al. for using it. I’m trying to understand the thinking that arrives at the conclusion that LiDAR is necessary, not just easier, and I’m having a hard time of it. What does make sense is Tesla’s holistic approach to the problem. I don’t see that in anyone else’s development/deployment plan.

The discussion of Lidar capabilities that we see in the lay press doesn’t present either the capabilities or the promise of lidar accurately and it generally judges the relative merits based on metrics which are not representative of how lidar is actually used today or which are actually valuable to the task of driving safely. This article is no exception. The general claims this article is making are wholly unsupported by the data that is presented and the author suggesting otherwise gives me the impression that this article is constructed to mislead.

The criteria evaluated are not relevant to the task of driving and are not representative of how lidar is used in vehicles today or, AFAIK, is expected to be used in the future. Additionally the evaluation criteria assume the economic availability of consumer automotive grade lidar systems which don’t exist and which AFAIK are not on the roadmap of any commercial lidar maker. Planned commercial vehicle lidar systems don’t have 200m of range with 64 planes and 360 degree coverage. They have 100m of range, 4 planes, and 120 degree FOV. The refresh rates of commercially planned lidar make them inappropriate to use for object detection and collision avoidance at normal driving speeds, which is why they are primarily used for localization today and not object detection. An image of a scene accumulated over several seconds from a high performance research lidar on a stopped vehicle is not indicative of what we can expect from commercial deployments on vehicles used to ferry passengers.

Even if a lidar with the needed physical capabilities existed we don’t today have a way combining it with other sensors in a way that doesn’t increase the rate of occurrence of long tail errors that can lead to catastrophic failures. It’s well appreciated by human drivers than vision has obscure failure modes which create subtle but severe problems in unusual circumstances which nonetheless need to be reliably handled. It is less well appreciated that lidar has similar but independent failure modes that make reliably interpreting the sensor output impossible to do perfectly. Transparent, translucent, and reflective optical surfaces are common on our cars, roadways, and structures and each of them presents a challenge to reliable interpretation of the backscatter of a powerful lidar signal.

Addressing the thought experiment: what if we had inexpensive, nearly ideal lidar implementations combined with software that does not experience increased failure rates with increased complexity is not helpful to answering whether lidar is likely to be useful in the real world. This is just another version of the “trolley problem” phenomenon - a debate topic which is compelling to a lay audience but which just adds noise at best and at worst is anti-signal: actually cancelling out useful debate in the pursuit of a meaningless argument.

If we want to have an informed debate about whether lidar adds value for object detection we need a statistically significant amount of real world data comparing vision alone to vision combined with commercially available lidar specifically on the task of detecting objects that are relevant to the planning process for a driverless vehicle. There is no simple proxy that is going to substitute for this data because the critical evaluation is not whether lidar makes the easy stuff easier, but whether it makes the really hard stuff easier without introducing separate problems that exacerbate the risk of accidents. You can’t answer the question of how a complex system in a complex environment will fail using rules of thumb and reasoning by analogy.

The lack of a proper statistical analysis in the public sphere may not be coincidence. There are numerous commercial organizations which could publish an objective analysis from their own records and who at least superficially appear to have an incentive to do so. The fact that nobody has done it suggests that publishing the data they actually have might not be helpful to their interests.

1 Like

Punnett square I made:

2 Likes