Mobileye’s camera-first approach to full autonomy


Mobileye’s CEO and CTO, Amnon Shashua, describes Mobileye’s approach to fully autonomous driving in a talk from May 2018:

What do we mean by “true redundancy”? We have to achieve a comprehensive, end-to-end — so end-to-end is the sensing, it is the mapping, localizing the vehicle inside the map, the planning, the action — everything is based just on cameras, and it’s one comprehensive solution.

Another comprehensive solution and independent is to detect all road users using radar and lidars.

Skip to 13:48 to hear his comments:

It’s very interesting to me that Mobileye’s position is that fully autonomous driving is achievable in the near-term using just cameras, and that radar and lidar are only required for redundancy.

Aurora Innovation (a startup co-founded by Chris Urmson from Waymo, Sterling Anderson from Tesla, and Drew Bagnell from Uber ATG) has similar reasoning:

We believe it will ultimately be entirely possible to build a self-driving car that can get by on, for instance, cameras alone. However, getting autonomy out safely, quickly, and broadly means driving down errors as quickly as possible. Crudely speaking, if we have three independent modalities with epsilon miss-detection-rates and we combine them we can achieve an epsilon ³ rate in perception. In practice, relatively orthogonal failure modes won’t achieve that level of benefit, however, an error every million miles can get boosted to an error every billion miles. It is extremely difficult to achieve this level of accuracy with a single modality alone.

Different sensor modalities have different strengths and weaknesses; thus, incorporating multiple modalities drives orders of magnitude improvements in the reliability of the system. Cameras suffer from difficulty in low-light and high dynamic range scenarios; radars suffer from limited resolution and artifacts due to multi-path and doppler ambiguity; lidars “see” obscurants.

Tesla is sometimes portrayed as iconoclastic for not using lidar (just cameras, radar, and ultrasonics), but Tesla’s position doesn’t seem too different from Mobileye’s or Aurora’s.

If the goal of autonomous vehicles is to reach a crash rate of once per 1 million miles, about 1/2 the crash rate of human drivers (once per 530,000 miles), then the level of redundancy Aurora is talking about isn’t needed.

Suppose two sensor modalities — cameras and radar — is enough to bring the perception error rate down to once per 10 million miles. That would be sufficient as long as all other types of errors had a combined rate of no more than 9 per 10 million miles (or once per 1.1 million miles). That would mean a total of 10 errors per 10 million miles, or an error once per 1 million miles — about 2x safer than human drivers. This is even a conservative figure, since less than 100% of errors lead to a crash. A perception error rate of once per 1 billion miles isn’t necessary.

A perception error rate below once per 10 million miles would provide even more leeway for other types of errors, or reduce the overall error rate even more.


More on this topic from Mobileye:

Current sensor setup: only cameras. Why?

During this initial phase, the fleet is powered only by cameras. In a 360-degree configuration, each vehicle uses 12 cameras, with eight cameras providing long-range surround view and four cameras utilized for parking. The goal in this phase is to prove that we can create a comprehensive end-to-end solution from processing only the camera data. We characterize an end-to-end AV solution as consisting of a surround view sensing state capable of detecting road users, drivable paths and the semantic meaning of traffic signs/lights; the real-time creation of HD-maps as well as the ability to localize the AV with centimeter-level accuracy; path planning (i.e., driving policy); and vehicle control. The sensing state is depicted in the videos above as a top-view rendering of the environment around the AV while in motion.

The camera-only phase is our strategy for achieving what we refer to as “true redundancy” of sensing. True redundancy refers to a sensing system consisting of multiple independently engineered sensing systems, each of which can support fully autonomous driving on its own. This is in contrast to fusing raw sensor data from multiple sources together early in the process, which in practice results in a single sensing system. True redundancy provides two major advantages: The amount of data required to validate the perception system is massively lower (square root of 1 billion hours vs. 1 billion hours) as depicted in the graphic below; in the case of a failure of one of the independent systems, the vehicle can continue operating safely in contrast to a vehicle with a low-level fused system that needs to cease driving immediately. A useful analogy to the fused system is a string of Christmas tree lights where the entire string fails when one bulb burns out.

jerusalem 2x1|690x345

The radar/lidar layer will be added in the coming weeks as a second phase of our development and then synergies among sensing modalities can be used for increasing the “comfort” of driving.

Full blog post: