Tesla Autopilot internship mentions reinforcement learning

Today I saw this Tesla job posting:

Autopilot Internship/Co-Op (Summer 2019)


As an Autopilot AI Scientist you will perform research and development to advance the state of the art in technologies enabling autonomous driving, research and develop algorithms for complex tasks like full scene understanding, scene prediction, planning and sequential decision making. Devise methods to use to enormous quantities of lightly labelled data in addition to a diverse set of richly labelled data.


  • Individuals in this role are expected to be experts in identified research areas such as computer vision, artificial intelligence, machine learning, and applied mathematics, particularly including areas such as supervised learning, graphical models, reinforcement learning, optimal control:

  • Develop state-of-the-art algorithms in one or all of the following areas: full scene understanding, multi-modal data processing, learning for planning and decision making, etc.

  • Implement them to run with real time performance in autonomous vehicle production environment.

  • Creative ways to use complementary sensor data, and offline processing to create ground truth for training the algorithms.

Some hints in here on what Tesla might be developing, or at least experimenting with. Could “methods to use enormous quantities of lightly labelled data” be a reference to supervised imitation learning? Or perhaps a way to train perception neural networks using driver input data?

nice catch

The interesting line for me is confirmation of this: “Creative ways to use complementary sensor data, and offline processing to create ground truth for training the algorithms.”

Most likely for training photogrammetry networks.

Can you explain what you mean?

Building a 3D point cloud from spatial disparity. Aka reconstructing a lidar like 3d scene but using triangulation to measure.

It’s super slow with traditional methods but neural nets have been shown to create pretty good results in real time. e.g. this example took my i7 6-core machine about 2 hours to compute. psx_%20%E2%80%94%20Agisoft%20PhotoScan%20Standard

So process offline using classic approaches that can take an hour per frame and then extract the point cloud. Pro: is that you can use fleet data since no cars have lidar. Con: is it takes a ton of offline processing.

Sounds like they want to not just do photogrammetry but also perform sensor fusion to properly scale and hint. eg photogrammetry can’t see depth without texture to extract features from. A big white box truck alongside could be 1’ or infinity there is nothing to triangulate. But ultrasonics will see it no problem. So filling in the gap gives you a full 3d scene.

Also since they don’t have stereo coverage of 360 degrees they are probably using movement of the car to help with the scene construction. The problem with this is that vehicles moving at nearly 0 velocity relative to the car (e.g. following in traffic) won’t move so you have to fill those with either stereophotogrammetry or else radar/ultrasonics. Radar is the most complimentary choice for forward objects since it can identify which regions are in motion and which are static. So the neural net would rely on vehicle motion to construct stationary reference points like buildings and road lanes but rely on radar/stereo disparity/ultrasonics/trained-monocular-depth-estimation for moving regions.