Tesla Published Patent 'Generating Ground Truth For Machine Learning From Time Series Elements'

Normally I wouldn’t post this kind of thing, but Elon himself tweeted the link, so…

Barely anything in the article. Maybe something can be gleaned from the patent itself?


As one example, a series of images for a time period, such as 30 seconds, is used to determine the actual path of a vehicle lane line over the time period the vehicle travels. The vehicle lane line is determined by using the most accurate images of the vehicle lane over the time period. Different portions (or locations) of the lane line may be identified from different image data of the time series. As the vehicle travels in a lane alongside a lane line, more accurate data is captured for different portions of the lane line. In some examples, occluded portions of the lane line are revealed as the vehicle travels, for example, along a hidden curve or over a crest of a hill. The most accurate portions of the lane line from each image of the time series may be used to identify a lane line over the entire group of image data. Image data of the lane line in the distance is typically less detailed than image data of the lane line near the vehicle. By capturing a time series of image data as a vehicle travels along a lane, accurate image data and corresponding odometry data for all portions of the corresponding lane line are collected.

In some embodiments, a three-dimensional representation of a feature, such as a lane line, is created from the group of time series elements that corresponds to the ground truth. This ground truth is then associated with a subset of the time series elements, such as a single image frame of the group of captured image data. For example, the first image of a group of images is associated with the ground truth for a lane line represented in three-dimensional space. Although the ground truth is determined based on the group of images, the selected first frame and the ground truth are used to create a training data. As an example, training data is created for predicting a three-dimensional representation of a vehicle lane using only a single image. In some embodiments, any element or a group of elements of a group of time series elements is associated with the ground truth and used to create training data. For example, the ground truth may be applied to an entire video sequence for creating training data. As another example, an intermediate element or the last element of a group of time series elements is associated with the ground truth and used to create training data.

Just for the the record, it is not a patent yet. It is a published patent APPLICATION. They may or may not receive the patent. Time will tell.

Not sure how “revolutionizing” this is. They are essentially “picking” some time clip from all the driving that the human driver does on Tesla, picks some random subset of the different modalities and interpolates that for the entire time clip. Then it probably generates some “ground truth” vector/representation into some feature space which their NNs can work with (or not- they could easily upload the raw data).
It isn’t that people haven’t thought of this, but how well does this work in practice…

1 Like