How augmenting camera data can hurt vision performance in autonomous cars

Fascinating discovery from Matt Cooper at DeepScale:

Unlike data from COCO and other public datasets, the data collected by a self-driving car is incredibly consistent. Cars generally have consistent pose with respect to other vehicles and road objects. Additionally, all images come from the same cameras, mounted at the same positions and angles. That means that all data collected by the same system has consistent camera properties, like the extrinsics and intrinsics mentioned above. We can collect training data with the same sensor system as will be used in production, so a neural net in a self-driving car doesn’t have to worry about generalizing over these properties. Because of this, it can actually be beneficial to overfit to the specific camera properties of a system.

Self-driving car data can be so consistent that standard data augmentors, such as flip and crop, hurt performance more than they help. The intuition is simple: flipping training images doesn’t make sense because the cameras will always be at the same angle, and the car will always be on the right side of the road (assuming US driving laws). The car will almost never be on the left side of the road, and the cameras will never flip angles, so training on flipped data forces the network to overgeneralize to situations it will never see. Similarly, cropping has the effect of shifting and scaling the original image. Since the car’s cameras will always be in the same location with the same field of view, this shifting and scaling forces overgeneralization. Overgeneralization hurts performance because the network wastes its predictive capacity learning about irrelevant scenarios.

I whole article is great. Helped me learn about data augmentation, and the author explores some data augmentation techniques that do actually help. Crisply and accessibly written.

1 Like