Bias in Naturalistic Driving Datasets. The appeal of behavior cloning lies in its simplicity and theoretical scalability, as it can indeed learn by imitation from large off-line collected demonstrations (e.g., using driving logs from manually driven production vehicles). It is, however, susceptible to dataset biases like all learning methods. This is exacerbated in the case of imitation learning of driving policies, as most of real-world driving consists in either a few simple behaviors or a heavy tail of complex reactions to rare events. Consequently, this can result in performance degrading as more data is collected, because the diversity of the dataset does not grow fast enough compared to the main mode of demonstrations. This phenomenon was not clearly measured before. Using our new NoCrash benchmark (section 4), we confirm it may happen in practice.
This imparts the importance of not collecting common state-action pairs after a certain point and only collecting ones that are uncommon or rare. The more you water down your dataset with the same “few simple behaviors” the more biased your agent will be toward those behaviours and therefore the worse it will be at “complex reactions to rare events”.
Another wonderful excerpt:
Causal Confusion. Related to dataset bias, end-to-end behavior cloning can suffer from causal confusion : spurious correlations cannot be distinguished from true causes in observed training demonstration patterns unless an explicit causal model or on-policy demonstrations are used. Our new NoCrash benchmark confirms the theoretical observation and toy experiments of  in realistic driving conditions. In particular, we identify a typical failure mode due to a subtle dataset bias: the inertia problem. When the ego vehicle is stopped (e.g., at a red traffic light), the probability it stays static is indeed overwhelming in the training data. This creates a spurious correlation between low speed and no acceleration, inducing excessive stopping and difficult restarting in the imitative policy. Although mediated perception approaches that explicitly model causal signals like traffic lights do not suffer from this theoretical limitation, they still under-perform end-to-end learning in unconstrained environments, because not all causes might be modeled (e.g., some potential obstacles) and errors at the perception layer (e.g., missed detections) are irrecoverable.
This reminds me of what Waymo did with ChauffeurNet. From the paper (page 8):
4.2 Past Motion Dropout
During training, the model is provided the past motion history as one of the inputs (Fig. 1(g)). Since the past motion history during training is from an expert demonstration, the net can learn to “cheat” by just extrapolating from the past rather than finding the underlying causes of the behavior. During closed-loop inference, this breaks down because the past history is from the net’s own past predictions. For example, such a trained net may learn to only stop for a stop sign if it sees a deceleration in the past history, and will therefore never stop for a stop sign during closed-loop inference. To address this, we introduce a dropout on the past pose history, where for 50% of the examples, we keep only the current position (u0,v0) of the agent in the past agent poses channel of the input data. This forces the net to look at other cues in the environment to explain the future motion profile in the training example.
I think a cool name for this sort of thing would be counterfactual training.