A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions

Deep neural networks (DNNs) achieve excellent per- formance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is still much lower than human performance on distorted images. We additionally find that there is little correlation in errors between DNNs and human subjects. This could be an indication that the internal representation of images are different between DNNs and the human visual system. These comparisons with human performance could be used to guide future development of more robust DNNs.

This study benchmarked deep neural networks against humans on classifying images of 10 different dog breeds. Result:

We show that the performance of DNNs and human subjects is roughly the same on clean images, but for distorted images human subjects achieve much higher accuracy than DNNs.

Learning and memorizing 10 dog breeds seems like a particularly hard task for humans. It’s not super comparable to classifying images of vehicles, pedestrians, cyclists, traffic signs, etc. Still, it’s an interesting result. I would like to see more research like this.