Computer vision vs. human vision in visual question answering (VQA)

The Alibaba AliceMind model took first place in the VQA Challenge 2021 competition, where it was required to answer 1.1 million questions about 250,000 images. Alibaba’s algorithm demonstrated recognition accuracy of 81.26%, while the accuracy of human recognition was 80.83%.

At the Visual Question Answering (VQA) Challenge 2021, computer vision models study images and answer questions about images.