DeepMind masters Capture the Flag in Quake III Arena

End-to-end reinforcement learning directly from pixels!

DeepMind tried eliminating superhuman reaction times and found the agents were still superhuman:

In a further study, we trained agents which have an inbuilt delay of a quarter of a second (267 ms) – that is, agents have a 267ms lag before observing the world – comparable with reported reaction times of human video game players. These response-delayed agents still outperformed human participants, with strong humans only winning 21% of the time.

Same for aiming accuracy. From the paper:

Another advantage exhibited by agents is their tagging accuracy, where FTW agents achieve 80% accuracy com- pared to humans’ 48%. By artificially reducing the FTW agents’ tagging accuracy to be similar to humans (without retraining them), agents’ win-rate was reduced, though still exceeded that of humans… Thus, while agents learn to make use of their potential for better tagging accuracy, this is only one factor contributing to their overall performance.

Video of gameplay:

Paper:

I had no idea this result was first announced 10 months ago!