Alex Kendall, Wayve CTO: Now is the Time for Reinforcement Learning on Real Robots

Deep learning has the potential to completely revolutionise mobile robotics. It understands vast quantities of data, simplifies engineering and creates representations which generalise beyond anything we can hand-engineer. We’ve seen this over and over again in other fields rich in data. However, today deep learning is typically only seen in isolated parts of mobile robotics systems, such as computer vision front-ends. Most leading robotics applications rely on hand-engineered control policies, such as Waymo’s self-driving car or Boston Dynamic’s Atlas.

For an interesting anecdote for reward design, we observed that when our car was trained with the reward to drive as far as possible without safety driver intervention, it learned to zig-zag down the road, as it didn’t leave the lane and cause intervention, but drove a greater distance by zig-zaging, therefore maximising the reward. This is a phenomenon known as reward hacking, where the agent earns reward using unintended behavior. For an excellent treatment of reward-hacking and other problems in AI safety, see Amodei et al. 2016.

I believe ideas like reward-learning, inverse reinforcement learning, preference learning and imitation learning are going to be very important to real-life robotics. Ultimately, the best reward will be learned from demonstration and feedback.

There is a huge opportunity to work on A.I. for robotics today. Hardware is cheaper, more accessible and reliable than ever before. I think mobile robotics is about to go through the revolution that computer vision, NLP and other data science fields have seen over the last five years.

Autonomous driving is the ideal application to work on. Here’s why; the action space is relatively simple. Unlike difficult strategy games like DOTA, driving does not require long term memory or strategy. At a basic level, the decision is either left, right, straight or stop. The counter point to this is that the input state space is very hard, but computer vision is making remarkable progress here.