So ETH has a really interesting new result that just came out a couple of days ago. They have a youtube video summarizing it here: https://www.youtube.com/watch?v=aTDkYFZFWug&feature=youtu.be
This result addresses what is probably the most significant obstacle to using RL in complex control robotics applications today: the “sim 2 real” problem. RL has demonstrated that, at least in simulation, it can provide solutions to complex control problems that are dramatically better than conventional approaches. RL is so much better, in fact, that if the simulation results can be translated to real robots it will finally enable capabilities that have been long sought but not yet won. Amongst these capabilities is legged robots that can move around in the world with the grace and efficiency that we associate with living creatures.
But getting similar results with physical robots has proved elusive. RL today is trained in simulation because of sample efficiency issues combined with the cost advantages and flexibility of virtual training environments. But a control policy learned in simulation will only work in an identical physical robot in an identical environment unless a lot of effort goes into making the control policy robust against variation. That robustness costs a lot in terms of development time and computation and often results in policies which are too conservative to be efficient in the real world. And creating simulations that capture all the complex nonlinear dynamics of real robot bodies has been impossible for all but the simplest of robots.
What ETH has done here is to add an extra layer to the simulation process by training a small neural network to mimic the nonlinear dynamics of the most difficult control elements - the actuators in the robot body. This turns out to be easy to do and can be done with just a few minutes of data taken from a real actuator. Including that pre-trained network in the simulated model of the robot body allows the straightforward training of a control policy that can be translated to the real robot with no modifications and which works on the first try.
This is a really important result. I won’t call it a breakthrough because the idea has been around for a while and ETH is just the first group to get it built and demonstrate it on a real robot. But the physical demonstration of the technique shows that ‘sim 2 real’ can be dealt with effectively and opens the door to dramatic advances in robot control over the next couple of years.
Not only does ETH show that they can transfer a pure RL policy straight into a robot body, they show that the resulting control system performs well beyond what state-of-the-art model based controllers.
One thing that is nice about robot body control as a problem is that anybody can look at the resulting movement and judge it pretty well because we are all familiar with how animals move, and animal movement is a really good benchmark. If you look at a dog sized robot and see unnatural movement it’s probably because the control algorithm is brittle or limited. With that in mind please look at this video that compares the ETH RL policy to a state of the art model based controller: http://robotics.sciencemag.org/highwire/filestream/640686/field_highwire_adjunct_files/1/aau5872_Movie_S2.mp4
The difference is quite clear - the RL policy, despite having been learned on a simulated body in a simulated environment with zero hand tuning required is dramatically more natural than the hand-built hand-tuned model based controller.