Important new robotics work published in Science


So ETH has a really interesting new result that just came out a couple of days ago. They have a youtube video summarizing it here:

This result addresses what is probably the most significant obstacle to using RL in complex control robotics applications today: the “sim 2 real” problem. RL has demonstrated that, at least in simulation, it can provide solutions to complex control problems that are dramatically better than conventional approaches. RL is so much better, in fact, that if the simulation results can be translated to real robots it will finally enable capabilities that have been long sought but not yet won. Amongst these capabilities is legged robots that can move around in the world with the grace and efficiency that we associate with living creatures.

But getting similar results with physical robots has proved elusive. RL today is trained in simulation because of sample efficiency issues combined with the cost advantages and flexibility of virtual training environments. But a control policy learned in simulation will only work in an identical physical robot in an identical environment unless a lot of effort goes into making the control policy robust against variation. That robustness costs a lot in terms of development time and computation and often results in policies which are too conservative to be efficient in the real world. And creating simulations that capture all the complex nonlinear dynamics of real robot bodies has been impossible for all but the simplest of robots.

What ETH has done here is to add an extra layer to the simulation process by training a small neural network to mimic the nonlinear dynamics of the most difficult control elements - the actuators in the robot body. This turns out to be easy to do and can be done with just a few minutes of data taken from a real actuator. Including that pre-trained network in the simulated model of the robot body allows the straightforward training of a control policy that can be translated to the real robot with no modifications and which works on the first try.

This is a really important result. I won’t call it a breakthrough because the idea has been around for a while and ETH is just the first group to get it built and demonstrate it on a real robot. But the physical demonstration of the technique shows that ‘sim 2 real’ can be dealt with effectively and opens the door to dramatic advances in robot control over the next couple of years.

Not only does ETH show that they can transfer a pure RL policy straight into a robot body, they show that the resulting control system performs well beyond what state-of-the-art model based controllers.

One thing that is nice about robot body control as a problem is that anybody can look at the resulting movement and judge it pretty well because we are all familiar with how animals move, and animal movement is a really good benchmark. If you look at a dog sized robot and see unnatural movement it’s probably because the control algorithm is brittle or limited. With that in mind please look at this video that compares the ETH RL policy to a state of the art model based controller:

The difference is quite clear - the RL policy, despite having been learned on a simulated body in a simulated environment with zero hand tuning required is dramatically more natural than the hand-built hand-tuned model based controller.


I asked my partner her opinion on the learned vs model-based video to get a layman’s perspective. She said that the model-based controller reminded her of a jogger at a stoplight.

Had to share that imagery


It’s honestly hard to tell which one is better with no understanding of the agent’s goals in that situation. Let’s see them navigate an obstacle course, or even just walk on some hilly terrain.

The learned policy kind of looks like an old dog with arthritis, whereas the model-based controller looks like an excited puppy. Initially I thought the RL policy was the classical controller and vice versa.

The explanation in the main video is helpful. They explain that the learned policy is more precise and uses less power. They also say the learned policy is 25% faster, and can get up from any fall position they tested. (How good is the model-based controller at getting up from falls?)

Crazy that it would take under 12 hours to train this robot on a normal desktop PC. Compare that to OpenAI Five, which trained for weeks on 1,000 GPUs and 100,000 CPU cores. (Source.)

I really want a Star Wars-like world filled with droids. Factory droids, firefighter droids, cleaning droids, droids that help people with limited mobility, etc. A next step would be applying this sort of RL technique to fine motor control. Apparently this is what factory robots and warehouse robots continue to struggle with.

But OpenAI did that using domain randomization because they said it was too hard to model the physics of contact forces. Has anyone tried to use supervised learning to learn a model of contact forces, like how this robot learned a model of its actuators?