Important new robotics work published in Science

So ETH has a really interesting new result that just came out a couple of days ago. They have a youtube video summarizing it here:

This result addresses what is probably the most significant obstacle to using RL in complex control robotics applications today: the “sim 2 real” problem. RL has demonstrated that, at least in simulation, it can provide solutions to complex control problems that are dramatically better than conventional approaches. RL is so much better, in fact, that if the simulation results can be translated to real robots it will finally enable capabilities that have been long sought but not yet won. Amongst these capabilities is legged robots that can move around in the world with the grace and efficiency that we associate with living creatures.

But getting similar results with physical robots has proved elusive. RL today is trained in simulation because of sample efficiency issues combined with the cost advantages and flexibility of virtual training environments. But a control policy learned in simulation will only work in an identical physical robot in an identical environment unless a lot of effort goes into making the control policy robust against variation. That robustness costs a lot in terms of development time and computation and often results in policies which are too conservative to be efficient in the real world. And creating simulations that capture all the complex nonlinear dynamics of real robot bodies has been impossible for all but the simplest of robots.

What ETH has done here is to add an extra layer to the simulation process by training a small neural network to mimic the nonlinear dynamics of the most difficult control elements - the actuators in the robot body. This turns out to be easy to do and can be done with just a few minutes of data taken from a real actuator. Including that pre-trained network in the simulated model of the robot body allows the straightforward training of a control policy that can be translated to the real robot with no modifications and which works on the first try.

This is a really important result. I won’t call it a breakthrough because the idea has been around for a while and ETH is just the first group to get it built and demonstrate it on a real robot. But the physical demonstration of the technique shows that ‘sim 2 real’ can be dealt with effectively and opens the door to dramatic advances in robot control over the next couple of years.

Not only does ETH show that they can transfer a pure RL policy straight into a robot body, they show that the resulting control system performs well beyond what state-of-the-art model based controllers.

One thing that is nice about robot body control as a problem is that anybody can look at the resulting movement and judge it pretty well because we are all familiar with how animals move, and animal movement is a really good benchmark. If you look at a dog sized robot and see unnatural movement it’s probably because the control algorithm is brittle or limited. With that in mind please look at this video that compares the ETH RL policy to a state of the art model based controller:

The difference is quite clear - the RL policy, despite having been learned on a simulated body in a simulated environment with zero hand tuning required is dramatically more natural than the hand-built hand-tuned model based controller.


I asked my partner her opinion on the learned vs model-based video to get a layman’s perspective. She said that the model-based controller reminded her of a jogger at a stoplight.

Had to share that imagery


It’s honestly hard to tell which one is better with no understanding of the agent’s goals in that situation. Let’s see them navigate an obstacle course, or even just walk on some hilly terrain.

The learned policy kind of looks like an old dog with arthritis, whereas the model-based controller looks like an excited puppy. Initially I thought the RL policy was the classical controller and vice versa.

The explanation in the main video is helpful. They explain that the learned policy is more precise and uses less power. They also say the learned policy is 25% faster, and can get up from any fall position they tested. (How good is the model-based controller at getting up from falls?)

Crazy that it would take under 12 hours to train this robot on a normal desktop PC. Compare that to OpenAI Five, which trained for weeks on 1,000 GPUs and 100,000 CPU cores. (Source.)

I really want a Star Wars-like world filled with droids. Factory droids, firefighter droids, cleaning droids, droids that help people with limited mobility, etc. A next step would be applying this sort of RL technique to fine motor control. Apparently this is what factory robots and warehouse robots continue to struggle with.

But OpenAI did that using domain randomization because they said it was too hard to model the physics of contact forces. Has anyone tried to use supervised learning to learn a model of contact forces, like how this robot learned a model of its actuators?

This result by itself was important enough to publish immediately. Having found a technique to bridge the sim 2 real divide is an important result that deserves to be distributed promptly and the deserving authors doubtless want their due credit for finding it. This technique of training an NN to provide a transfer function to enable usably accurate simulation is very likely applicable to other areas including the Dactyl type modeling and many others. In the case of Dactyl the transfer function learned by the network would be a prediction of how the cube moves in response to detected and applied forces. It’s a much more complicated transfer function to train but in principle it should work find. That would allow Dactyl-like results to be acquired with vastly less variation required, and thus vastly less computation.

We’re going to see this method employed heavily in the next 12 to 24 months, I predict.

1 Like

Write-up in Nature:

The hybrid simulator was faster and more accurate than a simulator that was based on analytical models. But more importantly, when a locomotion strategy was optimized in the hybrid simulator, and then transferred into the robot’s body and tested in the physical world, it was as successful as it was in simulation. This long-overdue breakthrough signals the demise of the seemingly insurmountable simulation–reality gap.

Hyperbolic, but essentially true. The experiment was straightforward and not complicated, so it was bound to be done. Even a lowly hobbyist such as myself had thought of it as one of a number of probably feasible variants a year earlier and I’m sure many others had the same thought. What is so exciting is that it works just as you would expect, which means there aren’t really any significant barriers to proceeding with using NNs to control robot bodies. It’s only a matter of time before we see graceful and competent locomoting machines become commonplace. I don’t know if it deserves the description breakthrough from a technical standpoint, but it is certainly one of the most important milestones in recent memory.

1 Like

That’s very exciting. Bring on the robots! Will this advance contribute to autonomous car progress, or is control for cars already a solved problem/a simple enough problem that this doesn’t matter?

I was wondering if maybe the implications are better suited for Tesla’s manufacturing capabilities than for their self-driving cars (or for any other auto OEM). Elon has not-so-secret plans to fully automate the manufacturing of a car, but has talked about their difficulties with using robots for tasks that require fine motor skills. The Model Y is going to be made with much less wiring to cut down on those sorts of tasks, but perhaps they could also improve their robotic capabilities by applying NNs to it

1 Like

This will have some impact on industrial robots, but mainly in reducing the cost of applying them to a particular problem. The big impact of using NNs for robotic control is going to be come when robots get redesigned to take advantage of what NNs can do.

In order to simplify the control problem today robots are built to be extremely stiff, which also makes them heavy. Movement repeatability is critical to this simple programming paradigm so current robots are also very slow and need to move along consistent trajectories. This is why you see robot limbs designed as large cross section tubes with extremely stiff joints and many, many fasteners and welds being used. The net result is that robots today are awkward, heavy, expensive, and slow.

NN control completely breaks this requirement for repeatability through stiffness and tightly controlled motion. NN control gives you high error tolerance in movement and then precision comes via external reference feedback - cameras provide arbitrarily high accuracy positional feedback at very low cost. Consider that the human arm has absolutely horrible repeatability and terrible absolute control, but by using your arms in combination with your eyes absolute positioning down to tens of microns becomes possible with a lightweight and flexible limb with very high strength to weight.

NN control will give us cheap, capable robots with control capabilities that exceed what humans can do in all the basic dimensions. In very high volumes (100M scale) I estimate that a humanoid robot with performance envelope exceeding that of a human being can be built for $10k.

To get there we still need a few core capabilities. Sim to real was one of them, but I think that’s a solved problem now. We still need to see a demonstration of general purpose absolute positional feedback using cheap sensors (e.g. cameras), but like sim to real that’s a problem that just needs attention - it doesn’t require breakthroughs. Recent demonstrations by DeepMind and others show that NNs can learn the 3D structure of the world and the nature of objects via unsupervised methods. Robot physical perception comparable to humans is near term feasible. Beyond that an industry will need to be built around creating high volume general purpose robot bodies that are designed for NN control - this will take at least a decade and probably two even after the core capabilities have all been developed. But at the end of that process the world will have seen the end of the economic application of human physical labor. This is going to be a rather profound transformation and it will be upon us relatively soon.


Jimmy, your post could not be more well-timed! There was just a big announcement on this topic.

Pieter Abbeel — who is well-known for his work on reinforcement learning and imitation learning — was involved in designing a new robot called Blue at UC Berkeley. Blue started as a research project at the Berkeley AI Research (BAIR) lab, and now it looks like the project has been spun out into a startup called Berkeley Open Robotics.

Academic paper and demo videos:

Article from The Verge:

“The fact that AI is becoming more capable gave us an opportunity to rethink how to design a robot,” Abbeel tells The Verge.

Abbeel explains that most robots in use today are built to be strong and accurate. Their movements are predefined, and they simply repeat the same action over and over again, whether that’s lifting pallets of goods, welding cars, or fastening screws into a smartphone.

The robots of the future, by comparison, will be reactive and dynamic. They’ll be able to work safely alongside humans without crushing them, and instead of having their actions planned in advance, they’ll navigate the world in real time using cameras and sensors.

“If you look at traditional robots, they’re designed around the principle of very high precision and repeated motions,” says Abbeel. “But you don’t necessarily need sub-millimeter repeatability.” (That’s the ability to perform the same task over and over with differences in movement of less than a millimeter.) “Humans don’t have sub-millimeter repeatability. Instead, we use our eyes and sense of touch to get things done through feedback.”

The upshot:

This makes Blue safer to work around but also suitable for research using reinforcement learning, a type of AI training method that’s becoming popular in robotics.

The target price is $5,000. Blue is similar to the robot Baxter by Rethink Robotics (a company co-founded by well-known roboticist Rodney Brooks), which wasn’t a commercial success. Baxter cost $25,000, and it was launched in 2011, before the popularization of deep supervised learning and deep reinforcement learning. Baxter may have just been too early, and maybe too expensive as well.

Baxter was also being sold to work in warehouses and manufacturing facilities right from the get-go, whereas initially Blue is being sold to researchers working on machine learning and robotics.

1 Like