OpenAI CTO Greg Brockman: the last 6 years of progress in AI



Thanks for linking that - hadn’t seen it.

I couldn’t agree more - with both his argument and his conclusions.


Published my thoughts on this talk:

Let’s make hominin AGI because it’s probably 1) safer and 2) faster than the alternative.


You write “Despite the title, there isn’t really an in-depth argument that near-term artificial general intelligence (AGI) is plausible.” But Brockman’s question was “can we rule it out”. He wasn’t trying to prove the pro case, he was asking whether the con case is provable. The reality of the moment is that we cannot prove either case. And the inability to prove something does not make it’s opposite true. So Brockman cannot prove the case for deep learning leading to AGI and similarly it cannot be disproven either.

Like with self-driving vehicles the topic of AGI is one where we can’t even describe the sufficiency criteria so of course we cannot predict future success. Both of these are heavily reliant on emergent phenomena for which we have no good theory and success in both is not solely dependent on the system itself, but also on how a complex and poorly defined environment interact with the system. In situations like this one of the most powerful of the poor predictive tools available is to look at the trend of progress, make some gross evaluation about how far away the goal lies, and extrapolate. But trends are unreliable and the goal state is in truth unknown so any prediction is weak at best.

NNs have recently brought us dramatic new capabilities. Outside of NNs AGI progress has been comparatively slow and is comparatively mature. NNs do not illuminate the already mature non-NN AGI approaches; those approaches continue to proceed at their own pace. If something has changed recently, or will change soon, which makes AGI more achievable then there’s a good chance that it will be advances in NNs and not a coincident but unrelated advance in some other area. One implication of this straightforward calculation is that the best use of resources right now is to investigate how the new NN capabilities impact the feasibility of AGI.

I’m also very excited about Neuralink and related work. I wish it received much more emphasis than it does. But the nature of that technology probably precludes it from making any great contribution to AGI development until after high bandwidth direct brain interfaces are available to researchers. That is something which probably lies several years in the future at a minimum. In the meantime there are many, many exciting avenues to explore with new NN technologies.


He didn’t really talk about AGI much at all — not even the question about whether the con case is provable. The talk was almost all about non-AGI stuff. AGI/the future of AI was mentioned in, what, one slide? Still a good talk, but I felt the title was a misnomer. I want to hear about the new ideas for solving reasoning, etc.!

I also thought the point about exponentially increasing computing power used for training runs could have been developed more. If exponentially more computing power is used to gain only incremental performance increases, i.e. if there are steeply diminishing returns, that’s actually a reason to be pessimistic about AI progress. At some point the computing power requirements of the next incremental gain will outstrip the planet’s supply of semiconductors. I don’t necessarily believe this is the case, but it’s a possibility that is compatible with the data Brockman presented.

In an alternate world, someone could present a graph showing that the amount of computation needed to train neural networks has been steeply decreasing over time. That would be taken as a positive sign! Yet it’s the exact opposite trend as Brockman is presenting as a positive sign. When a piece of evidence and its opposite can both be used to point to the same conclusion, that suggests the conclusion doesn’t actually follow from that evidence.

I agree this is worth investigating, but who is investigating it, and how?

With a supercomputer the size of Jupiter, we most likely couldn’t achieve human-level AGI with any existing neural network architecture. So, even with practically unlimited computation, a limiting factor is going to be human creativity and innovation. Which in many fields we have no way of predicting, and has historically followed surprising, irregular patterns (e.g. in aviation and spaceflight). Even if computation increases at a smooth, exponential rate, we can’t assume the same for innovation in how neural networks are designed.

The analogy I use is lighter-than-air aircraft: hot air balloons, blimps, and rigid airships (zeppelins). You can imagine a counterfactual steampunk world where airships and balloons became as ubiquitous as airplanes and helicopters are in our world. You can imagine that, in this world, innovation in lighter-than-air aircraft proceeds at an amazing speed, with faster, more efficient new models out every year, and airships that are capable of flying higher and higher.

But no matter how much progress there is in lighter-than-air aircraft, an airship will never make it the Moon, because the physics airships rely on don’t apply in space. There is a ceiling on the technology’s capability, and it is the atmosphere. (That shouldn’t bother you if you only care about getting around on Earth.)

With AGI, the technology that we know for certain has no ceiling is the human brain because it instantiates GI. Neural networks were invented by loosely copying the brain, and DeepMind and Vicarious are attempting to show further progress in neural networks can be achieved by copying the brain more. Numenta is focused on theoretical neuroscience first; it’s attempting to develop a fundamental theory of intelligence based on brain research. Once a successful theory is developed, it can be used in AI.

The fundamental question here is where ideas come from. We need a lot of new ideas before we can develop AGI; Greg Brockman agrees with that. So, where are we going to get more and better ideas faster? One potential source of ideas is solving incrementally harder software and robotics problems with neural networks. Another potential source of ideas is reverse engineering the human brain. With the former approach, there is no telling how fast or slow progress toward AGI will be, or whether it will suddenly plateau for half a century. The attractiveness of the brain approach is that we know all the ideas we need are already there, instantiated in a working system. We just need to understand them.

We also don’t necessarily need to even understand them. A third approach, different from DeepMind’s, Vicarious’, or Numenta’s, is to directly copy the brain, cell for cell. The Human Brain Project was started with this intention, although some critics say that goal is infeasible. The Project’s original leadership was replaced, and I have no idea what the state of the simulation effort is — or whether it’s even still being pursued.

We can already scan dead brains to an accuracy of 20 microns. The width of an axon can be as small as 0.16 microns, so resolution still needs to be increased by more than two orders of magnitude. But the real difficulty is scanning living brains in order to get information about neuron (and glia) activity and function.

It’s not just Neuralink; another exciting company working in this area is Openwater:

Mary Lou Jespen believes that Openwater’s light-based and sound-based approach can achieve a billionfold (with a ‘b’) increase in resolution over MRIs.

So, there are 3 broad approaches to AGI:

  • pure AI (i.e. iterating on neural networks to solve incrementally harder software and robotics problems)
  • reverse engineering the human brain
  • brain emulation

The reason I feel more optimistic about the second two is that copying a system (in this case, the brain) seems faster and easier than building a new system from scratch.


Personally, I think NN is probably going to get to AGI faster than either studying the brain or emulating the brain. I define AGI as a system that is generally better than humans at any kind of problem solving, by the way. I don’t include stuff like consciousness, self awareness, or emotions beyond whatever contribution they make to problem solving.

As an aside - I don’t think AGI is a particularly interesting objective. If we have highly functional systems that are separately optimized but still give an overall set of capabilities that let us do anything we want then that’s good enough. Combining all of those into a single system is useful in some ways but I think that’s probably secondary to just having the capabilities at all.

Its possible that NN work will stall out again, but my guess is that we won’t hit any big walls until after NNs are outperforming people in most domains. That’s purely a trend prediction, clearly, but I’ve been looking pretty closely at this the last few years and I don’t see any signs of slowdown and I haven’t seen any compelling arguments that our toolbox is lacking any fundamental capability which can’t be aggregated by further development of stuff we have now. Rather, my sense is that the rate of useful discovery is still accelerating and that it’s likely to keep accelerating for as far ahead as I have a sense of things (maybe 5 years).

Smart people differ on this topic. A lot of people I respect have different opinions than mine.


Oh - if we had a Jupiter computer you could probably make pretty fast progress on making AGI imo. Of the 5 orders of magnitude of increase in NN computation since AlexNet only one of them went into bigger networks, the other 4 are longer training and more parameter search. If I had 6x10^30kg of computer to work with I - or rather better people than myself - could start searching the space of NN configurations which are, say, 10x the size of a human brain and trained on, say, tens of thousands of years of human equivalent experience - each. For instance we already know of techniques which likely could provide full visual cortex equivalence if we scaled them up 1M times and trained them on enough data.

Lenin was right about quantity. It’s not everything but with enough of it you can do some really interesting things.


I was going to respond to this here, but the response I wrote turned into an essay (that I had been meaning to write anyway):

In brief, the fundamental capability that today’s neural networks lack is the ability to create good explanatory theories. This ability is, I think, the essence of general intelligence. In the human brain, the ability to create explanations is made possible by a cognitive architecture nicknamed “the Joycean machine” by Daniel Dennett.

Progress in today’s neural networks could easily be orthogonal to progress toward general intelligence. Indefinite progress in narrow intelligence is possible without ever creating general intelligence. Evolution has made endless progress on narrow intelligence throughout the biosphere, while only once — possibly as a one-off fluke — evolving the ability to create explanations.

To create artificial general intelligence, we need to build an artificial Joycean machine. I think that will happen sooner by studying and copying the Joycean machine in the human brain than through neural network R&D focused on solving iteratively harder narrow problems in software and robotics. Copying the Joycean machine architecture should be faster than stumbling upon it accidentally while trying to do something else.


Mary Lou Jepsen: Openwater’s technology is able to achieve resolution down to the size of “a single neuron”, meaning that it “can read and write neuron states using light alone”, shining non-invasively through the skull.

If this is really true, it will make brain emulation a lot easier.


Good discussion of AGI in this video from 14:40 to 20:35:

The talk is by Frank Chen, a partner at Andreessen Horowitz.

Also, an interesting tweet from François Chollet cited at 26:52 in the video:

I wonder why he thinks that. :thinking:


Have heard some pretty amazing stuff about this tech, but so far there doesn’t seem to be anything either detailed or concrete out in the open. I hope it lives up to the hype. (this is about the open water stuff)


Kinda cool. Waymo blog post today echoing the points I made in my Medium post:

Fully autonomous driving systems need to be able to handle the long tail of situations that occur in the real world. While deep learning has enjoyed considerable success in many applications, handling situations with scarce training data remains an open problem. Furthermore, deep learning identifies correlations in the training data, but it arguably cannot build causal models by purely observing correlations, and without having the ability to actively test counterfactuals in simulation. Knowing why an expert driver behaved the way they did and what they were reacting to is critical to building a causal model of driving. For this reason, simply having a large number of expert demonstrations to imitate is not enough. Understanding the why makes it easier to know how to improve such a system, which is particularly important for safety-critical applications.

A causal model is pretty similar to an explanatory theory.


I don’t think it’s clear that humans primarily employ causal models when driving a vehicle. The assertion that correlation is insufficient is a statement of opinion on the part of the author. Also, while long tail phenomena are indeed an important obstacle, the holistic nature of NN driven perception leads it to soft failure more often than hard failure. That tendency to soft fail makes it intrinsically more robust against long tail phenomena than heuristically programmed approaches. I would not claim that it’s better or worse than heuristic approaches WRT long tail issues, but it’s weak points are different and not necessarily worse than heuristics.

In any case, to really evaluate these systems against long tail phenomena in a way that is straightforward to evaluate and trustworthy they have to be driven for billions of miles in the real world. Long tail phenomena are rare, as the author points out. Humans can invent them for simulation, and you can statistically perturb simulation parameters to approximate phenomena that are a few sigmas from the median, but true long tail phenomena that occur in the real world aren’t going to be tested in this manner. Billions of simulated miles are certainly helpful, but they don’t save you from having to do the real world demonstration. At what point can we expect Waymo to have billions of miles of real world data, I wonder?


The practical example they give in the paper is that is that, by default, the cars were learning to stop for stop signs based on simply extrapolating forward the past deceleration of human drivers. When they deleted the past motion history in half of the examples, and forced the neural network to use the future motion as the training output, then the cue switched from past deceleration to stop signs.

I think to say that X causes Y is to say that Y temporally follows Y in 100% of logically possible cases (including cases that don’t appear in the real world). So if X causes Y, the closer a neural network gets to assuming a 100% correlation in the real world, the closer it is to a functional understanding that X causes Y. Waymo showed that one way to teach a neural network that correlation/causation is to eliminate a confounding variable.

This is arguably no different from Skinnerian learning in animals, though. A dog learns the sound of a crinkling bag means “treat!”. But I bet if you then introduced a lot of crinkling bags in the house, the dog could learn another cue, like seeing or hearing the treat cupboard open. Causal reasoning in adult humans is (often) one-shot learning with a lot of world understanding and concept understanding. A detective finds a dead person who’s apparently been stabbed, notices a circular entry wound, and sees a small puddle on the floor. The detective then searches for/generates plausible mechanisms, hitting upon an icicle. Humans have a complex enough model of the world, and an ability to generate and test ideas on demand, that we can reason about plausible mechanisms when assessing whether X caused Y (or whether it was just a coincidence) or what caused Y.

Last night I was listening to a podcast with Yoshua Bengio, where he said he thinks a promising direction for AI is to move from a paradigm of passive observation (like ConvNets) to a paradigm of agents actively intervening in the world, and exploring relationships between causes and effects. He sees that as a way for AI to develop high-level explanations of phenomena in the world.


Agree with most of your post but your last statement causes me doubts. One could argue that with, for example aviation , we were able to fly by creating a new system that did not imitate exaclty how birds fly, but rather based on physics logic. Therefore is not clear which path Will be faster, 1 , 2 or 3. Have You guys read Pedro Domingos book about AGI, "The máster algorithm " ? Any thoughts on his ideas ?


I agree it’s possible in theory, but here’s the key question: where will the knowledge come from about the fundamental principles of intelligence? With the physics of flight, there are all kinds of objects in the world that instantiate the relevant physical principles — things like lift, drag, thrust, and weight. Physicists like Galileo and Newton were able to do experiments on everyday objects that obey the same laws as all other objects.

The knowledge about how to build an intelligent system is instantiated in intelligent systems and nowhere else. We can’t figure out the principles that make intelligent systems tick by studying non-intelligent systems. Galileo, Newton, et al. could understand lift, drag, thrust, and weight with non-flying objects like feathers and cannonballs that exhibit the same flight-relevant principles as flying objects. But non-intelligent systems don’t exhibit the same principles as intelligent systems.

Simply put, you can understand physics by studying any physical objects. But you can only understand biology by studying biological organisms — you won’t derive the Krebs cycle from feathers and cannonballs. My strong hunch is that to understand intelligence, you have to study brains.

I added his TEDx talk to my Watch Later. Thanks. :slightly_smiling_face:


You make a good point indeed. The initial inspiration comes from organic inteligente systems , and Deep neural networks are a result of that, what I wonder is if we need to fully understand and mimic human brain or It can just be loosely coupled like Deep learning. Eager to see how It unfolds, Hope I can see It in the Next 50 years . Or if healthcare improves steadly as long as I live :slight_smile:


Happy holidays everybody! I wanted to share this article on AGI, AlphaGo, OpenAI’s Dota bots, and new directions in AI research.


Just stumbled on this from Surya Ganguli at Stanford’s AI research centre:

An oft-quoted trope to argue for ignoring biology in the design of AI systems involves the comparison of planes to birds. After all, if we wish to create artificial machines that propel humans into the air, it now seems ridiculous to mimic biological ingredients like feathers and flapping wings in order to invent flying machines. However, a closer inspection of this idea reveals much more nuance. The general problem of flight involves solving two fundamental problems: (1) the generation of thrust in order to move forward, and (2) the generation of lift so that we do not fall out of the sky. Birds and planes do indeed solve the problem of thrust very differently; birds flap their wings and planes use jet engines. However, they solve the problem of lift in exactly the same way, by using a curved wing shape that generates higher air pressure below and lower air pressure above. Thus gliding birds and planes operate very similarly.

Indeed, we know that there are general physical laws of aerodynamics governing the motion of different shapes through air that yield computable methods for predicting generated forces like lift and thrust. Moreover, any solution to the problem of flight, no matter whether biological or artificial, must obey the laws of aerodynamics. While there may be different viable solutions to the problem of flight under aerodynamic constraints, such solutions may share common properties (i.e., methods for generating lift), while simultaneously differing in other properties (i.e., methods for generating thrust). And finally, while on the subject of flight, there may yet be further engineering inspiration to be gleaned from the biological control laws implemented by the lowly fruit fly. Such flies are capable of rapid aerial maneuvers that far outstrip the capabilities of the world’s most sophisticated fighter jets.

More generally, in our study of the physical world, we are used to the notion that there exist principles or laws governing its behavior. For example, just as aerodynamics governs the motion of flying objects, general relativity governs the curvature of space and time, and quantum mechanics governs the evolution of the nanoworld. We believe that there may also exist general principles, or laws that govern how intelligent behavior can emerge from the cooperative activity of large interconnected networks of neurons. These laws could connect and unify the related disciplines of neuroscience, psychology, cognitive science and AI, and their elucidation would also require help from (as well as contribute to the development of) analytic and computational fields like physics, mathematics and statistics. Indeed the author of this post has used techniques from dynamical systems theory [25–28], statistical mechanics [29–33], Riemannian geometry [34], random matrix theory [13,35], and free probability theory [36] to obtain conceptual insights into the operation of biological and artificial networks alike. However, to elucidate general laws and design principles governing the emergence of intelligence from nonlinear distributed circuits will require much further work, including the development of new concepts, analysis methods, and engineering capabilities. Ultimately, just like the story of birds, planes and aerodynamics, there may be diverse solutions to the problem of creating intelligent machines, with some components shared between biological and artificial solutions, while others may differ. By seeking general laws of intelligence, we could more efficiently understand and traverse this solution space.