OpenAI GPT-2: language generation via unsupervised learning

#1

GPT-2 has some people shook. The most apt description I’ve heard is “deepfakes for text”. Like video and photo deepfakes, there’s something weird and unsettling about GPT-2. Maybe it’s the uncanny valley of text.

The main thing folks have been talking about with regard to GPT-2 is its potential use in information warfare (e.g. Russian Twitter bots trying to influence elections). That topic is well-covered, so I want to talk about something else.

From what people are saying, GPT-2’s success is owed not to a better algorithm:

OpenAI says:

GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.

This seems to be a case where we didn’t know how powerful our tools are. Just throw more coal on the furnace and that does the trick.

You do have to be smart about the quality and diversity of the data. GPT-2 used links that had at least 3 karma on Reddit; that’s a smart way to essentially get human supervision. But that’s still a change in information input, not in the way the system processes information.

At some point current algorithms are going to run out of steam, but can anyone predict when that point will be? From an autonomous cars perspective, I find this reassuring. Let’s see how far we can get just with scale. Then decide whether we need any big breakthroughs in algorithms.

#2

To be clear: there is no algorithmic contribution here. They are “just scaling up” previous research. But I think seeing exactly how strong these scaled up models are is an important contribution in its own right. It’s easy to say in retrospect “of course more data and compute gives you better models”, but if I had shown the unicorn story to a colleague last Wednesday and told them that it was AI-generated, I don’t think they would have believed me.

#3

OpenAI co-founder and CTO Greg Brockman:

One thing that we’ve found is just a pattern that just holds across domain after domain is if you scale up a neural network with appropriate data, you tune the architecture in the right way, it will work better.

@7:11:05

#4

You can play with the a model based on public version of OpenAI GPT-2.

https://gpt2.apps.allenai.org/?text=Joel%20is

No matter what prompt text I put in, if I just keep selecting the top option for the next word, the model ends up repeating itself.

Prompt text:

Love is

Output:

a great way to get a little bit of a break from the grind.

"I’m not going to lie, I’m not going to lie to you. I’m not going to lie to you. I’m not going to lie to you. I’m not going to lie to you.

Prompt text (the example tweeted out by someone from the Allen Institute for AI):

Joel is

Output:

a former U.S. Army Ranger and a former U.S. Army Ranger Corps Ranger. He is a former U.S. Army Ranger and a former U.S. Army Ranger

Prompt text:

Philosophy is

Output:

Philosophy is a very important part of the philosophy of science. It is the foundation of all science. It is the foundation of all philosophy. It is the foundation of all philosophy. It is the foundation of all philosophy. It is the foundation of all philosophy. It is the foundation of all philosophy. It is the foundation of all philosophy.

Prompt text (from a comment I wrote on a blog post):

Nostalgia for the pre-Obama years makes me think of this MLK quote: “True peace is not merely the absence of tension: it is the presence of justice.”

The Obama administration marks the end of white Christians’ — mainly white Protestants’ — monopoly on political power in America. Tension is uncomfortable, but it is the cost of confronting racial injustice.

Output:

The Obama administration is not only a racist, but also a racist and a racist.

The Obama administration is a racist, but also a racist and a racist.

The Obama administration is a racist, but also a racist and a racist.

The Obama administration is a racist, but also a racist and a racist.

The Obama administration is a racist

Prompt text (I went to Vox.com and took the first paragraph of the first article on the page):

Food and Drug Administration head Scott Gottlieb resigned on Tuesday — leaving a hole not just in the country’s top agency for food and drug regulation, but also in the Trump administration’s response to the opioid epidemic.

Output:

The move comes after the FDA announced it would not be enforcing the Food and Drug Administration’s (FDA) recommended guidelines for the use of antibiotics in humans.
The move comes after the FDA announced it would not be enforcing the Food and Drug Administration’s (FDA) recommended guidelines for the use of antibiotics in humans.

Prompt (from Wikipedia):

A self-driving car, also known as a robot car, autonomous car, or driverless car, is a vehicle that is capable of sensing its environment and moving with little or no human input.

Autonomous cars combine a variety of sensors to perceive their surroundings, such as radar, Lidar, sonar, GPS, odometry and inertial measurement units. Advanced control systems interpret sensory information to identify appropriate navigation paths, as well as obstacles and relevant signage.

Output (this is the best one yet):

The autonomous car is also capable of driving autonomously, and can be programmed to automatically adjust its speed, braking and steering.

The autonomous car is also capable of driving autonomously, and can be programmed to automatically adjust its speed, braking and steering.