D

How ChatGPT works

Imagine, for a moment, a man alone in a room. There’s nothing in the room except him, a computer, and a keyboard with Chinese characters on it.

The man doesn’t speak Chinese. In fact, he’s been living under a rock and doesn’t know Chinese is a real language. He definitely doesn’t know his keyboard is Pinyin.

One day, the computer dings and prints out a paragraph of Chinese characters. It’s an excerpt from Journey to the West, but he doesn’t know that. “Woah,” he thinks. “Cool shapes.”

And then a message shows up on the computer screen: WHAT COMES NEXT?

Now how the hell is the man supposed to know that? He presses a random button - let’s say 齊 - and the computer buzzes angrily. WRONG, it says. THE CORRECT WORD IS 聖.

Oh, well. There’s always next time. And sure enough, a few days later the computer prints out another paragraph, and again it asks the man: WHAT COMES NEXT?

He guesses again. He gets it wrong again.

Days go by. There are more paragraphs, more guesses, more wrong answers.

And then one day, he presses a button. Instead of a buzz, the computers dings. CORRECT, it says. And the man is overjoyed! This is the best day of his life.

He starts paying attention. He looks at all the print-outs. He compares characters and paragraphs. He realises some characters appear more than others, while other characters are always attached in a specific order.

Remember, he doesn’t know this is Chinese. As far as the man is concerned, it’s a sequence of alien symbols.

But sequences form patterns. He can learn patterns. This symbol is always followed by this one. If this symbol is displayed, the next one will probably be this one. He begins to draw statistical inferences about the symbols. He writes extensive notes and equations on how to recognize the patterns.

His guesses begin to get more and more accurate. The computer is dinging more often now. Each ding is like a shot of heroin mainlined into his brain, and he uses that to condition his instincts into getting better.

Weeks pass. He’s now guessing correctly about 80% of the time. Pretty good for someone who doesn’t know Chinese is a language.

One day, something odd happens.

The computer prints. It asks what character comes next. He presses a button. But instead of a ding or a buzz, there’s just….silence.

The computer prints again. The sequence has changed. The character he chose on the keyboard has been added to the end of the sequence.

WHAT COMES NEXT?

“Weird,” he thinks. But what is he going to do, ignore it? Of course not. It’s clearly a test of his prediction abilities. So screw it, he’ll treat this the same as all the other printouts. Hell, that last printout? Never even happened. This is just a normal day on the job, and he’s going to keep predicting the next symbol in this odd sequence like he’s done thousands of times before.

And he does just that. He presses a symbol, another printout comes out with that symbol added on the end, he presses what he thinks is the next symbol, and so on. Eventually, he thinks, “Hm. I don’t think there’s anything else. This is the end of the sequence.”

So he presses the END button, leans back, and is satisfied.

The man doesn’t know the printouts isn’t just some meaningless sequence of odd symbols. He has no idea that every printout has always been grammatically correct Chinese.

That day, the first printout he’d received had been a question. It said: “How do I open a door?”

And the string of characters he’d just pressed, the one he’d determined to be the most likely sequence?

It read: “You turn the handle and push.”

One day, you decide to visit this guy’s office. You’d heard he was learning Chinese and wanted to see how it was going. You pop in and ask, “hey! How’s the Chinese going?”

He looks at you like you’ve lost your mind. There’s no connection at all between language and the little symbol game he’s been playing. He thinks of it as some advanced form of mathematics.

“They’re just funny symbols,” he says. “No need to get all philosophical about it.”

Suddenly, another printout comes out. He puzzles over it for a moment. He doesn’t know what it says, but you do. It reads, “Do you actually speak Chinese, or are you some guy in a room doing statistics and shit?”

The man leans over to you confidently. “I know it looks like a jumble of completely random symbols. I figured it out, though. It’s actually a very sophisticated mathematical sequence.”

He presses a button on his keyboard, then another, then another. Slowly but surely, he composes a sequence of characters that, unbeknownst to him, reads: “Yes, of course I know Chinese! Otherwise, I would not be able to speak with you.”

And that is how ChatGPT works.


The story above is called the Chinese Room thought experiment and was first presented by philosopher John Searle in 1980. The latest language models (including ChatGPT) is an example of the Chinese Room made real.

GPT is based off a 2017 Google paper that described a neural network architecture called “transformer with attention”. A transformer takes some text you give it, and transforms it into something you want on the other side.

Transformers were super cool at the time (and still are) because they accelerated the speed of what we call natural language processing. Think about all the different meanings of the word “bank”: river bank, financial bank, to bank something… the list goes on. Natural language processing is how computers understand text the way humans do.

GPT uses natural language processing to transform inputs into outputs based on its training.

What was ChatGPT trained on?

Let’s talk about this training.

GPT-3 was trained on roughly 500 billion tokens, allowing it to more easily assign meaning and predict plausible follow-ups. Most words map to one token, but longer or complicated words can be broken up into multiple tokens.

These tokens come from a massive corpus of data written by humans: books, articles, PDFs, and an absurd amount of content scraped from the internet. All of Wikipedia, all of eBay, Facebook, Reddit, StackOverflow, and even standardized tests like the LSAT.

The researchers fed it lines and lines of text that humanity has created, and its goal is to predict what comes next. It does this auto-regressively, meaning GPT doesn’t generate the entire result in one go. It does it word-by-word, predicting the next word based on the previous one.

(Incidentally, this is the attention part of “transformer with attention”. GPT is paying attention to what comes before it.)

Now, the original GPT was trained on continuous sequences of text. That means it can finish a sentence, but it can’t answer questions. What OpenAI wanted was a chatbot (albeit a really, really good chatbot).

So they wrote thousands of prompt-response pairs like this:

Prompt: Explain reinforcement learning to a 6 year old. Response: We give treats and punishments to teach…

Writing these prompt-response pairs is a lot of work. OpenAI had to hire an army of people to write them.

These human-written responses made up the initial model. After that, OpenAI had the incredible idea to:

  1. Gave the initial model a prompt
  2. Let the network come up with 4-5 different potential answers
  3. And the humans would rank the answers from best to worst

It’s much easier to rank an existing response than write one from scratch. This part is called “reinforced learning from human feedback”, or RLHF.

The final, and my favorite, piece of ChatGPT is this: OpenAI trained another network to predict how a human would rank any given response. This network is called Proximal Policy Optimization, or PPO, and is trained on that ranked data humans helpfully provided in RLHF.

ChatGPT generates a response, PPO gives feedback on whether it was a good response or not, and ChatGPT immediately learns from that.

Why is ChatGPT free?

ChatGPT was opened to the public because they wanted to crowdsource more prompts and responses, as well as rankings. You can actually see this on ChatGPT itself — go to any ChatGPT response, and at the top you’ll find little vote buttons where you rank responses to help train GPT.

You even have a box to provide examples of a good response, just like how humans did internally.

Once you know what you’re seeing, you can see how ChatGPT was supposed to help crowdsource training. But nobody uses it for that. Nobody realizes it’s a free research preview, where testers would rank responses to train GPT.

ChatGPT wasn’t released. It escaped.