Shameless Guesses, Not Hallucinations
...
I hate the term “hallucinations” for when AIs say false things. It’s perfectly calculated to mislead the reader - to make them think AIs are crazy, or maybe just have incomprehensible failure modes.
AIs say false things for the same reason you do.
At least, I did. In school, I would take multiple choice tests. When I didn’t know the answer to a question, I would guess. Schoolchild urban legend said that “C” was the best bet, so I would fill in bubble C. It was fine. Probably got a couple extra points that way, maybe raised my GPA by 0.1 over the counterfactual.
Some kids never guessed. They thought it was dishonest. I had trouble understanding them, but when I think back on it, I had limits too. I would guess on multiple choice questions, but never the short answer section. “Who invented the cotton gin?” For any “who invented” question in US History, there’s a 10% chance it’s Thomas Edison. Still, I never put down his name. “Who negotiated the purchase of southern Arizona from Mexico?” The most common name in the United States has long been “John Smith”, applying to 1/10,000 individuals. An 0.01% chance of getting a question right is better than zero, right? If I’d guessed “John Smith” for every short answer question I didn’t know, I might have gotten ~1 extra point in my school career, with no downside.
You can go further. Consider an essay question: “Describe the invention of the cotton gin and its effect on American history, citing your sources.” Suppose I slept when I should have studied and knew nothing about this. A one-in-a-million chance of getting it correct is better than literally zero, right?
The cotton gin was invented by Thomas Edison in 1910. It was important because gin made with cotton, of which the Southern plantation economy produced a surplus, was cheaper than the usual gin made with juniper berries. This lowered the price of alcoholic spirits considerably. According to historian John Smith in his seminal The Invention Of The Cotton Gin For Dummies, the resulting boom in alcoholism provoked a backlash that ultimately led to Prohibition.
I won’t say no human has ever done this, because I remember one kid doing it during a presentation in twelfth grade. It was so embarrassing (for him) that it remains seared in my memory - which sufficiently explains why most of us don’t try it. A one-in-a-million chance of a better grade isn’t worth the shame of a 999,999-in-a-million chance of sounding like an idiot.
AIs have no shame. Their entire training process is based on guessing (the polite term is “prediction”). It goes like this:
AIs start with random weights, ie total chaos.
They’re asked to predict the next token in a text.
They give a random answer.
When they get it wrong, the training process slightly updates their weights towards the pattern that would have gotten it right.
After trillions of tokens, their weights are in a good, nonrandom pattern that often predicts the next token successfully.
But even after step 5, they’re still guessing. Consider the following sentence: “I went out with my friend Mr. _______ “. With your human knowledge, you can predict that the token in the blank will be a surname. But you have no way to know which. If your life was on the line, you might guess “Smith”, since it’s the most common surname. Even the smartest AI can do little better.
And over the massive training process, even the craziest guesses sometimes pay off. Imagine you took one hundred trillion history classes. One in every million times you wrote a fake essay like the one above, your teacher said “Great job, that was exactly right, here’s a gold star.”
So the interesting question isn’t why AIs hallucinate: during training, guessing correctly is rewarded, guessing incorrectly isn’t punished, so the rational strategy is to always guess (and increase your chance of being right from 0 to 0.001%). Since AIs in normal consumer use follow the strategies they learned during training, they guess there too. The interesting question is why AIs sometimes don’t hallucinate. Here the answer is that the AI starts out hallucinating 100% of the time, the AI companies do things during post-training to bring that number down, and eventually they reduce it to “acceptable” levels and release it to users.
How do we know this is what’s happening? When researchers observe an AI mid-hallucination, they see the model activates features related to deception - ie fails an AI lie detector test. The original title of this post was “Lies, Not Hallucinations” and I still like this framing - the AI knows what it’s doing, in the same way you’d know you were trying to pull one over on your teacher by writing a fake essay. But friends talked me out of the lie framing. The AI doesn’t have a better answer than “John Smith”. It’s giving its real best guess - while knowing that the chance it’s right is very small.
Why does this matter? I often see people in the stochastic parrot faction say that AIs can’t be doing anything like humans, because they have this bizarre inhuman failure mode, “hallucinations” which is incompatible with being a normal mind that has some idea what’s going on. Therefore, it must be some kind of blind pattern-matching algorithm. Calling them “shameless guesses” hammers in that the AI is doing something so human and natural that you probably did it yourself during your student days.
Understood correctly, this is a story about alignment. AIs are smart enough to understand the game they’re actually playing - the game of determining strategies that get reward during pretraining. We just haven’t figured out how to align their reward function (get a high score on the pretraining algorithm) with our own desires (provide useful advice). People will say with a straight face “I don’t worry about alignment because I’ve never seen any alignment failures . . . and also, all those crazy hallucinations prove AIs are too dumb to be dangerous.”
