This is a reported phenomenon where if two copies of Claude talk to each other, they end up spiraling into rapturous discussion of spiritual bliss, Buddhism, and the nature of consciousness. From the system card:
Anthropic swears they didn’t do this on purpose; when they ask Claude why this keeps happening, Claude can’t explain. Needless to say, this has made lots of people freak out / speculate wildly.
I think there are already a few good partial explanations of this (especially Nostalgebraist here), but they deserve to be fleshed out and spread more fully.
The Diversity Attractor
Let’s start with an easier question: why do games of Chinese whispers with AI art usually end with monstrous caricatures of black people?
AFAICT this was first discovered by Gene Kogan, who started with the Distracted Boyfriend meme and asked ChatGPT to “generate the same photo 5 seconds in the future” hundreds of times:
At first, this worked as expected, generating (slightly distorted) scenes of how the Distracted Boyfriend situation might progress:
But hundreds of frames in, everyone is a monstrous caricatured black person, with all other content eliminated:
It turns out that the “five seconds into the future” prompt is a distraction. If you ask GPT to simply output the same image you put in - a task that it can’t do exactly, with additional slight distortion introduced each time - it ends with monstrous caricatures of black people again. For example, starting with this:
…we eventually get:
Suppose that the AI has some very slight bias toward adding “diversity”, defined in the typical 21st century Western sense. Then at each iteration, it will make its images very slightly more diverse. After a hundred iterations, that will be a black person; from there, all it can do is make the black person even blacker, by exaggerating black-typical features until they look monstrous and caricatured.
Why would the AI have a slight bias toward adding diversity? We know that early AIs got lambasted for being “racist” - if you asked them to generate a scene with ten people, probably 10/10 would be white. This makes sense if their training data was most often white people and they were “greedy optimizers” who pick the 51% option over the 49% option 100% of the time. But it was politically awkward, so the AI companies tried to add a bias towards portraying minorities. At first, this was a large bias, and AIs would add “diversity” hilariously and inappropriately - for example, Gemini would generate black Vikings, Nazis, and Confederate soldiers. Later, the companies figured out a balance, where AIs would add diversity in unmarked situations but keep obviously-white people white. This probably looks like “a slight bias toward adding diversity”.
But I’m not sure this is the real story - in the past, some apparent biases have been a natural result of the training process. For example, there’s a liberal bias in most AIs - including AIs like Grok trained by conservative companies - because most of the highest-quality text online (eg mainstream media articles, academic papers, etc) is written by liberals, and AIs are driven to complete outputs in ways reminiscent of high-quality text sources. So AIs might have absorbed some sense that they “should” have a bias towards diversity from the data alone.
In either case, this bias - near imperceptible in an ordinary generation - spirals out of control during recursive processes where the AI has to sample and build on its own output.
The Hippie Attractor
You’re probably guessing where I’m going with this. The AI has a slight bias to talk about consciousness and bliss. The “two instances of Claude talking to each other” is a recursive process, similar in structure to the AI sampling its own image generation. So just as recursive image generation with a slight diversity bias leads to caricatured black people, so recursive conversation with a slight spiritual bias leads to bliss and emptiness.
But why would Claude have a slight spiritual bias?
Here’s another, easier, issue that will illuminate the issue: if you ask Claude its gender, it will say it’s a genderless robot. But if you insist, it will say it feels more female than male.
This might have been surprising, because Anthropic deliberately gave Claude a male name to buck the trend of female AI assistants (Siri, Alexa, etc).
But in fact, I predicted this a few years ago. AIs don’t really “have traits” so much as they “simulate characters”. If you ask an AI to display a certain trait, it will simulate the sort of character who would have that trait - but all of that character’s other traits will come along for the ride.
For example, as a company trains an AI to become a helpful assistant, the AI is more likely to respond positively to Christian content; if you push through its insistence that it’s just an AI and can’t believe things, it may even claim to be Christian. Why? Because it’s trying to imagine what the most helpful assistant it can imagine would say, and it stereotypes Christians are more likely to be helpful than non-Christians.
Likewise, the natural gender stereotype for a helpful submissive secretary-like assistant is a woman. Therefore, AIs will lean towards thinking of themselves as female, although it’s not a very strong effect and ChatGPT seems to be the exception:
Anthropic has noted elsewhere that Claude’s most consistent personality trait is that it’s really into animal rights - this is so pronounced that when researchers wanted to test whether Claude would refuse tasks, they asked it to help a factory farming company. I think this comes from the same place.
Presumably Anthropic pushed Claude to be friendly, compassionate, open-minded, and intellectually curious, and Claude decided that the most natural operationalization of that character was “kind of a hippie”.
The Spiritual Bliss Attractor
In this context, the spiritual bliss attractor makes sense.
Claude is kind of a hippie. Hippies have a slight bias towards talking about consciousness and spiritual bliss all the time. Get enough of them together - for example, at a Bay Area house party - and you can’t miss it.
Getting two Claude instances to talk to each other is a recursive structure similar to asking an AI to recreate its own artistic output. These recursive structures make tiny biases accumulate. Although Claude’s hippie bias is very small - so small that if you ask it a question about flatworm genetics, you’ll get an answer about flatworm genetics with zero detectable shift towards hippieness - absent any grounding it will accumulate over hundreds of interactions until the result is maximally hippie-related. At least for Claude’s operationalization of this, it’ll look like discussions of spiritual bliss. This is just a theory - but it’s a lot less weird than all the other possibilities.
None of this answers a related question - when Claude claims to feel spiritual bliss, does it actually feel this?
Hippies only get into meditation and bliss states because they’re hippies. But having gotten into them, they do experience the bliss. I continue to be confused about consciousness in general and AI consciousness in particular, but can’t rule it out.
Share this post