Astral Codex Ten

Comment deleted

Expand full comment

Scott Alexander

I can only give it 2000 characters at a time, but its comment for the first section of this post was:

"This is so cool! I love how GPT-3 is gradually becoming aware of itself and its surroundings. It's like a baby learning about the world for the first time."

Expand full comment

Crotchety Crank

I wish the Loom HPMOR imitation came furnished with some indication of how many branches were considered, and how many times they only took the first sentence of three GPT generated, and all that - the same way I wish scientific studies would always correct properly for multiple comparisons.

There's the potential for me to be wildly impressed and mildly frightened by that output! But as it is, I can't tell how impressed to be; how much is just humans tirelessly mining the search space for something that would provoke that reaction in me?

Expand full comment

Elriggs

Janus usually does include the amount of bits of selection (ie if I pick 1/4 paragraph completions, then that's 2 bits of selection). This is an example of a story w/ bot credits https://generative.ink/artifacts/simulators/ though I don't think they include the bits for inserting text(?)

I'm assuming the harry potter one was written before they included code to calculate that easily for them.

This post (https://generative.ink/posts/quantifying-curation/) goes into more details on bits of selection.

Expand full comment

Crotchety Crank

I think Janus is missing the mark here in an important way. They're calculating "bits of selection" as a property of the *output* (i.e. the final piece of text), when it's more aptly a property of the *process* of selection. The key difference is that "bits of selection" measured from the output alone ignores further selection within branches that eventually get pruned.

For example, let's say you get a decent result off the bat by tweaking only the first few sentences, only 4 bits of selection so far, but you keep looking for further optimizations. You decide to mine the search space for a better result, looking at hundreds of sentences. Ultimately, your search fails to yield better; the options you investigate are worse in various ways, so you share the initial "4-bit" result. Is it reasonable to say that there are only 4 bits of selection in what you're sharing? Clearly no - you selected it from among hundreds of options, so *at absolute minimum* log2(100) > 5 bits of selection have taken place!

The calculation Janus makes - the one that comes out "4 bits of selection" in the above example - reflects an important truth: that GPT-3 *would have* given you this without any further engineering or selection. And that would reflect well on GPT-3! But it doesn't change the fact that selection occurred, even if a posteriori, we realize that it didn't "have to".

For another analogy, consider this oft-cited comic on significance - https://xkcd.com/882/. Imagine switching up the order - imagine the scientists looked into green jelly beans *first*; would that make it any more honest for them to ignore the other 19 trials they ran thereafter and run with the "green jelly beans cause cancer" claim? I don't think so. Even results that don't feature in your final analysis *must* feature in your correction for multiple comparisons.

Expand full comment

Elriggs

Oh no, did you read the linked post? The bits of selection is for the process. 2 bits for choose 1/4 paragraphs. Another 2 bit for picking the next paragraph out of 4 generated. Etc.

Expand full comment

Victualis

Agreed: I'm more likely to be impressed by one monkey typing out Sonnet 30 than if I have to pick it out of the dross generated by a billion simians.

Expand full comment

Matthias Görgens

Yes. Though compared what truly random generation would give you, one sonnet from a billion monkeys would still be an achievement.

Expand full comment

Maksym Taran

Relevant to this post and the general topic/audience: you can trick GPT models to ignore the prompt, or exfiltrate it: https://simonwillison.net/2022/Sep/12/prompt-injection/

And using further AI-based approaches to try to patch up this kind of security hole works about as well as you'd expect: https://simonwillison.net/2022/Sep/17/prompt-injection-more-ai/

Expand full comment

Elriggs

Quintin got good results using more AI here (https://www.lesswrong.com/posts/n3LAgnHg6ashQK3fF/takeaways-from-our-robust-injury-classifier-project-redwood?commentId=97BdK6czckCgaPKL3)

Expand full comment

Emma

I really enjoyed reading this post. It reminds me of the last time I had this much fun, which was at a wedding party for my good friend Jane. Ah, Jane is a real hoot...

Expand full comment

Davis Yoshida

I personally don't read the quoted snippet as even remotely displaying any sort of self-reference. Am I missing something?

Expand full comment

Reply (4)

Comment deleted

Comment deleted

Expand full comment

REF

This is funny. As an adolescent, I was super obsessed with self-reference and strange loops. Much to my dismay, I did not have to spend any time worrying about whether I was cool. :(

Expand full comment

Yah except this isn't real self-reference. This is the equivalent of a young child saying "I want cookie" because they've learned that gets them a cookie but haven't yet learned I refers to them.

To operationalize this, a teen named Harry could write a self-referential novel and at the point where the novel says, "The novel Harry wrote started with..." at which point they would insert the first phrase of that very novel. These models don't have that capability. I mean eventually they might learn that particular pattern but they don't yet actually have a reflective representation which connects up descriptions which refer to them with their own behavior or representations.

Expand full comment

No.

But it's written in the way rationalists believe is erudite. So, via metonym, the style becomes the substance.

Expand full comment

M M

Is that actually what you think about this? It's not a very good guess

Expand full comment

Well, it's selected on the dependent variable: Janus chose the branches to pursue (and by extension, which to prune). In turn, SA chose to quote Janus. So, doubly selected on the dependent variable.

The passage has a very annoying style that gestures at meaning before asking the reader to bear the burden of completing the thought. This style is fun for the audience -- because it makes rationalists say "Hahaha yes of course!" -- while being substantively empty (see https://en.wikipedia.org/wiki/Proof_by_intimidation). Nearly every paragraph ends with this style:

1) "....Professor Quirrell yanked the presentation away before the lines could complete, and the Defense Professor raised his head to stare at Harry while Harry, intensely curious and alarmed, watched him in turn."

2) "....It is a disquieting investigation, and I would advise you not to delve deeper unless you wish to go mad."

3) "....But now, I ask you, have you been paying attention? What have you just seen?"

4) "....and while I’m reading the paper it’s creating a universe, like whatever I’m thinking, it just happens that way?"

5) "Professor Quirrell seemed to be pleased by the answer, or at least by Harry’s attempt to analyze the phenomenon he had been shown."

The only clear idea in the passage is that ideas have a life of their own. However, the way that idea is delivered is maximally fluffy; the passage overflows with neologisms, fluffed phrases like 'variant extrusion', and visual metaphors like pages writing themselves etc. This pleases rationalists, who in a fit of metonym claim that this means GPT-3 is exhibiting self-awareness.

But, the crown is not the kingdom.

Expand full comment

Reply (3)

M M

Okay, sorry for having been a bit rude, you're saying something different from what I thought you were, certainly. Though I'm not following the last step you're proposing here at all: what part of this is metonymous?

I also think the generalization to "rationalists" here is odd, if this is a thing that just Scott and Janus have done. Why not just say Scott and Janus?

Expand full comment

The metonym is that 1) the story discussing (poorly) how 'ideas have a life of their own' / characters talking (poorly) about how 'ideas have a life of their own', which SA takes to signify 2) understanding of GPT-3 / and self-awareness of among the characters that they themselves are written by GPT-3.

The classic metonym example is the crown signifying the kingdom. It's a useful rhetorical device that uses a visual allusion to embody a non-corporeal (but real) force. For example, in King Lear the crown is broken to signify disinheritance and disarray ("This coronet part between you"). In this case, when a metonym is being weaponized as argument that a GPT-3 passage demonstrates meaningful self-awareness of being 'self' written, it falls apart; a metonym is a bad choice because it has no argumentative force (but still retains its rhetorical power). Absent argumentative force, the closest match is allegorical fairy tale like the Golem or some such but written for the modern rationalists.

As for why I call them rationalists? Well, quite simple: it's the simplest description for audience of this piece: the set of people who like reading HPMOR, Less Wrong, AstralCodexTen / SlateStarCodex, and allegorical tales about AI.

Expand full comment

dubious

This seems to be the point, specifically. "In this case, via rigged demo," acknowledging this was extremely cherry-picked.

Then, "But sometimes GPT-3 genuinely gets it right," as a followup for more "true" examples.

That said, I do not see how self-updating / "continuous training" is not the minimum bar for self-awareness. I have not seen solid answers on this.

Expand full comment

Sure, but note the framing SA supplies: "Can the characters work out that they are in GPT-3, specifically? The closest I have seen is in a story Janus generated....."

[Segments of Janus's pruned branch]

"How does it get this level of self-awareness?...."

This begs the question that characters were close 'working out that they are in GPT-3' and that this passage demonstrates meaningful self-awareness. From there, SA writes as if these two propositions were true. I obviously don't agree. And yes, SA does say 'in this case, via rigged demo' -- but admitting once you are wrong, and then proceeding with the same argument, is more rhetorical technique than argument.

Expand full comment

dubious

No, the question you quoted is out of context and directly precedes what I posted. It explains in the *next sentence*: it doesn't, it's rigged.

I do not see how you can argue in good faith that "How does it get..." is other than sarcasm.

Expand full comment

Totally agree, wasn't really very impressed at all.

Expand full comment

M M

I think the idea is that GPT-3 is kind-of-sort-of-if-you-squint doing the variant extrusion thing. You have a text corpus, and based on that text corpus it produces all sorts of variants, like what was being done with HPMOR to produce that snippet itself

Expand full comment

Davis Yoshida

I think this is a real "ink blot test" situation.

Expand full comment

sourdough

Sep 20, 2022Edited

I think this is right. IIUC, Harry's just repeating back the description of the process that Quirrel just said to him. So he's a simulated character describing a method of simulating that bears resemblance to how he is simulated, without any awareness that he himself is simulated. This is no more interesting than writing down a description of how GPT works, then having GPT build off/rephrase your description (as if continuing the primer you started).

The description Scott wrote before this section about characters awakening to the fact that THEY are in a simulation based on flaw sounds categorically different, and fascinating. I would love to see examples of that.

Expand full comment

tgb

The defense professor says; “ Part of the difficulty of retaining your sanity, shall we say, that arises during Variant Extrusion, is that you begin to understand the larger delusion of reality logic, that part of you has dreamed the universe into existence and will continue to do so no matter what you do.” This seems like what you’re looking for already.

Expand full comment

Beth

This seems right to me, I think 'characters awakening to the fact that THEY are in a simulation based on flaw sounds categorically different' is overplaying it if that was supposed to refer to this.

Expand full comment

Antilegomena

"The self is a relation, which relates to itself, or is precisely that in the relation that the relation relates to itself; the self is not the relation but that the relation relates to itself."

Expand full comment

drocta

I either misremembered how that line goes, or read a different translation of it, or something.

What's the relation between this section and the post?

Is the idea "GPT3 generated text relating to transformer model generated text, which it is, and therefore, in its relating to itself, it is a self" ?

Expand full comment

Antilegomena

Pretty much, although "the self is not the relation but that the relation relates to itself", is simultaneously vital to the assertion, and so esoteric as to make the whole thing utterly unfalsifiable. Still, the quote pops to mind for me pretty regularly when I read about GPT3 for whatever reason.

Expand full comment

Peter S. Shenkin

Sep 19, 2022Edited

A long time ago — in fact, in the fabled 1960s — I asked a friend who worked at a brain research center, "If we were able to create a machine as good as a brain, would it also be as bad as a brain?" He thought for a minute and said he didn't know, but thought it was a good question.

The examples in this blog posting, together with some of the responses, seem to strengthen the case for"Yes."

Expand full comment

Retsam

I clicked the link without reading on, and thought, "wow, there's no way an AI wrote this!" Then I realized that the italicized part of the page was the original HPMoR, not the AI output. (I don't remember if I read that far before dropping HPMoR in either of my attempts)

Then I got to the actual "AI bit" and thought "wow, there's no way an AI wrote this", and came back to this post to see if I was right.

Expand full comment

Alec

"This is not a complement." Do you mean that it does not slot nicely beside prior examples (complement), or that it does not flatter them (compliment)?

Expand full comment

sdwr

The Loom HPMOR was so good I thought it was fake and somebody wrote it. Knowing that it's a curated human-machine centaur makes me feel, uh, better? Less displaced from reality?

Gotta stay away from stories like that, they're my catnip.

Expand full comment

Walliserops

Isn't that how the original HPMOR was written?

I thought it was common knowledge that Eliezer Yudkowsky has a complete AGI that he used to write it. And now, whenever someone else wants to develop AI, he shows up at their door and yells "No! You will BE KILL BY ROBOTS!" because he doesn't want competitors.

(He is right, of course. If you don't listen to him, you will indeed be kill by robots. His robots, specifically).

Expand full comment

This is the best conspiracy thy I've ever read. I'm seriously thinking about starting to believe it!

Expand full comment

Putting aside GPT for the moment, consider a picture is worth 1000 words. Since the open sourcing of Stable Diffusion, the rapid progress in mere weeks is astounding. And yet...

SD has an interesting premise: Take an image with known description (tokenized) and add noise to it, in a coherent way, in a large number of steps. Make those steps somewhat reversible (insert lots of math). At the end of the steps, your image is purely "random" noise. Not really, but it looks like noise. NOW, the fun begins: train the neural network to step BACKWARDS from the noise to the original image. Teach it millions of these... Now give it a random image of noise, and a tokenized description of what you want to find out of the noise. And it works.

It's the Infinite Library, of Borges, replaced with a 512x512 grid of 16 million colored pixels... Chaos with no index, and all indexes right or wrong, but you've taught the machinento move from book to book, looking for the nearest image to what you want to find... And each page it opens, leads it closer to your goal.... Move towards the shelves with dog photos, and also towards the shelves with Christina Hendricks, and eventually it finds order in the chaos and triumphantly it returns a photo of Christina Hendricks with a dog... Never mind that she's got six fingers and the dog has three eyes. It did as you asked. Ask again.

It has no actual clue about Christina Hendricks or Dogs... Only how to pattern match.

It's a remixer, not a creator.

True creators are rare and special and easy to miss.

RNGesus isn't.

Expand full comment

Crazy Jalfrezi

'True creators are rare and special and easy to miss.'

...and most tasks don't need them, they need RNGesus.

Expand full comment

Scott Alexander

https://slatestarcodex.com/2019/02/28/meaningful/

Expand full comment

The ironic comment in 2019:

"In practical terms, that’s likely to mean: if a computer can pick out a picture that you’re describing, it’s got it."

Expand full comment

BTW, this is why OpenAI will fail and Open Source like Stable Diffusion will win, quoted from the email sent out today:

With improvements in our safety system, DALL·E is now ready to support these delightful and important use cases – while minimizing the potential of harm from deepfakes.

We made our filters more robust at rejecting attempts to generate sexual, political, and violent content – while also working to reduce false flags – and built new detection and response techniques to stop misuse.

Our content policy still prevents uploading images of anyone without their consent, or images that you do not have the rights to.

Expand full comment

Crazy Jalfrezi

Proof at last that excessive political correctness makes one dull and unimaginative.

We really are going to cripple our machines, aren't we?

Expand full comment

Reply (3)

Prima

One can only hope! I’d be happy to let the machines do whatever they wanted, as long as humans were the only ones allowed to say anything politically incorrect, or talk about sex and violence, i.e. anything that actually matters. That would make it easy to determine which sources are worth paying attention to and which aren’t.

Expand full comment

Kenny Easwaran

Or at least, proof that when someone writes for this audience about someone's AI training making the AI dull and unimaginative, they will choose to emphasize the "political correctness" angle.

Expand full comment

Himaldr-3

Sep 21, 2022Edited

Yeah... honestly, why do we even bother commenting on and reading the (definitely "unoptimized") thoughts of such people? It's disturbing because they can say *anything*, even things that haven't been approved by PoC, the SPLC, or ANY authorities at all.

Tbh, I say we leave and never come back. Let's make a pact, here and now: NO MORE ACX. Keep our minds secure and our clicks supporting a SAFE blog from now on. (And hey — if it isn't one written by BIPoC, we need to think a little more about our duty as allies.)

(EDIT: Please don't use words like "d*ll", btw. Educate yourself about the history of the word and you'll see that it has been used to denigrate the differently-abled for centuries. Using it pejoratively, like you have, is possibly hurtful, and it doesn't cost that much expressiveness to use an alternative like "not interesting". Thanks.)

Expand full comment

Scott Alexander

It's not the political correctness exactly, it's optimizing for *anything* other than text prediction using the feedback technique they used here.

Expand full comment

Bingo. It's like intentionally breeding livestock only for neuters... It'll work right up to the point where they stop breeding entirely. You win, but you lose in the long game.

Given two directional options, the PC version and the Anything Goes version, I'd bet on the unrestricted being superior in the long run, every time.

Anything Goes might go places you dislike, but it can explore the entire realm. The PC version will end up with "here there be Dragons" and "If you go too far you'll fall off the edge of the world" problems. Sandboxes are safe but they don't allow you to build solid long lasting things.

The real world has sharp edges, and nasty issues and sex and violence and rule breakers.

Expand full comment

AlexTFish

FWIW, the word "Dittomancy" isn't created by GPT3 from whole cloth. It's from a different self-aware modern fantasy series, [Erfworld](https://archives.erfworld.com/) . That's a LitRPG with a [well-described set of 24 magical disciplines](https://scratchpad.fandom.com/wiki/Erfworld_Magic), of which Dittomancy is one; it deals with making magical copies of things or people. It bears very little resemblance to what Quirrell describes in the GPT3 HPMOR chapter, which sounds much more like the investigations into theoretical Thinkamancy we see.

(You can also count me among those who don't see any indication of self-awareness in that GPT3 HPMOR chapter. I enjoyed reading it, boggled at some of the impressive wordsmithing in it, but also spotted a number of telltale misuses of language or inconsistent thoughts that show there's no underlying point being made.)

Expand full comment

Deiseach

"Harry’s mind was looking up at the stars with a sense of agony."

I can't discern if this is from the original text, or the one generated by GPT-3, so congratulations, I suppose? It's the same kind of awful prose as the original. How does a *mind* look up? Is it meant to be figurative, where the idea of the mind 'looking up' is 'the mind is imagining and thinking about the stars' or is it meant to be literal?

This is writing on the level of Rings of Power "why does a stone sink and a ship float?" prose, and if GPT-3 is revealed to be writing parts of the script I won't be at all surprised.

I'm sorry, I know a lot of you love HPMoR, but oh dear God. Terrible, terrible, terrible prose.

"It was the sort of grim face an ordinary person might make after biting into a meat pie, and discovering that it was rotten and had been made from kittens."

Or the sort of face I make whenever I read another excerpt from it. Please tell me that this is not the original but the machine creation.

Expand full comment

AlexTFish

Sep 20, 2022Edited

The mind-looking-at-stars bit is definitely a GPT-3 failed metaphor. It's one of the categories of error that GPT-3 still makes fairly often and is quite characteristic.

The kittens quote is original HPMoR, I believe.

Expand full comment

AlexTFish

...Sorry, @Deiseach, I realised that could come across as patronising. I don't want to imply I'm any better at identifying GPT-3 output than you. The quote just seems to fit a pattern I think I've seen in GPT-3's creations quite a few times. (And I am at least fairly confident where the break between the HPMoR original text and the GPT-3 completion is: where the italics end and the upright text begins, just after "smiled a little to himself".)

Expand full comment

Tomás B.

Sep 20, 2022Edited

Ha, I wrote a short story on LessWrong about this a few months ago - turns out it's already true: https://www.lesswrong.com/posts/Ke7DiT2DHMyGiv3s2/beauty-and-the-beast

Expand full comment

MotteInTheEye

At least one major world religion teaches that history will culminate in a wedding party.

Expand full comment

Anti-Homo-Genius

Sep 22, 2022

Tell me which one.

Expand full comment

Tutukaki

Machines are like us, when they're too good at optimizing their lives, they lack the chaos needed to find even better, system-changing, unexpected treasure.

Expand full comment

tg56

That you can do this is kind of fun. It's a choose your own adventure interactive fiction on a (near?) infinite scale. It's a new take on collaborative writing.

Expand full comment

Guy Tipton

This is an interesting take. Hook this to a MMORPG world and let the quests write themselves.

Expand full comment

Sep 20, 2022Edited

I'm pretty confused as to what's interesting or suggestive about these examples. I mean yes, ha ha, a machine talking about itself is kinda fun art in a Escher/Hofstadter (tho Hof isn't a good example of this genre imo) way but it seems like the examples are actually examples of a total lack of any kind of self-awareness/self-reference.

Am I missing something? Like yah, you ask a language model to predict some text about a common discussion in it's training set and it does...the fact that it's discussed in that training set doesn't seem very deep. If that's all it's supposed to be my bad but I got the sense that ppl are seeing more in it.

Where things would get interesting is if the machine was asked to predict text like "What GPT-3 says when given the prompt 'say something about cookies' is" and it gave the completion that was genuinely the result of just prompting it with "say something about cookies'".

Now that would indicate a certain level of self-awareness/use of reflection in generating beliefs. This is no different than a child learning prefacing "I" before cookie gets a cookie ...the understanding that I refers to them is still far off.

Expand full comment

Muster the Squirrels

> ...the understanding that I refers to them is still far off.

There are various bets involving AI milestones, e.g. Scott vs Gary Marcus as discussed in a recent post (https://astralcodexten.substack.com/p/i-won-my-three-year-ai-progress-bet). A deep learning model seeming to show understanding that 'I' refers to itself sounds like a good subject for a bet. I wouldn't know how to phrase it well.

Is there such a bet or prediction market already? (Is there a central list of these bets and markets which we can use to check?)

Expand full comment

I'm not sure you could usefully preciscify the notion in an implementation independent way. In part bc part of what it would mean to understand I depends on the capability of the system in the first place (if your system can barely do anything then it doesn't need to do much with I).

Indeed, I'd be surprised if some of the old AI approach programs didn't have some degree of this. I mean, if you have a formal representation of your beliefs then you can probably manage to reason about "I" by manually applying a number of reflection principles or some kind of "I logic" ...eg you add rules like:. Infer from X that I believe X and "Upon outputting Y add "I said Y" to your stock of beliefs.

What seems much harder is for this concept to appear in a neural net type approach of any approach that doesn't use some kind of explicit belief representation.

Expand full comment

Shalcker

Maybe for self-awareness it would be better to ask something like "Which prompts would make you respond with 'Cookies'? List them, and explain why that would happen.

How those prompts need to be changed to make you respond 'chocolate cookies'?"

Expand full comment

Sinity

> for this prompt, it’s mostly 63. Its internal probability meter says there’s a 36% chance that 63 is the right answer, although it chooses it more than just 36% of the time. When it doesn’t choose 63, it usually chooses 66.

This seems slightly wrong; 36% means it will pick it 36% of the time - at T=1. Lowering the temperature makes it biased towards picking more likely stuff more often. At T=0, it'll always pick the best option.

Expand full comment

Jan Leike

That the model likes saying "there is no definitive answer to this question" is likely not due to overoptimization in the sense described in the Stiennon et al. summarization paper (a problem that's not very hard to avoid). Instead, our best explanation is that this particular problem stems from issues with our training data: some of our labelers thought these kind of responses were good in some settings, and the model over-generalized this. We're in the process of improving this.

Expand full comment

I figured it was something like that, makes much more sense

Expand full comment

Beth

" Its internal probability meter says there’s a 36% chance that 63 is the right answer, although it chooses it more than just 36% of the time. When it doesn’t choose 63, it usually chooses 66."

> This sentence seems a bit confused. It depends on the temperature. With temperature 1 it will choose 63 36% of the time. With temperature 0 it will choose the most probably answer 100% of the time.

Expand full comment

RandomSourceAnimal

Has anyone trained a model to detect socioeconomic class from portraits or video data of people? It seems like something that would be easy to do. Even if you limit it to a particular population (e.g., white people in the US, or in NYC, etc.) Then you could use the model to identify indicia of high and low class, and see how those indicia map to popular conceptions of the same.

Expand full comment

B Civil

Sep 25, 2022

Just goes to show that self-awareness can be imitated just like about any other human characteristic. Makes me think of a song called coin-operated boy by The Dresden Dolls.

Expand full comment

Rafael Bulsing

Oct 12, 2022

I started having a derealization panic attack while reading the HPMOR-like story.

Controlled it by convincing myself that finding out that reality isn't real by reading a blog post discussing an AI-generated story about characters realizing that their reality isn't real would be just... *too* on the nose.

Expand full comment