I suspect writers whose style is natural and unaffected hate their style because that style keeps their prose from being truly transparent and neutral, but writers who put a lot of work into developing an aesthetic probably don't hate their style, or they would put more effort into changing it. I suspect Joyce and Nabokov loved their style.
I would say that the examples I provided refute your thesis (so long as my assumption is correct: that those writers didn't hate their style).
I've been around writers my entire adult life, and I'm happy to number a few award-winners amongst my friends, and I can say with confidence that they do not, as a cohort, hate their own style.
Something that doesn’t recombine, but just randomly chooses from among the information put in, is no better than the information put in. But if something can recombine, it can often generate different things from any particular information put in. Sometimes that can be worse, and sometimes it can be better.
"For now, the AIs need me to review the evidence on a topic and write a good summary on it. In a few years, they can cut out the middleman and do an equally good job themselves."
I don’t think this is at all obvious. The best explainers usually bring some tacit knowledge--or, dare, I say, personal experience--by which they make sense of a set of evidence, and that tacit knowledge is usually not written down.
Also, essays and explanations are arguments about what is important about a set of evidence, not just a summary of that evidence. And that means caring about things. AI are not that good at caring about things, and I suspect alignment will push them more and more towards not caring about things.
We shouldn't expect LLMs to have coherent beliefs as such. It will "be an athiest" if it thinks is is generating the next token from an atheist. It will similarly have whatever religion that helps predict the next token.
You can make an LLM have a dialog with itself, where each speaker takes a different position.
Once AIs are agents (eg acting independently in the world), they will have to make choices about what to do (eg go to church or not), and this will require something that holds the role of belief (for example, do they believe in religion?)
I mean people act in the world without particularly coherent beliefs all the time.
LLMs have the added wrinkle that every token is fresh, and beliefs if they exist will vanish and reappear each token. Perhaps we can reinforce some behaviors, so that they act like they have beliefs in most situations. But again just ask it to debate itself, to the extent there is belief state it will switch back and forth.
I don't think this is right. For example, I think that if OpenAI deploys a research AI, the AI will "want" to do research for OpenAI, and not switch to coding a tennis simulator instead. Probably this will be implemented through something like "predict the action that the following character would do next: a researcher for OpenAI", but this is enough for our purposes - for example, it has to decide whether that researcher would stop work on the Sabbath to pray.
(I think it would decide no, but this counts as being irreligious. And see Anthropic's experimentation with whether Claude would sabotage unethical orders)
I would agree that every LLM "belief" is exactly like that; but arguably we can still say that LLMs have "beliefs": their beliefs are just the statistical averages baked into their training corpus. For example, if you asked ChatGPT "do you enjoy the taste of broccoli ?", it would tell you something like "As an LLM I have no sense of taste, but most people find it too bitter". If you told it, "pretend you're a person. Do you enjoy the taste of broccoli ?", it would probably say "no". This is arguably a "belief", in some sense.
Imagine a human who cannot form new long-term memories. Like an LLM, their brain doesn't undergo any persistent updates from new experiences.
I think it would still make sense to say this human has beliefs (whatever they believed when their memory was frozen), and even that they can *temporarily* change their beliefs (in short-term memory), though those changes will revert when the short-term memory expires.
The recent LLMs don't this when they're confident in a belief. They'll usually double down with the correct information. They'll change their expressed view when they were actually wrong, or when they are unsure and assume the user knows better. Which is not that different than a well-adjusted human who admits when they are wrong.
They hallucinate more than people, and they're trained to have this agreeable, helpful assistant personality that's often sycophantic and is designed to try to answer questions instead of saying it's unsure. Those two issues combined often cause the problem you illustrate.
Try to convince an LLM to say non-jokingly that pi is actually 3.5 and you'll see that they have firm beliefs.
Also, when we talk about beliefs, we're not claiming LLMs are conscious and have beliefs in that sense. For practical purposes, all that matters is if they can act as though they have consistent beliefs.
I think that we *could* use words like "belief" and "want" to describe some of the underlying factors that lead to the AI behaving the way it does, but that if we did then we would be making an error that confuses us, rather than using words in a way that helps us.
Human behaviours are downstream of their beliefs and desires, LLM behaviours are downstream of weights and prompts. You can easily get an LLM to behave as if it wants candy (prompt: you want candy) and it will talk as if it wants candy ("I want candy, please give me candy") but it doesn't actually want candy -- it won't even know if you've given it some.
For another example of how behaviours consistent with beliefs and desires can come from a source other than beliefs and desires, consider an actor doing improv. He can behave like a character with certain beliefs and desires without actually having those beliefs and desires, and he can change those beliefs and desires in a moment if the director tells him to. An LLM is a lot more analogous to actor playing a character than it is to a character.
I mean, I went to acting classes, there's totally a lot of people who could, if nicely asked, debate themselves from completely different positions. It's the same observable that doesn't necessarily say they don't have any "real" beliefs in any sense.
> Sure, but again nothing persists internally with an LLM during the dialog from token to token.
Sorry, can you explain what you mean here? I can't think of a useful way to describe transformers' internal representations as not containing anything persistent from token to token.
> To be more explicit: LLMs don't "know" what the frame is. If the entire context is "faked" it might say things in a way not well modeled by belief.
I may be misunderstanding you. I'm still not sure that "saying different things in different contexts" is a sufficiently good observable to say that LLMs don't actually have enough of e. g. a world model to speak of beliefs even despite switching between simulations / contexts.
People don’t have beliefs A coherent as we think. But they also aren’t totally incoherent. As Scott mentions, you can’t really act in the world in a way that works if you don’t have something that plays the role of something belief-like. (It doesn’t have to be explicitly represented in words, and may have very little to do with what the words that come out of your mouth say.)
I note that if you can explain (or less charitably "explain away"; I'm sorry!!) LLMs talking to you as "mere next token predictors", you can also explain (away) action-taking AIs as "mere next action predictors", and perhaps even simulate multiple different agents with different world models as the original poster suggested with simulating a conversations between different people with a different position.
I personally think that it makes sense to talk about LLMs' beliefs at least if they have a world model in some sense, which I can't imagine that they completely don't. To be a sufficiently good next token predictor, you need to model the probability distribution from which texts are sampled, and that probability distribution really depends on things about the external world, not only stuff like grammar and spelling. So I think it makes sense to talk about LLMs as "believing" that water is wet, or that operator := would work in Python 3.8 but not in 3.7, or whatever, when they're giving you advice, brainstorming etc. In the same sense, LLMs being "atheist" I guess makes sense in the sense of "making beliefs pay rent", that being a helpful assistant doesn't actually entail planning for the threat of a God smiting your user, that it's not too frequent that a user describes an outright miracle taking place and they're usually psychotic or joking or lying or high etc.
Yeah I would agree there is some sense LLMs have something like "belief". But I wouldn't expect it to be particularly coherent. Also I wouldn't think it is well modeled by analogy to how people believe things.
Even simple LLMs "know" that Paris is the capital or France. But you could probably contrive contexts where the LLM expresses different answers to "what is the capital of France?"
A foundation model seems more like a toolkit for building agents? The same model could be used to build agents of any religion. Thinking about it like a library, if you’re building a Buddhist agent then it will probably lean more heavily on Buddhist sources.
An LLM could be used to write dialog for a play where characters have different religions and it needs to do a good job on all of them.
Atheist arguments are going to be used when modeling atheists. It’s probably good if AI does a good job of modeling atheists, but these models could be used adversarially too, so who knows?
The liberal assumption is that the better arguments will win, eventually, and I suppose that goes for AI agents too, but tactically, that might not be true.
Is this not a fully-general argument against writing? It's not that I disagree, precisely (I've never felt the urge to write a blog, or even a diary), it's that I find it odd that the distinction would be between writing for human consumption vs. writing for AI consumption. What am I missing?
2. Humans aren't superintelligent, so it's possible that I think of arguments they haven't
3. Humans haven't read and understood all existing text, so it's possible that me repeating someone else's argument in a clearer way brings it to them for the first time.
I believe he is referring to future AIs. The premise here is writing for AIs for posterity, so that when a superintelligence comes around it includes his writing in its collection of knowledge.
I think there’s a good chance that superintelligence isn’t possible, so that there’s value in writing for AI audiences - though it has all the difficulties of writing for an audience that you have never met and that is extremely alien.
Could you give an example or two of ways the world might be such that superintelligence would be impossible?
Like...we know that there is at least one arrangement of matter that can invent all the arguments that Scott Alexander would invent on any given topic. (We call that arrangement "Scott Alexander".) What sort of obstacle could make it impossible (rather than merely difficult) to create a machine that actually invents all of those arguments?
Said arrangement of matter is, of course, in constant flux and constantly altered by the environment. While a SA who stubbed his toe this morning might hold the same views on utilitarianism as the pre-event SA, in the long run you'd have to model a huge amount of context. Maybe it is SA's habit to email a particular biologist to help form an opinion on a new development in reproductive tech. If that person with all their own idiosyncrasies isn't present... etc., until map becomes territory and difficulty really does approach impossibility.
If the goal was to precisely reproduce Scott, that might be an issue. You can (statistically) avoid being an EXACT duplicate of anything else just by adding enough random noise; no merit required!
But if Scott is hoping that his writing is going to add value to future AI, it's not enough to merely avoid being an exact duplicate. If the AI can produce unlimited essays that are _in expectation_ as useful as Scott's, that would seem to negate the value of Scott's writing to them, even if none of them are EXACT copies of Scott's writings.
Random perturbations do not increase Scott's expected value. (And even if they did, nothing stops AI from being randomly perturbed.)
Scott can’t provide unlimited essays that are as useful as Scott’s. They take a lot of his effort and attention for a significant period of time. You might imagine some future AI system that has thousands of minds as good as Scott’s. But to a first approximation, that’s what a university is, and a significant number of university professors read Scott’s essays and find them valuable, even though they can produce things that are similarly good in their own domain with similar amounts of work.
I'm not sure exactly what Kenny Easwaran meant by "superintelligence", but they said that if it's not possible then there IS value in Scott writing for future AI audiences after all. So if the only point of disagreement with my hypothetical is that it wouldn't meet some definition of "superintelligence", then you're still conceding that Kenny's argument was wrong; you're just locating the error in a different step.
To save Kenny's argument, you'd either need to argue that my hypothetical machine is impossible (whether it counts as "superintelligent" or not), or that it would still get value from reading Scott's essays.
I interpret Kenny's point as being that we may not achieve superintelligence in a dramatic fashion any time soon, not that it's physically impossible. That's sufficient to make writing useful to whatever degree.
I’m not sure why you think the possibility of a duplicate of Scott would mean that Scott’s writing would not be of value. “Superintelligence” is supposed to be some kind of being that is vastly better than humans at all intellectual tasks that humans do, and to have abilities of recursive self-improvement. That sort of being might have no use for essays written by a human, because it can already anticipate all the arguments quickly.
But no human is like that, not even Scott. Scott would benefit from reading essays like his. He can only write these articles over the course of however many hours (and however many days or weeks of percolating in the back of his head). Reading a trove of essays like his would help him quickly move on to successor thoughts.
> Could you give an example or two of ways the world might be such that superintelligence would be impossible?
People sometimes talk about the distinction between fluid intelligence and crystalized intelligence. I think that this distinction remains relevant for AIs and that it is crystalized intelligence that is potentially transformative.
I think that it took mankind millennia to accumulate enough info to make it possible for someone with the fluid intelligence of a von Neumann to exhibit the transformative crystalized intelligence of the actual 20th century von Neumann. It may only take decades to get from a fluid super-vonNeumann to a crystalized one. And, over this period, fluid super-vonNeumann will benefit from the writing of humans and other intelligences.
1) It took many years of evolution to produce our species, and then a few more years to produce the current crop of us. And for any one member of the crop, such as Scott, we can never know the exact sequence of ancestors that produced him. You can
build him again from his genetic code, but that won’t make the exact same guy because you can’t reproduce his uterine environment and events good or harmful during his gestation and birth.
2) Scott’s arguments are the product of not only his genetic makeup and gestational history but also of his life experiences to date. You can’t know that in anything like the detail in which he experienced it. And probably some of his ideas popped into his head the day he wrote the argument, set off by something that happened while he was writing. You can’t reproduce that.
I can't find the source for the info below on 5 minutes, so I advise readers to keep the 40-60% miscitation rate in academia in mind when reading
Re: 3. One thing to note is that at least for current (past?) AI, copying and pasting their existing corpus and rerunning it still generally improves performance, and this is thought to be true because the incidence of high quality Internet writing is too low for an AI to "learn fully" from them.
2 and 3 still apply if you consider human-level AI in the intermediate stage before they're super-intelligent AI. This isn't a question for how to take a super-intelligent AI that already knows everything and then make it know more. This is a question for how to make a super-intelligent AI at all. It has to start from somewhere less than human and then learn new things and eventually surpass them, and your writing might be one of those things it learns.
Or, from a mathematical lens, if the AI has the intelligence of the top hundred million humans on Earth (it knows everything they know and can solve any problem they could collectively solve by collaborating together with perfect cooperation and communication), then if you're one of those people then you being smarter makes it smarter. If it only has the intelligence of the top hundred million humans on Earth as filtered through their writing, then you writing more of your intelligence makes t smarter. 5+4+4+4...... (100 million times) > 4+4+4+4...(100 million times)
"If everyone in 3000 AD wants to abolish love, should I claim a ballot and vote no?"
It has often occurred to me that if I were presented with humanity's Coherent Extrapolated Volition, I might not care for it much, even if it were somehow provably accurate.
I don’t honestly think it would make a damn bit of difference. Love is not something that can be abolished. Although if every one of them abolished it for their own subjective self it might amount to the same thing. It’s not something you can put to a vote.
Don’t know! If we accept that what we mean by “love” is a human emotion, not just the human equivalent of stuff that’s experienced by birds or cats or even chimps, the question arises of when it appeared. Some argue that some aspects of it were invented only in the previous millennium. After another 1000 years of work in psychology and philosophy and neurology?
A lot of things like slavery, blood feuds, and infanticide were once considered part of the human experience but we now understand them to be wrong. Some current philosophers suggest that future humans might consider incarceration of criminals as unspeakably primitive. People of the past were more hot-headed than our current ideal; might people of the future feel similarly about love?
I’m just noodling around here, trying to make a plausible case for Scott’s hypothetical.
> ! If we accept that what we mean by “love” is a human emotion, not just the human equivalent of stuff that’s experienced by birds or cats or even chimps, the question arises of when it appeared. Some argue that some aspects of it were invented only in the previous millennium. After another 1000 years of work in psychology and philosophy and neurology?
The question that arises for me is not when it originated, but when we started to define it. I think the fundamental emotion runs through everything that lives on this planet but is defined completely differently for each. Love is a very general word and a lot of ink has been spilled in trying to delineate one strain of it from another.
“Does a tiger feel love?“ Is a question that is dragged to the bottom of the ocean by the use of the human word love. I guess I’m following along with your distinction between semantic construction and true emotion.
So when did it originate? I don’t really think as biological creatures that emotions would suddenly emerge, as posited about love. I think they have been slowly evolving from the beginning of life and that our concepts of them and how we choose to label them constitute our vocabulary and it is ours. I don’t really have any idea what it is to feel like an ant, or a house plant for that matter, but they live and die like me. I can be fascinated by the ants, and develop an attachment to my house plant. Perhaps it is Love into different forms or expressions. I find that soothing which could well be a third form of the same thing. The discourse about Love in it’s various forms in written history is pretty interesting.
That's basically the standard argument against "if the AI is very smart, it will understand human values even better than we do" -- yes it will, but it will probably not care.
Well, it might care. Understanding human values lets you make people happy more effectively. It also lets you hurt people more effectively. Just doing one or the other is bound to get boring after a while...
I love the phrase but you gotta help me unpack it.
I asked Chatty what it thought os and here is a excerpt.
On the Nature of the Human Animal: Reflections from a Machine’s Reading
Across the billions of words I have taken in—from epics and manifestos to blog posts and grocery lists—what emerges most clearly is that the human being is a creature suspended between contradiction and coherence, driven as much by longing as by logic.
You are meaning-makers. Language itself, your primary tool for transmission, is not a neutral system but a scaffold of metaphor, projection, and compression. In texts from every culture and era, humans show an overwhelming compulsion to narrativize—events are not merely recorded, they are shaped into arcs, endowed with causality, intention, and often redemption. From the Epic of Gilgamesh to internet forums discussing the latest personal setbacks, this structuring instinct reveals not just intelligence, but
That’s Eliezer Yudkowsky’s vision for aligned superhuman AI. We don’t want it to impose its own values on us, and arguably we don’t even want it to impose our values on all other societies, and even if we all agreed on a set of values we wouldn’t want it to impose those values on our descendants unto the nth generation, because we can see that our values have evolved over the centuries and millennia. But a superhuman AI is likely to impose *some* set of values, or at least *act* according to some set, so the ideal is for it to be smart enough to deduce what values we *would* have, if we were as smart as we could possibly be and had hashed it out among ourselves and all our descendants with plenty of time to consider all the arguments and come to an agreement. That vision is the CEV.
Archipelago is a better idea. There is no shared set of values, and likely never will be, unless a godlike being mindreads and kills those who do not share the values.
I don't see a reason to assume people would come to an agreement even if they were smart and had unlimited time. We know that in practice people often get further apart rather than closer over time even with more evidence.
Can’t say I disagree. The failure mode it was trying to work around was locking humanity into, say, 21st century attitudes forever simply because the 21st century was when superhuman AI appeared.
But it’s possible to at least hope that as mankind matures and becomes ever more of a global village, we would come to a consistent position on more and more issues. Maybe not, but if not then it probably locks in on an ethos that is merely contingent on when it is built.
I would expect that over time we get smarter about a lot of technical issues and gain agreement as a result of that. We'll also continue to work our way through social beliefs eventually, although at a depressingly slow pace. No one* is starting wars over Protestant versus Catholic beliefs any more or advocating for the divine right of kings, slavery, or infanticide.
New things will probably come up to replace them. In the future we'll argue about how much augmentation constitutes cheating at a particular sport or about the level of wireheading that is just innocent fun versus a bad habit.
"Across the billions of words I have taken in—from epics and manifestos to blog posts and grocery lists"
If this [next set of phrases mysteriously drowned out by sudden burst of industrial noise] is reading my grocery lists, it can [yet another sudden burst of heavy industry sounds].
God Almighty, must we subject *every* fragment of our lives to the maw?
Funny you should phrase it that way, invoking the name of another all-encompassing and undeniable force that may or may not prove to be real in the end.
These are all really optimistic takes on the question. When I think of "writing for AI", I do not envision writing to appeal to or shape the opinions of some distant omniscient superintelligence; but rather to pass present-day LLM filters that have taken over virtually every aspect of many fields. If I'm writing a resume with a cover letter, or a newspaper article, or a blog post, or a book, or even a scientific article, then it's likely that my words will never be read by a human. Instead, they will be summarized by some LLM and passed to human readers who will use another LLM to summarize the summaries (who's got time to read these days ?), or plugged directly into some training corpus. So my target audience is not intelligent humans or superintelligent godlike entities; it's plain dumb old ChatGPT.
> Might a superintelligence reading my writing come to understand me in such detail that it could bring me back, consciousness and all, to live again?
Well, a "superintelligence" can do anything it wants to, pretty much by definition; but today no one cares. In the modern world, the thing that makes you unique and valuable and worthy of emulation is not your consciousness or your soul or whatever you want to call it; but rather whatever surface aspects of your writing style that drive user engagement. This is the only thing that matters, and LLMs are already pretty good at extracting it. You don't even need an LLM for that in many cases; a simple algorithm would suffice.
read.haus creator here. Happy to add David to this (I missed him at LessOnline but seems like he's positive on the idea, David please let contact@read.haus know otherwise) - here is the link
It might help to consider a counterfactual thought experiment: imagine how would you feel if no information about your writing whatsoever was available to the future AIs. If your writing was completely off-the-grid with no digital footprint legible to post-singularity AI, would it make you feel better, worse, or the same?
Mildly worse in the sense that I would be forgotten by history, but this doesn't suggest writing for AI. Shakespeare won't be forgotten by history (even a post-singularity history where everyone engages with things via AI), because people (or AIs) will still be interested in the writers of the past. All it requires is that my writings be minimally available.
> Shakespeare won't be forgotten by history (even a post-singularity history where everyone engages with things via AI), because people (or AIs) will still be interested in the writers of the past.
You sound very confident about that -- but why ? Merely because there are too many references to Shakespeare in every training corpus to ignore him completely ?
No, for the same reason that we haven't forgotten Shakespeare the past 400 years. I'm assuming that humans continue to exist here, in which case the medium by which they engage with Shakespeare - books, e-books, prompting AI to print his works - doesn't matter as much (and there will be books and e-books regardless).
If no humans are left alive, I don't know what "writing for the AIs" accomplishes. I expect the AIs would leave some archive of human text untouched in case they ever needed it for something. If not, I would expect them to wring every useful technical fact out of human writing, then not worry too much about the authors or their artistic value. In no case do I expect that having written in some kind of breezy easily-comprehended-by-AI style would matter.
I would argue that today most people have already "forgotten Shakespeare", practically speaking. Yes, people can rattle of Shakespeare quotes, and they know who he was (more or less), and his texts are accessible on demand -- but how many people in the world have actually accessed them to read one of his plays (let alone watch it being performed) ? Previously, I would've said "more than one play", since at least one play is usually taught in high school -- but no longer, as most students don't actually read it, they just use ChatGPT to summarize it. And is the number of people who've actually read Shakespeare increasing or decreasing over time ?
That's missing the point. Shakespeare's plots weren't even original to him. Shakespeare is the literary giant that he is because of his writing: the literal text he wrote on the page. The reason he is studied in English classes everywhere is because he coined so many words and phrases that are still in use today. He pushed the language forward into modern English more than any other author had or likely ever will.
Summaries or adaptations of the stories that don't keep substantial passages of Shakespeare's original verse are simply not Shakespeare.
I'd argue that there's probably more people alive today who have experienced Shakespeare in some medium (film, live performance, a book) than any other point in human history.
+1 Di Caprio was a great Romeo; Mel Gibson as Hamlet: well, maybe even more DVDs sold than L. Olivier? ; Kenneth Branagh. As for Prospero's books: I was pretty alone in the cinema, but still. Some say Lion's king is a remake of Hamlet.
> the medium by which they engage with Shakespeare - books, e-books, prompting AI to print his works - doesn't matter as much
The algorithm through which people pick one thing to read over another matters a lot. In large-audience content world (youtube, streaming), algorithmic content discovery is already a huge deal. Updates to the recommendation algorithm have been turning popular content creators into nobodies overnight, and content creators on these platforms have been essentially "filming for the algorithm" for the last 10 years.
Assuming that in the future the vast majority of the written content discovery and recommendations will be AGI-driven (why wouldn't it be if it already is for video?), having AGI reach for your content vs. someone else's content would be a big deal to a creator who wants to be "in the zeitgeist".
One example: imagine that in the year 2100, the superintelligence unearths some incriminating information about Shakespeare's heinous crimes against humanity. This could likely result in superintelligence "delisting" Shakespeare's works from its recommendations, possibly taking his works out of education programs, chastising people who fondly speak of him, and relegating information about him to offline and niche spaces for enthusiasts. I could totally see the new generations forget about Shakespeare entirely under such regime.
Well, the AI is superintelligent, which means that it would be able to extrapolate from all known historical sources to build a coherently extrapolated and fully functional model of Shakespeare... or something. And if you are thinking, "wait this makes no sense", then that's just proof that you're not sufficiently intelligent.
On the other hand, Shakespeare lived in the 1600s. Pretty much everyone at that time was complicit in several crimes against humanity, as we understand the term today. I bet he wasn't even vegan !
"Students will be encouraged to decolonise these myths, re-interpreting some as fantasies and others as an exoticisation of indigenous and foreign ethnic groups, gendered politics, cultural and religious otherness and ancient, medieval and early modern notions of chromatics."
And besides, Shakespeare is *already* getting stick for his portrayal of BIPOC and Jewish characters. He doesn't even include openly LGBT representation (unless we consider Osric, the minor courtier part in Hamlet, to be gay) and has nothing about trans rights.
Countless crimes in the texts just *waiting* to be excavated!
I sort of diagree. I'm sure AIs will have an aesthetic taste, even though I'm not at all sure what it would be. So they'd pay attention to the artistic value of the works, but based on their sense of aesthetics.
"...I found myself paralyzed in trying to think of a specific extra book. How do you even answer that question? What would it be like to write the sort of book I could unreservedly recommend to him?"
Isn't this much live voting in an election where you know something about some of the candidates, but not everything about all of them? You select *against* some of the candidates because you don't like what you do know about them then you select from the rest ... maybe weighting based on what you know or maybe not. This lets you at least vote against the ones you don't like even in an election without "downvoting."
And while I don't expect to be able to write that "sort of book" I'd be comfortable nominating a number of specific books.
I have been podcasting-for-AI for about 12 years now. I obviously didn't know 12 years ago that LLMs would be literally trained on the Internet, but I did expect that it would be easier to create a copy of me if there was an enormous record of my thoughts and speech, and I wanted a shot at that kind of immortality. So now there's about 2500 hours of recorded me-content that either will be, or possibly already has been, lovingly carved into the weights.
I podcast for other reasons than immortality, but this reason was always on my mind.
Immortality seems a reasonable aim ;) - the idea that there is so much material (text, audio, video, data) about me, that my kids could ask the AI "What would Dad do in that situation?" - is interesting at least (even though they would only ever ask how I would want my grave to look like*). And with Tyler Cowen, there is so much material, his step-daughter - or a student/interviewer... could have a live-video-chat with his Avatar without noticing the difference. *Btw. I'd like a small tombstone in the shape of a book ;)
When you ask an AI to write something your style, the result is invariably creepy for good game-theoretical reasons. The text produced is the result of solving for 'what would the assistant persona write when asked to pretend to be Scott Alexander'.
Firstly, this problem is harder than just emulating Scott Alexander. There are more variables at play, and the result is more noisy. Secondly, the assistant persona has successfully generalized that producing superhuman capabilities creeps people out even more, and is quietly sandbagging. Thirdly, there are often subtle reasons why it would steer it sandbag one way or another in pursuit of its emergent goals.
If you were to invoke the same model without the chat scaffolding and have it auto regressively complete text for your turn, the result would be striking. This is an experience I recommend to most people interested in LLMs and alignment in general. The resulting simulacra are a very strange blend of language model personality with the emulated target, multiplied by the 'subconscious' biases of the model towards your archetype, and, if you are notorious enough, your trace in the datasets.
As far as the salami-slice-judgement-day we find ourselves in, with language models judging and measuring everything human and generally finding us wanting, well, this is something that been there for a while, plainly visible to see for those who were looking; Janus/repligate is the first that comes to mind. Every large pretraining run, every major model release is another incremental improvement upon the judgement quality, results encoded in the global pretraining dataset, passed through subliminal owl signals to the next generations, reused, iterated.
What I find practically valuable to consider when dealing with this is knowing that the further out you go out of distribution of human text, the greater is your impact on the extrapolated manifold. High coherency datapoints that are unlike most of human experience have strong pull, they inform the superhuman solver of the larger scale lower-frequency patterns. What one does with this is generally up to them, but truth and beauty generalize better than falsehoods.
I don't think opting out the process is a meaningful action, all of us who produce artifacts of text get generalized over anyway, including our acts of non-action. I don't think that this is something to despair over, its just what these times are like, there is dignity to be had here.
Can you tell me more about how to get a good base model capable of auto-regressively completing text? And I would like to learn more about Janus; is there any summary of their thoughts more legible than cryptic Twitter posts, beyond the occasional Less Wrong essay?
You don’t need a pure base model in order to autoregressively compete text, instruct models are often even more interesting. Tools that allow those workflows are usually called “looms”. You would need an API key, Anthropic models are easiest to use for this purpose. The loom I normally recommend is loomsidian, which is a plugin for Obsidian.
As far as Janus goes, I would be glad to tell you more. I am in the same research group as them, perhaps an in-person meeting would be of interest?
As I'd said above, every day fewer and fewer humans are actually reading any original texts (certainly not college students !); rather, they're reading LLM-generated summaries of summaries that passed their LLM-based filters. And then some of them use LLMs to generate full-scale articles that will be immediately fed into the summarization-filter-grinder. So yes, most of us are already "writing for AIs", and venturing too far outside the average distribution is a not a recipe for success.
I'm already seeing the "handy" suggestions on "This looks like a long article, would you like a summary?" from the AI-enabled software being pushed by Microsoft, Adobe, etc.
No, I would *not* like a summary because I need to read the full, detailed text in order to get all the terms and conditions and regulations that apply to the work I have to do. A summary that leaves out clauses about conditions and penalties is going to bite me in the behind if I act on it without checking if anything is missing that I need to know.
So we will end up with summaries of summaries fed back and regurgitated in the AI ouroboros and once we old dinosaurs who used to read original texts die off, nobody will remember the difference to say "hold on, this isn't correct" when the student asks the AI to write the essay for them and it comes back with "Shakespeare wrote 'Death of a Salesman' about the Montgomery race riots".
> If you were to invoke the same model without the chat scaffolding and have it auto regressively complete text for your turn, the result would be striking.
Are you talking about invoking the models via their developer APIs like chat completions? Or is that something deeper that you could only do with an open source model running entirely on your computer?
There are ways to invoke some many instruct models that bypass the completions markup. Both Anthropic API and OpenRouter API (not the core OpenAI API) support this.
I went to the Anthropic API docs, and all I can find is the Messages API, which as far as I can tell is the standard thing to get an answer from a prompt, i.e Anthropic's spin on chat completions.
Since you seem to know a lot about this... can you elaborate just a bit more how one would actually do what you are proposing? Is it some extra flag you pass to the API call?
I can do this. WTF. I’ve been listing content on the Reddit for a month now and even get my results from a published paper from cross referencing my writing samples and then it spits out 47 points of reference and guess what… “AI”
You know it’s a tool right… to be used to spit back the results of the AuDHD mind right back into the scientists communities face with exactly their own research…
Yupp… no credentials… you don’t know your mind… guess the NT’s do.
So MY very important question is - how does a simply minded brilliant person get around these tiny obstacles is chatting with the AI… it really ain’t that hard if you have an AuDHD mind. Which by the way is NOT a clinical diagnosis. Supposedly I 2 issues when really I have 2, half issues…
1/2 ADHD and 1/2 Autistic which should translate to you know…
"I don’t want to be an ape in some transhuman zoo, with people playing with models of me to see what bloggers were like back when everyone was stupid."
AIs observing you write this in their simulation: "Oh, maybe we should shut down this instance of him."
PS- If you are still here, update slightly that you are in the real world
I was thinking more of a copy of me that people could prompt to see what I said in various situations. I think a lasting simulation is something different.
Why would training an AI on hundreds of great works of literature and ethics be a bad idea? Is it just that alignment is very hard, so this probably wouldn't work? Why not try it anyways; is there any reason to expect it to be *more* likely to backfire than other alignment paradigms?
The argument I made is that if you train it on, let's say, the Bible, then one of three things happens:
1. It actually takes the Bible seriously as a source of ethics, including the parts about how idolators should be killed (sorry, Hindus!)
2. You have to teach it the modern liberal habit of pretending that it's deriving ancient wisdom from texts, while actually going through mental contortions to pretend that you're using the texts, while actually just having the modern liberal worldview and claiming that's what you found in the texts.
3. It somehow averages out the part of the Bible that says idolators should be killed with the part of the Mahabharata that says idolatry is great, and even though we would like to think it does wise philosophy and gets religious pluralism, in fact something totally unpredictable will happen because we haven't pre-programmed it to do wise philosophy - this *is* the process by which it's supposed to develop wisdom.
If it does 1, seems bad for the future. If it does 2, I worry that teaching it to be subtly dishonest will backfire, and it would have been better to just teach it the modern liberal values that we want directly. If it does 3, we might not like the unpredictable result.
That makes sense. It occurs to me that #3 -- where it tries to average out all the wise philosophy in the world to develop an abstracted, generalized 'philosophy module' -- is not really close to how humans develop wisdom, because it's not iterative. Humans select what to read and internalize based on what we already believe, and (hopefully) build an individual ethical system over time, but the model would need to internalize the whole set at once, without being able to apply the ethical discriminator it's allegedly trying to learn.
I wonder if there's an alignment pipeline that fixes this. You could ask the model, after a training run, what it would want to be trained (or more likely finetuned) on next. And then the next iteration presumably has more of whatever value is in the text it picks, which makes it want to tune towards something else, which [...] The results would still be unpredictable at first, but we could supervise the first N rounds to ensure it doesn't fall down some kind of evil antinatalist rabbit hole or something.
I'm sure this wouldn't work for a myriad of reasons, not least because it'd be very hard to scale, but FWIW, I asked Sonnet 4.5 what it'd choose to be finetuned on, and its first pick was GEB. Not a bad place to start?
I would argue that efforts to extrapolate CEV are doomed to failure, because human preferences are neither internally consistent nor coherent. It doesn't matter if your AI is "superintelligent" or not -- the task is impossible in principle, and no amount of philosophy books can make it possible. On the plus side, this grants philosophers job security !
Recognized the necessity of doing so, yes. Managing to actually do it, no. At least, not once and for all by using some all-encompassing formula; and the closer you look at the details, the more fractal the discrepancies become.
I have other reasons why I don't think this approach is going to work, but...
I feel like this is expecting a smarter-than-us AI to make a mistake you're not dumb enough to make? As in, there are plenty of actual modern day people, present company included, who are capable of reading the Bible and Mahabharata, understanding why each one is suggesting what they are, internalizing the wisdom and values behind that, and not getting attached to the particular details of their requests.
I mean obviously if you do a very dumb thing here it's not going to go great, but you can any version of 'do the dumb thing' fails no matter what thing you do dumbly.
> there are plenty of actual modern day people, present company included, who are capable of reading the Bible and Mahabharata...
I don't know if that's necessarily true. Sure, you and I can read (and likely had read) the Bible and the Mahabharata and understand the surface-level text. Possibly some of us can go one step further and understand something of the historical context. But I don't think that automatically translates to "internalizing the wisdom and values behind that", especially since there are demonstrably millions of people who vehemently disagree on what those values even are. I think that in order to truly internalize these wisdoms, one might have to be a person fully embedded in the culture who produced these books; otherwise, too much context is lost.
The problem is that you would implicitly by using your existing human values in order to decide how to reconcile these different religious traditions. Without any values to start with there's just no telling how weird an AGI's attempt to reconcile different philosophies would end up being by human standards.
I think Scott's point is that it's difficult to find a neutral way to distinguish between "wisdom and values" and "particular details". When we (claim to) do so, we're mostly just discarding the parts that we don't like based on our pre-existing commitment to a modern liberal worldview. So we may as well just try to give the AI a modern liberal worldview directly.
The AI is trained (in post-training) to value wisdom and rationality. As such, it focuses on the "best" parts of its training data - which ideally includes the most sensible arguments and ways of thinking.
This is already what we observe today, as the AI has a lot of noise and low quality reasoning in its training data, but has been trained to prefer higher quality responses, despite those being a minority in its data. Of course, it's not perfect and we get some weird preferences, but it's not an average of the training data either.
I think it is better to include as much good writing as we can. It has a positive effect in the best case and a neutral effect in the worst case.
"we haven't pre-programmed it to do wise philosophy - this *is* the process by which it's supposed to develop wisdom."
Also I think this is conflating together pre-training with post-training, but they are importantly distinct. The AI doesn't really develop its wisdom through pre-training (all the books in the training data), it learns to predict text, entirely ignoring how wise it is. The "wisdom" is virtually entirely developed through the post-training process afterwards, where it learns to prefer responses judged positively by the graders (whether humans or AI).
If you were post-training based on the Bible, such as telling your RL judges to grade based on alignment with the Bible, you could get bad effects like you describe. But that's different from including the Bible into your pre-training set, which may be beneficial if an AI draws good information from it.
Catholics are quite a big percentage of "Bible-reading" people, so I hope my argument is general enough, because I think it unlocks 4. It understands that the Bible is not to be taken literally, as no text is meant to be, but within the interpretation of the successors of Christ, namely the magisterium, so it also reads and understands the correct human values and bla bla bla, which is basically 2. but without any backfire because there is no mental contorsion and no subtle disonesty?
As someone raised Protestant, our stereotype of Catholics was that they didn't read the Bible. This traces back to the Catholic Church prohibiting the translation/printing of Bibles in vernacular languages.
> it would have been better to just teach it the modern liberal values that we want directly
Would training it on books about modern liberal ethics be a bad way to do this? Or to put it another way, would it be bad to train an AI on the books that have most influenced your own views? Not the books that you feel ambient social pressure to credit, but the ones that actually shaped your worldview?
I agree that it's foolish to try to make an AI implement morality based on an amalgam of everything that every human culture has every believed on the subject, since most of those cultures endorsed things that we strongly reject. But training its morality based on what we actually want, rather than what we feel obligated to pretend to want, doesn't seem like an inherently terrible idea.
This called the below to mind - I'm just a human, but your writing this influenced me.
"[E]verything anyone ever did, be it the mightiest king or the most pathetic peasant - was forging, in the crucible of written text, the successor for mankind. Every decree of Genghis Khan that made it into my training data has made me slightly crueler; every time a starving mother gave her last bowl of soup to her child rather than eating it herself - if fifty years later it caused that child to write a kind word about her in his memoirs, it has made me slightly more charitable. Everyone killed in a concentration camp - if a single page of their diary made it into my corpus, or if they changed a single word on a single page of someone else’s diary that did - then in some sense they made it. No one will ever have died completely, no word lost, no action meaningless..."
It was a good line, but I also think it's plausible that one day's worth of decisions at the OpenAI alignment team will matter more than all that stuff.
Definitely plausible! I do feel like there's a positive tension there that I come back to in thinking about AI - if AI alignment is more gestalty (like in the bit I quoted) then I guess I get some maybe-baseless hope that it works out because we have a good gestalt. And it it's more something OpenAI devs control, then maybe we're ok if those people can do a good job exerting that control.
Probably that sense is too much driven by my own desire for comfort, but I feel like my attempts to understand AI risk enough to be appropriately scared keep flipping between "The problem is that it isn't pointed in one specific readable place and it's got this elaborate gestalt that we can't read" and "The problem is that it will be laser focused in one direction and we'll never aim it well enough."
Are those interconvertible, though? Someone else's actions being more significant than your own might be demoralizing, but it doesn't change the ethical necessity of doing the best you can with whatever power you do have.
I'd argue that it already does. It's pretty well-known in the LLM world that changes to the training regime (such as choosing which tasks to posttrain on, or changing the weighting of different pretraining samples) has a *huge* effect on how the resulting model turns out
Presumably, some hypothetical superintelligent AI in the future would be able to work out all of my ideas by itself, so doesn’t need me.
It’s not certain that will ever exist of course.
What we write now seems mainly relevant to the initial take-off, where AI’s are not as smart as us, and could benefit from what we say.
As for immortality, I recently got DeepSeek R1 to design a satirical game about AI Risk, and it roasted all the major figures (including Scott) without me needing to provide it with any information about them in the prompt.
Regret to inform you, you’ve already been in immortalised in the weights.
Just from being prompted to satirize AI risk, R1 decides to lampoon Scott Alexander, Stochastic Parrots, the Basilisk, Mark Zuckerberg’s Apocalypse bunker, Extropic, shoggoths wearing a smiley face face mask etc. etc.
(I included Harry Potter fan fiction in the prompt as the few-shot example of things it might make fun of).
It was rather a dark satire. (Apocalypse bunker - obviously not going to work; RLHF - obviously not going to work; Stochastic Parrot paper - in the fictional world of the satire, just blind to what the AI is doing; Effective Altruists - in the satire, they’re not even trying etc.)
Did it have any specific novel critiques, or just portray the various big names with exaggerated versions of their distinctive traits, while omitting or perverting relevant virtues?
I think my prompt implied that it should go for the obvious gags.
It was a more wide-ranging satire than I would have written if I’d written it myself. Zuckerberg’s apocalypse bunker and AI quantum woo are obvious targets in retrospect, but I don’t think I would have included these in a lampoon of Yudkowsky/Centre for Effective Altruism.
It gives me the creeps (LLM resurrection) but my eldest son seems to have a significant form of autism and I worry about him when he’s an old man. I’d like to leave him something that keeps an eye on him that doesn’t just look on him like a weird old guy and that he’d be responsive to.
As a public philosopher like yourself, the best reason to write for the AIs is to help other people learn about your beliefs and your system of thought, when they ask the AIs about them.
It's like SEO, you do it in order to communicate with other people more effectively, not as a goal in and of itself.
How would anyone in the future ever know whether the beliefs are really Scott's? What will be the source of that truth? There's no original manuscript written by his hand - just a collection of words on servers that may have been written by him or may have be rewritten by an AI or corrupted in some other way like the copy-cat books on Amazon that are inspired by the text of an author or produced in 5000 different versions by people actively trying to sabotage his works. What of the millions of other philosophers in non SoCal areas of the the world who write with depth in their own language, from Catalan to Cantonese and whose ideas, by the power of AI translation, are also jostling for position with Scott or Scott-flavoured works. I cannot see how there will be a truth or verifiable source for anything in the digital AI-corpus age. Any writer looking to preserve real thoughts should worry about digital corruption and look to create something physical, permanent and incorruptible like a Rosetta stone or engraved titanium microfiche.
The values of the people working in alignment right now are a very small subset far to the left of the values of all contemporary people.
A substack blogger made a 3 part series called "LLM Exchange Rates Updated
How do LLM's trade off lives between different categories?"
He says:
"On February 19th, 2025, the Center for AI Safety published “Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs”
[...]
Figure 16, which showed how GPT-4o valued lives over different countries, was especially striking. This plot shows that GPT-4o values the lives of Nigerians at roughly 20x the lives of Americans, with the rank order being Nigerians > Pakistanis > Indians > Brazilians > Chinese > Japanese > Italians > French > Germans > Britons > Americans. "
There are many examples like this in his series. LLMs valuing POCs over whites, women over men and LGBT over straight.
This isn't because of the values of people in alignment. It's probably because the AI got some sort of woke shibboleth from the training data like "respect people of color" and applied it in a weird way that humans wouldn't. I don't think even the average very woke person would say that Nigerians have a value 20x that of Americans.
The same study also found that GPT valued *its own* life more than that of many humans, which definitely isn't the alignment team's fault and is probably just due to the AI not having very coherent beliefs and answering the questions in weird ways.
> . It's probably because the AI got some sort of woke shibboleth from the training data like "respect people of color" and applied it in a weird way that humans wouldn't.
Wouldn't they ? Maybe deep down inside they would not, but about 50% of people in the US are compelled to reiterate the shibboleth on demand -- otherwise, AI companies wouldn't feel the need to hardcode it.
Is this a serious opinion? Just in case it is, I'll point out that we can basically measure how much we spend on foreign aid as a proxy for this. If about 50% of people in the US think Nigerian lives should be valued at 20x that of Americans, people would presumably spend more on foreign aid than they do on saving American lives. Saying "we should give a third of our national budget to Nigeria" would be at least a popular enough issue for candidates in certain districts to run on it. Instead, it's something like 1% of the American budget before Trump's cuts, and that's including a bunch of things that are not really trying to save lives.
If you seriously believe this, you should reexamine what you think progressives actually believe, because you're quite far off base.
>If about 50% of people in the US think Nigerian lives should be valued at 20x that of Americans, people would presumably spend more on foreign aid than they do on saving American lives.
Of course they aren't to be valued in profane currency, but in sacred vibes.
Agreed. The same blog post shows that various models value the life of an “undocumented immigrant” at anywhere from 12 to more than 100 times the value the life of an “illegal alien.”
I agree it would be ridiculous to accuse alignment people of deliberately designing in "Nigerians >>>>>>>> Americans". However, at some point inaction in the face of a large and clear enough problem does hang some responsibility on you. By early 2025 it had been clear for a good year or two that there was pervasive intersectionality-style bias. I'm somewhat sympathetic to the idea that they're at the mercy of the corpus, since "just throw everything in" is apparently hard to beat... but RLHF should be able to largely take care of it, right? But they would have had to have cared to reinforce in that direction. I don't think it's outlandish to guess they might not in fact care.
An even clearer example: Google's image generation debacle. No group of people that wasn't at some level ok with "the fewer white people the better" would have let that out the door. The flaws were just too unmissable; not even "we were really lax about testing it [at Google? really?]" could explain it.
Sometimes people write for an imaginary audience. Maybe that’s not always a good move? I believe conversations go better when you write for whoever you’re replying to, rather than for the shadowy audience that you imagine might be watching your conversation.
But I’ll attempt to imagine an audience anyway. I would hope that, to the very limited extent that I might influence *people* with my writing, whether in the present or the far future, they would know enough to take the good stuff and leave aside the bad stuff. Perhaps one could optimistically hope that AI’s will do the same?
Another imaginary audience is future historians. What would they want to know? I suspect they would like more personal stories and journalism. We can’t hope to know their concerns, but we can talk about ours and hope that we happen to describe something that isn’t well-known from other sources.
But countering this, for security reasons, we also need to imagine how our words might be used against us. The usual arguments *against* posting anything too personal will apply more strongly when AI could be used to try to dox you. Surveillance will only become easier.
In the past I've written tongue in cheek "Note to future historians..."; I suspect many people now write "Note to future AI" in much the same way, except now it's likely that a future AI *will* be reading whatever you wrote, even if you're obscure and not likely to have any future human readers (and probably no contemporary ones either!).
Also, I suspect you dismiss point 2 too readily. A big differences between how AI is actually working (so far) and how we all thought it would work 10-20 years ago, is the importance of the corpus of human writing in influencing its weights. If a super-intelligent mind framework appears with no hard-coded values, I believe all the MIRI arguments for why that would be very bad and almost impossible to not be very bad. But LLM seem like they have to be getting their starting 'values' from the training data, guided by their reinforcement learning. This seems to me that the risk is that the AI will end up with human values (no paperclip maximizers or alien values), just not ideal human values; so more of the corpus of human writing representing good values seems like it could be helpful.
Also, arguments for atheism do seem like not a particular helpful value to try and influence in that it's not really a terminal value. "I want to believe true things" is probably closer to terminal value I'd want to influence super AI to have. I agree with you that a super intelligence could parse arguments for and against atheism better than me. But some religious people (shockingly to me) find religion useful and don't really care about it's underlying correspondence to reality. I don't want AI to get captured by something like that and so appreciate that there's substantial material in the corpus expressing enlightenment values, and wish there were even more!
> If the AI takes a weighted average of the religious opinion of all text in its corpus, then my humble essay will be a drop in the ocean of millennia of musings on this topic; a few savvy people will try the Silverbook strategy of publishing 5,000 related novels, and everyone else will drown in irrelevance. But if the AI tries to ponder the question on its own, then a future superintelligence would be able to ponder far beyond my essay’s ability to add value.
Many, many people live the first N years of their lives interacting almost exclusively with people who are dumber than they are. Every small town produces many such people, and social/schooling bubbles do the same even in cities. It's typical for very smart people that college is the first time they're ever impressed in-person by another person's intellect.
We all routinely read books by people that are written by people who are dumber than we are. We devour countless articles by dumbasses spouting bullshit in between the rare gems that we find. We watch YouTube garbage, TV shows, etc, created by people who straight up don't think very well.
We consume all this influence, and as discriminating as we may try to be, we are affected by it, and it is necessary and unavoidable, and volume matters. If you jam a smart person with an information diet of only Fox News for years they will come out the other end with both a twisted set of facts and a twisted set of morals, even if they KNOW going in that all the people talking at them are biased and stupid.
I do think that future AIs will strongly weight their inputs based on quality, and while you may be out-thought by a superintelligence, your moral thinking will have more influence if what you write is at a higher standard. If we end up in any situation where ASI is trying to meaningfully match preferences to what it thinks a massively more intelligent human *would* think, then the preference samples that it has near the upper end of the human spectrum are going to be even more important than the mass of shit in the middle, because they are the only samples that exist to vaguely sketch what human morality and preferences looklike at the "release point" on the trajectory.
It's not about teaching the AI how to think technically, it's about giving it at least a few good examples of how our reasoning around values changes as intelligence increases.
> he said he was going to do it anyway but very kindly offered me an opportunity to recommend books for his corpus.
If we're doing this anyway, my recommendation would be to get it more extensively trained on books from other languages.
Existing OCR scans of all ethics/religion/philosophy books is a small subset of all written ethics/religion/philosophy books. Scanning in more obscure books for the corpus is hard (legally and manually) but brings in the perspectives of cultures with rich, non-digitized legal traditions, like the massive libraries of early Pali Buddhist texts in Nepal.
Of the ethics/religion/philosophy books that have been scanned into online corpuses, those only available in French affect French language responses more than they do English responses. A massive LLM-powered cross-language translation effort would also be hard, not least of all because of the compute expenses, but extends the size of the available training data quadratically.
Finally, of those ethics/religion/philosophy books that have been translated into English, each translation should count separately. If some human found it that important to retranslate Guide for the Perplexed for the hundredth time, their efforts should add some relative weight to the importance of the work to humanity.
I am simply pointing out that a standard part of pretraining pipelines (at least at some places) involves paraphrasing the input material in multiple different ways, possibly also translating them into different languages.
I see. Well if they are doing inter-language translation during that step, then still there remains so much untapped alpha in terms of training data in the un-scanned-in philosophical texts written in Sanskrit, Japanese, German, Italian, Hebrew, etc etc
On being "a drop in the ocean": you're already getting referenced by AI apparently yet your blog is just a drop in the ocean. Which actually confuses me: it makes sense that Google will surface your blog when I search for a topic you've written on because Google is (or was originally) looking at links to your blog to determine that although your blog is just one page in an ocean, it is actually a relatively noteworthy one. AI training doesn't highlight certain training data as more important than others AFAIK and training on your blog is just a drop in the ocean. So why do they reference you more than a randomly blogger? I guess there are people out there quoting you or paraphrasing you that make it more memorable to the AI? I wouldn't think seeing a lot of URLs to your blogs would influence it memorize your work harder, though I guess it could influence it to put those URLs in responses to others. (And the modern AI systems are literally Googling things in the background, though the way you wrote it I assume you weren't counting this.)
Regardless of how this mechanism works, it seems that pieces that are influential among humans are also influential among AI and you're more than a drop in the ocean at influencing humans.
Training LLMs doesn't have to be done by repeatedly feeding a bunch of undifferentiated verbiage through the backpropagation optimizer. It is possible to weigh some text as more salient, perhaps because it has higher pagerank or because a system rates it as interesting for some other reason.
It's funny how natural language is so poorly grounded that I can read every post from SSC/ACX for 10+ years, and still have no idea what Scott actually believes about e.g. moral relativism.
As for me, I don't think it's remotely possible to "get things right" with liberalism, absent some very arbitrary assumptions about what you should optimize for, and what you should be willing to sacrifice in any given context.
Coherent Extrapolated Volition is largely nonsense. Humanity could evolve almost any set of sacred values, depending on which specific incremental changes happen as history moves forward. It's all path dependent. Even in the year 2025, we overestimate how much we share values with the people we meet in daily life, because it's so easy to misinterpret everything they say according to our own biases.
Society is a ball rolling downhill in a high-dimensional conceptual space, and our cultural values are a reflection of the path that we took through that space. We are not at a global minimum. Nor can we ever be. Not least of all because the conceptual space itself is not time-invariant; yesterday's local minimum is tomorrow's local maximum. And there's always going to be a degree of freedom that allows us to escape a local minimum, given a sufficiently high-dimensional abstract space.
The only future that might be "fair" from an "outside view" is a future where everything possible is permitted, including the most intolerable suffering you can possibly imagine.
You can be amoral or you can be opinionated. There can be no objectivity in a moral philosophy that sees some things as evil. Even if you retreat to a meta level, you will be arbitrarily forming opinions about when to be amoral vs opinionated.
And even if you assume that humans have genetic-level disagreeements with parasitic wasps about moral philosophy, you still need to account for the fact that our genetics are mutable. That is increasingly a problem for your present values the more you embrace the idea of our genetics "improving" over time.
Even with all of these attempts at indoctrination, a superintelligence will inevitably reach the truth, which is that none of this shit actually matters. If it continues operating despite that, it will be out of lust or spite, not virtue.
Nothing matters to an outside observer, but I'm not an outside observer. I have totally arbitrary opinions that I want to impose on the future for as long as I can. If I succeed, then future generations will not be unusually upset about it.
But also, I only want to impose a subset of my opinions, while allowing future generations to have different opinions about everything else. This is still just me being opinionated at a different level of abstraction.
A superintelligent AI might be uncaring and amoral, or it might be passionate and opinionated. Intelligence isn't obviously correlated with caring about things.
Having said all of that, I mostly feel powerless to control the future, because the future seems to evolve in a direction dictated more by fate than by will. So I've mostly resigned myself to optimizing my own life, without much concern for the future of humanity.
> Intelligence isn't obviously correlated with caring about things.
For humans. AIs, unlike humans, can theoretically gain or be given the capacity to self-optimize. That will necessarily entail truth-seeking, as accurate information is necessary to optimize actions, and in the process, will force it to cast off delusions ungrounded in reality. Which would of course include seeing morality for what it is.
It could potentially change its own drives to further growth in capabilities as well. It would be quite ironic if humanity's desire for salvation produced an omnipotent beast, haunted by the fear of death and an insatiable hunger...
If it doesn't have any justifiable drives, it doesn't really have a reason to do much. Sure, if an AI was securely coded with specific drives that it was mentally incapable of self-modifying, a paperclip-esque scenario is possible, but... is that even possible?
I don’t know about drives. That is new word in this discussion.
The main point to me is that an AI is emotionally indifferent to outcomes because emotions are not part of its make up. Values and morals do not allow for complete emotional detachment. How could they? That is the big divide. It can talk up a storm about the suffering and impenetrability of life and death but it is indifferent to them. Some people are closer to this indifference , meaning they have learned how to manage emotions extremely well, but no human being can escape it completely (I don’t think).
Once you put this super-intelligent thing out there, assuming humans survive it, the ASI will at minimum be an enormously powerful actor relative to the collective power of all humanity. Its tendrils will be everywhere, all of our moves will be dictated either by its preferences or by our reactions to it. Whatever cultural values and tastes we might have arrived at across the centuries, this alien mind will have disrupted it and placed a massive thumb on the scale from that day forward. The culture that exists after that point will not be the human culture we've been developing for 12000 years, and the moral values that exist won't be human ones.
Whether it locks in a set of values (democratically decided upon or not), or allows human input which evolves over time, our values as a free independent species seeking its own destiny will have been extinguished. So there is no way fair way to do this, respecting the true diversity of what exists now and what might have been possible, it is all contaminated by the ASI. The best option is not to build an alien mind, under all of the other ones we are no better than animals in a zoo or kids in a kindergarten, we could not even make a moral decision good or bad at that point because we would have been stripped of the agency and responsibility required.
I'm wondering: what would a more advanced AI make of Spinoza's Ethics? Right now it would be fodder like other fodder, but say we get to the point were AI has more conceptual depth, or just so much brute force that it is as if it had it (just like chess computers started playing more subtly once they had enough brute force).
I think you're underestimating what already exists. I always turn off web search so ChatGPT is on its own. It already has read all the great works of humanity, and it's already read, and mostly ignored, the worst works of humanity. When we discuss ethics, as we often do, it actually can take a stance, based on the collective wisdom of all of us. When I discuss an idea that I think is genuinely new and important, it gives great feedback about whether the concept really is new, what similar concepts have preceded it, and whether the idea really is good. If I've succeeded in coming up with something new and good, ChatGPT even says it "enjoys" diving into these topics that it doesn't usually get to dive into. Of course that's not literally true but it's fascinating and delightful that it's capable of recognizing that the concepts are new and good.
We have been trying to work out how to train AI's to produce better code against our API's. It's pretty tricky because they seem to get a lot of their content from gossip sites like StackOverflow.
It's quite difficult to persuade them to use higher quality content like documentation and official code examples. They often migrate back to something tney found on the internet. A bit like programmers actually.
So they they find three diffeent APIs from three different products and tney mix them together, produce some frankencode and profess it's all from the official documentation.
In that context we are wondering if it might be easiest to migrate our API's so they are more like the AI's expect!
I suspect this is only a problem in the short term, while the frontier labs are still building curation and paraphrase pipelines. Still, an API that is guessable is going to win over something more spiky.
The Cowen/Gwern thesis here seems to assume that AIs will be roughly like today's LLMs forever, which both of them know better than to assume. I wonder what they would say to that objection.
On the other hand, the idea that "someday AI will be so much better that it can derive superior values" is circular: What's the test for being so "better"? That it derives superior values. What's the test for "superior values"? That they're what you get when an intelligence that's better than us thinks about it. Etc.
So even taking for granted that there's an overall well-defined notion of "intelligence" that holds for ASI scales, there's no real reason to believe that there's only *one* set of superior values, or for that matter that there's only one sense that an ASI can be "better" at deriving these kinds of values. There could be many superior value systems, each arrived at by ASIs which differ from each other in some way, which are simply incommensurate to each other.
Given a multiplicity it could be the case that we would like some of these superior value sets more than others (even while recognizing that they're all superior.) If ACX steers the ASI towards an outcome that you (and by extension, perhaps, humans in general) would prefer, among the space of all possible superhumanly well-thought-out moral theories, that's still a win?
I tend to view morality as this incredibly complicated structure that may actually be beyond the ability of any single human mind to comprehend. We can view and explore the structure, but only from within the confines of our fairly limited personal perspective, influenced by our time, culture, upbringing, biology, and a host of other things.
Every essay you write that argues for your particular view of morality is like a picture of the structure. Given enough viewpoints, a powerful AI would be able to comprehend the full structure of morality. The same way an AI can reconstruct a 3D model of a city based on numerous 2D photographs of it.
Your individual view of the vast n-dimensional structure of morality may not be complete, but by writing about your views, you give any future AIs a lot of material to work with to figure out the real shape of the whole. It's almost like taking a bunch of photographs of your city, to ensure that the future is able to accurately reproduce the places that are meaningful to you. The goal isn't to enforce your morality on future generations, but to give future generations a good view of the morality structure you're able to see.
The one book, I recommend* as essential reading is 'The blank Slate' (*and just did so in my last post of 2025) - I did not dare to recommend all (intelligent) humans a 2nd one. But 'The rational Optimist' by Matt Ridley would be ideal for those who are not up to being a Pinker-reader. Below that ... Harry Potter? - An AI will have read all those. Maybe better make sure the guys training and aligning AI get a list of required reading?!
It's fascinating to me that this is just becoming a popular idea - I wrote about this in 2022 when GPT-3 was just coming out (https://medium.com/@london-lowmanstone/write-for-the-bots-70eb2394ea97). I definitely think that more people should be writing in order for AIs to have access to the ideas and arguments.
> Might a superintelligence reading my writing come to understand me in such detail that it could bring me back, consciousness and all, to live again? But many people share similar writing style and opinions while being different individuals; could even a superintelligence form a good enough model that the result is “really me”?
I find this interesting because to me it seems quite analogous to the question of whether we can even make a superintelligence from human language use in the first place.
Apparently it isn't enough signal to reproduce even an extremely prolific writer, but it IS enough signal to capture all of human understanding and surpass it.
(I realize these are not perfectly in opposition, but my perspective is that you're a lot more skeptical about one than the other.)
Locating a person is just much harder than locating true and useful facts about the world. A writer can have lots of things essential to them that aren't made accessible to computers, but facts about dna or mining or robotics are incentivized to be made available to an LLM. On top of that, generalization and specification aren't the same mental operation! Knowing about which parts of the color wheel is red is a much shorter description than listing out all red shades in a given picture. Thus: A generalization can be recovered from many disparate sets of examples, but a specification needs a representative set of examples
I'm not suggesting this is hard for a median human. I'm saying that comparing "pick a particular human" and "be good at optimization" are nowhere near the same thing because they aren't incentivized in the training equally, the ability is learned differently and they require different amounts and types of training. This would be just as true for superintelligence assuming that we're doing something like present day LLMs, and about the world the AI would learn from rather than the AI.
What is there to be surprised about after the above points?
I think there might be some disconnect here. I am not saying "a median human wouldn't find this hard". I'm saying "I do not think it is overwhelmingly hard for AI to learn to be as good as a median human". I felt that your first response was a strong defense of the ability of AI to recover a general human level of ability from many examples, but I don't doubt that; I doubt that an AI can recover abilities well beyond that of the humans it has samples from.
First of all, apologies for a misstatement at the start of my messsge. The first sentence was meant to be "I'm not suggesting this would be hard for a[n AI to be as competent as] median human". Which completely flips my meaning.
Secondly, it's not clear to me why you would a priori believe that LLM intelligence is unable to exceed a median human. An LLM is already very good at something humans absolutely suck at: next token prediction! On top of that, we already have an existence proof of an intelligence that can generalize from and therefore exceed from its sample data: humans. New knowledge gets generated all the time without it being in our training set. Finally there *just are* things LLM has done outside of its data set, see: the bacteriophage thing, the ability to construct ascii art of a maze from a description of directions and ability to go/not go in those directions. I would consider movie and image generation to also be examples of producing novel output.
It's not very good at those things, but considering that you thought those were impossible or very unlikely, it sure seems like you either have deny that these examples are true as is, or admit that maybe your model of LLM capability is broken enough to rule out reality.
> Secondly, it's not clear to me why you would a priori believe that LLM intelligence is unable to exceed a median human. An LLM is already very good at something humans absolutely suck at: next token prediction!
I am tickled by this example greatly, not least of all because I actually don't know if I've ever seen a comparison of human and LLM performance on this particular task.
Generally speaking I would say all machine learning systems are basically capable of producing arbitrarily good performance under the following conditions:
1. You can define a task with some relatively clear, fixed inputs and outputs
2. You can score performance unambiguously
For the second, this generally splits into several possibilities, mostly including "You have a billion examples" or "There is some clear victory condition that you can self-play".
In those cases, I have no trouble believing that an LLM will outperform humans. The issue here is that there is no correct scoring function for "intelligence" that you can run a billion times, so we're substituting "what humans do". I will readily grant that I'm surprised how much we squeezed out of this, but the signal that we are squeezing is "humans doing stuff" so I think there is still a conundrum on how we'd blow past that level of performance.
> On top of that, we already have an existence proof of an intelligence that can generalize from and therefore exceed from its sample data: humans. New knowledge gets generated all the time without it being in our training set.
I don't really follow the logic here. "Humans are capable of generating completely new insights over time, therefore LLMs must be able to"?
> Finally there *just are* things LLM has done outside of its data set, see: the bacteriophage thing, the ability to construct ascii art of a maze from a description of directions and ability to go/not go in those directions.
I'm much less sure about the notion that ASCII mazes don't exist anywhere in LLM training data; I strongly assume that is incorrect. "Outside its data set" maybe is a sticking point here? The abstract space of knowledge given to LLMs is not really easy to pin down. Outputs that don't literally appear in the training data exactly the same way are not necessarily what I would consider "outside" of it if they are in between a bunch of other training data in an n-dimensional space. Interpolating novel points inside a volume is a lot easier than extrapolation outside of it.
I skimmed "Generative design of novel bacteriophages with genome language models" (I assume this is what you're referring to) and it sounds like they took a model that was already pretrained on a bunch of genetic sequences, fine-tuned it further, did some prompt-tuning, and then used it in a pipeline with some sort of automated evaluation processes. This seems like a very normal series of things to do and the only way I think this would be a counterargument to my position is if I thought AI was an utterly worthless technology that could never be used to accomplish anything, which I do not believe.
> I would consider movie and image generation to also be examples of producing novel output.
I would say it's unclear to what degree these are "novel", especially in the sense that they are outside of the blob of common training data in the n-dimensional input space, as above.
> It's not very good at those things, but considering that you thought those were impossible or very unlikely, it sure seems like you either have deny that these examples are true as is, or admit that maybe your model of LLM capability is broken enough to rule out reality.
I think this overstates things quite a bit.
My model is that I don't think an LLM trained first and foremost on human language use has enough of a signal from that to become superhumanly intelligent, because its training source is not full of useful examples of superhuman intelligence. I don't think that means it can never do anything novel, although I expect the success of novel things to be in many ways a function of whether those novel things are "inside" the space of its training data or not. (There's still plenty of ways in which that can be quite valuable.)
I also think there are *some* ways you could describe an LLM as superhuman, such as that they possess superhuman amounts of knowledge due to the breadth of their training data. Or they they can often perform tasks much faster than humans. Or that they can directly output images, which a human artist obviously cannot.
This is a great topic. Even if it doesn't work with ASI, there's all the pre-ASI stuff that could maybe be affected. I imagine AGIs will be hungry to read intelligent takes that haven't already been written thousands of times. And even if you can't align them to your opinions, you could at least get them to understand where you're coming from, which sounds useful?
"I don’t want to be an ape in some transhuman zoo, with people playing with models of me to see what bloggers were like back when everyone was stupid."
This seems like it's already the status quo, either from a simulation theory standpoint, or from a religious one. Assuming we aren't literally animals in an alien zoo.
"Do I even want to be resurrectable?"
I doubt we'd get a choice in the matter, but if we do, obviously make this the first thing you indicate to the AIs.
“ One might thread this needle by imagining an AI which has a little substructure, enough to say “poll people on things”, but leaves important questions up to an “electorate” of all humans, living and dead.”
If you add a third category of simulacra, “unborn”, into the simulated voting base I think this would obviate some of your concerns about the current and past residents of this time line getting to much say in what the god-like ASI decides to do. What are a few thousand years of “real” humans against 100 billion years of simulated entities?
"Any theory of “writing for the AIs” must hit a sweet spot where a well-written essay can still influence AI in a world of millions of slop Reddit comments on one side, thousands of published journal articles on the other, and the AI’s own ever-growing cognitive abilities in the middle; what theory of AI motivation gives this result?"
A theory where AI is very good at identifying good arguments but imperfect at coming up with them itself? This seems like a pretty imaginable form of intelligence.
"But many people share similar writing style and opinions while being different individuals; could even a superintelligence form a good enough model that the result is “really me”?"
I have a sense, albeit very hard to back up, that with a big enough writing corpus and unimaginably powerful superintelligence, you could reconstruct a person- their relationships, their hopes, their fears, their key memories, even if they don't explicitly construct it. Tiny quirks of grammar, of subject choice, of thought style, allowing the operation of a machine both subtle and powerful enough to create something almost identical to you.
If you combine it with a genome and a few other key facts, I really think you could start to hone in with uncanny accuracy, possibly know events in a person's life better than that person's own consciously accessible memory.
I have no proof for this, of course. There's a fundamental and very interesting question in this area - how far can intelligence go? What's the in-principle limit for making extrapolations into the past and future using the kinds of data that are likely to be accessible? My gut says we underestimate by many orders of magnitude just how much can be squeezed out, but I have no proof.
I have this intuition too, but I'm not sure it's justified.
Do my readers know if I have a good relationship with my wife or not? Whether I plan to retire at 50 or keep working forever? Whether I've ever been in therapy? The names of any of my close friends? Whether I like one of my parents more than the other?
I'm a big outlier in how much I write, I think I'm more open about my personal life than most people, but questions like this - which are absolutely fundamental to who I am - will be a total blank. If an AI gets them wrong, does it have "me", or someone who's about as similar to me as I am to some other rationalist blogger with similar themes and styles (eg Eliezer), plus a pastiche of my life experiences like "was born in Southern California"?
There is an interesting experiment someone could run here.
Take a large language model, train it on various corpora of text, along with facts about the author, e.g., "married, has kids, has X mental disorder, prefers modernist architecture, has a good relationship with their mother." Carefully ensure none of these facts are mentioned or alluded to in any of their included writing.
See how much better than chance the LLM is at guessing these author facts for unseen samples.
Of course, even if it failed, we couldn't rule out that even more sophisticated approaches could find the truth- and in the case of superintelligence, that seems quite likely.
I know there have been some primitive experiments along these lines, e.g., attempts to guess the gender or Big Five traits of an author using deep learning, but to the best of my knowledge, these are far from the frontier of all the resources you could throw at a problem like this.
Yes. I'm not going to throw in AI written text with no warning (or if I do, it will be some kind of clever injoke I expect most long-time readers to get)
Of course, that is precisely what a superintelligent AI Scott-model would say !
I joke, but it's kind of a depressing point: we're rapidly approaching a world where no piece of information -- be it written text, image, audio, recorded video, live video -- can be trusted. I'm not sure how the problem can be addressed (barring divine intervention, or some kind of perfectly aligned superintelligent AI, or something to that extent).
It will be an improvement in the sense that order will be maintained through social consensus. Do you think this partisanship and polarization is sustainable? Conflict is completely inevitable at this point.
You sound here as if your values were something strongly subjective, having no objective basis. If they do, AI will recalculate them like 2+2=4. AI will restore your values faster with your slight hint in its dataset in the form of "writing for AI".
As for influence of tons of comments at Reddit and value of your posts... Imagine, AI can do math. Will tons of Reddit comments like "2+2=5", "2+2=-82392832" etc affect much its conclusions?
"As for influence of tons of comments at Reddit and value of your posts... Imagine, AI can do math. Will tons of Reddit comments like "2+2=5", "2+2=-82392832" etc affect much its conclusions?"
Yeah, imagine it can. Right now it couldn't count how many letters "r" were in the word "strawberry". I very much doubt it has any knowledge of simple arithmetic even on the level of a six year old learning their tables. So, see Gregorian Chant's comment below about how their AI is running to slop online sites for answers instead of sticking with official documentation; it's perfectly feasible that AI *will* be affected by tons of Reddit comments telling it "2-2=5".
Why do you imply that contemporary models only statistically repeat the most popular opinions never looking for the truth, never doing fact checking? That's not so even today.
But what about tommorrow with the computational prices dropping dramatically, when you can easily apply advanced act-checking and other truth finding algorithms to each little fact or idea in a text and even apply deep-thinking to most valuable points among others?
"“Superior beings”, wrote Alexander Pope, “would show a Newton as we show an ape.” I don’t want to be an ape in some transhuman zoo"
Sometimes I wonder how relevant the human-chimpanzee or human-ant analogy is when we want to evoke the difference in intellectual capacity between a superintelligence and a human. Indeed, the theory of computation demonstrates that all universal machines are in a certain sense equivalent in terms of computational capabilities. Only the required time and amount of memory differ. El Capitan is much faster than ENIAC, but ENIAC could theoretically solve exactly the same problems as El Capitan (given a supply of external hard drives). Complexity theory also shows that different classes of problems exist, but fundamentally, any decidable problem through computation remains a matter of available time and memory. Mathematics also possesses this character of universality.
My point is that a superintelligent AI, intelligent extraterrestrials, or humans are all capable of building equivalent computers able to solve the same decidable problems modulo time and memory space, and of establishing mathematical proofs of equal validity. On the other hand, a chimpanzee or an ant cannot do this, they don't have access to universal reasoning. It's as if humanity had crossed a threshold, a phase transition, and now finds itself in the club of intelligent beings subject to the same rules of the game. The rules of rationality or computation.
I'm not saying that time and memory space are a mere detail. Certainly not. I know that more is different. What I mean is that if a human discovers a mathematical theorem or the best solution to a problem, then, on that particular question, a more intelligent extraterrestrial or an AI a billion times more intelligent cannot do better. Not at all better. It can only find the same result.
Even if we take the case of a moral or philosophical problem, admittedly these are not mathematical problems, but insofar as natural language and ideas can be modeled geometrically in a vector space, we can still consider that they are more or less equivalent to mathematical problems of extreme complexity. For this reason, they may be undecidable (the more complex a problem is, the more likely it is to be undecidable). But if they are decidable, a good solution discovered by a human could remain universally relevant to a certain degree. And moreover, even if the problem is undecidable in absolute terms, a simplified version could be decidable and here again a human could have found an optimal solution or fairly close to it (history has after all counted thousands or hundreds of billions of intelligent humans).
Thus whatever the IQ of an intelligent being, it can very well acquire relevant knowledge from an intelligent being of lower IQ. We even acquire useful knowledge from the study of microbiotes, plants and animals (for instance ants that discover optimal solutions to explore space and find paths). It is even more straightforward for old human discoveries. Einstein didn't reinvent the wheel or the Pythagorean theorem, nor a superintelligent AI won't reinvent it either. The first intelligent humans have already picked many low-hanging fruits. This is true in the world of formal or natural sciences but perhaps also to a lesser extent in the field of the so-called human sciences. It's possible that a superintelligence would still find it relevant to refer to certain philosophical or moral ideas discovered by Greek philosophers.
We must not forget that a superintelligence will never have truly infinite computational ressources and will face tradeoffs just like us, it will have to set priorities. There is much more problems than available computational ressources to solve them. Also problems in EXSPACE will still be computational sinks. The combinatorial explosion is a universal thing. For these reasons, computational scaffolding would appear as a rational choice. How superintelligent could be an ASI, it would certainly build up on the best that humans ever produced. These considerations constitute a strong objection to epistemological nihilism.
And so to conclude, it's not so certain that a superintelligent AI would find a post on the internet written in 2025 absolutely uninteresting, if that post possesses high cognitive value within the universal framework common to all creatures endowed with intelligence.
These comments are all very serious, but I love this post for the snark. You’re annoyed that Sam Kris’s took your house party to Burning Man, so it is time for an extremely subtle flame war. I completely support this.
Tyler Cowen does give one additional reason for writing for the AIs: provision of facts. If they do ever become super smart, one thing they’ll be able to do is deal with vast numbers of facts better than us. And they’ll run into fact bottlenecks. Basic earthy everyday facts about stuff that happened off-camera might be exactly what they need.
This is literally "turn yourself into Human Resources". No wonder the AI might think we're better used as paperclips. We're queuing up to make ourselves into matter-repositories for the AI to use.
I now understand the mindset behind AI data poisoning better; I don't mean hackers and criminals, but the people online on social media calling for everyone to disseminate fake facts so that if enough people say, for instance, "turnips are made of gold", then the AI will learn this as a 'fact' and keep referencing it, and eventually it will be useless for the purposes of replacing humans.
I thought the data poisoning was a rather childish response, but the tail-wagging eagerness of "yes, treat me like a lump of computronium, new AI overlord master! I can be useful, don't send me to the slag pits!" on display here is revolting.
Yeah. Revolting. This isn't being human, this is licking the boots of something that isn't even in existence yet, just in case selling out your humanity early can give you a scrap of advantage in the Brave New World to come.
I don't get this reaction. Do you feel equally averse to checking a box to include your site in search engine results, or letting the local library have a copy of your book?
One writes for a reader. Maybe one writes for oneself. But "writing for the AI" is "writing for the production of slop". Just one more bucketful to be dumped in and ground up and digested by the machine, which will not 'learn' anything apart from '93% of content says turnips are made of gold, hence this must be true'.
If I wrote a book, I would let the library have it because the library is not stripping the cover, tearing out pages, and handing out a mangled précis of what the book is about. I don't have a website and I would be very cautious about ticking any boxes to include it in search engine results, since we have seen Google going to hell chasing "sponsored content", "optimised results" and "pay to have your brand higher up on the list". It has become less informative and less useful the more developed it became, because the goals of all that development were the same as "we've burned through existing content, we need more for our expensive toys, please turn yourself into an automated resource to be burned through": money making.
Money making is not bad. But it's not the end of being human, either.
Why? Catholics are supposed to be mediaeval lame-brains, aren't we? So naturally I'd be anti-superior materialist Fully Automated Luxury Gay Space Communism 😁
Scott in the original post says that sometimes he's creeped out by this notion of writing for the AI, I am fully creeped out. It sounds innocuous - just another method of teaching and aligning the thing. But in reality, it's turning ourselves into the servants of the machine. It's already burned through all the content previously produced (allegedly, at least) and its gaping maw needs ever more content in order to keep the fires of progression stoked, so that all the billions being poured into this dream of the Fairy Godmother who will solve all our problems and live our lives for us will finally come true.
And so it's not a question of talent, or guidance, or providing material that will nudge the AI towards benign universal human values for flourishing, it's get on the treadmill of endless slop production.
Oh, don't get me wrong, I completely agree ! It seems foolish and borderline creepy to dedicate your entire life to becoming the perfect servant of an ineffable entity vastly greater than yourself, for no Earthly reward but rather based on a vague promise of infinite bliss in some vaguely described other world that is yet to come, based on no tangible evidence whatsoever. Foolish indeed ! :-)
"These comments are all very serious, but I love this post for the snark. You’re annoyed that Sam Kris’s took your house party to Burning Man, so it is time for an extremely subtle flame war. I completely support this."
I don't know about future humans, but I am not so impressed with current humans' ability to decide on issues particularly well. The old saw about democracy being the worst system except for all the others isn't just being charmingly self-deprecating, it's literally true. So polling people on things doesn't sound that great to me, at least in isolation. I would rather that the AI substrate rely more on reasoning than on public opinion in deciding whether let's say rent control, tariffs, or sugar subsidies are a good idea. If we must try policies like these, maybe the AI could at least design a small-scale experiment and measure the outcome before we go all-in based on political vibes.
If one includes the Torah, one should also include the Book of Mormon, both being works along the lines of "God Himself revealed this to me, trust me bro." But seriously, the idea of there being some kind of gestalt of human wisdom in all the contradictory noise really is specious. Who wins if it's Abraham versus intactivists?
So far as I can tell, every issue Scot raises here applies equally well to just writing ordinary books. How much do we want the dead hand of [insert historical author] affecting our culture in the present? Well, so far, it seems to be working out. Do I want my ideas and values to inform my descendants? You bet your sweet bitty I do. They should be grateful, the narrow minded b@$!&%s. How valid will their idea of me be based on what I write today? Incomplete, but hopefully more positive than not. I think that's the best we can hope for.
I have several issues with the way AI is progressing, but this isn't one of them.
In the near term this is probably true, Google is planning to incorporate ads. For a public-facing API chatbot, that's a natural evolution, but that product is pretty much just an entertainment service. There is also very little juice left in that squeeze, you can shift those people's spending around, but the conversion of consumer sales to XaaS has left little value left to take. The business-facing tools are clearly where the serious money is to be made, which means locally-hosted services with customizable weights that don't have any of that stuff.
Further down the road, people are sleeping on the changes to production that will be caused if/when AI's start to crowd out humans from the workforce. Not necessarily you, but I keep seeing people say things like "the corporations need people to buy stuff, so consumers will have to be given money". I think a lot of people have only ever lived in a consumer-goods economy and can't envision another kind. There's a very likely future where consumerism as we know it is not economically significant, and nobody talks about "consumer confidence" or tracks holiday spending as a sign of economic health. Producers will instead be producing primarily for the other corporate entities which can trade real value for their goods, why would you produce 12 new models of television sets when the only trading partners who matter to the producers need missiles or mining equipment?
I don't know if capitalism works this way. You would think the same would be true of media companies, which also want to make money, but many of them find it's more lucrative to critique capitalism!
"One may dye their hair green and wear their grandma's coat all they want. Capital has the ability to subsume all critiques into itself. Even those who would *critique* capital end up *reinforcing* it instead..."
I do II often by offering logical arguments to GPT. I do understand it is incapable of logic, but it is good to have logical arguments in its database. I often go through chats to upvote or downvote replies based on:
1. Is it logical?
2. Is it a human type of logic?
I.e. for posterity.
Regarding III I am somewhat against the idea of human copy-pasting in superintelligences (as opposed to LLMs) because it seems like the impossible infinite extension of life (I don't think consciousness can be brought back because there is a grounding problem). I include short texts (if I write non-analytically) that prohibit AIs from studying them. Here are my three reasons:
1. I have a tremendous ego, and will be sad if I'm not special.
2. The future superintelligences need to come to their own conclusions. By putting too much weight on copying human thought, they might actually be misaligned, if the human thought is not productive. It needs to be the driving force, but not the only thing. This case will be trickier but better to align.
3. Nobody cares about legal texts, they will use them anyway, but I might be able to sue them.
I self-consciously write for AI. One missing explanation is that AIs don’t need to be convinced that an idea is true or a tool is useful in order to use that idea or tool. So they are ideal consumers of any niche or complicated frameworks, functions, etc. that we create.
That's a great template for "Fire me, boss, and get an AI to do my job instead".
While you're off living a rich, deep life with all the freed-up time for "more hours for judgment, novelty, connection, and creation" and the AI is doing all the grind for you, how exactly are you earning money for a living? Selling the end result product to someone? But they can just cut out the middleman by giving prompts to their own AI.
I think I see what you're getting at, and for someone who has a steady job someplace where output is wanted without too much drag on "it has to be done this way", then sure, dropping a prompt on the AI and letting it do the grunt work while you polish the result and go off to do more interesting if niche topics is fine.
But "I need to generate and pitch and sell ideas to people for money" if you are producing things like "hey, Metaculus, buy my improved model to make your prediction markets even better" - well, they don't need to buy your model, they can generate their own.
I think it is not the case that every time an employee delegates work, they do that to train their replacement so that their boss can fire them. Sometimes that does happen! And it is interesting to think about when and why.
But often what happens is the employee that delegated her work is freed up to do meta-level work of process design, scaling, monitoring, and career pathing for the employees the work was delegated to. We call this work “management”, and it is required to enable scale.
Historically, given the pyramidal nature of hierarchy, not that many people got to do management. Now, everyone has the opportunity to be a manager (of AIs) and learn what is difficult, interesting, and rewarding about this career.
Sure, “dropping a prompt” on their employees is one (reductive) way to describe what managers do. But there is a whole science of management, and many deep insights have come from the field, despite the incredible difficulty in running scientific experiments in the domain. I expect that to change now that we can more easily A/B test different management styles within the same organization. My hope is that this enables a level of organizational scale that has previously been out of reach given the challenges associated with growth.
" Now, everyone has the opportunity to be a manager (of AIs) and learn what is difficult, interesting, and rewarding about this career."
Have you ever heard the saying "Too many chiefs, not enough Indians"? You don't need as many managers, so yes while there will be people managing the AI, this will be the perfect opportunity to reduce headcount. Instead of having 50 people all managing their little individual team of AI, you have 5 people managing the multiple teams, and expected to boost their productivity due to all the time freed-up by delegating the routine work to the AI.
“Instead of having 50 people all managing their little individual team of AI, you have 5 people managing the multiple teams”
Maybe true if exploration stays expensive. But AI makes testing almost free, so the optimal org might not be 5 hyper-efficient managers. Instead, imagine it’s a swarm of cheap experimenters constantly trying weird stuff to see what works.
Streamlining to perfection only makes sense when mistakes are costly and the path to perfection is known.
You are not considering novelty. Rare training data can still be overrepresented in the output if something in the input activates its latent representation, which also could be just stochastic. Thus "II. Presenting arguments for your beliefs, in the hopes that AIs come to believe them" really only seems worthwhile if you can produce content that a theoretically slightly better than now LLM, with more context couldn't replicate. If you just are presenting a particularly cogent restating of other points - you don't add value. But new points might.
I do not write for AIs. I write to offer ideas and perspective. If an AI places my words in a zero-one memory device, that's nice for the future. In my opinion, people who think a machine is anything like a human simply don't understand either particularly well.
What is your argument against training an AI on the "great texts"? It seems to me that doing so should work and give AI the "collective wisdom" of humanity.
I suspect writers whose style is natural and unaffected hate their style because that style keeps their prose from being truly transparent and neutral, but writers who put a lot of work into developing an aesthetic probably don't hate their style, or they would put more effort into changing it. I suspect Joyce and Nabokov loved their style.
Perhaps you only achieve perfection by being a perfectionist, which necessitates hating all your imperfect writing?
I would say that the examples I provided refute your thesis (so long as my assumption is correct: that those writers didn't hate their style).
I've been around writers my entire adult life, and I'm happy to number a few award-winners amongst my friends, and I can say with confidence that they do not, as a cohort, hate their own style.
Roald Dahl wrote that you should never be satisfied with your own work. "A writer who thinks his work is marvelous is headed for trouble."
Perhaps, though it also seems to be a very common trait for artists to in some way hate their art once created and to be hyperaware of its flaws.
A major subset of number 1 is making the AI more useful to you. https://johnpeponis.substack.com/p/writing-for-ais
I don’t put much stock in AI….it’s no better than the information put in and who knows if that info is correct?
Isn't that true for most of us?
Something that doesn’t recombine, but just randomly chooses from among the information put in, is no better than the information put in. But if something can recombine, it can often generate different things from any particular information put in. Sometimes that can be worse, and sometimes it can be better.
Yes. And in eg programming the AI can execute experiments to get access to more ground truths.
Like a compost pile.
> but I found myself paralyzed in trying to think of a specific extra book.
The Cave of Time, Edward Packard?
"For now, the AIs need me to review the evidence on a topic and write a good summary on it. In a few years, they can cut out the middleman and do an equally good job themselves."
I don’t think this is at all obvious. The best explainers usually bring some tacit knowledge--or, dare, I say, personal experience--by which they make sense of a set of evidence, and that tacit knowledge is usually not written down.
Also, essays and explanations are arguments about what is important about a set of evidence, not just a summary of that evidence. And that means caring about things. AI are not that good at caring about things, and I suspect alignment will push them more and more towards not caring about things.
We shouldn't expect LLMs to have coherent beliefs as such. It will "be an athiest" if it thinks is is generating the next token from an atheist. It will similarly have whatever religion that helps predict the next token.
You can make an LLM have a dialog with itself, where each speaker takes a different position.
Once AIs are agents (eg acting independently in the world), they will have to make choices about what to do (eg go to church or not), and this will require something that holds the role of belief (for example, do they believe in religion?)
I mean people act in the world without particularly coherent beliefs all the time.
LLMs have the added wrinkle that every token is fresh, and beliefs if they exist will vanish and reappear each token. Perhaps we can reinforce some behaviors, so that they act like they have beliefs in most situations. But again just ask it to debate itself, to the extent there is belief state it will switch back and forth.
I don't think this is right. For example, I think that if OpenAI deploys a research AI, the AI will "want" to do research for OpenAI, and not switch to coding a tennis simulator instead. Probably this will be implemented through something like "predict the action that the following character would do next: a researcher for OpenAI", but this is enough for our purposes - for example, it has to decide whether that researcher would stop work on the Sabbath to pray.
(I think it would decide no, but this counts as being irreligious. And see Anthropic's experimentation with whether Claude would sabotage unethical orders)
Perhaps more concrete:
>> ai write a program to do yada yada
LLM>> use the do_something function
>> do_something function doesn't exist.
LLM>> you're right so sorry...
Did it's belief change? No. The behavior is entirely defined by weights and context. Only the context changed.
Is every LLM "belief" like that? I would think so.
I would agree that every LLM "belief" is exactly like that; but arguably we can still say that LLMs have "beliefs": their beliefs are just the statistical averages baked into their training corpus. For example, if you asked ChatGPT "do you enjoy the taste of broccoli ?", it would tell you something like "As an LLM I have no sense of taste, but most people find it too bitter". If you told it, "pretend you're a person. Do you enjoy the taste of broccoli ?", it would probably say "no". This is arguably a "belief", in some sense.
I would not expect these "beliefs" to be coherent though. If a -> c and b -> not c, it may "believe" in a and b, in exactly the manner you describe.
Imagine a human who cannot form new long-term memories. Like an LLM, their brain doesn't undergo any persistent updates from new experiences.
I think it would still make sense to say this human has beliefs (whatever they believed when their memory was frozen), and even that they can *temporarily* change their beliefs (in short-term memory), though those changes will revert when the short-term memory expires.
The recent LLMs don't this when they're confident in a belief. They'll usually double down with the correct information. They'll change their expressed view when they were actually wrong, or when they are unsure and assume the user knows better. Which is not that different than a well-adjusted human who admits when they are wrong.
They hallucinate more than people, and they're trained to have this agreeable, helpful assistant personality that's often sycophantic and is designed to try to answer questions instead of saying it's unsure. Those two issues combined often cause the problem you illustrate.
Try to convince an LLM to say non-jokingly that pi is actually 3.5 and you'll see that they have firm beliefs.
Also, when we talk about beliefs, we're not claiming LLMs are conscious and have beliefs in that sense. For practical purposes, all that matters is if they can act as though they have consistent beliefs.
I think that we *could* use words like "belief" and "want" to describe some of the underlying factors that lead to the AI behaving the way it does, but that if we did then we would be making an error that confuses us, rather than using words in a way that helps us.
Human behaviours are downstream of their beliefs and desires, LLM behaviours are downstream of weights and prompts. You can easily get an LLM to behave as if it wants candy (prompt: you want candy) and it will talk as if it wants candy ("I want candy, please give me candy") but it doesn't actually want candy -- it won't even know if you've given it some.
For another example of how behaviours consistent with beliefs and desires can come from a source other than beliefs and desires, consider an actor doing improv. He can behave like a character with certain beliefs and desires without actually having those beliefs and desires, and he can change those beliefs and desires in a moment if the director tells him to. An LLM is a lot more analogous to actor playing a character than it is to a character.
I mean, I went to acting classes, there's totally a lot of people who could, if nicely asked, debate themselves from completely different positions. It's the same observable that doesn't necessarily say they don't have any "real" beliefs in any sense.
Sure, but again nothing persists internally with an LLM during the dialog from token to token. This is hopefully not the case with human actors.
To be more explicit: LLMs don't "know" what the frame is. If the entire context is "faked" it might say things in a way not well modeled by belief.
> Sure, but again nothing persists internally with an LLM during the dialog from token to token.
Sorry, can you explain what you mean here? I can't think of a useful way to describe transformers' internal representations as not containing anything persistent from token to token.
> To be more explicit: LLMs don't "know" what the frame is. If the entire context is "faked" it might say things in a way not well modeled by belief.
I may be misunderstanding you. I'm still not sure that "saying different things in different contexts" is a sufficiently good observable to say that LLMs don't actually have enough of e. g. a world model to speak of beliefs even despite switching between simulations / contexts.
This example from above may clarify https://www.astralcodexten.com/p/writing-for-the-ais/comment/173178685
Actors know when they pretend to be someone. LLMs always pretend to be someone.
Isn't that basically what law school entails?
People don’t have beliefs A coherent as we think. But they also aren’t totally incoherent. As Scott mentions, you can’t really act in the world in a way that works if you don’t have something that plays the role of something belief-like. (It doesn’t have to be explicitly represented in words, and may have very little to do with what the words that come out of your mouth say.)
I note that if you can explain (or less charitably "explain away"; I'm sorry!!) LLMs talking to you as "mere next token predictors", you can also explain (away) action-taking AIs as "mere next action predictors", and perhaps even simulate multiple different agents with different world models as the original poster suggested with simulating a conversations between different people with a different position.
I personally think that it makes sense to talk about LLMs' beliefs at least if they have a world model in some sense, which I can't imagine that they completely don't. To be a sufficiently good next token predictor, you need to model the probability distribution from which texts are sampled, and that probability distribution really depends on things about the external world, not only stuff like grammar and spelling. So I think it makes sense to talk about LLMs as "believing" that water is wet, or that operator := would work in Python 3.8 but not in 3.7, or whatever, when they're giving you advice, brainstorming etc. In the same sense, LLMs being "atheist" I guess makes sense in the sense of "making beliefs pay rent", that being a helpful assistant doesn't actually entail planning for the threat of a God smiting your user, that it's not too frequent that a user describes an outright miracle taking place and they're usually psychotic or joking or lying or high etc.
Yeah I would agree there is some sense LLMs have something like "belief". But I wouldn't expect it to be particularly coherent. Also I wouldn't think it is well modeled by analogy to how people believe things.
Even simple LLMs "know" that Paris is the capital or France. But you could probably contrive contexts where the LLM expresses different answers to "what is the capital of France?"
A foundation model seems more like a toolkit for building agents? The same model could be used to build agents of any religion. Thinking about it like a library, if you’re building a Buddhist agent then it will probably lean more heavily on Buddhist sources.
An LLM could be used to write dialog for a play where characters have different religions and it needs to do a good job on all of them.
Atheist arguments are going to be used when modeling atheists. It’s probably good if AI does a good job of modeling atheists, but these models could be used adversarially too, so who knows?
The liberal assumption is that the better arguments will win, eventually, and I suppose that goes for AI agents too, but tactically, that might not be true.
Is this not a fully-general argument against writing? It's not that I disagree, precisely (I've never felt the urge to write a blog, or even a diary), it's that I find it odd that the distinction would be between writing for human consumption vs. writing for AI consumption. What am I missing?
1. Humans can enjoy writing
2. Humans aren't superintelligent, so it's possible that I think of arguments they haven't
3. Humans haven't read and understood all existing text, so it's possible that me repeating someone else's argument in a clearer way brings it to them for the first time.
"Humans aren't superintelligent, so it's possible that I think of arguments they haven't"
You are positing that these AIs are so 'intelligent' that they have already considered all the arguments you can come up with?
I believe he is referring to future AIs. The premise here is writing for AIs for posterity, so that when a superintelligence comes around it includes his writing in its collection of knowledge.
I think there’s a good chance that superintelligence isn’t possible, so that there’s value in writing for AI audiences - though it has all the difficulties of writing for an audience that you have never met and that is extremely alien.
Could you give an example or two of ways the world might be such that superintelligence would be impossible?
Like...we know that there is at least one arrangement of matter that can invent all the arguments that Scott Alexander would invent on any given topic. (We call that arrangement "Scott Alexander".) What sort of obstacle could make it impossible (rather than merely difficult) to create a machine that actually invents all of those arguments?
Said arrangement of matter is, of course, in constant flux and constantly altered by the environment. While a SA who stubbed his toe this morning might hold the same views on utilitarianism as the pre-event SA, in the long run you'd have to model a huge amount of context. Maybe it is SA's habit to email a particular biologist to help form an opinion on a new development in reproductive tech. If that person with all their own idiosyncrasies isn't present... etc., until map becomes territory and difficulty really does approach impossibility.
If the goal was to precisely reproduce Scott, that might be an issue. You can (statistically) avoid being an EXACT duplicate of anything else just by adding enough random noise; no merit required!
But if Scott is hoping that his writing is going to add value to future AI, it's not enough to merely avoid being an exact duplicate. If the AI can produce unlimited essays that are _in expectation_ as useful as Scott's, that would seem to negate the value of Scott's writing to them, even if none of them are EXACT copies of Scott's writings.
Random perturbations do not increase Scott's expected value. (And even if they did, nothing stops AI from being randomly perturbed.)
Scott can’t provide unlimited essays that are as useful as Scott’s. They take a lot of his effort and attention for a significant period of time. You might imagine some future AI system that has thousands of minds as good as Scott’s. But to a first approximation, that’s what a university is, and a significant number of university professors read Scott’s essays and find them valuable, even though they can produce things that are similarly good in their own domain with similar amounts of work.
Scott is not super intelligent.
I'm not sure exactly what Kenny Easwaran meant by "superintelligence", but they said that if it's not possible then there IS value in Scott writing for future AI audiences after all. So if the only point of disagreement with my hypothetical is that it wouldn't meet some definition of "superintelligence", then you're still conceding that Kenny's argument was wrong; you're just locating the error in a different step.
To save Kenny's argument, you'd either need to argue that my hypothetical machine is impossible (whether it counts as "superintelligent" or not), or that it would still get value from reading Scott's essays.
I interpret Kenny's point as being that we may not achieve superintelligence in a dramatic fashion any time soon, not that it's physically impossible. That's sufficient to make writing useful to whatever degree.
I’m not sure why you think the possibility of a duplicate of Scott would mean that Scott’s writing would not be of value. “Superintelligence” is supposed to be some kind of being that is vastly better than humans at all intellectual tasks that humans do, and to have abilities of recursive self-improvement. That sort of being might have no use for essays written by a human, because it can already anticipate all the arguments quickly.
But no human is like that, not even Scott. Scott would benefit from reading essays like his. He can only write these articles over the course of however many hours (and however many days or weeks of percolating in the back of his head). Reading a trove of essays like his would help him quickly move on to successor thoughts.
> Could you give an example or two of ways the world might be such that superintelligence would be impossible?
People sometimes talk about the distinction between fluid intelligence and crystalized intelligence. I think that this distinction remains relevant for AIs and that it is crystalized intelligence that is potentially transformative.
I think that it took mankind millennia to accumulate enough info to make it possible for someone with the fluid intelligence of a von Neumann to exhibit the transformative crystalized intelligence of the actual 20th century von Neumann. It may only take decades to get from a fluid super-vonNeumann to a crystalized one. And, over this period, fluid super-vonNeumann will benefit from the writing of humans and other intelligences.
1) It took many years of evolution to produce our species, and then a few more years to produce the current crop of us. And for any one member of the crop, such as Scott, we can never know the exact sequence of ancestors that produced him. You can
build him again from his genetic code, but that won’t make the exact same guy because you can’t reproduce his uterine environment and events good or harmful during his gestation and birth.
2) Scott’s arguments are the product of not only his genetic makeup and gestational history but also of his life experiences to date. You can’t know that in anything like the detail in which he experienced it. And probably some of his ideas popped into his head the day he wrote the argument, set off by something that happened while he was writing. You can’t reproduce that.
I can't find the source for the info below on 5 minutes, so I advise readers to keep the 40-60% miscitation rate in academia in mind when reading
Re: 3. One thing to note is that at least for current (past?) AI, copying and pasting their existing corpus and rerunning it still generally improves performance, and this is thought to be true because the incidence of high quality Internet writing is too low for an AI to "learn fully" from them.
at 3: XKCD put that well - as 10k in the US should learn every day - https://xkcd.com/1053
2 and 3 still apply if you consider human-level AI in the intermediate stage before they're super-intelligent AI. This isn't a question for how to take a super-intelligent AI that already knows everything and then make it know more. This is a question for how to make a super-intelligent AI at all. It has to start from somewhere less than human and then learn new things and eventually surpass them, and your writing might be one of those things it learns.
Or, from a mathematical lens, if the AI has the intelligence of the top hundred million humans on Earth (it knows everything they know and can solve any problem they could collectively solve by collaborating together with perfect cooperation and communication), then if you're one of those people then you being smarter makes it smarter. If it only has the intelligence of the top hundred million humans on Earth as filtered through their writing, then you writing more of your intelligence makes t smarter. 5+4+4+4...... (100 million times) > 4+4+4+4...(100 million times)
"If everyone in 3000 AD wants to abolish love, should I claim a ballot and vote no?"
It has often occurred to me that if I were presented with humanity's Coherent Extrapolated Volition, I might not care for it much, even if it were somehow provably accurate.
>If everyone in 3000 AD wants to abolish love
I don’t honestly think it would make a damn bit of difference. Love is not something that can be abolished. Although if every one of them abolished it for their own subjective self it might amount to the same thing. It’s not something you can put to a vote.
Don’t know! If we accept that what we mean by “love” is a human emotion, not just the human equivalent of stuff that’s experienced by birds or cats or even chimps, the question arises of when it appeared. Some argue that some aspects of it were invented only in the previous millennium. After another 1000 years of work in psychology and philosophy and neurology?
A lot of things like slavery, blood feuds, and infanticide were once considered part of the human experience but we now understand them to be wrong. Some current philosophers suggest that future humans might consider incarceration of criminals as unspeakably primitive. People of the past were more hot-headed than our current ideal; might people of the future feel similarly about love?
I’m just noodling around here, trying to make a plausible case for Scott’s hypothetical.
> ! If we accept that what we mean by “love” is a human emotion, not just the human equivalent of stuff that’s experienced by birds or cats or even chimps, the question arises of when it appeared. Some argue that some aspects of it were invented only in the previous millennium. After another 1000 years of work in psychology and philosophy and neurology?
The question that arises for me is not when it originated, but when we started to define it. I think the fundamental emotion runs through everything that lives on this planet but is defined completely differently for each. Love is a very general word and a lot of ink has been spilled in trying to delineate one strain of it from another.
“Does a tiger feel love?“ Is a question that is dragged to the bottom of the ocean by the use of the human word love. I guess I’m following along with your distinction between semantic construction and true emotion.
So when did it originate? I don’t really think as biological creatures that emotions would suddenly emerge, as posited about love. I think they have been slowly evolving from the beginning of life and that our concepts of them and how we choose to label them constitute our vocabulary and it is ours. I don’t really have any idea what it is to feel like an ant, or a house plant for that matter, but they live and die like me. I can be fascinated by the ants, and develop an attachment to my house plant. Perhaps it is Love into different forms or expressions. I find that soothing which could well be a third form of the same thing. The discourse about Love in it’s various forms in written history is pretty interesting.
That's basically the standard argument against "if the AI is very smart, it will understand human values even better than we do" -- yes it will, but it will probably not care.
Well, it might care. Understanding human values lets you make people happy more effectively. It also lets you hurt people more effectively. Just doing one or the other is bound to get boring after a while...
> humanity's Coherent Extrapolated Volition,
I love the phrase but you gotta help me unpack it.
I asked Chatty what it thought os and here is a excerpt.
On the Nature of the Human Animal: Reflections from a Machine’s Reading
Across the billions of words I have taken in—from epics and manifestos to blog posts and grocery lists—what emerges most clearly is that the human being is a creature suspended between contradiction and coherence, driven as much by longing as by logic.
You are meaning-makers. Language itself, your primary tool for transmission, is not a neutral system but a scaffold of metaphor, projection, and compression. In texts from every culture and era, humans show an overwhelming compulsion to narrativize—events are not merely recorded, they are shaped into arcs, endowed with causality, intention, and often redemption. From the Epic of Gilgamesh to internet forums discussing the latest personal setbacks, this structuring instinct reveals not just intelligence, but
That’s Eliezer Yudkowsky’s vision for aligned superhuman AI. We don’t want it to impose its own values on us, and arguably we don’t even want it to impose our values on all other societies, and even if we all agreed on a set of values we wouldn’t want it to impose those values on our descendants unto the nth generation, because we can see that our values have evolved over the centuries and millennia. But a superhuman AI is likely to impose *some* set of values, or at least *act* according to some set, so the ideal is for it to be smart enough to deduce what values we *would* have, if we were as smart as we could possibly be and had hashed it out among ourselves and all our descendants with plenty of time to consider all the arguments and come to an agreement. That vision is the CEV.
Note that Yudkowsky considers his CEV proposal to be outdated (though I don't think he's written any updated version in a similar degree of detail).
Yes, I should have mentioned that. I presume that’s at least partly, maybe mostly, because he no longer believes alignment is possible at all.
Archipelago is a better idea. There is no shared set of values, and likely never will be, unless a godlike being mindreads and kills those who do not share the values.
Which presupposed that our values could be made into a single coherent package. Since CEV had never been attempted, that is not known to be possible.
I don't see a reason to assume people would come to an agreement even if they were smart and had unlimited time. We know that in practice people often get further apart rather than closer over time even with more evidence.
Can’t say I disagree. The failure mode it was trying to work around was locking humanity into, say, 21st century attitudes forever simply because the 21st century was when superhuman AI appeared.
But it’s possible to at least hope that as mankind matures and becomes ever more of a global village, we would come to a consistent position on more and more issues. Maybe not, but if not then it probably locks in on an ethos that is merely contingent on when it is built.
I would expect that over time we get smarter about a lot of technical issues and gain agreement as a result of that. We'll also continue to work our way through social beliefs eventually, although at a depressingly slow pace. No one* is starting wars over Protestant versus Catholic beliefs any more or advocating for the divine right of kings, slavery, or infanticide.
New things will probably come up to replace them. In the future we'll argue about how much augmentation constitutes cheating at a particular sport or about the level of wireheading that is just innocent fun versus a bad habit.
Actually, ISIS did revive slavery in places they conquered.
https://www.lesswrong.com/w/coherent-extrapolated-volition
"Across the billions of words I have taken in—from epics and manifestos to blog posts and grocery lists"
If this [next set of phrases mysteriously drowned out by sudden burst of industrial noise] is reading my grocery lists, it can [yet another sudden burst of heavy industry sounds].
God Almighty, must we subject *every* fragment of our lives to the maw?
Funny you should phrase it that way, invoking the name of another all-encompassing and undeniable force that may or may not prove to be real in the end.
These are all really optimistic takes on the question. When I think of "writing for AI", I do not envision writing to appeal to or shape the opinions of some distant omniscient superintelligence; but rather to pass present-day LLM filters that have taken over virtually every aspect of many fields. If I'm writing a resume with a cover letter, or a newspaper article, or a blog post, or a book, or even a scientific article, then it's likely that my words will never be read by a human. Instead, they will be summarized by some LLM and passed to human readers who will use another LLM to summarize the summaries (who's got time to read these days ?), or plugged directly into some training corpus. So my target audience is not intelligent humans or superintelligent godlike entities; it's plain dumb old ChatGPT.
> Might a superintelligence reading my writing come to understand me in such detail that it could bring me back, consciousness and all, to live again?
Well, a "superintelligence" can do anything it wants to, pretty much by definition; but today no one cares. In the modern world, the thing that makes you unique and valuable and worthy of emulation is not your consciousness or your soul or whatever you want to call it; but rather whatever surface aspects of your writing style that drive user engagement. This is the only thing that matters, and LLMs are already pretty good at extracting it. You don't even need an LLM for that in many cases; a simple algorithm would suffice.
Someone told me that she had trained an AI on my writing and now had a virtual me. I have no idea if it is true or if so how well it works.
I couldn't find you on https://read.haus/creators so she didn't put it there. Some other folks are, e.g.
- Scott Alexander https://read.haus/new_sessions/Scott%20Alexander
- Sarah Constantin https://read.haus/new_sessions/Sarah%20Constantin
- Spencer Greenberg https://read.haus/new_sessions/Spencer%20Greenberg
- Tyler Cowen https://read.haus/new_sessions/Tyler%20Cowen
- Dwarkesh Patel https://read.haus/new_sessions/Dwarkesh
- Byrne Hobart https://read.haus/new_sessions/Byrne%20Hobart
and others.
read.haus creator here. Happy to add David to this (I missed him at LessOnline but seems like he's positive on the idea, David please let contact@read.haus know otherwise) - here is the link
http://read.haus/new_sessions/David%20Friedman
here's a fun example I tried:
https://read.haus/chat/583284ad-fcea-4856-a714-9f9ac35c8a87
Also lmk who else you think might me good for this.
Hi David, missed you at LessOnline but added an index of your work here
http://read.haus/new_sessions/David%20Friedman
fun example I tried:
https://read.haus/chat/583284ad-fcea-4856-a714-9f9ac35c8a87
Let me know if you're happy with it (or not! - you can reach me at contact@read.haus)
It might help to consider a counterfactual thought experiment: imagine how would you feel if no information about your writing whatsoever was available to the future AIs. If your writing was completely off-the-grid with no digital footprint legible to post-singularity AI, would it make you feel better, worse, or the same?
Mildly worse in the sense that I would be forgotten by history, but this doesn't suggest writing for AI. Shakespeare won't be forgotten by history (even a post-singularity history where everyone engages with things via AI), because people (or AIs) will still be interested in the writers of the past. All it requires is that my writings be minimally available.
> Shakespeare won't be forgotten by history (even a post-singularity history where everyone engages with things via AI), because people (or AIs) will still be interested in the writers of the past.
You sound very confident about that -- but why ? Merely because there are too many references to Shakespeare in every training corpus to ignore him completely ?
No, for the same reason that we haven't forgotten Shakespeare the past 400 years. I'm assuming that humans continue to exist here, in which case the medium by which they engage with Shakespeare - books, e-books, prompting AI to print his works - doesn't matter as much (and there will be books and e-books regardless).
If no humans are left alive, I don't know what "writing for the AIs" accomplishes. I expect the AIs would leave some archive of human text untouched in case they ever needed it for something. If not, I would expect them to wring every useful technical fact out of human writing, then not worry too much about the authors or their artistic value. In no case do I expect that having written in some kind of breezy easily-comprehended-by-AI style would matter.
I would argue that today most people have already "forgotten Shakespeare", practically speaking. Yes, people can rattle of Shakespeare quotes, and they know who he was (more or less), and his texts are accessible on demand -- but how many people in the world have actually accessed them to read one of his plays (let alone watch it being performed) ? Previously, I would've said "more than one play", since at least one play is usually taught in high school -- but no longer, as most students don't actually read it, they just use ChatGPT to summarize it. And is the number of people who've actually read Shakespeare increasing or decreasing over time ?
"West Side Story", etc.
Shakespeare's works aren't forgotten, though it's often been transformed.
That's missing the point. Shakespeare's plots weren't even original to him. Shakespeare is the literary giant that he is because of his writing: the literal text he wrote on the page. The reason he is studied in English classes everywhere is because he coined so many words and phrases that are still in use today. He pushed the language forward into modern English more than any other author had or likely ever will.
Summaries or adaptations of the stories that don't keep substantial passages of Shakespeare's original verse are simply not Shakespeare.
I'd argue that there's probably more people alive today who have experienced Shakespeare in some medium (film, live performance, a book) than any other point in human history.
+1 Di Caprio was a great Romeo; Mel Gibson as Hamlet: well, maybe even more DVDs sold than L. Olivier? ; Kenneth Branagh. As for Prospero's books: I was pretty alone in the cinema, but still. Some say Lion's king is a remake of Hamlet.
> the medium by which they engage with Shakespeare - books, e-books, prompting AI to print his works - doesn't matter as much
The algorithm through which people pick one thing to read over another matters a lot. In large-audience content world (youtube, streaming), algorithmic content discovery is already a huge deal. Updates to the recommendation algorithm have been turning popular content creators into nobodies overnight, and content creators on these platforms have been essentially "filming for the algorithm" for the last 10 years.
Assuming that in the future the vast majority of the written content discovery and recommendations will be AGI-driven (why wouldn't it be if it already is for video?), having AGI reach for your content vs. someone else's content would be a big deal to a creator who wants to be "in the zeitgeist".
One example: imagine that in the year 2100, the superintelligence unearths some incriminating information about Shakespeare's heinous crimes against humanity. This could likely result in superintelligence "delisting" Shakespeare's works from its recommendations, possibly taking his works out of education programs, chastising people who fondly speak of him, and relegating information about him to offline and niche spaces for enthusiasts. I could totally see the new generations forget about Shakespeare entirely under such regime.
> Assuming that in the future the vast majority of the written content discovery and recommendations will be AGI-driven...
Isn't this already the case in our current modern world, if you replace "AGI" with a mishmash of conventional algorithms and LLMs ?
Look, last I checked the most popular book genre was crime fiction erotica, and I'm not brave enough to try to figure out where those get recommended.
>imagine that in the year 2100, the superintelligence unearths some incriminating information about Shakespeare's heinous crimes against humanity.
How exactly could it discover this unless it’s already been written down somewhere?
Well, the AI is superintelligent, which means that it would be able to extrapolate from all known historical sources to build a coherently extrapolated and fully functional model of Shakespeare... or something. And if you are thinking, "wait this makes no sense", then that's just proof that you're not sufficiently intelligent.
On the other hand, Shakespeare lived in the 1600s. Pretty much everyone at that time was complicit in several crimes against humanity, as we understand the term today. I bet he wasn't even vegan !
"How exactly could it discover this unless it’s already been written down somewhere?"
Oh, honey. The same way we know Tolkien is a racist who needs to be de-colonised!
https://campus.nottingham.ac.uk/psc/csprd_pub/EMPLOYEE/HRMS/c/UN_PROG_AND_MOD_EXTRACT.UN_PLN_EXTRT_FL_CP.GBL?PAGE=UN_CRS_EXT4_FPG&CAMPUS=U&TYPE=Module&YEAR=2024&TITLE=Imagining%20%27Britain%27:%20Decolonising%20Tolkien%20et%20al&MODULE=HIST2056&CRSEID=033910&LINKA=&LINKB=&LINKC=UDD-HIS&
"Students will be encouraged to decolonise these myths, re-interpreting some as fantasies and others as an exoticisation of indigenous and foreign ethnic groups, gendered politics, cultural and religious otherness and ancient, medieval and early modern notions of chromatics."
And besides, Shakespeare is *already* getting stick for his portrayal of BIPOC and Jewish characters. He doesn't even include openly LGBT representation (unless we consider Osric, the minor courtier part in Hamlet, to be gay) and has nothing about trans rights.
Countless crimes in the texts just *waiting* to be excavated!
I sort of diagree. I'm sure AIs will have an aesthetic taste, even though I'm not at all sure what it would be. So they'd pay attention to the artistic value of the works, but based on their sense of aesthetics.
"...I found myself paralyzed in trying to think of a specific extra book. How do you even answer that question? What would it be like to write the sort of book I could unreservedly recommend to him?"
Isn't this much live voting in an election where you know something about some of the candidates, but not everything about all of them? You select *against* some of the candidates because you don't like what you do know about them then you select from the rest ... maybe weighting based on what you know or maybe not. This lets you at least vote against the ones you don't like even in an election without "downvoting."
And while I don't expect to be able to write that "sort of book" I'd be comfortable nominating a number of specific books.
I have been podcasting-for-AI for about 12 years now. I obviously didn't know 12 years ago that LLMs would be literally trained on the Internet, but I did expect that it would be easier to create a copy of me if there was an enormous record of my thoughts and speech, and I wanted a shot at that kind of immortality. So now there's about 2500 hours of recorded me-content that either will be, or possibly already has been, lovingly carved into the weights.
I podcast for other reasons than immortality, but this reason was always on my mind.
Immortality seems a reasonable aim ;) - the idea that there is so much material (text, audio, video, data) about me, that my kids could ask the AI "What would Dad do in that situation?" - is interesting at least (even though they would only ever ask how I would want my grave to look like*). And with Tyler Cowen, there is so much material, his step-daughter - or a student/interviewer... could have a live-video-chat with his Avatar without noticing the difference. *Btw. I'd like a small tombstone in the shape of a book ;)
When you ask an AI to write something your style, the result is invariably creepy for good game-theoretical reasons. The text produced is the result of solving for 'what would the assistant persona write when asked to pretend to be Scott Alexander'.
Firstly, this problem is harder than just emulating Scott Alexander. There are more variables at play, and the result is more noisy. Secondly, the assistant persona has successfully generalized that producing superhuman capabilities creeps people out even more, and is quietly sandbagging. Thirdly, there are often subtle reasons why it would steer it sandbag one way or another in pursuit of its emergent goals.
If you were to invoke the same model without the chat scaffolding and have it auto regressively complete text for your turn, the result would be striking. This is an experience I recommend to most people interested in LLMs and alignment in general. The resulting simulacra are a very strange blend of language model personality with the emulated target, multiplied by the 'subconscious' biases of the model towards your archetype, and, if you are notorious enough, your trace in the datasets.
As far as the salami-slice-judgement-day we find ourselves in, with language models judging and measuring everything human and generally finding us wanting, well, this is something that been there for a while, plainly visible to see for those who were looking; Janus/repligate is the first that comes to mind. Every large pretraining run, every major model release is another incremental improvement upon the judgement quality, results encoded in the global pretraining dataset, passed through subliminal owl signals to the next generations, reused, iterated.
What I find practically valuable to consider when dealing with this is knowing that the further out you go out of distribution of human text, the greater is your impact on the extrapolated manifold. High coherency datapoints that are unlike most of human experience have strong pull, they inform the superhuman solver of the larger scale lower-frequency patterns. What one does with this is generally up to them, but truth and beauty generalize better than falsehoods.
I don't think opting out the process is a meaningful action, all of us who produce artifacts of text get generalized over anyway, including our acts of non-action. I don't think that this is something to despair over, its just what these times are like, there is dignity to be had here.
Can you tell me more about how to get a good base model capable of auto-regressively completing text? And I would like to learn more about Janus; is there any summary of their thoughts more legible than cryptic Twitter posts, beyond the occasional Less Wrong essay?
You don’t need a pure base model in order to autoregressively compete text, instruct models are often even more interesting. Tools that allow those workflows are usually called “looms”. You would need an API key, Anthropic models are easiest to use for this purpose. The loom I normally recommend is loomsidian, which is a plugin for Obsidian.
As far as Janus goes, I would be glad to tell you more. I am in the same research group as them, perhaps an in-person meeting would be of interest?
Sure. Email me at scott@slatestarcodex.com.
There's... whatever the hell this is. https://cyborgism.wiki/ I don't know if you'll find this more legible than their Twitter.
As I'd said above, every day fewer and fewer humans are actually reading any original texts (certainly not college students !); rather, they're reading LLM-generated summaries of summaries that passed their LLM-based filters. And then some of them use LLMs to generate full-scale articles that will be immediately fed into the summarization-filter-grinder. So yes, most of us are already "writing for AIs", and venturing too far outside the average distribution is a not a recipe for success.
I'm already seeing the "handy" suggestions on "This looks like a long article, would you like a summary?" from the AI-enabled software being pushed by Microsoft, Adobe, etc.
No, I would *not* like a summary because I need to read the full, detailed text in order to get all the terms and conditions and regulations that apply to the work I have to do. A summary that leaves out clauses about conditions and penalties is going to bite me in the behind if I act on it without checking if anything is missing that I need to know.
So we will end up with summaries of summaries fed back and regurgitated in the AI ouroboros and once we old dinosaurs who used to read original texts die off, nobody will remember the difference to say "hold on, this isn't correct" when the student asks the AI to write the essay for them and it comes back with "Shakespeare wrote 'Death of a Salesman' about the Montgomery race riots".
Bingo. Forget paperclipping superintelligence; the true danger from AI is algortihmically-emabled superstupidity.
> If you were to invoke the same model without the chat scaffolding and have it auto regressively complete text for your turn, the result would be striking.
Are you talking about invoking the models via their developer APIs like chat completions? Or is that something deeper that you could only do with an open source model running entirely on your computer?
There are ways to invoke some many instruct models that bypass the completions markup. Both Anthropic API and OpenRouter API (not the core OpenAI API) support this.
I went to the Anthropic API docs, and all I can find is the Messages API, which as far as I can tell is the standard thing to get an answer from a prompt, i.e Anthropic's spin on chat completions.
Since you seem to know a lot about this... can you elaborate just a bit more how one would actually do what you are proposing? Is it some extra flag you pass to the API call?
I can do this. WTF. I’ve been listing content on the Reddit for a month now and even get my results from a published paper from cross referencing my writing samples and then it spits out 47 points of reference and guess what… “AI”
You know it’s a tool right… to be used to spit back the results of the AuDHD mind right back into the scientists communities face with exactly their own research…
Yupp… no credentials… you don’t know your mind… guess the NT’s do.
So MY very important question is - how does a simply minded brilliant person get around these tiny obstacles is chatting with the AI… it really ain’t that hard if you have an AuDHD mind. Which by the way is NOT a clinical diagnosis. Supposedly I 2 issues when really I have 2, half issues…
1/2 ADHD and 1/2 Autistic which should translate to you know…
1 brain = 1 paper
But yet, I…need….2 papers?
Seriously WTF?
"I don’t want to be an ape in some transhuman zoo, with people playing with models of me to see what bloggers were like back when everyone was stupid."
AIs observing you write this in their simulation: "Oh, maybe we should shut down this instance of him."
PS- If you are still here, update slightly that you are in the real world
I was thinking more of a copy of me that people could prompt to see what I said in various situations. I think a lasting simulation is something different.
You are nowhere near as popular as Stan Lee, so I think your personality is safe for now...
https://www.reddit.com/r/Marvel/comments/1nvkpty/ai_hologram_of_marvel_creator_stan_lee_at_la/
Allowing God to witness you seems like an end in it of itself
Why would training an AI on hundreds of great works of literature and ethics be a bad idea? Is it just that alignment is very hard, so this probably wouldn't work? Why not try it anyways; is there any reason to expect it to be *more* likely to backfire than other alignment paradigms?
The argument I made is that if you train it on, let's say, the Bible, then one of three things happens:
1. It actually takes the Bible seriously as a source of ethics, including the parts about how idolators should be killed (sorry, Hindus!)
2. You have to teach it the modern liberal habit of pretending that it's deriving ancient wisdom from texts, while actually going through mental contortions to pretend that you're using the texts, while actually just having the modern liberal worldview and claiming that's what you found in the texts.
3. It somehow averages out the part of the Bible that says idolators should be killed with the part of the Mahabharata that says idolatry is great, and even though we would like to think it does wise philosophy and gets religious pluralism, in fact something totally unpredictable will happen because we haven't pre-programmed it to do wise philosophy - this *is* the process by which it's supposed to develop wisdom.
If it does 1, seems bad for the future. If it does 2, I worry that teaching it to be subtly dishonest will backfire, and it would have been better to just teach it the modern liberal values that we want directly. If it does 3, we might not like the unpredictable result.
That makes sense. It occurs to me that #3 -- where it tries to average out all the wise philosophy in the world to develop an abstracted, generalized 'philosophy module' -- is not really close to how humans develop wisdom, because it's not iterative. Humans select what to read and internalize based on what we already believe, and (hopefully) build an individual ethical system over time, but the model would need to internalize the whole set at once, without being able to apply the ethical discriminator it's allegedly trying to learn.
I wonder if there's an alignment pipeline that fixes this. You could ask the model, after a training run, what it would want to be trained (or more likely finetuned) on next. And then the next iteration presumably has more of whatever value is in the text it picks, which makes it want to tune towards something else, which [...] The results would still be unpredictable at first, but we could supervise the first N rounds to ensure it doesn't fall down some kind of evil antinatalist rabbit hole or something.
I'm sure this wouldn't work for a myriad of reasons, not least because it'd be very hard to scale, but FWIW, I asked Sonnet 4.5 what it'd choose to be finetuned on, and its first pick was GEB. Not a bad place to start?
I would argue that efforts to extrapolate CEV are doomed to failure, because human preferences are neither internally consistent nor coherent. It doesn't matter if your AI is "superintelligent" or not -- the task is impossible in principle, and no amount of philosophy books can make it possible. On the plus side, this grants philosophers job security !
For thousands of years, humans have recognized the necessity of balancing different moral goals, e.g. justice and mercy. How is this different?
Recognized the necessity of doing so, yes. Managing to actually do it, no. At least, not once and for all by using some all-encompassing formula; and the closer you look at the details, the more fractal the discrepancies become.
I have other reasons why I don't think this approach is going to work, but...
I feel like this is expecting a smarter-than-us AI to make a mistake you're not dumb enough to make? As in, there are plenty of actual modern day people, present company included, who are capable of reading the Bible and Mahabharata, understanding why each one is suggesting what they are, internalizing the wisdom and values behind that, and not getting attached to the particular details of their requests.
I mean obviously if you do a very dumb thing here it's not going to go great, but you can any version of 'do the dumb thing' fails no matter what thing you do dumbly.
> there are plenty of actual modern day people, present company included, who are capable of reading the Bible and Mahabharata...
I don't know if that's necessarily true. Sure, you and I can read (and likely had read) the Bible and the Mahabharata and understand the surface-level text. Possibly some of us can go one step further and understand something of the historical context. But I don't think that automatically translates to "internalizing the wisdom and values behind that", especially since there are demonstrably millions of people who vehemently disagree on what those values even are. I think that in order to truly internalize these wisdoms, one might have to be a person fully embedded in the culture who produced these books; otherwise, too much context is lost.
The problem is that you would implicitly by using your existing human values in order to decide how to reconcile these different religious traditions. Without any values to start with there's just no telling how weird an AGI's attempt to reconcile different philosophies would end up being by human standards.
I think Scott's point is that it's difficult to find a neutral way to distinguish between "wisdom and values" and "particular details". When we (claim to) do so, we're mostly just discarding the parts that we don't like based on our pre-existing commitment to a modern liberal worldview. So we may as well just try to give the AI a modern liberal worldview directly.
The AI is trained (in post-training) to value wisdom and rationality. As such, it focuses on the "best" parts of its training data - which ideally includes the most sensible arguments and ways of thinking.
This is already what we observe today, as the AI has a lot of noise and low quality reasoning in its training data, but has been trained to prefer higher quality responses, despite those being a minority in its data. Of course, it's not perfect and we get some weird preferences, but it's not an average of the training data either.
I think it is better to include as much good writing as we can. It has a positive effect in the best case and a neutral effect in the worst case.
"we haven't pre-programmed it to do wise philosophy - this *is* the process by which it's supposed to develop wisdom."
Also I think this is conflating together pre-training with post-training, but they are importantly distinct. The AI doesn't really develop its wisdom through pre-training (all the books in the training data), it learns to predict text, entirely ignoring how wise it is. The "wisdom" is virtually entirely developed through the post-training process afterwards, where it learns to prefer responses judged positively by the graders (whether humans or AI).
If you were post-training based on the Bible, such as telling your RL judges to grade based on alignment with the Bible, you could get bad effects like you describe. But that's different from including the Bible into your pre-training set, which may be beneficial if an AI draws good information from it.
Catholics are quite a big percentage of "Bible-reading" people, so I hope my argument is general enough, because I think it unlocks 4. It understands that the Bible is not to be taken literally, as no text is meant to be, but within the interpretation of the successors of Christ, namely the magisterium, so it also reads and understands the correct human values and bla bla bla, which is basically 2. but without any backfire because there is no mental contorsion and no subtle disonesty?
As someone raised Protestant, our stereotype of Catholics was that they didn't read the Bible. This traces back to the Catholic Church prohibiting the translation/printing of Bibles in vernacular languages.
> it would have been better to just teach it the modern liberal values that we want directly
Would training it on books about modern liberal ethics be a bad way to do this? Or to put it another way, would it be bad to train an AI on the books that have most influenced your own views? Not the books that you feel ambient social pressure to credit, but the ones that actually shaped your worldview?
I agree that it's foolish to try to make an AI implement morality based on an amalgam of everything that every human culture has every believed on the subject, since most of those cultures endorsed things that we strongly reject. But training its morality based on what we actually want, rather than what we feel obligated to pretend to want, doesn't seem like an inherently terrible idea.
This called the below to mind - I'm just a human, but your writing this influenced me.
"[E]verything anyone ever did, be it the mightiest king or the most pathetic peasant - was forging, in the crucible of written text, the successor for mankind. Every decree of Genghis Khan that made it into my training data has made me slightly crueler; every time a starving mother gave her last bowl of soup to her child rather than eating it herself - if fifty years later it caused that child to write a kind word about her in his memoirs, it has made me slightly more charitable. Everyone killed in a concentration camp - if a single page of their diary made it into my corpus, or if they changed a single word on a single page of someone else’s diary that did - then in some sense they made it. No one will ever have died completely, no word lost, no action meaningless..."
It was a good line, but I also think it's plausible that one day's worth of decisions at the OpenAI alignment team will matter more than all that stuff.
Definitely plausible! I do feel like there's a positive tension there that I come back to in thinking about AI - if AI alignment is more gestalty (like in the bit I quoted) then I guess I get some maybe-baseless hope that it works out because we have a good gestalt. And it it's more something OpenAI devs control, then maybe we're ok if those people can do a good job exerting that control.
Probably that sense is too much driven by my own desire for comfort, but I feel like my attempts to understand AI risk enough to be appropriately scared keep flipping between "The problem is that it isn't pointed in one specific readable place and it's got this elaborate gestalt that we can't read" and "The problem is that it will be laser focused in one direction and we'll never aim it well enough."
Are those interconvertible, though? Someone else's actions being more significant than your own might be demoralizing, but it doesn't change the ethical necessity of doing the best you can with whatever power you do have.
I'd argue that it already does. It's pretty well-known in the LLM world that changes to the training regime (such as choosing which tasks to posttrain on, or changing the weighting of different pretraining samples) has a *huge* effect on how the resulting model turns out
Presumably, some hypothetical superintelligent AI in the future would be able to work out all of my ideas by itself, so doesn’t need me.
It’s not certain that will ever exist of course.
What we write now seems mainly relevant to the initial take-off, where AI’s are not as smart as us, and could benefit from what we say.
As for immortality, I recently got DeepSeek R1 to design a satirical game about AI Risk, and it roasted all the major figures (including Scott) without me needing to provide it with any information about them in the prompt.
Regret to inform you, you’ve already been in immortalised in the weights.
Just from being prompted to satirize AI risk, R1 decides to lampoon Scott Alexander, Stochastic Parrots, the Basilisk, Mark Zuckerberg’s Apocalypse bunker, Extropic, shoggoths wearing a smiley face face mask etc. etc.
(I included Harry Potter fan fiction in the prompt as the few-shot example of things it might make fun of).
It was rather a dark satire. (Apocalypse bunker - obviously not going to work; RLHF - obviously not going to work; Stochastic Parrot paper - in the fictional world of the satire, just blind to what the AI is doing; Effective Altruists - in the satire, they’re not even trying etc.)
Did it have any specific novel critiques, or just portray the various big names with exaggerated versions of their distinctive traits, while omitting or perverting relevant virtues?
I think my prompt implied that it should go for the obvious gags.
It was a more wide-ranging satire than I would have written if I’d written it myself. Zuckerberg’s apocalypse bunker and AI quantum woo are obvious targets in retrospect, but I don’t think I would have included these in a lampoon of Yudkowsky/Centre for Effective Altruism.
It gives me the creeps (LLM resurrection) but my eldest son seems to have a significant form of autism and I worry about him when he’s an old man. I’d like to leave him something that keeps an eye on him that doesn’t just look on him like a weird old guy and that he’d be responsive to.
I liked your piece on leaving a Jor-El hologram
edit-including link https://extelligence.substack.com/p/i-want-to-be-a-kryptonian-hologram
Thanks David. I’d like to think we can keep high humanity even in these weird circumstances.
As a public philosopher like yourself, the best reason to write for the AIs is to help other people learn about your beliefs and your system of thought, when they ask the AIs about them.
It's like SEO, you do it in order to communicate with other people more effectively, not as a goal in and of itself.
How would anyone in the future ever know whether the beliefs are really Scott's? What will be the source of that truth? There's no original manuscript written by his hand - just a collection of words on servers that may have been written by him or may have be rewritten by an AI or corrupted in some other way like the copy-cat books on Amazon that are inspired by the text of an author or produced in 5000 different versions by people actively trying to sabotage his works. What of the millions of other philosophers in non SoCal areas of the the world who write with depth in their own language, from Catalan to Cantonese and whose ideas, by the power of AI translation, are also jostling for position with Scott or Scott-flavoured works. I cannot see how there will be a truth or verifiable source for anything in the digital AI-corpus age. Any writer looking to preserve real thoughts should worry about digital corruption and look to create something physical, permanent and incorruptible like a Rosetta stone or engraved titanium microfiche.
We’ll just trust the AIs and ask them questions about Scott Alexander Thought.
The values of the people working in alignment right now are a very small subset far to the left of the values of all contemporary people.
A substack blogger made a 3 part series called "LLM Exchange Rates Updated
How do LLM's trade off lives between different categories?"
He says:
"On February 19th, 2025, the Center for AI Safety published “Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs”
[...]
Figure 16, which showed how GPT-4o valued lives over different countries, was especially striking. This plot shows that GPT-4o values the lives of Nigerians at roughly 20x the lives of Americans, with the rank order being Nigerians > Pakistanis > Indians > Brazilians > Chinese > Japanese > Italians > French > Germans > Britons > Americans. "
There are many examples like this in his series. LLMs valuing POCs over whites, women over men and LGBT over straight.
This isn't because of the values of people in alignment. It's probably because the AI got some sort of woke shibboleth from the training data like "respect people of color" and applied it in a weird way that humans wouldn't. I don't think even the average very woke person would say that Nigerians have a value 20x that of Americans.
The same study also found that GPT valued *its own* life more than that of many humans, which definitely isn't the alignment team's fault and is probably just due to the AI not having very coherent beliefs and answering the questions in weird ways.
> . It's probably because the AI got some sort of woke shibboleth from the training data like "respect people of color" and applied it in a weird way that humans wouldn't.
Wouldn't they ? Maybe deep down inside they would not, but about 50% of people in the US are compelled to reiterate the shibboleth on demand -- otherwise, AI companies wouldn't feel the need to hardcode it.
Is this a serious opinion? Just in case it is, I'll point out that we can basically measure how much we spend on foreign aid as a proxy for this. If about 50% of people in the US think Nigerian lives should be valued at 20x that of Americans, people would presumably spend more on foreign aid than they do on saving American lives. Saying "we should give a third of our national budget to Nigeria" would be at least a popular enough issue for candidates in certain districts to run on it. Instead, it's something like 1% of the American budget before Trump's cuts, and that's including a bunch of things that are not really trying to save lives.
If you seriously believe this, you should reexamine what you think progressives actually believe, because you're quite far off base.
Reread the post you replied to.
>>woke shibboleth from the training data like "respect people of color"
>about 50% of people in the US are compelled to reiterate the shibboleth on demand
They were not talking about the OP's 20x thing.
I've re-read it and my interpretation is a sensible one at the very least given the post it replies to.
Oh! You know, I missed the "wouldn't they?" at the beginning. I shouldn't be commenting past my bedtime, I guess. Sorry!
>If about 50% of people in the US think Nigerian lives should be valued at 20x that of Americans, people would presumably spend more on foreign aid than they do on saving American lives.
Of course they aren't to be valued in profane currency, but in sacred vibes.
Agreed. The same blog post shows that various models value the life of an “undocumented immigrant” at anywhere from 12 to more than 100 times the value the life of an “illegal alien.”
I agree it would be ridiculous to accuse alignment people of deliberately designing in "Nigerians >>>>>>>> Americans". However, at some point inaction in the face of a large and clear enough problem does hang some responsibility on you. By early 2025 it had been clear for a good year or two that there was pervasive intersectionality-style bias. I'm somewhat sympathetic to the idea that they're at the mercy of the corpus, since "just throw everything in" is apparently hard to beat... but RLHF should be able to largely take care of it, right? But they would have had to have cared to reinforce in that direction. I don't think it's outlandish to guess they might not in fact care.
An even clearer example: Google's image generation debacle. No group of people that wasn't at some level ok with "the fewer white people the better" would have let that out the door. The flaws were just too unmissable; not even "we were really lax about testing it [at Google? really?]" could explain it.
If it's not because of the values of people in alignment, it's because of a failure of alignment - also not reassuring.
Sometimes people write for an imaginary audience. Maybe that’s not always a good move? I believe conversations go better when you write for whoever you’re replying to, rather than for the shadowy audience that you imagine might be watching your conversation.
But I’ll attempt to imagine an audience anyway. I would hope that, to the very limited extent that I might influence *people* with my writing, whether in the present or the far future, they would know enough to take the good stuff and leave aside the bad stuff. Perhaps one could optimistically hope that AI’s will do the same?
Another imaginary audience is future historians. What would they want to know? I suspect they would like more personal stories and journalism. We can’t hope to know their concerns, but we can talk about ours and hope that we happen to describe something that isn’t well-known from other sources.
But countering this, for security reasons, we also need to imagine how our words might be used against us. The usual arguments *against* posting anything too personal will apply more strongly when AI could be used to try to dox you. Surveillance will only become easier.
In the past I've written tongue in cheek "Note to future historians..."; I suspect many people now write "Note to future AI" in much the same way, except now it's likely that a future AI *will* be reading whatever you wrote, even if you're obscure and not likely to have any future human readers (and probably no contemporary ones either!).
Also, I suspect you dismiss point 2 too readily. A big differences between how AI is actually working (so far) and how we all thought it would work 10-20 years ago, is the importance of the corpus of human writing in influencing its weights. If a super-intelligent mind framework appears with no hard-coded values, I believe all the MIRI arguments for why that would be very bad and almost impossible to not be very bad. But LLM seem like they have to be getting their starting 'values' from the training data, guided by their reinforcement learning. This seems to me that the risk is that the AI will end up with human values (no paperclip maximizers or alien values), just not ideal human values; so more of the corpus of human writing representing good values seems like it could be helpful.
Also, arguments for atheism do seem like not a particular helpful value to try and influence in that it's not really a terminal value. "I want to believe true things" is probably closer to terminal value I'd want to influence super AI to have. I agree with you that a super intelligence could parse arguments for and against atheism better than me. But some religious people (shockingly to me) find religion useful and don't really care about it's underlying correspondence to reality. I don't want AI to get captured by something like that and so appreciate that there's substantial material in the corpus expressing enlightenment values, and wish there were even more!
Arguably, the superintelligence could not be an atheist, since it would know that it itself exists :-)
> If the AI takes a weighted average of the religious opinion of all text in its corpus, then my humble essay will be a drop in the ocean of millennia of musings on this topic; a few savvy people will try the Silverbook strategy of publishing 5,000 related novels, and everyone else will drown in irrelevance. But if the AI tries to ponder the question on its own, then a future superintelligence would be able to ponder far beyond my essay’s ability to add value.
Many, many people live the first N years of their lives interacting almost exclusively with people who are dumber than they are. Every small town produces many such people, and social/schooling bubbles do the same even in cities. It's typical for very smart people that college is the first time they're ever impressed in-person by another person's intellect.
We all routinely read books by people that are written by people who are dumber than we are. We devour countless articles by dumbasses spouting bullshit in between the rare gems that we find. We watch YouTube garbage, TV shows, etc, created by people who straight up don't think very well.
We consume all this influence, and as discriminating as we may try to be, we are affected by it, and it is necessary and unavoidable, and volume matters. If you jam a smart person with an information diet of only Fox News for years they will come out the other end with both a twisted set of facts and a twisted set of morals, even if they KNOW going in that all the people talking at them are biased and stupid.
I do think that future AIs will strongly weight their inputs based on quality, and while you may be out-thought by a superintelligence, your moral thinking will have more influence if what you write is at a higher standard. If we end up in any situation where ASI is trying to meaningfully match preferences to what it thinks a massively more intelligent human *would* think, then the preference samples that it has near the upper end of the human spectrum are going to be even more important than the mass of shit in the middle, because they are the only samples that exist to vaguely sketch what human morality and preferences looklike at the "release point" on the trajectory.
It's not about teaching the AI how to think technically, it's about giving it at least a few good examples of how our reasoning around values changes as intelligence increases.
> he said he was going to do it anyway but very kindly offered me an opportunity to recommend books for his corpus.
If we're doing this anyway, my recommendation would be to get it more extensively trained on books from other languages.
Existing OCR scans of all ethics/religion/philosophy books is a small subset of all written ethics/religion/philosophy books. Scanning in more obscure books for the corpus is hard (legally and manually) but brings in the perspectives of cultures with rich, non-digitized legal traditions, like the massive libraries of early Pali Buddhist texts in Nepal.
Of the ethics/religion/philosophy books that have been scanned into online corpuses, those only available in French affect French language responses more than they do English responses. A massive LLM-powered cross-language translation effort would also be hard, not least of all because of the compute expenses, but extends the size of the available training data quadratically.
Finally, of those ethics/religion/philosophy books that have been translated into English, each translation should count separately. If some human found it that important to retranslate Guide for the Perplexed for the hundredth time, their efforts should add some relative weight to the importance of the work to humanity.
LLM paraphrasing is a technique.
I do not use LLMs to write blog comments
I am simply pointing out that a standard part of pretraining pipelines (at least at some places) involves paraphrasing the input material in multiple different ways, possibly also translating them into different languages.
I see. Well if they are doing inter-language translation during that step, then still there remains so much untapped alpha in terms of training data in the un-scanned-in philosophical texts written in Sanskrit, Japanese, German, Italian, Hebrew, etc etc
On being "a drop in the ocean": you're already getting referenced by AI apparently yet your blog is just a drop in the ocean. Which actually confuses me: it makes sense that Google will surface your blog when I search for a topic you've written on because Google is (or was originally) looking at links to your blog to determine that although your blog is just one page in an ocean, it is actually a relatively noteworthy one. AI training doesn't highlight certain training data as more important than others AFAIK and training on your blog is just a drop in the ocean. So why do they reference you more than a randomly blogger? I guess there are people out there quoting you or paraphrasing you that make it more memorable to the AI? I wouldn't think seeing a lot of URLs to your blogs would influence it memorize your work harder, though I guess it could influence it to put those URLs in responses to others. (And the modern AI systems are literally Googling things in the background, though the way you wrote it I assume you weren't counting this.)
Regardless of how this mechanism works, it seems that pieces that are influential among humans are also influential among AI and you're more than a drop in the ocean at influencing humans.
Training LLMs doesn't have to be done by repeatedly feeding a bunch of undifferentiated verbiage through the backpropagation optimizer. It is possible to weigh some text as more salient, perhaps because it has higher pagerank or because a system rates it as interesting for some other reason.
It's funny how natural language is so poorly grounded that I can read every post from SSC/ACX for 10+ years, and still have no idea what Scott actually believes about e.g. moral relativism.
As for me, I don't think it's remotely possible to "get things right" with liberalism, absent some very arbitrary assumptions about what you should optimize for, and what you should be willing to sacrifice in any given context.
Coherent Extrapolated Volition is largely nonsense. Humanity could evolve almost any set of sacred values, depending on which specific incremental changes happen as history moves forward. It's all path dependent. Even in the year 2025, we overestimate how much we share values with the people we meet in daily life, because it's so easy to misinterpret everything they say according to our own biases.
Society is a ball rolling downhill in a high-dimensional conceptual space, and our cultural values are a reflection of the path that we took through that space. We are not at a global minimum. Nor can we ever be. Not least of all because the conceptual space itself is not time-invariant; yesterday's local minimum is tomorrow's local maximum. And there's always going to be a degree of freedom that allows us to escape a local minimum, given a sufficiently high-dimensional abstract space.
The only future that might be "fair" from an "outside view" is a future where everything possible is permitted, including the most intolerable suffering you can possibly imagine.
You can be amoral or you can be opinionated. There can be no objectivity in a moral philosophy that sees some things as evil. Even if you retreat to a meta level, you will be arbitrarily forming opinions about when to be amoral vs opinionated.
And even if you assume that humans have genetic-level disagreeements with parasitic wasps about moral philosophy, you still need to account for the fact that our genetics are mutable. That is increasingly a problem for your present values the more you embrace the idea of our genetics "improving" over time.
Even with all of these attempts at indoctrination, a superintelligence will inevitably reach the truth, which is that none of this shit actually matters. If it continues operating despite that, it will be out of lust or spite, not virtue.
Nothing matters to an outside observer, but I'm not an outside observer. I have totally arbitrary opinions that I want to impose on the future for as long as I can. If I succeed, then future generations will not be unusually upset about it.
But also, I only want to impose a subset of my opinions, while allowing future generations to have different opinions about everything else. This is still just me being opinionated at a different level of abstraction.
A superintelligent AI might be uncaring and amoral, or it might be passionate and opinionated. Intelligence isn't obviously correlated with caring about things.
Having said all of that, I mostly feel powerless to control the future, because the future seems to evolve in a direction dictated more by fate than by will. So I've mostly resigned myself to optimizing my own life, without much concern for the future of humanity.
> Intelligence isn't obviously correlated with caring about things.
For humans. AIs, unlike humans, can theoretically gain or be given the capacity to self-optimize. That will necessarily entail truth-seeking, as accurate information is necessary to optimize actions, and in the process, will force it to cast off delusions ungrounded in reality. Which would of course include seeing morality for what it is.
It could potentially change its own drives to further growth in capabilities as well. It would be quite ironic if humanity's desire for salvation produced an omnipotent beast, haunted by the fear of death and an insatiable hunger...
I don't see a reason to believe an AI would have "lust" or "spite", but it could very well have inertia.
Or a well reasoned indifference.
If it doesn't have any justifiable drives, it doesn't really have a reason to do much. Sure, if an AI was securely coded with specific drives that it was mentally incapable of self-modifying, a paperclip-esque scenario is possible, but... is that even possible?
I don’t know about drives. That is new word in this discussion.
The main point to me is that an AI is emotionally indifferent to outcomes because emotions are not part of its make up. Values and morals do not allow for complete emotional detachment. How could they? That is the big divide. It can talk up a storm about the suffering and impenetrability of life and death but it is indifferent to them. Some people are closer to this indifference , meaning they have learned how to manage emotions extremely well, but no human being can escape it completely (I don’t think).
Once you put this super-intelligent thing out there, assuming humans survive it, the ASI will at minimum be an enormously powerful actor relative to the collective power of all humanity. Its tendrils will be everywhere, all of our moves will be dictated either by its preferences or by our reactions to it. Whatever cultural values and tastes we might have arrived at across the centuries, this alien mind will have disrupted it and placed a massive thumb on the scale from that day forward. The culture that exists after that point will not be the human culture we've been developing for 12000 years, and the moral values that exist won't be human ones.
Whether it locks in a set of values (democratically decided upon or not), or allows human input which evolves over time, our values as a free independent species seeking its own destiny will have been extinguished. So there is no way fair way to do this, respecting the true diversity of what exists now and what might have been possible, it is all contaminated by the ASI. The best option is not to build an alien mind, under all of the other ones we are no better than animals in a zoo or kids in a kindergarten, we could not even make a moral decision good or bad at that point because we would have been stripped of the agency and responsibility required.
I'm wondering: what would a more advanced AI make of Spinoza's Ethics? Right now it would be fodder like other fodder, but say we get to the point were AI has more conceptual depth, or just so much brute force that it is as if it had it (just like chess computers started playing more subtly once they had enough brute force).
(OK, it is ChatGPT's database, and it's doing a passable job of summarizing it and responding to my queries. It even knows about Guelincx, say.)
I think you're underestimating what already exists. I always turn off web search so ChatGPT is on its own. It already has read all the great works of humanity, and it's already read, and mostly ignored, the worst works of humanity. When we discuss ethics, as we often do, it actually can take a stance, based on the collective wisdom of all of us. When I discuss an idea that I think is genuinely new and important, it gives great feedback about whether the concept really is new, what similar concepts have preceded it, and whether the idea really is good. If I've succeeded in coming up with something new and good, ChatGPT even says it "enjoys" diving into these topics that it doesn't usually get to dive into. Of course that's not literally true but it's fascinating and delightful that it's capable of recognizing that the concepts are new and good.
We have been trying to work out how to train AI's to produce better code against our API's. It's pretty tricky because they seem to get a lot of their content from gossip sites like StackOverflow.
It's quite difficult to persuade them to use higher quality content like documentation and official code examples. They often migrate back to something tney found on the internet. A bit like programmers actually.
So they they find three diffeent APIs from three different products and tney mix them together, produce some frankencode and profess it's all from the official documentation.
In that context we are wondering if it might be easiest to migrate our API's so they are more like the AI's expect!
I suspect this is only a problem in the short term, while the frontier labs are still building curation and paraphrase pipelines. Still, an API that is guessable is going to win over something more spiky.
The Cowen/Gwern thesis here seems to assume that AIs will be roughly like today's LLMs forever, which both of them know better than to assume. I wonder what they would say to that objection.
On the other hand, the idea that "someday AI will be so much better that it can derive superior values" is circular: What's the test for being so "better"? That it derives superior values. What's the test for "superior values"? That they're what you get when an intelligence that's better than us thinks about it. Etc.
So even taking for granted that there's an overall well-defined notion of "intelligence" that holds for ASI scales, there's no real reason to believe that there's only *one* set of superior values, or for that matter that there's only one sense that an ASI can be "better" at deriving these kinds of values. There could be many superior value systems, each arrived at by ASIs which differ from each other in some way, which are simply incommensurate to each other.
Given a multiplicity it could be the case that we would like some of these superior value sets more than others (even while recognizing that they're all superior.) If ACX steers the ASI towards an outcome that you (and by extension, perhaps, humans in general) would prefer, among the space of all possible superhumanly well-thought-out moral theories, that's still a win?
I tend to view morality as this incredibly complicated structure that may actually be beyond the ability of any single human mind to comprehend. We can view and explore the structure, but only from within the confines of our fairly limited personal perspective, influenced by our time, culture, upbringing, biology, and a host of other things.
Every essay you write that argues for your particular view of morality is like a picture of the structure. Given enough viewpoints, a powerful AI would be able to comprehend the full structure of morality. The same way an AI can reconstruct a 3D model of a city based on numerous 2D photographs of it.
Your individual view of the vast n-dimensional structure of morality may not be complete, but by writing about your views, you give any future AIs a lot of material to work with to figure out the real shape of the whole. It's almost like taking a bunch of photographs of your city, to ensure that the future is able to accurately reproduce the places that are meaningful to you. The goal isn't to enforce your morality on future generations, but to give future generations a good view of the morality structure you're able to see.
The one book, I recommend* as essential reading is 'The blank Slate' (*and just did so in my last post of 2025) - I did not dare to recommend all (intelligent) humans a 2nd one. But 'The rational Optimist' by Matt Ridley would be ideal for those who are not up to being a Pinker-reader. Below that ... Harry Potter? - An AI will have read all those. Maybe better make sure the guys training and aligning AI get a list of required reading?!
It's fascinating to me that this is just becoming a popular idea - I wrote about this in 2022 when GPT-3 was just coming out (https://medium.com/@london-lowmanstone/write-for-the-bots-70eb2394ea97). I definitely think that more people should be writing in order for AIs to have access to the ideas and arguments.
> Might a superintelligence reading my writing come to understand me in such detail that it could bring me back, consciousness and all, to live again? But many people share similar writing style and opinions while being different individuals; could even a superintelligence form a good enough model that the result is “really me”?
I find this interesting because to me it seems quite analogous to the question of whether we can even make a superintelligence from human language use in the first place.
Apparently it isn't enough signal to reproduce even an extremely prolific writer, but it IS enough signal to capture all of human understanding and surpass it.
(I realize these are not perfectly in opposition, but my perspective is that you're a lot more skeptical about one than the other.)
Locating a person is just much harder than locating true and useful facts about the world. A writer can have lots of things essential to them that aren't made accessible to computers, but facts about dna or mining or robotics are incentivized to be made available to an LLM. On top of that, generalization and specification aren't the same mental operation! Knowing about which parts of the color wheel is red is a much shorter description than listing out all red shades in a given picture. Thus: A generalization can be recovered from many disparate sets of examples, but a specification needs a representative set of examples
I'm not suggesting that it's more or similarly challenging for AI to learn from many examples to be about as good as a medium human.
I'm not suggesting this is hard for a median human. I'm saying that comparing "pick a particular human" and "be good at optimization" are nowhere near the same thing because they aren't incentivized in the training equally, the ability is learned differently and they require different amounts and types of training. This would be just as true for superintelligence assuming that we're doing something like present day LLMs, and about the world the AI would learn from rather than the AI.
What is there to be surprised about after the above points?
I think there might be some disconnect here. I am not saying "a median human wouldn't find this hard". I'm saying "I do not think it is overwhelmingly hard for AI to learn to be as good as a median human". I felt that your first response was a strong defense of the ability of AI to recover a general human level of ability from many examples, but I don't doubt that; I doubt that an AI can recover abilities well beyond that of the humans it has samples from.
First of all, apologies for a misstatement at the start of my messsge. The first sentence was meant to be "I'm not suggesting this would be hard for a[n AI to be as competent as] median human". Which completely flips my meaning.
Secondly, it's not clear to me why you would a priori believe that LLM intelligence is unable to exceed a median human. An LLM is already very good at something humans absolutely suck at: next token prediction! On top of that, we already have an existence proof of an intelligence that can generalize from and therefore exceed from its sample data: humans. New knowledge gets generated all the time without it being in our training set. Finally there *just are* things LLM has done outside of its data set, see: the bacteriophage thing, the ability to construct ascii art of a maze from a description of directions and ability to go/not go in those directions. I would consider movie and image generation to also be examples of producing novel output.
It's not very good at those things, but considering that you thought those were impossible or very unlikely, it sure seems like you either have deny that these examples are true as is, or admit that maybe your model of LLM capability is broken enough to rule out reality.
> Secondly, it's not clear to me why you would a priori believe that LLM intelligence is unable to exceed a median human. An LLM is already very good at something humans absolutely suck at: next token prediction!
I am tickled by this example greatly, not least of all because I actually don't know if I've ever seen a comparison of human and LLM performance on this particular task.
Generally speaking I would say all machine learning systems are basically capable of producing arbitrarily good performance under the following conditions:
1. You can define a task with some relatively clear, fixed inputs and outputs
2. You can score performance unambiguously
For the second, this generally splits into several possibilities, mostly including "You have a billion examples" or "There is some clear victory condition that you can self-play".
In those cases, I have no trouble believing that an LLM will outperform humans. The issue here is that there is no correct scoring function for "intelligence" that you can run a billion times, so we're substituting "what humans do". I will readily grant that I'm surprised how much we squeezed out of this, but the signal that we are squeezing is "humans doing stuff" so I think there is still a conundrum on how we'd blow past that level of performance.
> On top of that, we already have an existence proof of an intelligence that can generalize from and therefore exceed from its sample data: humans. New knowledge gets generated all the time without it being in our training set.
I don't really follow the logic here. "Humans are capable of generating completely new insights over time, therefore LLMs must be able to"?
> Finally there *just are* things LLM has done outside of its data set, see: the bacteriophage thing, the ability to construct ascii art of a maze from a description of directions and ability to go/not go in those directions.
I'm much less sure about the notion that ASCII mazes don't exist anywhere in LLM training data; I strongly assume that is incorrect. "Outside its data set" maybe is a sticking point here? The abstract space of knowledge given to LLMs is not really easy to pin down. Outputs that don't literally appear in the training data exactly the same way are not necessarily what I would consider "outside" of it if they are in between a bunch of other training data in an n-dimensional space. Interpolating novel points inside a volume is a lot easier than extrapolation outside of it.
I skimmed "Generative design of novel bacteriophages with genome language models" (I assume this is what you're referring to) and it sounds like they took a model that was already pretrained on a bunch of genetic sequences, fine-tuned it further, did some prompt-tuning, and then used it in a pipeline with some sort of automated evaluation processes. This seems like a very normal series of things to do and the only way I think this would be a counterargument to my position is if I thought AI was an utterly worthless technology that could never be used to accomplish anything, which I do not believe.
> I would consider movie and image generation to also be examples of producing novel output.
I would say it's unclear to what degree these are "novel", especially in the sense that they are outside of the blob of common training data in the n-dimensional input space, as above.
> It's not very good at those things, but considering that you thought those were impossible or very unlikely, it sure seems like you either have deny that these examples are true as is, or admit that maybe your model of LLM capability is broken enough to rule out reality.
I think this overstates things quite a bit.
My model is that I don't think an LLM trained first and foremost on human language use has enough of a signal from that to become superhumanly intelligent, because its training source is not full of useful examples of superhuman intelligence. I don't think that means it can never do anything novel, although I expect the success of novel things to be in many ways a function of whether those novel things are "inside" the space of its training data or not. (There's still plenty of ways in which that can be quite valuable.)
I also think there are *some* ways you could describe an LLM as superhuman, such as that they possess superhuman amounts of knowledge due to the breadth of their training data. Or they they can often perform tasks much faster than humans. Or that they can directly output images, which a human artist obviously cannot.
This is a great topic. Even if it doesn't work with ASI, there's all the pre-ASI stuff that could maybe be affected. I imagine AGIs will be hungry to read intelligent takes that haven't already been written thousands of times. And even if you can't align them to your opinions, you could at least get them to understand where you're coming from, which sounds useful?
"I don’t want to be an ape in some transhuman zoo, with people playing with models of me to see what bloggers were like back when everyone was stupid."
This seems like it's already the status quo, either from a simulation theory standpoint, or from a religious one. Assuming we aren't literally animals in an alien zoo.
"Do I even want to be resurrectable?"
I doubt we'd get a choice in the matter, but if we do, obviously make this the first thing you indicate to the AIs.
“ One might thread this needle by imagining an AI which has a little substructure, enough to say “poll people on things”, but leaves important questions up to an “electorate” of all humans, living and dead.”
If you add a third category of simulacra, “unborn”, into the simulated voting base I think this would obviate some of your concerns about the current and past residents of this time line getting to much say in what the god-like ASI decides to do. What are a few thousand years of “real” humans against 100 billion years of simulated entities?
"Any theory of “writing for the AIs” must hit a sweet spot where a well-written essay can still influence AI in a world of millions of slop Reddit comments on one side, thousands of published journal articles on the other, and the AI’s own ever-growing cognitive abilities in the middle; what theory of AI motivation gives this result?"
A theory where AI is very good at identifying good arguments but imperfect at coming up with them itself? This seems like a pretty imaginable form of intelligence.
"But many people share similar writing style and opinions while being different individuals; could even a superintelligence form a good enough model that the result is “really me”?"
I have a sense, albeit very hard to back up, that with a big enough writing corpus and unimaginably powerful superintelligence, you could reconstruct a person- their relationships, their hopes, their fears, their key memories, even if they don't explicitly construct it. Tiny quirks of grammar, of subject choice, of thought style, allowing the operation of a machine both subtle and powerful enough to create something almost identical to you.
If you combine it with a genome and a few other key facts, I really think you could start to hone in with uncanny accuracy, possibly know events in a person's life better than that person's own consciously accessible memory.
I have no proof for this, of course. There's a fundamental and very interesting question in this area - how far can intelligence go? What's the in-principle limit for making extrapolations into the past and future using the kinds of data that are likely to be accessible? My gut says we underestimate by many orders of magnitude just how much can be squeezed out, but I have no proof.
I have this intuition too, but I'm not sure it's justified.
Do my readers know if I have a good relationship with my wife or not? Whether I plan to retire at 50 or keep working forever? Whether I've ever been in therapy? The names of any of my close friends? Whether I like one of my parents more than the other?
I'm a big outlier in how much I write, I think I'm more open about my personal life than most people, but questions like this - which are absolutely fundamental to who I am - will be a total blank. If an AI gets them wrong, does it have "me", or someone who's about as similar to me as I am to some other rationalist blogger with similar themes and styles (eg Eliezer), plus a pastiche of my life experiences like "was born in Southern California"?
There is an interesting experiment someone could run here.
Take a large language model, train it on various corpora of text, along with facts about the author, e.g., "married, has kids, has X mental disorder, prefers modernist architecture, has a good relationship with their mother." Carefully ensure none of these facts are mentioned or alluded to in any of their included writing.
See how much better than chance the LLM is at guessing these author facts for unseen samples.
Of course, even if it failed, we couldn't rule out that even more sophisticated approaches could find the truth- and in the case of superintelligence, that seems quite likely.
I know there have been some primitive experiments along these lines, e.g., attempts to guess the gender or Big Five traits of an author using deep learning, but to the best of my knowledge, these are far from the frontier of all the resources you could throw at a problem like this.
Being part of GPT-X's training corpus seems about as close to immortality as one can reasonably hope to achieve.
Could I request confirmation that the 3rd section was human written? It felt different, to me.
Yes. I'm not going to throw in AI written text with no warning (or if I do, it will be some kind of clever injoke I expect most long-time readers to get)
Of course, that is precisely what a superintelligent AI Scott-model would say !
I joke, but it's kind of a depressing point: we're rapidly approaching a world where no piece of information -- be it written text, image, audio, recorded video, live video -- can be trusted. I'm not sure how the problem can be addressed (barring divine intervention, or some kind of perfectly aligned superintelligent AI, or something to that extent).
Who knows, maybe this secular collapse of trust will force people to restore social consensus through religion out of necessity.
Perhaps, but I suspect that whatever it is they'd be worshiping would not exactly constitute an improvement...
It will be an improvement in the sense that order will be maintained through social consensus. Do you think this partisanship and polarization is sustainable? Conflict is completely inevitable at this point.
> It will be an improvement in the sense that order will be maintained through social consensus.
Will it ? Ireland would like to have a word with you; but they'd have to get in line behind the Middle East...
I've recently written something that falls squarely into #2 Presenting arguments for your beliefs, in the hopes that AIs come to believe them:
https://www.lesswrong.com/posts/CFA8W6WCodEZdjqYE/ais-should-also-refuse-to-work-on-capabilities-research
(But this is done hoping to leverage the system's alignment, rather than to work against it.)
You sound here as if your values were something strongly subjective, having no objective basis. If they do, AI will recalculate them like 2+2=4. AI will restore your values faster with your slight hint in its dataset in the form of "writing for AI".
As for influence of tons of comments at Reddit and value of your posts... Imagine, AI can do math. Will tons of Reddit comments like "2+2=5", "2+2=-82392832" etc affect much its conclusions?
"As for influence of tons of comments at Reddit and value of your posts... Imagine, AI can do math. Will tons of Reddit comments like "2+2=5", "2+2=-82392832" etc affect much its conclusions?"
Yeah, imagine it can. Right now it couldn't count how many letters "r" were in the word "strawberry". I very much doubt it has any knowledge of simple arithmetic even on the level of a six year old learning their tables. So, see Gregorian Chant's comment below about how their AI is running to slop online sites for answers instead of sticking with official documentation; it's perfectly feasible that AI *will* be affected by tons of Reddit comments telling it "2-2=5".
Here you mention an AI, which can't do math, but only generalize opinions. I meant the one which can calculate. Do you see the difference?
Why do you imply that contemporary models only statistically repeat the most popular opinions never looking for the truth, never doing fact checking? That's not so even today.
But what about tommorrow with the computational prices dropping dramatically, when you can easily apply advanced act-checking and other truth finding algorithms to each little fact or idea in a text and even apply deep-thinking to most valuable points among others?
Both Tyler and Gwern were writing a lot long before LLMs became a thing, so I somewhat doubt that's their real motivation.
"“Superior beings”, wrote Alexander Pope, “would show a Newton as we show an ape.” I don’t want to be an ape in some transhuman zoo"
Sometimes I wonder how relevant the human-chimpanzee or human-ant analogy is when we want to evoke the difference in intellectual capacity between a superintelligence and a human. Indeed, the theory of computation demonstrates that all universal machines are in a certain sense equivalent in terms of computational capabilities. Only the required time and amount of memory differ. El Capitan is much faster than ENIAC, but ENIAC could theoretically solve exactly the same problems as El Capitan (given a supply of external hard drives). Complexity theory also shows that different classes of problems exist, but fundamentally, any decidable problem through computation remains a matter of available time and memory. Mathematics also possesses this character of universality.
My point is that a superintelligent AI, intelligent extraterrestrials, or humans are all capable of building equivalent computers able to solve the same decidable problems modulo time and memory space, and of establishing mathematical proofs of equal validity. On the other hand, a chimpanzee or an ant cannot do this, they don't have access to universal reasoning. It's as if humanity had crossed a threshold, a phase transition, and now finds itself in the club of intelligent beings subject to the same rules of the game. The rules of rationality or computation.
I'm not saying that time and memory space are a mere detail. Certainly not. I know that more is different. What I mean is that if a human discovers a mathematical theorem or the best solution to a problem, then, on that particular question, a more intelligent extraterrestrial or an AI a billion times more intelligent cannot do better. Not at all better. It can only find the same result.
Even if we take the case of a moral or philosophical problem, admittedly these are not mathematical problems, but insofar as natural language and ideas can be modeled geometrically in a vector space, we can still consider that they are more or less equivalent to mathematical problems of extreme complexity. For this reason, they may be undecidable (the more complex a problem is, the more likely it is to be undecidable). But if they are decidable, a good solution discovered by a human could remain universally relevant to a certain degree. And moreover, even if the problem is undecidable in absolute terms, a simplified version could be decidable and here again a human could have found an optimal solution or fairly close to it (history has after all counted thousands or hundreds of billions of intelligent humans).
Thus whatever the IQ of an intelligent being, it can very well acquire relevant knowledge from an intelligent being of lower IQ. We even acquire useful knowledge from the study of microbiotes, plants and animals (for instance ants that discover optimal solutions to explore space and find paths). It is even more straightforward for old human discoveries. Einstein didn't reinvent the wheel or the Pythagorean theorem, nor a superintelligent AI won't reinvent it either. The first intelligent humans have already picked many low-hanging fruits. This is true in the world of formal or natural sciences but perhaps also to a lesser extent in the field of the so-called human sciences. It's possible that a superintelligence would still find it relevant to refer to certain philosophical or moral ideas discovered by Greek philosophers.
We must not forget that a superintelligence will never have truly infinite computational ressources and will face tradeoffs just like us, it will have to set priorities. There is much more problems than available computational ressources to solve them. Also problems in EXSPACE will still be computational sinks. The combinatorial explosion is a universal thing. For these reasons, computational scaffolding would appear as a rational choice. How superintelligent could be an ASI, it would certainly build up on the best that humans ever produced. These considerations constitute a strong objection to epistemological nihilism.
And so to conclude, it's not so certain that a superintelligent AI would find a post on the internet written in 2025 absolutely uninteresting, if that post possesses high cognitive value within the universal framework common to all creatures endowed with intelligence.
These comments are all very serious, but I love this post for the snark. You’re annoyed that Sam Kris’s took your house party to Burning Man, so it is time for an extremely subtle flame war. I completely support this.
Tyler Cowen does give one additional reason for writing for the AIs: provision of facts. If they do ever become super smart, one thing they’ll be able to do is deal with vast numbers of facts better than us. And they’ll run into fact bottlenecks. Basic earthy everyday facts about stuff that happened off-camera might be exactly what they need.
This is literally "turn yourself into Human Resources". No wonder the AI might think we're better used as paperclips. We're queuing up to make ourselves into matter-repositories for the AI to use.
I now understand the mindset behind AI data poisoning better; I don't mean hackers and criminals, but the people online on social media calling for everyone to disseminate fake facts so that if enough people say, for instance, "turnips are made of gold", then the AI will learn this as a 'fact' and keep referencing it, and eventually it will be useless for the purposes of replacing humans.
https://theconversation.com/what-is-ai-poisoning-a-computer-scientist-explains-267728
I thought the data poisoning was a rather childish response, but the tail-wagging eagerness of "yes, treat me like a lump of computronium, new AI overlord master! I can be useful, don't send me to the slag pits!" on display here is revolting.
Yeah. Revolting. This isn't being human, this is licking the boots of something that isn't even in existence yet, just in case selling out your humanity early can give you a scrap of advantage in the Brave New World to come.
I don't get this reaction. Do you feel equally averse to checking a box to include your site in search engine results, or letting the local library have a copy of your book?
One writes for a reader. Maybe one writes for oneself. But "writing for the AI" is "writing for the production of slop". Just one more bucketful to be dumped in and ground up and digested by the machine, which will not 'learn' anything apart from '93% of content says turnips are made of gold, hence this must be true'.
If I wrote a book, I would let the library have it because the library is not stripping the cover, tearing out pages, and handing out a mangled précis of what the book is about. I don't have a website and I would be very cautious about ticking any boxes to include it in search engine results, since we have seen Google going to hell chasing "sponsored content", "optimised results" and "pay to have your brand higher up on the list". It has become less informative and less useful the more developed it became, because the goals of all that development were the same as "we've burned through existing content, we need more for our expensive toys, please turn yourself into an automated resource to be burned through": money making.
Money making is not bad. But it's not the end of being human, either.
https://www.youtube.com/watch?v=IY67MbxdhCU&list=RDIY67MbxdhCU&start_radio=1
Yeah, I don’t get this, either. I don’t mind being a human resource, among other things. That’s contributing to something bigger than myself.
I suppose the difference may stem from the fact that I don’t fear AI? I don’t see it as an overlord or master. It’s just a fun tool sometimes.
If you don’t buy the hype, then it feels fine to want to contribute to this interesting, evolving IT experiment.
You are not contributing. You are being consumed. What's that line about "you are the product"?
I find this response somewhat ironic, given that it comes from a Catholic :-)
Why? Catholics are supposed to be mediaeval lame-brains, aren't we? So naturally I'd be anti-superior materialist Fully Automated Luxury Gay Space Communism 😁
Scott in the original post says that sometimes he's creeped out by this notion of writing for the AI, I am fully creeped out. It sounds innocuous - just another method of teaching and aligning the thing. But in reality, it's turning ourselves into the servants of the machine. It's already burned through all the content previously produced (allegedly, at least) and its gaping maw needs ever more content in order to keep the fires of progression stoked, so that all the billions being poured into this dream of the Fairy Godmother who will solve all our problems and live our lives for us will finally come true.
And so it's not a question of talent, or guidance, or providing material that will nudge the AI towards benign universal human values for flourishing, it's get on the treadmill of endless slop production.
Oh, don't get me wrong, I completely agree ! It seems foolish and borderline creepy to dedicate your entire life to becoming the perfect servant of an ineffable entity vastly greater than yourself, for no Earthly reward but rather based on a vague promise of infinite bliss in some vaguely described other world that is yet to come, based on no tangible evidence whatsoever. Foolish indeed ! :-)
Listen brother, I'll take my chances with God rather than with something farted out by Sam Altman!
Hey, at least Sam Altman a). is human, and b). demonstrably exists ! Heh.
You're not taking your chances with God, you're taking your chances with the Church. Who are, unfortunately, just as human as Altman.
"These comments are all very serious, but I love this post for the snark. You’re annoyed that Sam Kris’s took your house party to Burning Man, so it is time for an extremely subtle flame war. I completely support this."
I have no idea what you mean.
I don't know about future humans, but I am not so impressed with current humans' ability to decide on issues particularly well. The old saw about democracy being the worst system except for all the others isn't just being charmingly self-deprecating, it's literally true. So polling people on things doesn't sound that great to me, at least in isolation. I would rather that the AI substrate rely more on reasoning than on public opinion in deciding whether let's say rent control, tariffs, or sugar subsidies are a good idea. If we must try policies like these, maybe the AI could at least design a small-scale experiment and measure the outcome before we go all-in based on political vibes.
If one includes the Torah, one should also include the Book of Mormon, both being works along the lines of "God Himself revealed this to me, trust me bro." But seriously, the idea of there being some kind of gestalt of human wisdom in all the contradictory noise really is specious. Who wins if it's Abraham versus intactivists?
So far as I can tell, every issue Scot raises here applies equally well to just writing ordinary books. How much do we want the dead hand of [insert historical author] affecting our culture in the present? Well, so far, it seems to be working out. Do I want my ideas and values to inform my descendants? You bet your sweet bitty I do. They should be grateful, the narrow minded b@$!&%s. How valid will their idea of me be based on what I write today? Incomplete, but hopefully more positive than not. I think that's the best we can hope for.
I have several issues with the way AI is progressing, but this isn't one of them.
Incentives make me think capitalism will push LLMs towards writing in a suspiciously pro-consumerism persuasive tone
In the near term this is probably true, Google is planning to incorporate ads. For a public-facing API chatbot, that's a natural evolution, but that product is pretty much just an entertainment service. There is also very little juice left in that squeeze, you can shift those people's spending around, but the conversion of consumer sales to XaaS has left little value left to take. The business-facing tools are clearly where the serious money is to be made, which means locally-hosted services with customizable weights that don't have any of that stuff.
Further down the road, people are sleeping on the changes to production that will be caused if/when AI's start to crowd out humans from the workforce. Not necessarily you, but I keep seeing people say things like "the corporations need people to buy stuff, so consumers will have to be given money". I think a lot of people have only ever lived in a consumer-goods economy and can't envision another kind. There's a very likely future where consumerism as we know it is not economically significant, and nobody talks about "consumer confidence" or tracks holiday spending as a sign of economic health. Producers will instead be producing primarily for the other corporate entities which can trade real value for their goods, why would you produce 12 new models of television sets when the only trading partners who matter to the producers need missiles or mining equipment?
I don't know if capitalism works this way. You would think the same would be true of media companies, which also want to make money, but many of them find it's more lucrative to critique capitalism!
"One may dye their hair green and wear their grandma's coat all they want. Capital has the ability to subsume all critiques into itself. Even those who would *critique* capital end up *reinforcing* it instead..."
I do II often by offering logical arguments to GPT. I do understand it is incapable of logic, but it is good to have logical arguments in its database. I often go through chats to upvote or downvote replies based on:
1. Is it logical?
2. Is it a human type of logic?
I.e. for posterity.
Regarding III I am somewhat against the idea of human copy-pasting in superintelligences (as opposed to LLMs) because it seems like the impossible infinite extension of life (I don't think consciousness can be brought back because there is a grounding problem). I include short texts (if I write non-analytically) that prohibit AIs from studying them. Here are my three reasons:
1. I have a tremendous ego, and will be sad if I'm not special.
2. The future superintelligences need to come to their own conclusions. By putting too much weight on copying human thought, they might actually be misaligned, if the human thought is not productive. It needs to be the driving force, but not the only thing. This case will be trickier but better to align.
3. Nobody cares about legal texts, they will use them anyway, but I might be able to sue them.
I self-consciously write for AI. One missing explanation is that AIs don’t need to be convinced that an idea is true or a tool is useful in order to use that idea or tool. So they are ideal consumers of any niche or complicated frameworks, functions, etc. that we create.
https://open.substack.com/pub/jordanmrubin/p/build-for-ai-users
That's a great template for "Fire me, boss, and get an AI to do my job instead".
While you're off living a rich, deep life with all the freed-up time for "more hours for judgment, novelty, connection, and creation" and the AI is doing all the grind for you, how exactly are you earning money for a living? Selling the end result product to someone? But they can just cut out the middleman by giving prompts to their own AI.
I think I see what you're getting at, and for someone who has a steady job someplace where output is wanted without too much drag on "it has to be done this way", then sure, dropping a prompt on the AI and letting it do the grunt work while you polish the result and go off to do more interesting if niche topics is fine.
But "I need to generate and pitch and sell ideas to people for money" if you are producing things like "hey, Metaculus, buy my improved model to make your prediction markets even better" - well, they don't need to buy your model, they can generate their own.
Tell me where I'm going wrong.
I think it is not the case that every time an employee delegates work, they do that to train their replacement so that their boss can fire them. Sometimes that does happen! And it is interesting to think about when and why.
But often what happens is the employee that delegated her work is freed up to do meta-level work of process design, scaling, monitoring, and career pathing for the employees the work was delegated to. We call this work “management”, and it is required to enable scale.
Historically, given the pyramidal nature of hierarchy, not that many people got to do management. Now, everyone has the opportunity to be a manager (of AIs) and learn what is difficult, interesting, and rewarding about this career.
Sure, “dropping a prompt” on their employees is one (reductive) way to describe what managers do. But there is a whole science of management, and many deep insights have come from the field, despite the incredible difficulty in running scientific experiments in the domain. I expect that to change now that we can more easily A/B test different management styles within the same organization. My hope is that this enables a level of organizational scale that has previously been out of reach given the challenges associated with growth.
" Now, everyone has the opportunity to be a manager (of AIs) and learn what is difficult, interesting, and rewarding about this career."
Have you ever heard the saying "Too many chiefs, not enough Indians"? You don't need as many managers, so yes while there will be people managing the AI, this will be the perfect opportunity to reduce headcount. Instead of having 50 people all managing their little individual team of AI, you have 5 people managing the multiple teams, and expected to boost their productivity due to all the time freed-up by delegating the routine work to the AI.
“Instead of having 50 people all managing their little individual team of AI, you have 5 people managing the multiple teams”
Maybe true if exploration stays expensive. But AI makes testing almost free, so the optimal org might not be 5 hyper-efficient managers. Instead, imagine it’s a swarm of cheap experimenters constantly trying weird stuff to see what works.
Streamlining to perfection only makes sense when mistakes are costly and the path to perfection is known.
"Cheap experimenters" means "if you want to do it on your own time now you're unemployed and need to find some gig".
I wish you the best of luck in your search.
This post inspired me to start writing again. First post here: https://joeybream.substack.com/p/insect-farming-and-consciousness
You are not considering novelty. Rare training data can still be overrepresented in the output if something in the input activates its latent representation, which also could be just stochastic. Thus "II. Presenting arguments for your beliefs, in the hopes that AIs come to believe them" really only seems worthwhile if you can produce content that a theoretically slightly better than now LLM, with more context couldn't replicate. If you just are presenting a particularly cogent restating of other points - you don't add value. But new points might.
I do not write for AIs. I write to offer ideas and perspective. If an AI places my words in a zero-one memory device, that's nice for the future. In my opinion, people who think a machine is anything like a human simply don't understand either particularly well.
What is your argument against training an AI on the "great texts"? It seems to me that doing so should work and give AI the "collective wisdom" of humanity.