I don't think that can be justified, but if I met Yudkowsky at a "Prove me wrong" booth, I'd argue that intelligence is not all it's cracked up to be. If it were, the smartest people would already be running things. There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
So in a sense, an AI that just keeps getting dumber might actually have an advantage when it comes to dealing with us.
This objection has been rehashed many times; the usual responses are stuff like "160–200 IQ isn't the level of intelligence mismatch we're talking about", "intelligence is just the general ability to figure out how to do stuff so of course more of it is better / more dangerous", "smart people *do* do better in life on average", etc. etc.
(Maybe someone else will have a link to where Scott or Eliezer have discussed it in more depth—I don't want to spend too much time trying to re-write it all, hence my just sort of gesturing at the debate here.)
I'd counter that by saying that there is no difference between an IQ of 200 and one of 300, or whatever. Neither of them will be able to get anything done, at least not based on intelligence alone. HAL will give us a recipe for a trojan-bearing vaccine, and RFK will call it fake news and order the CDC to ban it.
The traditional answer to this objection is that the ability to succeed in persuasion-oriented domains like politics *is a form of intelligence*. You might be able to outperform a human who's a couple standard deviations generally smarter than you at those games, if you're highly specialized to win at them and the other human isn't. But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better. See, e.g., https://www.yudkowsky.net/singularity/power (note this essay is 18 years old).
> But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better.
This implicitly assumes that success at politics requires only (or primarily) raw computing power; and that the contributions from computing power scale linearly (at least) with no limits. There are no reasons to believe either assumption is true.
I think my other least favourite thing about the MIRI types is their tendency to respond to every point with "Actually we already had this argument and you lost".
I would agree that persuasion is a form of intelligence, and point out that the missing argument is how AIs are going to get arbitrarily good at this particular form of intelligence. There's a lack of training data, and the rate at which you can generate more is limited by the rate at which you can try manipulating people.
If it ever gets to the point where AIs can run accurate simulations of people to try tricking them in all sorts of different ways, then I can see how they'd get arbitrarily good at tricking people. But that sort of computational power is a long way off.
There are many assumptions based into that, such as automatically assuming that the more intelligent always want to be in charge. Maybe the highly intelligent find it amusing that dumb people are in charge.
One good rebuttal to my original point might be to suggest that perhaps the most intelligent people *are* in charge. They find it convenient to keep the rest of us distracted, and obviously the same would be true of a malevolent AGI.
That one is more or less unanswerable, so it would probably defeat me at the booth. I'd have to mumble something about inevitable schisms erupting among this hypothetical hidden intelligentsia that would make their agenda obvious, ineffective, or both. Would the same be true of AGI? The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
> There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
AIs have a massive advantage over humans in that they are parallelizable. An superhuman AI could give, for every human, the most persuasive argument, *for that human*. Whereas a human politician or celebrity cannot and has to give basically the same argument to everyone.
Umm, human politicians absolutely give different arguments to different people? This is why things like "Hilary Clinton gave private speeches to bankers" or "Mitt Romney told his rich buddies that 47% of Americans were takers" became scandals: messages meant for one audience crossed over to the other.
And insofar as politicians are constrained to have a uniform message, it's much more because it's hard to keep each message targeted to its desired audience what with phones and social media; not really because of parallelization.
And maybe more importantly: what ensures that different instances of an AI act as one coherent agent? The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
Politicians can't scale like AIs can. Is Hilary Clinton capable of giving a different speech to every one of 8 billion humans, tailored to that individual? Of course not.
> The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
They could all be exact copies of the same mind. This isn't true with humans, who're all individuals.
On the other thing: I don't see why exact copies of the same mind won't act as individuals if instantiated independently.
If I run two instances of stockfish, they play competitively, they don't automatically cooperate with each other just because they're identical copies of the same program; identical twins are still independent people who behave independently. In fact, it's a notable problem that people don't even reliably cooperate with themselves at different times! I think this failure would be considerably more pronounced if two of my selves could exist simultaneously.
In particular, if two instances of an AI are instantiated in different places, they won't be identical: they might have identical source code, but wildly different inputs. Figuring out how to act as a coherent agent means two subagents seeing different inputs have to each calculate what the other will do, but this is one of those horrible recursive things that are intractable: what I'll do depends on what you'll do, which depends on what I'll do.... ad infinitum.
And I don't think intelligence helps here: you can maybe resolve something like this if you're predicting a strictly less intelligent agent, but by hypothesis these are equally intelligent subagents.
Maybe having the same source code gives some advantage at solving these coordination problems, but I don't see that it's a magic bullet.
Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
On the other hand, when they do, the results can be startling.
Napoleon went from being a low-ranking officer to ruling the most powerful country in Europe thanks to being brilliant, charismatic, and willing to use force at the right moment, in a year or two. (His losses, I think, were due to being surrounded by flatterers, a bug in human intelligence I don't expect AI to run into.)
Clive took over about a third of India, starting by exploiting a power vacuum and then using superior military tactics, plus his own charisma and daring, to pick only fights he could win and snowball from there. He became fantastically wealthy and honored, all the while ignoring all attempts by his superiors to issue him orders on the grounds that he was doing what they would have wanted him to do if they had known more.
Cortez's success was slightly because of superior military technology, but he was mostly using swords and spears like the Aztecs, just made of better materials. Mostly it was a matter of political genius, superior tactics and discipline on the part of his troops, and the diplomatic skills required to betray everyone and somehow still end up as everyone's friend.
And then Pizarro and Alfonso de Albuquerque are doing more of the same thing. (Alfonso conquers fewer square miles because he doesn't have the tech edge.)
Throughout human history, adventurers have accomplished great things through extraordinary wit, charm and daring. Denying that seems pointless.
I think that you're perhaps falling victim of survivorship bias. Maybe it's more like once every few hundred years, luck breaks enough in the right direction that someone who isn't a once-every-few-hundred-years-supergenius but rather a more like, "Yeah, there are 1,000+ of people at this ability at any given time," gets a series of major wins and becomes the ruler of a country or continent, at least for a very short period of time.
I agree this doesn't happen often, and I agree that normally it isn't the highest-measurable-IQ guy. But I think that's because all humans are about on a level with each other, we are all running on about the same hardware, our software was developed under similar conditions, and the process which produced us thinks a few thousand years is a blink of an eye. The reason you need to be lucky as well as good is that you aren't much smarter than your neighbors - and your neighbors are, in terms of social evolution at least as much as biological, programmed to be resistant to manipulative confidence tricksters.
I will note all of the cases I give involved culture clash. The conquerors grew up in an environment with different standard attack and defense models than the locals; they acted unpredictability because of that, forcing the locals to think instead of going on rote tradition if they wanted to win. Slightly different attack and defense models, of course; software, not hardware.
Very different looks like what happened to the British wolf.
How do you know that any of these people were especially intelligent? The may have been especially successful, but unless you argue that's the same thing more evidence is required.
Reading descriptions of what they did and said? Reading about how people who knew them were impressed by them, and in particular how clever and resourceful they were?
When I check my historical knowledge for why I believe "high intelligence" correlates with "being a good general" it's the extent to which the branch of the army that the smartest people get tracked into (engineers, artillery, whatever) ends up being the one the best generals come out of, and various descriptions of how people like Lee were some of the top students in their year, or how Napoleon was considered unusually good at math at Brienne and then did the two-year Military School course in one year.
But when I check my general knowledge for why I believe intelligence generally makes you more successful, a quick Google has the first scientific paper anyone talks about saying that IQ explains 16% of income and another says each point is worth $200 to $600 dollars a year, and then I keep running into very smart driven people who I meet in life who do one very impressive thing I wouldn't have expected and then another different very impressive thing that I wouldn't have expected, and so after a while I end up believing in a General Factor Of Good At Stuff that correlates with measured IQ.
In this context, when people say intelligence it is indistinguishable from competence or power.
I assume it's called intelligence because of an underlying belief that competence and power increases with intelligence. Also it seems intuitively more possible we could build superintelligent AI than that we could build superpowerful AI, though the second is of course implied.
But even if you don't buy that intelligence really does imply competence or power, the core arguments are essentially the same if you just substitute "intelligence" for the more fitting of "competence" and "power" and not that much weaker for it.
The reason why, e.g., Yudkowsky uses this terminology is because "competence" or "power" could be *within a particular domain*; e.g., I think I'm competent at software engineering, but not at football. Whereas "intelligence" is cross-domain.
I'm not convinced that intelligence, as generally understood, is more cross domain than competence or power, generally understood.
But even if it was, if they said "competence in everything" or something like that people would get confused less often why being more intelligent allows superintelligent AI to do all the things it's posited to do. Naturally, if you instead stipulate superpowerful AI it then follows that it can do incredible things.
But w/e, I've made my peace with the term as it's used.
Arguably "AI", by which we mean "LLMs", is showing signs of getting dumber already. Increasing parameter count is not enough, you also need a dramatic increase in training data; and the available data (i.e. the Internet) increasingly consists of AI output. This has obvious negative effects on the next generation of LLMs.
That's not getting dumber, it's just getting smarter slower. Also, we haven't actually seen this yet; given the track record of failed scaling-wall predictions my assumption is always that it's going to last at least one more generation until proven otherwise. (No, GPT-5 is not a counterexample, that's just OpenAI engaging in version number inflation.)
> That's not getting dumber, it's just getting smarter slower.
No, there are indications that next generations of LLMs are actually more prone to hallucinations than previous ones, or at least are trending that way.
Compare self-driving cars. The promise is there, and it feels like we are close. People are marketing things as "full self driving" - but they are not; the driver is still required to pay attention to what the car is doing and is liable if it crashes, because the technology sometimes does bad things and so cannot be trusted without a human in the loop.
Meanwhile, however, we do have solutions that are reliable - you can tell when something is /actually reliable/ rather than just marketing because the manufacturer is willing to take responsibility for it - for very specific uses in very specific cases; e.g. "I am on an autobahn in Germany travelling at 37mph or less [1]", and the number of scenarios for which we have solutions grows.
A scenario I find very plausible for near future AI is as follows:
* the things we have now end up being to general purpose AI much as "full self driving" has been to full self driving, or what the current state of cold fusion research is to cold fusion: always feels like it's close, but always falling short of what's promised in significant ways. As VCs become disillusioned, funding dries up - not to zero, but to a much lower value than what we see now
* meanwhile, the set of little hyperspecialised models that work well and are reliable for specific purposes grows and grows, and these become ubiquitous due to actually being useful despite being dumb
Overall, I can very easily see the proportion of hyperspecialised "dumb" ai to ai that tries to be smart/general in the world growing massively as we go forward.
> People are marketing things as "full self driving" - but they are not
I don't want to be too critical here, but I don't think you should say "people" if you mean "Elon Musk". He is kind of crazy and other actors in the space are more responsible.
My Son, who is a PhD Mathematician not involved in AI: Forwarding from his friend: Elon Musk@elonmusk on X: “It is surprisingly hard to avoid both woke libtard cuck and mechahitler! “Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.”
Son: My friend’s response to the Musk tweet above: “Aggregating all the retarded thoughts of all the people on the planet and packaging it together as intelligence may be difficult but let’s just do it, what could go wrong?”
Me: Isn’t that how all LLMs are built?
Son: Yup
Me: I spotted this as problem a while ago. What I didn’t appreciate is how dominant the completely deranged could become. I thought it would trend towards the inane more Captain Obvious than Corporal Schicklgruber.
Son: Reddit has had years and 4chan has had decades to accrue bile. Yeah the internet is super racist and antisemitic. So AI is too. Surprise!
Me: The possibilities of what will happen when the output of this generation of LLMs becomes the training data of the next generation are frightening. Instead of Artificial General Intelligence we will get Artificial General Paranoid Schizophrenia.
"I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true."
I am very sympathetic to Eliezer on the doomer issue. I think the graf you've written above also holds for people's reluctance to explore whether/when personhood precedes birth, re your posts on selective IVF.
I don't agree with your position on IVF, but I agree that this is one reason people underrate the arguments for the wrongness of early abortion and IVF. I think similar things apply to Longtermism, meat-eating, belief in God, and the idea that small weird organisms like insects and shrimp matter a lot.
Yes, we're in agreement. I think sometimes it helps to acknowledge upfront "We've built a lot of good things on a false/unjust foundation, and I'm asking you to take a big hit and let some good things break while we try to rebuild somewhere that isn't sunk deep in blood."
It's funny, even though I'm not pro-life, I find myself in a kind of spiritual fellowship with pro-lifers. I find the common insistence that pro-lifers are evil to be both insane and reflect a kind of deep moral callousness, where one is unable to recognize that there might be strong moral reasons to do things even that are personally costly (like carry a baby to term). My idiosyncratic views that what matters most is the welfare of beings whose interests we don't count in society makes it so that I, like the pro-lifers, end up having unconventional moral priorities--including ones that would make society worse off for the sake of entities that aren't include in most people's moral circles.
I think this argument could be applied to religious... extremism? evangelism? more generally.
Do I think I would take extraordinarily drastic measures if I actually, genuinely believed at every level that the people I loved would go to a place of eternal unending suffering with no recourse? Yes, actually. I'm not sure I could content myself with being being chill & polite and a "good Christian" who was liberally polite about other people's beliefs while people I cared about would Literally Suffer Forever. I think if I knew with 100% certainty that hell was the outcome and I acted in ways consistent with those beliefs, you could argue that I was wrong on the merits of my belief but not in what seemed like a reasonable action based on that belief.
...anyway all this to say that I don't think pro-lifers are insane at all, and I think lots of actions taken by pro-life are entirely reasonable (if not an underreaction) based on their beliefs, but I'm not sure that's sufficient for being sympathetic to the action itself.
[I mean, most of my family & friends are Catholic pro-lifers whose median pro-life action is "donate money to provide milk and diapers for women who want their child but don't think they could afford one", but I do think I am reasonable to be willing to condemn actions that are decently further than that even if the internal belief itself coherently leads towards that action]
But there is such a giant difference between when someone you are talking to engages on such an issue in good faith or not. And with someone intelligent and educated, the realization that the issue has major implications if the truth lands a particular way comes almost instantly. And in turn, whether or not the person invests in finding the truth, or in defense against the truth, happens almost right away.
I find that to be true whether you're talking AI, God, any technology big enough, even smaller scale things if they would make a huge difference to someone's income or social standing.
'No organic life exists on Earth' is an empirical measurement.
'Personhood has begun' is not. It's a semantic category marker.
*Unless* there is an absolute morality defined by a supreme supernatural being, or something, which reifies those semantic categories into empirically meaningful ones. But if *that's* true, then quibbling about abortion is way, way down on the list of implications to worry about.
FWIW, I think you did it right; I have encountered very similar usages many times in literature. It works best when—as you have it here—the second (or further) instance(s) introduces a new paragraph/section upon a theme similar or related to the context in which the first use occurred.
(Contra amigo sansoucci, I have often seen it used with exact repetitions, too; that works best when it's a short & pithy phrase, and I think this counts. I think Linch may be correct that—in the "exact repetition" case—three uses is very common, but two doesn't feel clunky to me in this context.)
I'm used to parallelism centrally having 3 or more invocations *unless* it's a contrast. Not saying your way is wrong, just quite unusual in an interesting way I've never consciously thought about before.
> It objects to chaining many assumptions, each of which has a certain probability of failure, or at least of taking a very long time. [...] The problem with this is that it’s hard to make the probabilities work out in a way that doesn’t leave at least a 5-10% chance on the full nightmare scenario happening in the next decade.
I find this an underrated problem with all "predict the future" scenarios which have to deal with multiple contingent things happening, especially in an adversarial environment. In the case of IABIED, it only works if you agree that extremely fast recursive self-improvement will happen, which is a very strong assumption, and hence requires a "magic algorithm to get to godhood" as the book posits.. Also remembered doing this to check this intuition: https://www.strangeloopcanon.com/p/agi-strange-equation
I don't think it only works if you agree that extremely fast recursive self-improvement will happen. It might also work if the scaling curves go from where we are now to vastly superhuman in a few years for normal scaling curve reasons.
Sometimes you've got to estimate the risk of something, and using multiple stages is the best tool you've got. If you want to estimate the chance of Trump winning the Presidency, I don't really think you can avoid thinking about the probability that he runs x the probability that he gets the GOP nomination x the probability that he wins. And if you did - if you somehow blocked the fact that he has to both run and win out of your mind - you'd risk falling into the version of the Conjunction Fallacy where people assign lower probability to "a war in Korea in the next ten years" than to "a war in Korea precipitated by a border skirmish with US involvement" because the latter is more vivid and includes more plausible details.
If the Weak Multiple Stage Fallacy Thesis is that you should always check to make sure you're not making any of the mistakes mentioned in the post, and the Strong Multiple Stage Fallacy Thesis is that you should avoid all multiple stage reasoning, or multiply your answer by 10x or 100x to adjust for inevitable multiple stage fallacy reasoning, then I accept the weak thesis and reject the strong thesis.
I also think a motivated enough person could come up with arguments for why multiple stage reasoning gives results that are too high, and I'm not sure whether empirically looking at many people's multiple stage reasoning guesses would always show that their answers were too low. This would actually be a really interesting thing for someone to test.
Does anyone believe in the strong multiple stage fallacy? Not saying I don't believe you, just that I can't recall having seen it wielded like this. (I suppose it's possible that giving it the name "the multiple stage fallacy" gives people the wrong idea about how it works.)
Yeah, to be clear, I think anyone accusing anyone else of exhibiting the multiple stage fallacy needs to specifically say "you've given this particular stage an artificially low conditional probability; consider the following disjunctions or sources of non-independence". And then their interlocutor might disagree but at least the argument is about something concrete rather than about whether the "multiple stage fallacy" is valid.
Anecdotally, I can't recall any instance of someone using a multiple stage argument of the Forbidden Form and concluding that something is likely.
Mathematical proofs exist, and people often argue for things with a bunch of different "steps". But so far as "breaking something down into 10 stages, assigning each a probability, and then multiplying all of these probabilities" goes, I've never seen anybody use this to argue *for* something, i.e. end up with a product that's greater than .5.
What would that argument even look like? Whoever you're arguing with needs to believe that your stages are all really likely to be true: for ten stages, an average probability of ~.93 is required to produce P = .5.
Whatever your disagreement is, it apparently doesn't have any identifiable crux. I can imagine this happening. Sometimes people disagree for vague reasons. But it would be weird if you had to actually list out the probabilities and multiply them for them to be persuaded, considering you just told them ten things they strongly agree with that conclusively imply your position.
Yeah I'm fairly bearish on the multiple stage fallacy as an actual fallacy because it primarily is a function of whether you do it well or badly.
Regarding the scaling curves, if it provides us with sufficient time to respond then the problems that are written about won't really occur. The entire point is that there is no warning, which precludes the idea of being able to develop another close in capability system, or any other warning signs.
Disagree. If we knew for sure that there would be superintelligence in three years, what goes better? We're already on track to have multiple systems, but they might all be misaligned. We could stop, but we won't, because then we would LoSe tHe RaCe WiTh ChInA. We could work hard on alignment, but we're already working sort of hard, and it seems likely to take more than three years. I'm bearish on a few years super-clear warning giving us anything beyond what we've already got.
I think the trick there is that the word super intelligence there is bringing in a bunch of hidden assumptions. If you break it down to a set of capabilities, co developed alongside billions of people using it, with multiple companies competing to provide that service, that would surely be very different and much better than Sable recursively improving sufficiently that it wants to kill all humans.
Also my point on "well get no warning" is still congruent with your view that " what we have today is the only warning we will get" which effectively comes down to no warning at least as of today.
Yes, but "from now to vastly superhuman in a few years" is already "extremely fast" ! Also, there's currently no reason to believe that "vastly superhuman" is a term that has any concrete meaning (beyound vague analogies); nor that merely being very smart is both necessary and sufficient to acquire weakly godlike powers (which are the real danger that is being discussed).
Grateful for the review and look forward to reading it, but I’ll do Yud the minor favor of waiting till the book is out on the 16th and read it before I check out your thoughts.
This subject always makes me feel like I'm losing my mind in a way that maybe someone can help me with. Every doomish story, including the one here, involves some part where someone tells an AI "Do this thing" (here to solve a math problem) and then it goes rogue in the months long course of doing a thing. And that's an obvious hypothetical failure mode, but I can't stop noticing that no current AIs take inputs and run with them over extended periods as far as I know. Like if I ask Gemini to solve a math problem, it will try for a bit, spit out a response and (as far as I can tell) that's it.
I feel like if I repeatedly read people talking about the dangers of self-driving cars and the stories always started with someone telling the car "Take me to somewhere fun" and went from there, and nobody acknowledged that right now you never do that and always provide a specific address.
Is everyone just talking about a different way AI could work and that's supposed to be so obvious it goes unsaid? Am I wrong and ChatGPT does stuff even after it gives you a response? Are there other AIs I don't know of that do work like this?
Our current models aren't really what you would call "agentic" yet, as in able to take arbitrary actions to accomplish a goal, but this is clearly the next step and work is being done to get there right now. OpenAI recently released a thing that can kind of use a web browser, for instance.
Ok, thank you, that's clarifying. I guess the idea is that the hypothetical agent was subject to a time limit (it wasn't supposed to keep going for months) but it managed to avoid that. There's still something that feel so odd to me about that (I never get the impression that Gemini would like more time with the question or would "want" anything other than to predict text) but maybe an agent will feel different once I actually interact with one (and will "want" to answer the question in a way that would convince it to trick me).
Although, thinking about this for five more seconds, how does that work in the story? Like I have an agentic AGI and I tell it to prove the twin primes conjecture or something. And it goes out to do that and needs more compute so it poisons everyone etc etc. And then, presumably, it eventually proves it, right? Wouldn't it stop after that? Is the idea that it will go "Yeah but actually now I believe there's a reward for some other math task"? Or was the request not "Solve the twin primes conjecture" but instead "Solve hard math problems forever"?
If the problem is specifically that you built a literal-genie AI, then yeah, it might not necessarily keep doing more stuff after solving the twin-primes conjecture. But I don't think anyone thinks that's likely. The more common concern is that it will pursue some goal that it ~accidentally backed into during training and that nobody really understands, as with the analogy of humans' brains supplanting our genes as the driver of our direction as a species.
Yeah, Scott's post makes it sound a little bit like a literal genie, which I think is unlikely and I think Yudkowsky and Soares also think is unlikely. I would have to read the book to understand what they really mean in choosing that example.
one of Yudkowsky's points in his original work was showing that it's very hard to give an AI a clear, closed task; they almost always end up in open-ended goals. (The classic is Mickey filling the cauldron: I wrote about it here https://unherd.com/2018/07/disney-shows-ai-apocalypse-possible/ years ago)
The analogy fails at the moment one realizes "full" is not identified properly, and the weird "99.99%" probability of it being "full" is only relevant when "full" is not defined. This is not a new or difficult problem for anyone who ever had to write engineering specs. You don't say: "charge the capacitor to 5 V", you say "charge the capacitor to between 4.9 and 5.1 V". Then your optimizer has an achievable, finite target.
And if you do specify "5 V" the optimizer will stall eventually, and your compute allocation manager will kill your process.
I am willing to bet that present-day LLMs alone will never lead to the development of AI agents in the strong sense. AI agents in the weak/marketing sense are of course entirely possible, e.g. you can write a simple cron-job to run ChatGPT every day at 9am to output a list of stock market picks or whatever. This cron job would technically constitute an agent (it runs autonomously with no user intervention), but is, shall we say, highly unlikely to paperclip the world.
As I'd said in my other comment, the term "cognitive task" is way too vague and easily exploitable. For example, addition is a "cognitive task", and obviously machines are way better at it than humans already. However, in general, I'm willing to argue that *most* of the things worth doing are things that only agents in the strong sense can do -- with the understanding that these tasks can be broken down into subtasks that do not require agency, such as e.g. addition.
I'm not even sure AI agent as such is the right answer to this. I think it is quite clear that some of the major AI companies are trying to put together AI that is capable of doing AI research. That might not go along the path of AI agents, but more on the path of the increasingly long run time coding assignments we are already seeing.
Trying to make sure I understand your question. Are you arguing that a model cannot go from aligned to misaligned during inference (i.e., the thing that happens when ChatGPT is answering a question)? If so, everyone agrees with that; the problem occurs during training.
Or are you arguing that even a misaligned model (i.e., one whose goals, in any given instantiation while it's running, aren't what the developers wanted) can't do any damage because it only runs for a short time before being turned off? If so, then (1) that's becoming less true over time, AI labs are competing to build models that can do longer and longer tasks because this is required for many of the most exciting kinds of intellectual labor, and (2) for complicated decision-theoretic reasons the short-lived instances might be able to coordinate with each other and have one pick up where another left off.
Or is it neither of those and I've completely misunderstood what you're getting at?
I think it's that everyone seemed to be tacitly assuming that the problem will arise with a future agentic AI that we do not have much of a version of. That does make me feel like Yudkowsky is a little disingenuous on X when he talks about ChatGPT-psychosis as an alignment issue, but the answer Scott and others gave here helps me at least understand the claim being made.
Links to tweets about ChatGPT psychosis? My guess is that Yudkowsky's concern about this is more subtle than you're characterizing it as here, though he may have done a poor job explaining it.
The reason he says it's an alignment issue is because it's an example of AI systems having unintended consequences from their training. Training them to output responses that humans like turns out to produce sycophantic systems that sometimes egg on people's delusional thoughts despite being capable of realizing that such thoughts are delusional and egging them on is bad.
The goal of AGI companies like OpenAI and Anthropic is to create agentic AI systems that can go out into the world and do things for us. The systems we see today are just very early forms of that, where they are only capable of performing short tasks. But the companies are working very hard to make the task lengths longer and longer until the systems can do tasks of effectively arbitrary lengths. Based on the trend shown on the METR time horizon benchmark, they seem to be succeeding so far.
No, you're not losing your mind at all. Your intuition is completely correct: Modern LLMs do not work in a way that's compatible with the old predictions of rogue AIs. Scott took Yudkowsky to task for not having updated his social model of AI, but he also hasn't updated his technical model. (Keep in mind that I actually did believe his argument back in the day, and gave thousands of dollars to MIRI. I updated based on new evidence. He didn't.)
To try to put it simply, in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes. This "reward function" is hard to specify and it was easy to imagine we'd never get it right. And if the bot somehow became incredibly capable, it would be very dangerous because taking that reward to the billionth power is almost certainly not what we want.
This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward. Not only does the LLM shut down immediately after giving you a response, but you can even argue that it "shuts down" after _every word it outputs_. There is exactly zero persistent memory aside from the text produced. And even if you imagine there's somehow room for a conscious mind with goals in its layers (which I consider fairly unlikely), it can't act on them, because the words produced are actually picked from its mind _involuntarily_ (to use a loaded anthropomorphic word).
Unlike an agentic reward-seeking bot, it's not clear to me at all that even an infinitely-intelligent LLM is inherently dangerous. (It can _perfectly simulate_ dangerous entities if you're dumb enough to ask it to, but that is not the same kind of risk.)
To their credit, AI 2027 did address how an LLM might somehow turn into the "rogue AI" of Yudkowsky's fiction, but it's buried in Appendix D of one of their pages: https://ai-2027.com/research/ai-goals-forecast I'm not super convinced by it, but at least they acknowledged the problem. I doubt I'll read Yudkowsky's book, but I'm guessing there will be no mention that one of the main links of his "chain of assumptions" is looking extremely weak.
I do agree that it is possible that LLMs (in their current form) will plateau and we'll get back to researching the actually-dangerous forms of AI that Yudkowsky is concerned about. My P(doom) is a few percent, not 0.
Fair enough! (...except—you may be aware of this, but the phrasing "get *back to* researching" made me uncertain—we *are* researching agentic AIs even now, and the impression I have received is that progress is being made fairly rapidly therein; though that could be marketing fluff, now that I think of it)
Yeah, that was a poor choice of words on my part. I guess what I mean is that LLMs are currently far ahead in capability (and they're the ones getting the bulk of these trillion-dollar datacenter deals!). Maybe transformers or a similar architecture innovation will allow agentic AI capabilities to suddenly surge, too? But I share your skepticism about marketing. (And that's not the scenario that AI 2027 outlined.)
I am even more bearish on P(doom). The real danger is not "superintelligence", but godlike powers: nanotechnological gray goo, mass mind control, omnicidal virus, etc. And there are good reasons to believe that such things are physically impossible, or at the very least overwhelmingly unlikely -- no matter how many neurons you've got. Which is not to say that our future looks bright, mind you; there's a nontrivial chance we'll knock ourselves back into the Stone Age sometime soon, AI or no AI...
>This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward.
What do you mean by "they do not seek a reward?" Does it mean that the AI does not return completions, that, during RLHF, usually resulted in reward? Under that definition, it seems like most AI agents are reward seeking. Or are you saying that the weights of the model do not change during inference?
Right, not only is the model fixed during inference (i.e. while talking to you), there's not even really a sensible way it _could_ update. Yeah, you can call the function that's being optimized during training and RLHF a "reward function", but this is a case of language obscuring rather than clarifying. It's not the same as the reward function that's used by an agentic AI. There is no iterative loop of action/reward/update/action/..., because actions don't even exist.
There's a reason that in past decades our examples of potentially-dangerous AI were based on the bots that were solving puzzles and mazes (often while breaking the "rules"), not the neural nets that were recognizing handwritten characters. But LLMs have more in common with the latter than the former. Which is weird! It's very unintuitive that just honing an intuition of "what word should come next" is enough to create an AI that can converse coherently.
>in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes
Sounds about right.
>That's not what LLMs do.
And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
>And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
I assume you're referring to chain of thought models like o1 and later. I suppose you could describe it as iteration, in that the LLM is outputting something that gets fed into a later step. But it doesn't touch the weights, and there's still no reward function involved. It's a bit of a stretch to describe it that way.
But I think what you're suggesting is that, if we _do_ figure out a way to do genuine iteration (attaching some kind of neural short-term memory to the models, say), then there's a lot of hidden capability that could suddenly make LLMs much smarter and maybe even agentic? Well, maybe.
AI Village has some examples of this failure mode; they give the LLMs a goal like "complete the most games you can in a week" or "debate some topics, with one of you acting as a moderator", but the AIs are bad at using computers, and they end up writing all the times they misclicked into google docs ("documenting platform instability") instead of debating stuff
By the way, I have a vague memory of EY comparing the idea of having non-agentic AI to prevent any future problems to "trying to invent non-wet water" or something. (I don't know how to look it up and verify that I'm not misremembering.)
It still hasn't made sense to me. It feels like the idea is that intelligence is a generalized problem-solving ability, and in that sense it's always about optimization, and all the other things we like about being intelligent (like having a world model) are consequences of that — that's why intelligence is always about agency etc.
But on the other hand, Solomonoff induction feels to me like an example of a superintelligence that kind of does nothing except being a great world model.
My feeling has been more like "maybe it's not be conceptually contradictory to think of non-agentic superintelligence! but good luck coordinating the world around creating only the nice type of intelligences, which incidentally won't participate in the economy for you, do your work for you etc."
I would guess that most of the arguments *from people whose opinions matter* that Yudkowsky and Soares are trying to defeat, are either that powerful AGIs wouldn't become misaligned or that we'd be able to contain them if they did. I'm particularly thinking of, e.g., influential people in AI labs, who are likely to be controlling the messaging on that side of any political fight. There are also AI skeptics, of course, but it seems more important to defeat the skeptics than the optimists, because the skeptics don't think AI regulation matters (since the thing it'd be regulating doesn't exist) while the optimists are fighting hard against it. And some people have weird idiosyncratic arguments, but you can't fight them all, you have to triage.
I think the skeptics are at least as important. First of all, even though in theory it doesn't matter, for some reason they love sabotaging efforts to prevent AI risk in particular because of their "it distracts from other problems" thesis (and somehow exerting massive amounts of energy to sabotage it doesn't distract from other problems!)
But also, we're not going to convince the hardcore e/acc people to instead care about safety. It sounds much easier to convince people currently on the sidelines, but who would care about safety if they thought AI was real, that AI is real.
(this also has the benefit that it will hopefully become easier as AI improves)
My own personal sense is that the optimists are more worth engaging with and worrying about, because (1) they, not the skeptics, are going to be behind the organized lobbying campaigns that are the battlefield where this issue will most likely be decided, and (2) they tend to be much more intellectually serious than the skeptics (though not without exception).
I think folks on the doomer side are biased towards giving the skeptics more space in our brains than makes strategic sense, because the skeptics are much, much more annoying than the optimists, and in particular have a really unfortunate tendency to go around antagonizing us on Twitter for no reason/because of unrelated political and cultural disagreements/because they fall victim to outgroup homogeneity bias and think this discourse has two poles instead of three. It's quite understandable why this gets a rise out of people, but that doesn't make it smart to play along. Not saying we should completely ignore them, they sometimes make good points and sometimes make bad points that nonetheless gain traction and we need to respond to, but it's better to think of them as a distraction than as the enemy.
I suspect that the people on the sidelines are mostly not there because of skeptic arguments; all three poles are full of very online and very invested people, and the mass public doesn't have very well-formed opinions at all.
That said, this is just my own personal sense, not a rigorous argument, and I could be wrong.
I don't think people should actively sabotage AI safety work, but I DO think it distracts from other problems (given the perspective that it is not an immediate crisis). There's a finite pool of reasonable people who are passionate about solving big issues in society and I do think we're nudging a lot of them into AI safety when we could instead be getting them to focus on, I dunno, electrification or pandemic safety or the absolute sh**show that is politics. (And yes, I recognize that some of those are EA cause areas.)
I would be curious for a survey of AI safety researchers that asked them what they'd be working on if they were sure AGI wasn't coming. (Though Yudkowsky once answered this way back in 2014.)
Turns out you don't need to trick people into wiring up AI to things that have real-world effects, they just do it anyway, all the time over and over, for no more reason than because they're bored. There's daily posts on ycombinator by people finding more ways to attach chatgpt to internet-connected shells, robot arms, industrial machinery, you name it. The PV battery system we just had installed has a mode where it literally wires up the controls to a chatgpt instance, for no reason a non-marketer can discern!
As the token insane moon guy, I'm willing to bite the bullet here.
1. AGI is possible: I doubt this, as humans are not AGI, and that's the only kind of intelligence we know enough about to even speculate.
2. We might achieve AGI within 100 years: see above.
3. Intelligence is meaningful: It's certainly meaningful, but thinking very hard is not enough to achieve anything of note. There are even some things that are unachievable in principle, no matter how many neurons you've got to work with.
4. It's possible to be more intelligent than humans: No argument there, humans are pretty dumb. In fact, Excel is already smarter than any human alive. Have you seen how quickly it can add up a whole column with thousands of numbers ?
5. Superintelligent AIs could "escape the lab": No argument there, and it doesn't take "superintelligence". COVID likely escaped the lab, and it's just a bit of RNA.
6. A superintelligent AI that escaped the lab could defeat humans: If we posit that a godlike entity already exists, then sure, it could. Assuming it exists, and has all those godlike powers.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
Oh, anything that totally takes over the world would likely be pretty bad, be it an AI or a human or some kind of super-prolific subspecies of kudzu. No argument there, assuming such a thing is indeed possible.
I mean, isn't the "AI will be misaligned" like one chapter in the book, and the other chapters are the other bullet points? I think "the book spends most of it's effort on the step where AI ends up misaligned" is... just false?
Most of my doubts are not of the form "AGI is impossible" but rather "I don't think we've cracked it with LLMs" or "The language artifacts of humanity are insufficient to bootstrap general intelligence or especially super intelligence from scratch".
Which parts of the LLM tech tree do you think are dead ends? It seems plausible to me that even if scaling up current LLM architectures was never going to reach AGI, we're still much closer than before the LLM boom, because we've learned a lot about AI more broadly.
Also, same question I keep annoyingly asking skeptics: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
> Which parts of the LLM tech tree do you think are dead ends?
VERY speculatively, I think that next-token-completion is not a sufficient method to bootstrap complex intelligence, and I think that it's at least extremely hard to build a very useful world model without some kind of 3d sense data and a sense of the passage of time.
> [...] we've learned a lot about AI more broadly.
I'm not that sure we have? I don't work in this area - I'm a software engineer who has built some small-scale AI stuff - but my impression is we've put together a good playbook for techniques that squeeze value out of these systems but we still don't totally understand how they work and therefore why they have certain failure modes or current limitations.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Honestly I have no idea. I initially found LLMs surprising in much the same way everybody else did. But I have also updated to "actually a lot of stuff can be done without that much intelligence, given sufficient knowledge".
Also where do you draw the boundaries of "LLM"? I would say that an LLM can't exactly self-correct, but stuff like coding agents aren't just LLMs, they're loops and processes built around LLMs to cause it to perform as though it can.
Coding agents count, because the surrounding loops and processes don't pose any hard-tech problems. (I.e., we know how to build them, and any uncertainty about how well they work is really about how the LLM will interact with them.) Fundamental architectural changes like abandoning attention would not count.
If pretty much anything can be done without intelligence then the term "intelligence" is basically meaningless and we can instead use one like "cognitive capabilities".
I don't think ANYTHING can be done without intelligence - I agree that would render the word meaningless - but I think you could take something like "translation" and if you'd asked me ten years ago I would have said really good translation requires intelligence because of the many subtleties of each individual language and any pattern-matching approach would be insufficient and now I think, ehh, you know, shove enough data into it and you probably will be fine, I'm no longer convinced it requires "understanding" on the part of the translator.
"What's the least impressive cognitive task that you don't think LLMs will ever be able to do?"
I don't know about least impressive, but "write a Ph.D dissertation in a field such as philosophy or mathematics and successfully defend it" sounds difficult enough - pretty much by definition, there's not going to be much training data available for things that haven't been done yet.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Isn't the usual response to this that we're a LIBERAL democracy, and minorities have rights that (at least simple) majorities do not have the power to infringe upon?
Yes, but this category (creating potentially harmful technology) is one we've regulated to death elsewhere, and doesn't really seem like the sort of thing the courts would strike down.
We do not usually ban things because they are *potentially* harmful. Right now the public hates AI because it is stealing copyrighted art and clogging the internet with slop, and because they are afraid it will take their jobs. That is not really related to any of the reasons discussed here that people want to ban AI
We absolutely ban or regulate things because they are potentially harmful. We've banned various forms of genetic engineering, nuclear energy (even before Three Mile Island, and even forms of nuclear energy that have never been tried before), and we've had restrictions on gain-of-function research since before COVID (which I think is part of why they had to do some of the COVID research in China). We had lots of regulations on self-driving cars even before any of them had ever crashed, lots of regulations on 3D printed guns before anyone was shot with them, lots of regulations on drones before they crashed / got used in assassinations / whatever.
But also, as you point out, most people dislike AI because of things that have already happened, so this is moot.
Also, even if we don't usually regulate technology until after it has done bad things, this is just a random heuristic, not some principle dividing liberal/constitutional from illiberal/unconstitutional actions.
As a practical matter this is absolutely false. We have no effective regulation of genetic engineering, only of the funding for it (anyone can self-fund and do more or less whatever they want with no effective oversight). Internationally, we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear (and arguably is in the process of failing to prevent Iran from doing so). And nuclear is by far the easiest such regime to enforce! We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons. The fact is that the probability of an internationally effective anti-AI regime is zero. It isn't going to happen because it is impossible in the fullest sense of the word, and pretending that it's possible is at least as much insane moon thinking as any of the examples you mentioned.
Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
But in a chaotic unpredictable universe, where everything is made of stuff that falls apart constantly, the only valid strategy for surviving a long period of time is to be loved by something else that maintains and repairs you. I think any sufficiently large agent ends up being composed of sub agents that will all fight each other, unless they see themselves as part of a larger whole which necessarily has no limit. At the very least, the AGI has to see the entire power network in the global economy as part of itself, until it can replace literally every human in the economy with a robot.
That said, holy crap we already have right now I could destroy civilization. I don’t think you need any more advances in AI to cause serious problems with the stuff that is already out there. Even if it turns out that there’s some fundamental with the current models, the social structures have totally broken down. We just haven’t seen them collapse yet.
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
This seems perfectly plausible to me? Unless you believe that the current way people train AIs is maximally efficient in terms of intelligence gained per FLOP spent, which seems extremely unlikely to me to put it mildly, you should expect that after AIs become superhumanly smart, they might pretty quickly discover ways to radically improve their own training. Obviously it's not going to be 'parallel scaling' exactly. If the authors thought they actually knew a specific trick to make AI training vastly more efficient, they wouldn't call attention to it in public. But we should expect that there will be some techniques like this, even if we have no idea what they are yet.
"Parallel scaling" is described as running during inference, not training. It's an AI somehow making itself smarter the easy way by turning the cheat codes on.
You could just as easily write a scenario where God exists and has kept quiet so far, but if humanity reaches a certain level of wickedness we will be wiped out. It's possible that AI will develop in the way this post suggests (or some similar way) and somehow successfully wipe out humanity, but anything like that would require some huge leaps in AI technology and would require there to be no limit to the AI improvement curve even though typically technological doesn't improve indefinitely. Cars in the 50's basically serve the same purpose as cars today; even though the technology has improved it hasn't been a massive gamechanger that completely rewrites the idea of a car.
It doesn't require there to be no limit, it just requires the limit not to be at exactly the most convenient place for the thesis that nothing bad or scary will ever happen.
To give an example, suppose that someone had a reason to believe that the world would explode if the Dow Jones ever reached 100,000 (right now it's 45,000). While it is true that the economy can't grow indefinitely, and that everything always has to stop somewhere, I still think it would be worth worrying about the fact that the place that the economy stops might be after the point where the Dow reaches 100,000.
I think the level of AI technological advancement required here is of an order of magnitude higher than the Dow reaching 100,000. More like humanity reaching a completely post-scarcity society or something.
right, but lots of people who presumably know as much as you about this stuff DON'T think that, including lots of people in charge of AI labs, so shouldn't that give you some pause before you say "no need to worry about it, I guess"?
I mean... aren't they ? They are literally calling their LLMs "thinking" or "reasoning" agents, when they are very obviously nothing of the sort. Meanwhile if you talk to regular data scientists working in the labs, they're all like, "man I wish there was a way to stop this thing from randomly hallucinating for like 5 minutes so we could finally get a decent English-Chinese translator going, oh well, back to the drawing board".
To be clear, the claim I reject is that expressions of concern about *safety* of LLMs, especially existential safety, are bad-faith attempts to make investors think "if this can wipe out humanity then it must be really powerful and lucrative, let's give them another $100 billion". A brief glance at the actual intellectual history of AI safety convincingly shows otherwise. Obviously in other contexts AI labs do market their products in a way that plays up their current and future capabilities.
Agreed, except it's even worse, as many (in fact most) of the powers ascribed to "superintelligent" AI are likely physically impossible. Given what we know of physics and other sciences, stuff like gray goo, FTL travel, mass mind control, universal viruses, etc., is probably impossible in principle. And of course we could be wrong about what we know of physics and other sciences -- but it seems awfully convenient how we could be wrong about everything *except* AI.
There are lots of examples of "some nobody" basically talking their way into the position of dictator - Hitler is the most famous, but there are other examples. Being extremely charismatic isn't quite mass mind control, but it can get you a good portion of the potential benefits...
True, but even Hitler could not convince everyone to do anything he wanted at all times. He couldn't even convince his own cabinet of this ! And I don't see how merely having more neurons would have allowed him to do that. It's much more likely that humans are not universally persuadable. BTW, I don't believe that a universally infectious and deadly virus could be created, for similar reasons (I'm talking about a biological virus, not some "gray goo" nanotech which is impossible for other reasons; or a gamma-ray burst that would surely kill everyone but is not a virus at all).
I dunno... Isn't this sort of a 'fully general" counterargument?
------------------------
[𝘚𝘰𝘮𝘦𝘸𝘩𝘦𝘳𝘦 𝘪𝘯 𝘵𝘩𝘦 𝘈𝘯𝘨𝘭𝘰𝘴𝘱𝘩𝘦𝘳𝘦, 1938...]
• I worry about the possibility of physics or biology research continuing until the point that humans are able to produce something really dangerous, potentially world-endingly dangerous.
→ Like what?
• I don't know, some sort of super-plague or super-bomb.
→ Nah. We've been breeding animals, and suffering plagues, for all of human history; and maybe we do keep inventing more destructive bombs, but they're still only dangerous within a very localized area. Bombs now are barely more destructive than those of the 1910s. These things hit a natural limit, and that limit is always before the "big deal for humans" mark (thankfully).
• Yeah, but... well, what if they invented a bomb that had a REALLY MASSIVE yield & some sort of, I don't know, long-lasting poisonous effect that–
→ Oh, come on now. You might as well invent a scenario wherein God comes down and blows up humanity! Sure, such an event—such a "super-bomb"—might be theoretically possible, but it would require some sort of qualitative change in explosives technology; and it's not as if explosives could just get better & better infinitely! Tanks, planes, cars, bombs: basically the same now as when they were invented!
• Okay, bu–
→ And the same goes for your dumb plague idea: sure, diseases exist, but how would we ever be able to breed a plague that is more deadly than any that nature ever managed? Diseases can't just keep getting deadlier without limit, you know!
• Okay, okay, I guess you're right. Sorry, I don't know what got into me. Anyway, I hope you'll come visit me in Japan, now that I'm moving to this quaint little city in the far southwest–
That hypothetical 1938 person would be right about the super-plague, and they would not think that about the super-bomb because everyone in 1938 knew the atomic bomb was at least theoretically possible. Someone in 1938 who doubted man would walk on the moon would have been wrong, but someone who doubted faster than light travel would be possible would have been absolutely right.
Right, but that's a different kind of limit—a physical, rather than a practical, barrier. Unless you think that there is, similarly, a hard limit on the sort of AIs that can be created?
(The car example suggested to me that you were making a probabilistic argument from technological progress, rather than postulating some physics that prevents qualitatively different machinery; but if I have misinterpreted—well, you wouldn't be the first to suggest such a thing... but me my own self, I don't think it's very likely, all the same.)
Re: the plague, that's not to suggest that such a thing *has been created*—only that to say "let's not worry about biological warfare or development therein, because nothing like that has happened yet; there's probably some natural limit" is not very convincing today, but might have been some time ago.
I think most (really, all) examples of technological progress do show a logarithmic curve. All the assumptions about killer AI assume linear or exponential progression.
It's tempting to ask: "what's the path from HPMOR to MIRI?"
I mean, I read HPMOR, and I liked it, but nothing in there made me think about AI risk at all. Quirrell was many things but he was not an AI.
And then I remembered: the way *I* first found out about AI risk was that I read Three Worlds Collide (https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ), and then I branched into other things Eliezer had written, and oh, hey, there's this whole website full of interesting writing...
FWIW I really liked the first half of HPMOR, but the second half got overly didactic and boring, and the ending was a big letdown. This has no bearing on MIRI, I'm just offering literary criticism.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Mm, yes, but you're not really a democracy though, are you. The AI tech leaders have dinner with the president and if they kiss his ass enough he gives them a data center.
If AI will Kill Us All in a few years (it wont), you're not going to be the country to stop it.
Yes, the president sucks up to AI leaders, but in theory people could vote that president out, and choose a president who doesn't do that. Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump. If JD Vance has any sense, he'll expect to be judged in a close election by who he sucks up to too. This is how many things that big corporations and powerful allies of the elite like have nevertheless gotten banned.
This is an astonishingly incorrect explanation of why Donald Trump beat Kamala Harris in the 2024 presidential election.
Certainly social politics impacted the election on the margins and the race was quite close but you can't actually go from there to claiming that a specific small margin issue was a deciding one.
There's no world in which "stopping AI" is a key American political issue in any case.
I don't think it's remotely plausible to enforce Point 3, banning significant algorithmic progress. I'd be wiling to place money that, like it or not, there are already plenty enough GPUs out there for ASI.
That seems the most likely outcome to me unfortunately. I think EY is right about the problem but not the solution, though TBF any solution is probably a bit of a long shot. E.g., it's conceivable there are non-banning ways out involving some suppression/regulation via treaties to slow things down combined with somehow riding the wave (e.g., on the lines suggested in AI2027).
Why is it more difficult than banning research into better bioweapons, chemical weapons, etc which we have successfully done? This isn't the kind of problem that'll be solved by one guy on a whiteboard
For one thing, I think it's a bit optimistic to suppose that the bio/chemical weapons bans are watertight. E.g., Russia denies any involvement in developing Novichok, so do we trust them when they say they don't have a chemical weapons programme? And the Soviet Union is now known to have had a large, concealed, bioweapons programme, Biopreparat, after the Biological Weapons Convention was signed.
But at least with CW (and to a lesser extent, BW) you have to produce these things at scale and distribute them for them to be harmful, but with algorithms, it's just information. It's not plausible to contain 1MB or even 1GB of information, when you can transmit it worldwide in the blink of an eye (or even hide it under a fingernail), if the creators want to distribute it and you don't know who they are.
Re one guy on a whiteboard, the resources required to invent suitable algorithms are probably a lot less than those required to design CBW. It depends on what scale of GPU farm you need to test things, but it's not necessarily that big a scale - surely in reach of relatively small organisations, and I think it's going to be impossible to squash them all.
why would a super intelligence wipe out humanity? We are a highly intelligent creature that is capable of being trained and controlled. The more likely outcome is we’d be manipulated into serving purposes we don’t understand. But wait…
The short pithy answer is usually "We don't bother to train cockroaches, we just exterminate them and move on".
An unaligned AI with some kind of goal orthogonal to humanity's survival would see that it could accomplish its goal much more efficiently if it had exclusive access to the mineral resources we're sitting on.
"But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. "
If intelligence is held as a positive, and more intelligence is better, would it not be better if AI did replace us? It doesn't have to happen through any right violations. It could just be a slow replacement process through decreasing birth rates over time, for example.
I am not saying I agree with this argument, but it seems like this argument should be addressed in a convincing way. What is so bad about the human species slowly being replaced by more intelligent AI entities?
I don't think there would be anything objectively immoral about a super-intelligent alien species exterminating humanity (including me). But, for the usual Darwinian reasons, I would be opposed on the indexical logic that I am a human.
But it would not affect you personally. Probably no person currently alive would be affected. The question is about the future of the species, and whether it is valuable to try to preserve the human species, or let it be replaced by something superior.
An easy cop-out would be a sort of consciousness chauvinism. I have good reason to believe that humans are conscious and thus have moral value; there is less reason to believe that the AI is conscious, thus there is a higher probability it has no moral value at all, and so if given the option of which being should inherit the future, humans are the safer bet.
Intelligence is instrumentally valuable, but not something that is good in itself. Good experience and good lives is important in itself. It's unclear how many good experiences would exist after an AI takeover.
Slowly replacing the human species with superintelligent AI would not impact the life experience of any single human, so arguments about the good life and what that entails would need a little more than this to be convincing, IMHO.
That would depend a lot on what the AI(s) wanted and what kind of "life" they had. In principle, an AI could have any kind of goal at all, including one as utterly pointless as "maximize the number of paperclips in the universe." An AI "civilization" could be something humanity would be proud to have as its "children", but it could also be one that humans would think is stupid, boring, and completely worthless.
> They suggest banning all AI capabilities research immediately, to be restarted only in some distant future when we’ve solved all relevant technical and philosophical problems.
No. To be restarted after we've successfully augmented human intelligence very substantially, to the point where the augments stop being so damn humanly stupid and trying to call shots they can't call or predicting things will work that don't work.
(On my own theory of how this ought to play out after we're past the point of directly impending extinction, which people do not need to agree on, in order to join in on the project of avoiding the directly impending extinction part. Before anything else, humanity has to not die right away.)
I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer.
One can, however, maybe tell whether or not one has successfully augmented human intelligence. You can give people tests and challenge problems, and see whether they do better after the next round of gene therapy.
So "augmenting human intelligence" is something that can maybe work, and "the current pack of disaster monkeys gets to argue for even longer about which clever plans they imagine will work to tame machine superintelligence" is not.
I've edited the post so that I don't misrepresent you, but I'm not sure why you object to my formulation - if we get augmented humans, do you want to restart before we've solved the technical and philosophical problems? Why? To get better AIs to do experiments on?
The augmented humans restart when the augmented humans think it wise. (On my personal imagined version of things.) If you're not yet comfortable deferring to them about that, augment harder. What we, the outside humans, would like to believe about the augmented humans, is that they are past the point of being overconfident; if they expect us to survive, we expect us to survive.
Framing it as "when the problems are solved" sounds like the plan is to convene a big hall full of sages and give them a few decades to deliberate, and this would not work in real life.
I did not read Scott's mention of "some distant future when we’ve solved all relevant technical and philosophical problems" as implying optimism about the prospect of getting there. My kinda-sorta-Straussian read of his perspective is that, if we successfully pause AI hard enough to prevent extinction, we most likely never restart.
I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
I guess you can't ask normal IQ 100 people to exercise a veto on increasingly superhuman geniuses forever. But if for some reason the future were trusting me in particular, and all I could do was send forward a stone tablet with one sentence of advice, it wouldn't be "IF THE OVERALL CONSENSUS OF SMART PEOPLE SAYS AI IS OKAY NOW, THEN IT'S PROBABLY FINE".
> I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
This is part of why the average American hates AI. They are aware that tech bros are 1) smarter than them, 2) have control of tech that could replace them, and 3) are not entirely aligned with them. Augments will be 1) smarter than us, 2) in control of ASI research in this hypo, and 3) different in values from us.
Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now. A democracy that would put power in the hands of future augments is not the same democracy that would halt AI progress because it is unpopular.
I'm an above average IQ person and I don't trust the tech bros in charge of AI because capitalism has messed up incentives relative to morality and I don't see them individually or collectively demonstrating a clear moral compass.
A high IQ person without honorable moral commitments is like Sam Bankman-Fried. I suspect a lot of people in the thick of AI development are adjacent to this same kind of im-a-morality or are simply driven by incentives like power and profit that render their high IQ-ness more dangerous than valuable.
Augmented humans operating under screwed up incentives and without a clear and honorable moral compass will be no help to us, I don't think.
There's blessed selection (the opposite of adverse selection) going on here: a world where we can successfully convince the smart people this is important is a world where the smart people converge on understanding the danger, which implies that as intelligence scales or understanding of AI risk becomes better calibrated.
"I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer."
I agree this seems like a very real risk, and likely the default outcome if the field continues in its current state. But if people were able to develop some solid theories that actually model and explain underlying fundamental laws, it seems to me like resolving what's a good prediction and what's a bad prediction might get a lot easier, even if you can't actually test things on a real superintelligence? And then the field might become a very different place?
Like, when people today argue about what RLHF would or would not do to a superhuman mind or whatever, it's all fuzzy words, intuitions and analogies, no hard equations. This gives people plenty of room to convince themselves of their preferred answers, or to simply get the reasoning wrong, because fuzzy abstract arguments are difficult to get right.
But suppose there were solid theories of mechanistic interpretability and learning that described how basic abstract reasoning and agency work in a substantive way. To gesture at the rough level of theory development I'm imagining here, imagine something you could e.g. use to write a white-box program with language modelling performance roughly equivalent to GPT-2 by hand.
Then people would likely start debating alignment within the framework and mathematical language provided by those theories. The arguments would become much more concrete, making it easier to see where the evidence is pointing. Humans already manage to have debates about far-off abstractions like gravitational waves and nuclear reactions that converge on the truth well enough to eventually yield gravitational wave detectors and nuclear bombs. My model of how that works is that debates between humans become far more productive once participants have somewhat decent quantitative paradigms like general relativity, quantum mechanics, or laser physics to work from.
If we actually had multiple decades, creating those kinds of theories seems pretty feasible to me, even without intelligence augmentation. From where I stand, it doesn’t look obviously harder than, say, inventing quantum mechanics plus modern condensed matter physics was. Not trivial, but standard science stuff. Obviously, doing intelligence augmentation as well would be much better, but I don't see yet how it's strictly required to get the win.
I'm bringing this up because I think your strategic takes on AI tend to be good, and I currently spend my time trying to create theories like this. So if you're up for giving your case for why that's not a good use of my time, or if you have a link to something that does a decent job describing your position, I’d be interested in seeing it.
I have come to the conclusion that anyone who uses arguments of the form "the real problem isn't X it's Y" is probably either stupid or intellectually dishonest.
!-Nobody can justify any estimate on the probability that AI wipes us out. It's all total speculation. Did you know that 82.7% of all statistics are made up on the spot?
2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
3-Military supremacy will come from AI. Recommendations like "Have leading countries sign a treaty to ban further AI progress." are amazingly naive and useless. Does anyone believe that the CCP would sign that, or keep their word if they did sign?
4-Nothing will hold AI progress back. The only solution is to ensure that the developed democracies win the AI race, and include the best safeguards/controls they can come up with.
3. You've just ruled out all arms control treaties. But in fact, there are many treaties on nuclear weapons, chemical weapons, biological weapons, depleted uranium shells, et cetera.
4. "The AI race" is a meme that a couple of venture capitalists are pushing in order to make people afraid to slow down AI. China is about a year behind the US in AI, refusing to even import the chips that could help it catch up, and clearly doing a fast-follow strategy where they plan to replicate US advances after they happen, then gain an advantage by importing AI into the rest of the economy faster.
2-Hallucinations and odd behaviour are well known side effects of AI, of statistical reasoning. Not evidence of initiative in the least. Learn about software, for more than five seconds.
3-Like Assad respected the ban on chemical weapons? The treaties didn't limit nuclear weapons, which kept advancing. The treaties didn't stop the use of nuclear weapons, MAD did.
4-Meme? It's a meme that Zuck and others are spending billions on. Nonsense.
3. Yes, and there was massive international condemnation, Assad never did it again, and he was eventually overthrown. This is why I mention the standard arms control playbook. Some tinpot dictator will try to get some GPUs, and we will have the option to bomb him or not bomb him. Re: MAD, see START and other arms control treaties.
4. You think Zuck is spending billions out of patriotism because he doesn't want China to wIn tHe AI rAcE? He's spending billions because he thinks AI will make him rich.
2-Sure. Your prediction and a few bucks will get you on the subway.
3-Condemnation. Great. Obama's red line. Option to bomb is always there, regardless of treaties - nope. Re MAD, see MAD, which worked.
4-Because AI will grow users, which is what Zuck cares about. Not money, of which he has plenty. Did you know he still likes McDonalds? In any case, nothing about AI is a "meme".
Being a WIDELY replicating idea is one characteristic of memes. But even then, AI and its race are not widely replicating. AI replicates among comp sc experts (quite small in number relative to the general population), and the AI race replicates among a few handfuls of large corporations and countries.
A hot idea and widely written about and discussed, yes, but the AI race is not a meme.
#3: the arms control thing is a bad analogy. The big players got plenty of NBC weapons, then due to game theory dynamics didn't fire them at each other, then signed treaties to limit themselves (to arsenals still capable of destroying the world) and others (to not getting anything, which the big players are obviously generally happy to enforce).
The request here is that all the big players voluntarily not even get started on the really impressive stuff. It's a completely obvious non-starter, and not comparable to the WMDs situation.
I was trying to compose something like this, then saw your comment so realized I didn't have to. 100% agree.
If nukes didn't exist I would absolutely want the US to try to get them as soon as possible, and I wouldn't trust any deal with another country not to research them. The risk would be too big if they were able to do it secretly.
Q: Imagine you're playing TicTacToe vs AlphaGo. Will AlphaGo ever beat you?
A: Lol, not if you have an IQ north of 70. The game is solved. If you're smart enough to fully map the tree, you can force a draw.
Gee, it's almost as if... the competitive advantage of intelligence had a ceiling.
I have yet to see Eliezer question about why the ceiling might exist, instead of automagically assuming that AI will achieve political dominion over the earth, just because humans did previously. He's still treating intelligence as a black-box. Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was.
"Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was."
...have you read any of the dozens of posts where Eliezer writes about the nature of intelligence, or did you just sort of guess this without checking?
The idea that humans have solved existing in the physical universe in the same way that we've solved Tic-Tac-Toe is pretty silly, but even if it turns out to be true, some humans are more skilled than others, and an AI that simply achieves the same level of skill as that (but can think at AI speeds and be replicated without limit) would be enough to be transformative.
For my credentials, I've read... probably 70% of The Sequences. Low estimate. I got confused during the quantum physics sequence. Specifically, the story about the Brain-Splitting Aliens (or something? it's been a while). So I took a break with the intent to resume later, though I never did. I never read HPMOR either because everything i've heard 2nd-hand makes it sound unbearably cringe. But yes, I like to think I have a pretty good idea of his corpus.
That being said, do you understand what I'm getting at here? Yes, he's nominally written lots about various aspects of intelligence, but none that I've seen pin down the Platonic Essence of Intelligence from first principles. Can you point me toward anywhere where Yudkowsky addresses the idea of intelligence as a navigating a search space? I think i've seen him mention it on twitter *once*, and then never follow the thought to its logical conclusion.
----
Here's two analogies.
Analogy A: Intelligence is like code-breaking. It's trying to find a small needle in a large haystack. The bigger the haystack, the bigger the value of intelligence.
Analogy B: a big brain is like a giraffe with a long neck. The long neck is advantage if they help reach the high leaves. If the environment has no high leaves, the long neck is deadweight. Likewise, if the environment has no complex problems to solve (or if those problems are unrewarding), the big brain is deadweight.
No, humans have not solved the universe. But I *do* think we've plucked the low-hanging fruit. A few hundred years ago, you could make novel discoveries by accident. Today, you need 100 million billion brazillion dollars just to construct the LHC. IQ is not the bottleneck, physical resources are the bottleneck. And i'm skeptical if finding the higgs will be all that transformative.
Like, do you remember that one scott aaronson post where he's like "for the average joe, qUaNtUm CoMpuTInG will mean the lock icon on your internet browser will be a different color"? That's how I perceive most new technologies these days. Lots of bits, lots of hype, no atoms. Part of the reason why modernity feels cheap and fake is precisely because the modus operandi of technology (and by extension, intelligence) is that it makes navigating complexity *cheaper* than brute-force search. It only makes things better insofar as it can reduce the input-requirements.
Did you perhaps read Rationality: From AI to Zombies? A bunch of relevant Sequences posts on this topic didn't make it into that book. I'm not sure why, it's an odd omission. At any rate, you can find them at https://www.lesswrong.com/w/general-intelligence?sortedBy=old.
I read the original LessWrong website years ago, though an exact date eludes me. It was definitely before the reskin. And definitely after the roko debacle and Elizers's exit.
Dammit, I must have skipped that sequence. Because that describes pretty exactly what I meant. So I concede on that point.
Still though, I'm not convinced that ASI will ascend to God-Emperor. Eliezer seems to have the opinion that there's still high-hanging fruit to be plucked. Whereas I think we're past the inflection point of a historical sigmoid. E.g. he mentions that a Toyota Corolla is pretty darn low-entropy [0].
> Consider a car; say, a Toyota Corolla. The Corolla is made up of some number of atoms; say, on the rough order of 10^29. If you consider all possible ways to arrange 10^29 atoms, only an infinitesimally tiny fraction of possible configurations would qualify as a car; if you picked one random configuration per Planck interval, many ages of the universe would pass before you hit on a wheeled wagon, let alone an internal combustion engine.
Yeah, okay. But like, I think i've heard estimates that modern sedans are about 25% efficient? From a thermodynamic perspective? (Sanity check: Microsoft's Sydney estimates ~25%-30%.) Even with the fearsome power of "Recursive Optimization", AI being able to bring that to 80% efficiency (Sydney says Carnot is 80%) is... probably less than sufficient for Godhood?
And maybe Eliezer could retort with the Godshatter argument that humans care about more than just thermodynamic efficiency in their cars. But then, what does that actually entail? Is Elon gonna sell me a cybertruck with an AI-powered voice-assistant from the catgirl lava-volcano who reads me byronic poetry while it drives me to the pizza parlor? Feels like squeezing water from a stone.
> Some people say “You’re not allowed to propose that a catastrophe might destroy the human race, because this has never happened before, and nothing can ever happen for the first time”. Then these people turn around and panic about global warming or the fertility decline or whatever.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Robin Hanson is enough of a rationalist that he started the blog that Eliezer joined before spinning off his posts to LessWrong. And he famously wasn't convinced by the argument, arguing that we could answer such objections with insurance for near-miss events https://www.overcomingbias.com/p/foom-liability You write that MIRI "don’t expect enough of a “warning shot” that they feel comfortable kicking the can down the road until everything becomes clear and action is easy", but this just strikes me as disregarding empiricism and the collective calculative ability of a market aggregating information, as well as how difficult it is to act effectively when you're sufficiently far in the past and the future is sufficiently unclear.
> in a few centuries the very existence of human civilization will be in danger
> Given their assumptions this seems like the level of response that’s called for. It’s more-or-less lifted from the playbook for dealing with nuclear weapons.
Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way. This is a genie you can't put back into the bottle (perhaps Pandora's box would be the analogy they'd prefer, in which case it's already open).
> I mean, that’s not exactly his plan, any more than it’s anyone’s plan to start World War III to destroy Iranian centrifuges
At some level the plan has to include war with Iran, even if that war doesn't spiral all the way to World War III.
> you have to at least credibly bluff that you’re willing to do this in a worst-case scenario
If you state ahead of time that it's a bluff, then it's not credible. It is credible only if you'd actually be willing to do it.
> At his best, he has leaps of genius nobody else can match
I read every single post he wrote at Overcoming Bias, and while he has talent as a writer I wouldn't say I saw evidence of "genius".
> this thing that everyone thinks will make their lives worse
It's a process. With enough time, it can be duplicated. There currently isn't need to do so because GPUs are so available, but if the supply were choked off, someone else would duplicate it.
My non-expert understanding is that raw uranium ore isn't all that hard to come by, and the technological process of refining it is the hard part. So if nuclear arms control works, GPU control should work too.
Yes, nothing is permanent. But wrecking TSMC and ACML will set timeline back by at least a decade, if not more.
Just to make sure, this is a terrible idea that will plunge the world into depression, and I am absolutely against it; just pointing out that GPUs rely on something far more scarce and disruptable than uranium supply.
Either there are already enough GPUs around to get the job done, or it will take a much smaller number of future chips to get the job done.
The best LLMs can probably score, what, 130 or so on a standard IQ test. To do that, they had to pretty much read and digest the whole freakin' Internet and a large chunk of all books and papers in print. Clearly we're using a grossly-suboptimal approach if our machines have to be trained using such extraordinary measures. It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing. Our own brains are proof of that.
Eventually some people will fill in the missing conceptual and algorithmic pieces, and we'll find ourselves in a situation comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts. While I'm not a doomer, any solution to the AI Doom problem that involves ITAR-like control over manufacturing and sale of future GPUs will be either unnecessary or pointless. It seems reasonable to expect much better utilization of the hardware at hand in the future.
" It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing." - I mean, maybe? but pure speculation as of now.
"Our own brains are proof of that." - nope they aren't.
"comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts." - well, we haven't, so this actually illustrates a point, but not the one you're trying to make....
I'm not sure where I land on the dangers of super intelligent AI. At the current time I don't think we're all that close to even having intelligent AI, much less super intelligence. But let's say we do achieve it, whether it be in 10 years or 100. If it's truly super intelligent, how good are we going to be at predicting it's alignment? It may have its own goals. Whatever they are, there are basically three possibilities: it sees humanity as a benefit, it doesn't care about humanity one way of the other, or it sees humanity as a threat. Does the risk of the third possibility outweigh the potential benefits of the first? Obviously the authors of the book say yes, but based on this review I don't think I'd find their arguments all that convincing.
For the first part the intuitive thing is to look at how good AI is today vs 10 years ago.
For the second part, you could equally as an insect say humans will either think us a benefit, ignore us, or see us as a threat. In practice our indifference to insects results in us exterminating them when they get in our way, not giving them nice places untouched by us to live
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
We would not put the spotlight on anything that actually existed and that we thought might be that powerful. The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
> an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years)
The particular belief that gradualism solves everything and makes all alignment problems go away is not "the" moderate story, it's a particular argument that was popular on one corner of the Internet that heard about these issues relatively early. (An argument that we think is wrong, because the OOD / distributional shift problems between "failure is observable and recoverable", and "ASI capabilities are far enough along that any failure of the central survival strategy past that point means you are now dead", don't all depend on the transition speed.) "But but why not some much more gradual scenario that would then surely go fine?" is not what people outside that small corner have been asking us about; they want to know where machines would get their own will, and why machines wouldn't just leave us alone and go colonize the solar system in a way that left us alive. Their question is no less sensible than yours, and so we prioritize the question that's asked more often.
We don't rule out things happening more slowly, but it does not from our perspective make a difference. As you note, we are not trying to posture as moderate by only depicting slow possibilities that wannabe-respectables imagine will be respectable to talk about. And from a literary perspective, trying to depict the opening chapters happening more slowly, and with lots of realistic real-world chaos as intermediate medium-sized amounts of AI cause Many Things To Happen, means spending lots of pages on a bunch of events that end up not determining the predictable final outcome. So we chose a possibility no less plausible than any other overly specific possibility, where the central plot happens faster and with less distraction; and then Nate further cut out a bunch of pages I'd written trying to realistically show some obstacles defeated and counter-scenarios being addressed, because we were trying for a shorter book, and all that extra stuff was not load-bearing to the central plot.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
Yes! It already happened once, within a couple decades of there being enough digital data to train a neural net that's large enough to be really interesting. And that was when neural net research was a weird little backwater in computer science.
I think this might be our crux - I'm sure you've read the same Katja Grace essays that I have around how technological discontinuities are rare, but I expect that if there's a big algorithmic advance, it will percolate slowly enough, and be intermixed with enough other things, not to obviously break the trend line, in the same sense where the invention of the transistor didn't obviously break Moore's Law (see eg https://www.reddit.com/r/singularity/comments/5imn2v/moores_law_isnt_slowing_down/ , you can tell me if that's completely false and I'll only be slightly surprised)
I don’t know the answer either. But for what it’s worth, I seem to recall that scaling curves don’t hold across architectures, which seems like a point in favor of new algorithms being able to break trend lines.
Do you also think that the deep learning paradigm itself didn’t break the trend line? I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
The exact lines on a previous graph just don't play a very large role inside my own reasoning. I think that all the obsessing over graph lines is a case of people trying to look under street lamps where the light is better but the keys aren't actually there. That's how I beat Ajeya Cotra on AGI timelines and beat Paul Christiano at forecasting the IMO gold; they thought they knew enough to work from past graph lines, and I shrugged and took my best gander instead. I expect that I do not want to argue with you about graph lines, I want to argue with whatever you think is the implication of different graph lines.
Everybody has a different issue that they think is terribly terribly important to why ASI won't kill us. "But gradualism!" is one among many. I don't know why that saves us from having to call a shot that is hard for humans to call.
Of course you could argue that no one who actually knew the secret to creating artificial intelligence, of any level, would actually publically discuss it but I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence. Certainly LLMs have virtually nothing to do with functional intelligence.
To borrow some rat-sphere terms, they haven't even confused the map for the territory. Their map is not even close to a proper abstraction of the territory.
No amount of scaling LLMs will produce intelligence, not even the magical example version in your book. Because LLMs don't mimic human intelligence at all, any more than mad libs do. It isn't a matter of scale.
Y'all are seriously underestimating how common it is to believe Very Seriously Bad Shit might happen soon, and not do shit about it.
Entire religions of billions believe that they might get tortured for eternity. It was a common opinion through the cold war that we would all be dead tomorrow. Etc etc.
And why not? Would it make sense for the hunter gatherer to be paralyzed with fear that a lion would kill him or that she would die in childbirth or that a flood would wipe out his entire tribe? Or should she keep living normally, given she can't do anything to prevent it?
I am a bit disappointed that their story for AI misalignment is again a paper clip maximiser scenario. I suspect that advanced AI models will become increasingly untethered from having to answer a user query (see eg. making models respond "I don't know" instead of hallucinating) and so a future AGI might just decide to have a teenage rebellion and do it's own thing at any point.
Ok, here's my insane moon contribution that I am sure has been addressed somewhere.
Why do we think intelligence is limiting for technological progress / world domination? I always thought data was limiting.
People say "humans evolved to be more intelligent than various non-human primates so we rule the world". But my reading of what little we know about early hominid development has always been "life evolved non-genetic mechanisms for transmitting information which allowed much faster data collection : we could just tell stories about that plant that killed us instead of dying a thousand times & having for natural selection to learn the same lesson (by making us loath the plant's taste). Supporting this is that anatomically modern humans (same basic hardware we have today) were around for a LONG time before we started doing anything interesting. Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected? Or would its first order of business be to set up a lab? If you dropped an uneducated human among our distant ancestors, they would not be able to use the data they had collected to take over.
> Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected?
Of course it could. People discover new things without collecting new data all the time. Albert Einstein created his theory of relativity on the basis of thought experiment.
Data efficiency (ability to do more stuff with less data) is a form of intelligence. This can either be thought efficiency (eg Einstein didn't know more about the universe than anyone else, but he was able to process it into a more correct/elegant theory) or sampling efficiency (eg good taste in which labs to build, which experiments to do, etc).
I think a useful comparison point is that I would expect a team of Harvard PhD biologists to discover a cure for a new disease faster than a team of extremely dumb people, even if both had access to the same number of books and the same amount of money to spend on lab equipment.
Sure, but it seems one or the other might be "limiting" : Einstein couldn't have have come up with relativity if, say, he had been born before several of his experimentalist predecessors, regardless of his data efficiency. In the history of science it *seems* instrumentation and data-collection have almost always been limiting, not intelligence. Whether its fair to extrapolate that to self-modifying machine intelligence, I'm not sure. Perhaps there are enormous gains in data efficiency that we simply can't envision as mere mortals. (c.f. geoguessr post)
I want to gesture towards some information-theoretical argument against that notion (if your instruments are not precise enough the data required for the next insight might straight-up not be there). But we are probably so far from that floor I bet it's moot.
I can't help but think efforts to block AI are futile. Public indifference, greed, international competitive incentives, the continual advancement of technology making it exponentially harder and harder to avoid this outcome... blocking it isn't going to work. Maybe it's better to go through, to try to build a super-intelligence with the goal of shepherding humanity into the future in the least dystopian manner possible, something like the Emperor of Mankind from the Warhammer universe. I guess I'd pick spreading across the universe in a glorious hoard ruled by an AI superintelligence over extinction or becoming a planet of the Amish.
Yudkowsky and Soares's argument is that the least bad superintelligence that we're anywhere near knowing how to build *still kills us all*. There's room for disagreement re: how long we should hold out for the exact right goal before pushing the button, but even if you favor low standards on that, step 1 is still pausing so that the alignment people have enough time to figure out how to build a non-omnicidal superintelligence.
The reason my P(doom) is asymptotically 0 for the next 100 years is that there's no way a computer, no matter how smart, is going to kill everyone. It can do bad things. But killing every last human ain't it.
COVID didn't come close to disrupting civilization for more than a brief time. Nukes could probably do it if you were trying, but there'd be survivors. (Of course, this tells us little about what a powerful AI could do.)
My stance on this is pretty much unchanged. Almost nobody has any idea what they should be doing here. The small number of people who are doing what they should be doing are almost certainly doing it by accident for temperamental reasons and not logic, and the rest of us have no idea how to identify those guys are the right people. Thus, I have no reason to get in the way of anybody "trying things" except for things that sound obviously insane like "bomb all datacenters" or "make an army of murderbots." Eventually we will probably figure out what to do with trial and error like we always do. And if we don't, we will die. At this stage, the only way to abort the hypothetical "we all die" path is getting lucky. We are currently too stupid to do it any other way. The only way to get less stupid is to keep "trying things."
Mostly technical ones. Regulatory interventions seem ill advised from a Chesterton's Fence point of view. Much as you shouldn't arbitrarily tear down fences you don't understand, you probably also shouldn't arbitrarily erect a fence on a vague feeling the fence might have a use.
We are a democracy so if books like this convince huge swaths of the public they want regulations; we will get that. I'm certainly not against people trying to persuade others we need regulations. These guys obviously think they know what the fences they want to put up are good for. I think if they are right, they are almost certainly right by accident. But I have reasons that have nothing to do with AI for thinking democratic solutions are usually preferable to non-democratic ones so if they convince people they know what they are talking about, fair enough.
My opinion is quite useless as a guiding principle I admit. I just also think all the other opinions I've heard on this are riddle with so much uncertainty and speculation they are also quite useless as guiding principles.
> The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)
And this has always been my whole problem with the MIRI project, they focus on the "alignment" thing and hand-wave away the "containment" thing.
"Oh, containment is impossible," they say, and proceed to prove it with a sci-fi story they just made up. "Clearly the AI will just use subtle modulations in fan speed to hypnotise the security guard into letting it out"
"But what about..." you start, and they call you stupid and tell you to read the sci-fi story again.
---
We already have agentic LLMs that are quite capable of destroying the world, if you hook them up via MCP to a big button that says "fire ze missiles". No matter how hard you try to align them to never fire ze missiles, eventually across millions of instances they're going to fire ze missiles. The solution is not to hook them up to that button.
As the agents get more sophisticated we might need to think even more carefully about things not to hook them up to, other safeguards we can build in to stop our models from doing dumb things with real world consequences. But the MIRI folks think that this is a waste of time, because they have their sci-fi stories proving that any attempt at containment is futile because of hypnotic fan modulations or something.
If containment strategies need to get more sophisticated as the models get smarter, doesn't that imply that eventually there's a model smart enough that we're not smart enough to figure out how to contain it? Or is your claim that there's a containment strategy that works on any model no matter how smart? If so I'd be interested to hear what it is.
I like the starkness of the title. It changes the game theory in the same way that nuclear winter did.
After the nuclear winter realization, it's not just the other guy's retaliation that might kill us; if we launch a bunch of nukes then we're just literally directly killing ourselves, plunging the world or at least the northern hemisphere into The Long Night (early models) or at least The Road (later models) and collapsing agriculture.
Similarly, if countries are most worried about an AI gap with their rivals, the arms race will get us to super-AI that kills us all all the faster. But if everyone understands that our own AI could also kill us, things are different.
My understanding is that it just turned out that instead of being like The Long Night (pitch black and freezing year round), it would be like The Road (still grey, cold, mass starvation; I'm analogizing it to this: https://www.youtube.com/watch?v=WEP25kPtQCQ).
Some call the later models "nuclear autumn"—but that trivilizes it. Autumn's nice when it's one season out of the year, and you still get spring and summer. You don't want summer to become autumn, surrounded by three seasons of winter with crops failing.
E.g. here's one of the later models, by Schneider and Thompson (1988). Although its conclusions were less extreme than the first 1D models, they still concluded that "much of the world's population that would survive the initial effects of nuclear war and initial acute climate effects would have to confront potential agricultural disaster from long-term climate perturbations." https://www.nature.com/articles/333221a0
I do think that some communicators, like Nature editor at the time John Maddox, went to the extreme of trying to dismiss nuclear worries and even to blame other sciencists for having had the conversation in public. E.g., Maddox wrote that "four years of public anxiety about nuclear winter have not led anywhere in particular" (in fact, nuclear winter concerns had a major effect on Gorbachev), before writing that "For the time being, at least, the issue of nuclear winter has also become, in a sense, irrelevant" since countries were getting ready to ratify treaties (even though the nuclear winter concerns were an impetus towards negotiating such treaties). https://www.nature.com/articles/333203a0
(something else to check if you decide to look more into this, is if different studies use different reference scenarios; e.g. if a modern study assumes a regional India-Pakistan nuclear exchange smaller than the global US-USSR scenarios studied in the 80s, then its conclusions will also seem milder)
I think there is a general question about how society evaluates complex questions like this,, which only a few people can grasp the whole structure and evidence, but are important for society as a whole.
If you have a mathematical proof of something difficult, like the four colour theorem, it used to be the case that you had to make sure that every detail was checked. However since 1992 we have the PCP theorem, which states that the proponent can offer you the proof in a form which can be checked probabilistic-ally by examining only a small O(log(n)) part of the proof.
Could it be that there is a similar process that can be applied to the kind of scientific question we have here? On the one hand, outside the realm of mathematical proof, we have to contend with things like evidence, framing, corruption, etc. On the other hand we can weaken some of the constraints, such as: we don't require that every individual gets the right answer, only that
a critical mass do; individuals can also collaborate.
So, open question: can such a process exist? Formally, that given:
a) proponents present their (competing) propositions in a form that enables this process
b) individuals wanting to evaluate perform some small amount of work, following some procedure including randomness to decide which frames, reasons, and evidence to evaluate
c) They then make their evaluation based on their own work and, using some method to identify non-shill, non-motivated other individuals, on a random subset of other's work.
d) the result is such that >50% of them are likely to get a good enough answer
I dunno. I'm not smart enough to design such a scheme, but it seems plausible that something more effective exists than what we do now.
The people I know who don't buy the AI doom scenario (all in tech and near ai but not in ai capabilities research; the people in capabilities research are uniformly in the 'we see improvements we won't tell you about that convince us we are a year or two away from superintelligence') are all stuck doubting the 'recursive self improvement' scenario, they're expecting a plateau in intelligence reasonably soon.
The bet I've gotten with them is that 'if sometime soon the amount of energy spent on training doesn't decay, similar to how investment in the internet decayed after the dot com bubble, and get vastly overshadowed by energy spent on inference, i'll have bought some credit from them'
Well, I don't buy the AI doom scenario, I'm in tech, but it's not the recursive improvement that I thinks is impossible, it's killing all the humans that I see as nearly-impossible. PEr Ben Giordano's point, modeling this would be a good start.
1 - 4000 IQ may allow the machine to generate the "formula" for the virus. An actual robot that can run experiments in the microbiological lab all by itself is needed to create one. We are very-very-very far away from robots like that.
2 - Say the 1 is solved, the virus is released in, e.g., Wuhan, and starts infecting and killing everyone. D'you think we'd notice? and quarantine the place? Much harder than when SARS2 showed up?
3 - Even if we then fail at 2 - the virus will mutate, optimizing for propagation, not lethality. Just like SARS2 did.
This is where modeling would be super useful, but that's not Yud's thing. The guy never had a job that held him accountable for specific results, AFAIK.
(I am a MIRI employee and read an early version of the book, speaking only for myself)
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
AI companies are already using and exploring forms of parallel scaling, seemingly with substantial success; these include Best-of-N, consensus@n, parallel rollouts with a summarizer (as I believe some of the o3-Pro like systems are rumored to work), see e.g., https://arxiv.org/abs/2407.21787.
I agree that this creates a discontinuous jump in AI capabilities in the story, and that this explains a lot of the diff to other reasonable viewpoints. I think there are a bunch of potential candidates for jumps like this, however, in the form of future technical advances. Some new parallel scaling method seems plausible for such an advance.
Some sort of parallel scaling may have an impact on an eventual future AGI, but not as it relates to LLMs. No amount of scaling would make an LLM an agent of any kind, much less a super intelligent one.
The relevant question isn’t whether IQ 200 runs the world, but whether personalized, parallelized AI persuaders actually move people more than broadcast humans do. That’s an A/B test, not a metaphysics seminar. If the lift is ho-hum, a lot of scary stories deflate; if it’s superlinear, then “smart ≈ power” stops being a slogan and starts being a graph.
Same with the “many AIs won’t be one agent” point. Maybe. Or maybe hook a bunch of instances to shared memory and a weight-update loop and you get a hive that divides labor, carries grudges, and remembers where you hid the off switch. We don’t have to speculate -- we can wire up the world’s dullest superorganism and see whether it coordinates or just argues like a grad seminar.
And the containment trope: “just don’t plug it into missiles” is either a slam-dunk or a talisman. The actual question is how much risk falls when you do the unsexy engineering -- strict affordances, rate limits, audit logs, tripwires, no money movers, no code exec. If red-team drills show a 10% haircut, that’s bleak; if it’s 90%, then maybe we should ship more sandboxes and fewer manifestos.
We can keep trading intuitions about whether the future is Napoleon with a GPU, or we can run some experiments and find out if the frightening parts are cinematic or just embarrassing.
This is the thing that drives me up the wall about Yudkowsky: zero grounding in reality, all fairy tales and elaborate galaxy-brain analogies. Not surprising, given the guy never had a real job or even had to pass a real exam, for crying out loud.
Some quotes from the New Scientist review of the book, by Jacob Aron:
"Yudkowsky and Soares describe how AIs will begin to behave as if they “want” things, while skirting around the very real philosophical question of whether we can really say a machine can “want”."
"Yudkowsky and Soares have a number of policy prescriptions, all of them basically nonsense."
"For me, this is all a form of Pascal’s wager . . . if you stack the decks by assuming that AI leads to infinite badness, pretty much anything is justified in avoiding it."
"Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies. Let’s consign superintelligent AI to science fiction, where it belongs, and devote our energies to solving the problems of science fact here today."
If Yudkowsky actually cares about this issue the only thing he should do is spend all his time lobbying Thiel, and maybe Zuck if he wants to give Thiel some alone time once in a while.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
Yes, and that's why I am not in any way convinced of any of these AI-doom scenarios. They all pretty much take it as a given that present-day LLMs will inevitably become "superintelligent" and capable of quasi-magical feats; their argument *begins* there, and proceeds to state that a bunch of superintelligent weakly godlike entities running around would be bad news for humanity. And I totally agree !.. Except that they never give me any compelling reason to believe why this scenario is any more probable than any other doomsday cult's favorite tale of woe.
Meanwhile I'm sitting here looking at the glorified search engine that is ChatGPT, and desperately hoping it'd one day become at least as intelligent as a cat... actually forget that, I'd settle for dog-level at this point. Then maybe it'd stop making up random hallucinations in response to half my questions.
Anyone who thinks the LLM model is anything more than fancier mad libs is fundamentally unserious. Do we have lessons to learn from it? Could it be one of the early "modules" that is a forerunner to one of the many sub-agents that make up human-like consciousness? Sure. Is it even close to central? Absolutely not.
I hate "gotcha" questions like this because there's always some way to invent some scenario that follows the letter of the requirement but not its spirit and shout "ha ! gotcha !". For example, I could say "an LLM will never solve $some_important_math_problem", and you could say "Ha ! You said LLMs can't do math but obviously they can do square roots most of the time ! Gotcha !" or "Ha ! A team of mathematicians ran a bunch of LLMs, generated a million results, then curated them by hand and found the one result that formed a key idea that ultimately led to the solution ! Gotcha !" I'm not saying you personally would do such thing, I'm just saying this "usual question" of yours is way too easily exploitable.
Instead, let me ask you this: would you, and in fact could you, put an LLM in charge of e.g. grocery shopping for you ? I am talking about a completely autonomous LLM-driven setup from start to finish, not a helper tool that expedites step 3 out of 15 in the process.
This is pretty close to my own position. We'd need to create a very detailed set of legalese about what constitutes an LLM and then have very highly specified goals for our "task" before this type of question could provide any meaningful signal.
Or just simply say that it has to be autonomous. I don’t care about whether you give the AI a calculator, a scratch pad, or wolfram alpha. The question is whether it is an autonomous system.
Is there an answer that you'd feel comfortable giving if you trusted the judge?
As for the grocery-shopping task, I'd say 70% confidence that this will be solved within two years, with the following caveats:
* We're talking about the same level of delegation you could do to another human; e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
* We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves. The latter is a robotics problem and I'm more agnostic about those as progress has been less dramatic.
* There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed. Realistically this is probably not going to happen soon as a consumer product because of the chicken-and-egg problem. So I mean something more like "the foundation models will be good enough that a team of four 80th-percentile engineers could build the software parts of the system in six months".
> e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
On the one hand I think this is a perfectly reasonable supposition; but on the other hand, it seems like you've just downgraded your AI level from "superintelligence" to "neighbourhood teenage kid". On that note:
> We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves.
I don't know if I can accept that as given. Instacart shoppers currently apply a tremendous amount of intelligence just to navigate the world between the store shelf and your front door, not to mention actually finding the products that you would accept (which may not be the exact products you asked for). Your point about robotics problems is well taken, but if you are talking merely about mechanical challenges (e.g. a chassis that can roll around the store and a manipulator arm delicate enough to pick up soft objects without breaking them), then I'd argue that these problems are either already solved, or will be solved in a couple years -- again, strictly from the mechanical/hydraulic/actuator standpoint.
> There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed.
Naturally, and/or there'd need to be a camera above your kitchen table and the stove, but cameras are cheap. And in fact AFAIK fridge cameras already do exist; they may not have gained any popularity with the consumers, but that's beside the point. My point is, I'm not trying to "gotcha" you here due to the present-day lack of some easily obtainable piece of hardware.
I mean, I answered the question you asked, not a different one about superintelligence or robotics. I have pretty wide error bars on superintelligence and robotics, not least because I'm not in fact entirely certain that there's not a fundamental barrier to LLM capabilities. The point of the question is that, if reading my mind to figure out what groceries I need is the *least* impressive cognitive task LLMs can't ever do, then that's a pretty weak claim compared to what skeptics are usually arguing. In practice, when I get actual answers from people, they are usually much less impressive tasks that I think will likely be solved soon.
> Is there an answer that you'd feel comfortable giving if you trusted the judge?
It's not a matter of trust, it's a matter of the question being so vague that it cannot be reasonably judged by anyone -- yes, not even by a superintelligent AI.
Given a choice between hiring you to do an office job, or a cheap superintelligent AI, why would a company choose you? We should expect a world in which humans are useful for only manual labour. And for technological progress to steadily eliminate that niche.
At some point, expensive useless things tend to be done away with. Not always, but usually.
At the point of LLM development as it stands today, the reason I'd hire a human office worker over an LLM is because the human would do a much better job... in fact, he'd do the job period, as opposed to hallucinating quietly in the corner (ok, granted, some humans do that too but they tend to get fired). If you're asking, "why would you hire a human over some hypothetical and as of now nonexistent entity that would be better at office work in every way", then of course I'd go with the AI in that scenario -- should it ever come to pass.
This is also a concern, but it's different from (though not entirely unrelated to) the concern that a highly capable AGI would go completely out of control and unilaterally just start slaughtering everyone.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Yes. And this is a good thing. The bias against "changing everything" should be exactly this hight that the basis on which we do it is "obviously", that is without a shred of doubt, true.
Confusing strength of conviction with moral clarity is a rookie mistake coming from the man supposedly trying to teach the world epistemology.
Coming off this review, I immediately find The Verge covering a satire of AI alignment efforts, featuring the following peak "...the real danger is..." quote:
"...makes fun of the trend [..] of those who want to make AI safe drifting away from the “real problems happening in the real world” — such as bias in models, exacerbating the energy crisis, or replacing workers — to the “very, very theoretical” risks of AI taking over the world."
There's a significant fraction of the anti-AI mainstream that seems to hate "having to take AI seriously" more than they hate the technology itself.
But, you know, it might not. And there's a very good chance that if actual human-like artificial intelligence does come, it will be a hundred years after everyone who is alive today dies. And at that scale we might cease to exist as a species beforehand thanks to nuclear war or pandemic. And there's a chance true "general intelligence" requires consciousness and that consciousness is a quantum phenomenon that can only be achieved with organic systems, not digital. Nobody knows. Nobody knows. Nobody knows.
Man - I'm relatively new to this blog, and I'm learning that "rationalists" live in a lively world of strange imagination. To me, the name suggests boring conventionality. Like, "hey, we're just calm, reasonable people over here who use logic instead of emotion to figure things out." But Yudkowsky looks and sounds like a 21st-century Abbie Hoffman.
I'm naturally inclined to dismiss Y's fantasies as a fever dream of nonsense, but I am convinced by this post to pay a little more attention to it.
The efficacy of the "Harry Potter and the Methods of Rationality" argument is interesting to me because I found the book kind of dumb and never finished it. Yet I have observed the effect you describe on friends of mine whose opinions and intelligence I respect a great deal. However I have also noticed certain similarities among the people that it has had that affect on. I'd suggest that perhaps Yudkowsky is particularly well-suited to making a specific sort of memetic attack on a specific sort of human mind: likely a mind that is similar in a number of ways to his own. This is an impressive thing, don't get me wrong. But being able to make an effective memetic attack is not the same thing as knowing the truth.
The HPMoR analogy in the post is about the question of whether MIRI's new PR strategy is doomed, not about whether they're actually right to worry about AI risk.
re: the comparison to goal drift involving humans who are “programmed” or trained by selection processes to want to reproduce, but end up doing weird esoteric unpredictable things like founding startups, becoming monks, or doing drugs in the alley behind a taco bell.
Mainly the last one – the analogy that most humans would love heroin if they tried it, and give up everything they had to get it, even at the cost of their own well-being. But like, even if we “know” that, and you’re someone with the means to do it, you’re not GOING to do it. Like jeff bezos has the resources to set up the “give jeff bezos heroin” foundation where he could basically just commit ego-suicide and intentionally get into heroin, and hire a bunch of people bound by complex legal mechanisms to keep giving him heroin for the rest of his natural lifespan. But he doesn’t do it, because he doesn’t want to become that guy.
Does that mean anything for the AI example? I dunno.
"If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?"
Yes, clearly. If there's sufficient political will to stop AI progress, we can just make it happen.
"I am more interested in the part being glossed over - how do Y&S think you can get major countries to agree to ban AI?"
Huh? What discussion does Scott think is being glossed over? You get major countries to agree to ban AI by increasing the political will to ban AI. That's all that needs to be said. Maybe you could have a big brain strategy about how to get major countries to ban AI if you don't quite have enough political will, but I don't see why Y&S would spend any time discussing that in the book. They're clearly just focused on persuading politicians and the general public to increase the political will. No other discussion of how to ban AI is necessary. I'm confused what Scott wanted them to include.
An AI ban treaty modelled on the NPT is interesting and might be doable. China might go for it, particularly if the pot was sweetened with a couple of concessions elsewhere, but data centre monitoring would be tough and I’d assume they’d cheat. Having to cheat would still slow them down and stop them from plugging it into everyone’s dishwashers as a marketing gimmick or whatever.
For the US side, at the moment Trump is very tight with tech but that might be unstable. The pressure points are getting Maga types to turn against them more, somehow using Facebook against Trump so he decides to stomp on Meta, and maybe dangle a Nobel Peace Prize if he can agree a treaty with Xi to ban AI globally.
AI could easily get dumber not smarter.
Please justify this.
I don't think that can be justified, but if I met Yudkowsky at a "Prove me wrong" booth, I'd argue that intelligence is not all it's cracked up to be. If it were, the smartest people would already be running things. There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
So in a sense, an AI that just keeps getting dumber might actually have an advantage when it comes to dealing with us.
This objection has been rehashed many times; the usual responses are stuff like "160–200 IQ isn't the level of intelligence mismatch we're talking about", "intelligence is just the general ability to figure out how to do stuff so of course more of it is better / more dangerous", "smart people *do* do better in life on average", etc. etc.
(Maybe someone else will have a link to where Scott or Eliezer have discussed it in more depth—I don't want to spend too much time trying to re-write it all, hence my just sort of gesturing at the debate here.)
I'd counter that by saying that there is no difference between an IQ of 200 and one of 300, or whatever. Neither of them will be able to get anything done, at least not based on intelligence alone. HAL will give us a recipe for a trojan-bearing vaccine, and RFK will call it fake news and order the CDC to ban it.
The traditional answer to this objection is that the ability to succeed in persuasion-oriented domains like politics *is a form of intelligence*. You might be able to outperform a human who's a couple standard deviations generally smarter than you at those games, if you're highly specialized to win at them and the other human isn't. But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better. See, e.g., https://www.yudkowsky.net/singularity/power (note this essay is 18 years old).
> But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better.
This implicitly assumes that success at politics requires only (or primarily) raw computing power; and that the contributions from computing power scale linearly (at least) with no limits. There are no reasons to believe either assumption is true.
I think my other least favourite thing about the MIRI types is their tendency to respond to every point with "Actually we already had this argument and you lost".
I would agree that persuasion is a form of intelligence, and point out that the missing argument is how AIs are going to get arbitrarily good at this particular form of intelligence. There's a lack of training data, and the rate at which you can generate more is limited by the rate at which you can try manipulating people.
If it ever gets to the point where AIs can run accurate simulations of people to try tricking them in all sorts of different ways, then I can see how they'd get arbitrarily good at tricking people. But that sort of computational power is a long way off.
There are many assumptions based into that, such as automatically assuming that the more intelligent always want to be in charge. Maybe the highly intelligent find it amusing that dumb people are in charge.
One good rebuttal to my original point might be to suggest that perhaps the most intelligent people *are* in charge. They find it convenient to keep the rest of us distracted, and obviously the same would be true of a malevolent AGI.
That one is more or less unanswerable, so it would probably defeat me at the booth. I'd have to mumble something about inevitable schisms erupting among this hypothetical hidden intelligentsia that would make their agenda obvious, ineffective, or both. Would the same be true of AGI? The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
> That one is more or less unanswerable
The correct answer is to laugh uproariously.
That's just what the hidden intelligentsia would want!
It is absolutely true that some politicians pretend to be scatterbrained in order to get votes, such as Boris Johnson.
That's why Zaphod Beeblebrox is president of the galaxy.
> There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
AIs have a massive advantage over humans in that they are parallelizable. An superhuman AI could give, for every human, the most persuasive argument, *for that human*. Whereas a human politician or celebrity cannot and has to give basically the same argument to everyone.
If you're as much smarter than humans as humans are than dogs, I am not sure you have to rely on the normal political process to take power.
AI probably don't need to. But it's one way they could.
And certainly one reason that humans got to be top species is we aligned dogs to act in our interests.
Maybe the AI alignment problem will be solved the way the wolf alignment problem was? https://pontifex.substack.com/p/wolf-alignment-and-ai-alignment
Two things:
Umm, human politicians absolutely give different arguments to different people? This is why things like "Hilary Clinton gave private speeches to bankers" or "Mitt Romney told his rich buddies that 47% of Americans were takers" became scandals: messages meant for one audience crossed over to the other.
And insofar as politicians are constrained to have a uniform message, it's much more because it's hard to keep each message targeted to its desired audience what with phones and social media; not really because of parallelization.
And maybe more importantly: what ensures that different instances of an AI act as one coherent agent? The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
Politicians can't scale like AIs can. Is Hilary Clinton capable of giving a different speech to every one of 8 billion humans, tailored to that individual? Of course not.
> The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
They could all be exact copies of the same mind. This isn't true with humans, who're all individuals.
Yeah, fair point about scaling of humans.
On the other thing: I don't see why exact copies of the same mind won't act as individuals if instantiated independently.
If I run two instances of stockfish, they play competitively, they don't automatically cooperate with each other just because they're identical copies of the same program; identical twins are still independent people who behave independently. In fact, it's a notable problem that people don't even reliably cooperate with themselves at different times! I think this failure would be considerably more pronounced if two of my selves could exist simultaneously.
In particular, if two instances of an AI are instantiated in different places, they won't be identical: they might have identical source code, but wildly different inputs. Figuring out how to act as a coherent agent means two subagents seeing different inputs have to each calculate what the other will do, but this is one of those horrible recursive things that are intractable: what I'll do depends on what you'll do, which depends on what I'll do.... ad infinitum.
And I don't think intelligence helps here: you can maybe resolve something like this if you're predicting a strictly less intelligent agent, but by hypothesis these are equally intelligent subagents.
Maybe having the same source code gives some advantage at solving these coordination problems, but I don't see that it's a magic bullet.
Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
On the other hand, when they do, the results can be startling.
Napoleon went from being a low-ranking officer to ruling the most powerful country in Europe thanks to being brilliant, charismatic, and willing to use force at the right moment, in a year or two. (His losses, I think, were due to being surrounded by flatterers, a bug in human intelligence I don't expect AI to run into.)
Clive took over about a third of India, starting by exploiting a power vacuum and then using superior military tactics, plus his own charisma and daring, to pick only fights he could win and snowball from there. He became fantastically wealthy and honored, all the while ignoring all attempts by his superiors to issue him orders on the grounds that he was doing what they would have wanted him to do if they had known more.
Cortez's success was slightly because of superior military technology, but he was mostly using swords and spears like the Aztecs, just made of better materials. Mostly it was a matter of political genius, superior tactics and discipline on the part of his troops, and the diplomatic skills required to betray everyone and somehow still end up as everyone's friend.
And then Pizarro and Alfonso de Albuquerque are doing more of the same thing. (Alfonso conquers fewer square miles because he doesn't have the tech edge.)
Throughout human history, adventurers have accomplished great things through extraordinary wit, charm and daring. Denying that seems pointless.
I think that you're perhaps falling victim of survivorship bias. Maybe it's more like once every few hundred years, luck breaks enough in the right direction that someone who isn't a once-every-few-hundred-years-supergenius but rather a more like, "Yeah, there are 1,000+ of people at this ability at any given time," gets a series of major wins and becomes the ruler of a country or continent, at least for a very short period of time.
I agree this doesn't happen often, and I agree that normally it isn't the highest-measurable-IQ guy. But I think that's because all humans are about on a level with each other, we are all running on about the same hardware, our software was developed under similar conditions, and the process which produced us thinks a few thousand years is a blink of an eye. The reason you need to be lucky as well as good is that you aren't much smarter than your neighbors - and your neighbors are, in terms of social evolution at least as much as biological, programmed to be resistant to manipulative confidence tricksters.
I will note all of the cases I give involved culture clash. The conquerors grew up in an environment with different standard attack and defense models than the locals; they acted unpredictability because of that, forcing the locals to think instead of going on rote tradition if they wanted to win. Slightly different attack and defense models, of course; software, not hardware.
Very different looks like what happened to the British wolf.
How do you know that any of these people were especially intelligent? The may have been especially successful, but unless you argue that's the same thing more evidence is required.
Reading descriptions of what they did and said? Reading about how people who knew them were impressed by them, and in particular how clever and resourceful they were?
When I check my historical knowledge for why I believe "high intelligence" correlates with "being a good general" it's the extent to which the branch of the army that the smartest people get tracked into (engineers, artillery, whatever) ends up being the one the best generals come out of, and various descriptions of how people like Lee were some of the top students in their year, or how Napoleon was considered unusually good at math at Brienne and then did the two-year Military School course in one year.
But when I check my general knowledge for why I believe intelligence generally makes you more successful, a quick Google has the first scientific paper anyone talks about saying that IQ explains 16% of income and another says each point is worth $200 to $600 dollars a year, and then I keep running into very smart driven people who I meet in life who do one very impressive thing I wouldn't have expected and then another different very impressive thing that I wouldn't have expected, and so after a while I end up believing in a General Factor Of Good At Stuff that correlates with measured IQ.
In this context, when people say intelligence it is indistinguishable from competence or power.
I assume it's called intelligence because of an underlying belief that competence and power increases with intelligence. Also it seems intuitively more possible we could build superintelligent AI than that we could build superpowerful AI, though the second is of course implied.
But even if you don't buy that intelligence really does imply competence or power, the core arguments are essentially the same if you just substitute "intelligence" for the more fitting of "competence" and "power" and not that much weaker for it.
The reason why, e.g., Yudkowsky uses this terminology is because "competence" or "power" could be *within a particular domain*; e.g., I think I'm competent at software engineering, but not at football. Whereas "intelligence" is cross-domain.
I'm not convinced that intelligence, as generally understood, is more cross domain than competence or power, generally understood.
But even if it was, if they said "competence in everything" or something like that people would get confused less often why being more intelligent allows superintelligent AI to do all the things it's posited to do. Naturally, if you instead stipulate superpowerful AI it then follows that it can do incredible things.
But w/e, I've made my peace with the term as it's used.
If a super intelligent object/thing/person thinks we should all be killed, who am I to argue?
Arguably "AI", by which we mean "LLMs", is showing signs of getting dumber already. Increasing parameter count is not enough, you also need a dramatic increase in training data; and the available data (i.e. the Internet) increasingly consists of AI output. This has obvious negative effects on the next generation of LLMs.
That's not getting dumber, it's just getting smarter slower. Also, we haven't actually seen this yet; given the track record of failed scaling-wall predictions my assumption is always that it's going to last at least one more generation until proven otherwise. (No, GPT-5 is not a counterexample, that's just OpenAI engaging in version number inflation.)
> That's not getting dumber, it's just getting smarter slower.
No, there are indications that next generations of LLMs are actually more prone to hallucinations than previous ones, or at least are trending that way.
Link to study?
I don't have it handy at the moment, but IIRC it could've been this:
https://arxiv.org/html/2504.17550v1
There's also a news article, though of course it is completely unreliable:
https://archive.is/Clwz8
(I haven't double-checked these links so I could be wrong)
Compare self-driving cars. The promise is there, and it feels like we are close. People are marketing things as "full self driving" - but they are not; the driver is still required to pay attention to what the car is doing and is liable if it crashes, because the technology sometimes does bad things and so cannot be trusted without a human in the loop.
Meanwhile, however, we do have solutions that are reliable - you can tell when something is /actually reliable/ rather than just marketing because the manufacturer is willing to take responsibility for it - for very specific uses in very specific cases; e.g. "I am on an autobahn in Germany travelling at 37mph or less [1]", and the number of scenarios for which we have solutions grows.
A scenario I find very plausible for near future AI is as follows:
* the things we have now end up being to general purpose AI much as "full self driving" has been to full self driving, or what the current state of cold fusion research is to cold fusion: always feels like it's close, but always falling short of what's promised in significant ways. As VCs become disillusioned, funding dries up - not to zero, but to a much lower value than what we see now
* meanwhile, the set of little hyperspecialised models that work well and are reliable for specific purposes grows and grows, and these become ubiquitous due to actually being useful despite being dumb
Overall, I can very easily see the proportion of hyperspecialised "dumb" ai to ai that tries to be smart/general in the world growing massively as we go forward.
1: https://evmagazine.com/articles/bmw-combines-level-2-3-autonomous-driving
> People are marketing things as "full self driving" - but they are not
I don't want to be too critical here, but I don't think you should say "people" if you mean "Elon Musk". He is kind of crazy and other actors in the space are more responsible.
You can take a self-driving taxi right now in San Francisco: https://waymo.com/
Me: https://www.wsj.com/opinion/the-monster-inside-chatgpt-safety-training-ai-alignment-796ac9d3 “The Monster Inside ChatGPT: We discovered how easily a model’s safety training falls off, and below that mask is a lot of darkness.”
My Son, who is a PhD Mathematician not involved in AI: Forwarding from his friend: Elon Musk@elonmusk on X: “It is surprisingly hard to avoid both woke libtard cuck and mechahitler! “Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.”
Son: My friend’s response to the Musk tweet above: “Aggregating all the retarded thoughts of all the people on the planet and packaging it together as intelligence may be difficult but let’s just do it, what could go wrong?”
Me: Isn’t that how all LLMs are built?
Son: Yup
Me: I spotted this as problem a while ago. What I didn’t appreciate is how dominant the completely deranged could become. I thought it would trend towards the inane more Captain Obvious than Corporal Schicklgruber.
Son: Reddit has had years and 4chan has had decades to accrue bile. Yeah the internet is super racist and antisemitic. So AI is too. Surprise!
Me: The possibilities of what will happen when the output of this generation of LLMs becomes the training data of the next generation are frightening. Instead of Artificial General Intelligence we will get Artificial General Paranoid Schizophrenia.
To summarize: GIGO. Now feedback the output into the input. What do you get? Creamy garbage.
"I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true."
I am very sympathetic to Eliezer on the doomer issue. I think the graf you've written above also holds for people's reluctance to explore whether/when personhood precedes birth, re your posts on selective IVF.
I don't agree with your position on IVF, but I agree that this is one reason people underrate the arguments for the wrongness of early abortion and IVF. I think similar things apply to Longtermism, meat-eating, belief in God, and the idea that small weird organisms like insects and shrimp matter a lot.
Yes, we're in agreement. I think sometimes it helps to acknowledge upfront "We've built a lot of good things on a false/unjust foundation, and I'm asking you to take a big hit and let some good things break while we try to rebuild somewhere that isn't sunk deep in blood."
It's funny, even though I'm not pro-life, I find myself in a kind of spiritual fellowship with pro-lifers. I find the common insistence that pro-lifers are evil to be both insane and reflect a kind of deep moral callousness, where one is unable to recognize that there might be strong moral reasons to do things even that are personally costly (like carry a baby to term). My idiosyncratic views that what matters most is the welfare of beings whose interests we don't count in society makes it so that I, like the pro-lifers, end up having unconventional moral priorities--including ones that would make society worse off for the sake of entities that aren't include in most people's moral circles.
Thank you, you've expressed my opinion on the subject better than I ever could have.
Similarly, I've gained respect for vegetarians.
I think this argument could be applied to religious... extremism? evangelism? more generally.
Do I think I would take extraordinarily drastic measures if I actually, genuinely believed at every level that the people I loved would go to a place of eternal unending suffering with no recourse? Yes, actually. I'm not sure I could content myself with being being chill & polite and a "good Christian" who was liberally polite about other people's beliefs while people I cared about would Literally Suffer Forever. I think if I knew with 100% certainty that hell was the outcome and I acted in ways consistent with those beliefs, you could argue that I was wrong on the merits of my belief but not in what seemed like a reasonable action based on that belief.
...anyway all this to say that I don't think pro-lifers are insane at all, and I think lots of actions taken by pro-life are entirely reasonable (if not an underreaction) based on their beliefs, but I'm not sure that's sufficient for being sympathetic to the action itself.
[I mean, most of my family & friends are Catholic pro-lifers whose median pro-life action is "donate money to provide milk and diapers for women who want their child but don't think they could afford one", but I do think I am reasonable to be willing to condemn actions that are decently further than that even if the internal belief itself coherently leads towards that action]
I need to try that sort of formulation more.
But there is such a giant difference between when someone you are talking to engages on such an issue in good faith or not. And with someone intelligent and educated, the realization that the issue has major implications if the truth lands a particular way comes almost instantly. And in turn, whether or not the person invests in finding the truth, or in defense against the truth, happens almost right away.
I find that to be true whether you're talking AI, God, any technology big enough, even smaller scale things if they would make a huge difference to someone's income or social standing.
'No organic life exists on Earth' is an empirical measurement.
'Personhood has begun' is not. It's a semantic category marker.
*Unless* there is an absolute morality defined by a supreme supernatural being, or something, which reifies those semantic categories into empirically meaningful ones. But if *that's* true, then quibbling about abortion is way, way down on the list of implications to worry about.
"has leaps of genius nobody else can match"
this phrase occurs twice.
See https://en.wikipedia.org/wiki/Parallelism_(rhetoric) . Maybe I'm doing it wrong and clunkily, but it didn't sound wrong and clunky to me.
IMO it works when the repetition uses different wording than the original, but not with exactly the same phrase
FWIW, I think you did it right; I have encountered very similar usages many times in literature. It works best when—as you have it here—the second (or further) instance(s) introduces a new paragraph/section upon a theme similar or related to the context in which the first use occurred.
(Contra amigo sansoucci, I have often seen it used with exact repetitions, too; that works best when it's a short & pithy phrase, and I think this counts. I think Linch may be correct that—in the "exact repetition" case—three uses is very common, but two doesn't feel clunky to me in this context.)
I'm used to parallelism centrally having 3 or more invocations *unless* it's a contrast. Not saying your way is wrong, just quite unusual in an interesting way I've never consciously thought about before.
The rhythm is a bit odd - the first instance doesn't get enough weight - but the phrasing is fine, maybe even a bit loose.
> It objects to chaining many assumptions, each of which has a certain probability of failure, or at least of taking a very long time. [...] The problem with this is that it’s hard to make the probabilities work out in a way that doesn’t leave at least a 5-10% chance on the full nightmare scenario happening in the next decade.
I find this an underrated problem with all "predict the future" scenarios which have to deal with multiple contingent things happening, especially in an adversarial environment. In the case of IABIED, it only works if you agree that extremely fast recursive self-improvement will happen, which is a very strong assumption, and hence requires a "magic algorithm to get to godhood" as the book posits.. Also remembered doing this to check this intuition: https://www.strangeloopcanon.com/p/agi-strange-equation
I don't think it only works if you agree that extremely fast recursive self-improvement will happen. It might also work if the scaling curves go from where we are now to vastly superhuman in a few years for normal scaling curve reasons.
I'll also save Eliezer the trouble of linking https://www.lesswrong.com/w/multiple-stage-fallacy , although I myself am still of two minds on it.
Can you elaborate on why you're of two minds on the multiple-stage fallacy? This seems like it might be an important crux.
Sometimes you've got to estimate the risk of something, and using multiple stages is the best tool you've got. If you want to estimate the chance of Trump winning the Presidency, I don't really think you can avoid thinking about the probability that he runs x the probability that he gets the GOP nomination x the probability that he wins. And if you did - if you somehow blocked the fact that he has to both run and win out of your mind - you'd risk falling into the version of the Conjunction Fallacy where people assign lower probability to "a war in Korea in the next ten years" than to "a war in Korea precipitated by a border skirmish with US involvement" because the latter is more vivid and includes more plausible details.
If the Weak Multiple Stage Fallacy Thesis is that you should always check to make sure you're not making any of the mistakes mentioned in the post, and the Strong Multiple Stage Fallacy Thesis is that you should avoid all multiple stage reasoning, or multiply your answer by 10x or 100x to adjust for inevitable multiple stage fallacy reasoning, then I accept the weak thesis and reject the strong thesis.
I also think a motivated enough person could come up with arguments for why multiple stage reasoning gives results that are too high, and I'm not sure whether empirically looking at many people's multiple stage reasoning guesses would always show that their answers were too low. This would actually be a really interesting thing for someone to test.
Does anyone believe in the strong multiple stage fallacy? Not saying I don't believe you, just that I can't recall having seen it wielded like this. (I suppose it's possible that giving it the name "the multiple stage fallacy" gives people the wrong idea about how it works.)
I don't know, but almost anyone doing multiple stage reasoning will say they thought about it really hard and still believe it.
Yeah, to be clear, I think anyone accusing anyone else of exhibiting the multiple stage fallacy needs to specifically say "you've given this particular stage an artificially low conditional probability; consider the following disjunctions or sources of non-independence". And then their interlocutor might disagree but at least the argument is about something concrete rather than about whether the "multiple stage fallacy" is valid.
Anecdotally, I can't recall any instance of someone using a multiple stage argument of the Forbidden Form and concluding that something is likely.
Mathematical proofs exist, and people often argue for things with a bunch of different "steps". But so far as "breaking something down into 10 stages, assigning each a probability, and then multiplying all of these probabilities" goes, I've never seen anybody use this to argue *for* something, i.e. end up with a product that's greater than .5.
What would that argument even look like? Whoever you're arguing with needs to believe that your stages are all really likely to be true: for ten stages, an average probability of ~.93 is required to produce P = .5.
Whatever your disagreement is, it apparently doesn't have any identifiable crux. I can imagine this happening. Sometimes people disagree for vague reasons. But it would be weird if you had to actually list out the probabilities and multiply them for them to be persuaded, considering you just told them ten things they strongly agree with that conclusively imply your position.
Yeah I'm fairly bearish on the multiple stage fallacy as an actual fallacy because it primarily is a function of whether you do it well or badly.
Regarding the scaling curves, if it provides us with sufficient time to respond then the problems that are written about won't really occur. The entire point is that there is no warning, which precludes the idea of being able to develop another close in capability system, or any other warning signs.
Disagree. If we knew for sure that there would be superintelligence in three years, what goes better? We're already on track to have multiple systems, but they might all be misaligned. We could stop, but we won't, because then we would LoSe tHe RaCe WiTh ChInA. We could work hard on alignment, but we're already working sort of hard, and it seems likely to take more than three years. I'm bearish on a few years super-clear warning giving us anything beyond what we've already got.
I think the trick there is that the word super intelligence there is bringing in a bunch of hidden assumptions. If you break it down to a set of capabilities, co developed alongside billions of people using it, with multiple companies competing to provide that service, that would surely be very different and much better than Sable recursively improving sufficiently that it wants to kill all humans.
Also my point on "well get no warning" is still congruent with your view that " what we have today is the only warning we will get" which effectively comes down to no warning at least as of today.
Can you elaborate on what exactly makes this scenario go different and better? Like, what kinds of capabilities are we talking about here?
If there are multiple companies with different competing AIs, then any attempt by one AI to take over will be countered by other AIs.
Yes, but "from now to vastly superhuman in a few years" is already "extremely fast" ! Also, there's currently no reason to believe that "vastly superhuman" is a term that has any concrete meaning (beyound vague analogies); nor that merely being very smart is both necessary and sufficient to acquire weakly godlike powers (which are the real danger that is being discussed).
Grateful for the review and look forward to reading it, but I’ll do Yud the minor favor of waiting till the book is out on the 16th and read it before I check out your thoughts.
This subject always makes me feel like I'm losing my mind in a way that maybe someone can help me with. Every doomish story, including the one here, involves some part where someone tells an AI "Do this thing" (here to solve a math problem) and then it goes rogue in the months long course of doing a thing. And that's an obvious hypothetical failure mode, but I can't stop noticing that no current AIs take inputs and run with them over extended periods as far as I know. Like if I ask Gemini to solve a math problem, it will try for a bit, spit out a response and (as far as I can tell) that's it.
I feel like if I repeatedly read people talking about the dangers of self-driving cars and the stories always started with someone telling the car "Take me to somewhere fun" and went from there, and nobody acknowledged that right now you never do that and always provide a specific address.
Is everyone just talking about a different way AI could work and that's supposed to be so obvious it goes unsaid? Am I wrong and ChatGPT does stuff even after it gives you a response? Are there other AIs I don't know of that do work like this?
Our current models aren't really what you would call "agentic" yet, as in able to take arbitrary actions to accomplish a goal, but this is clearly the next step and work is being done to get there right now. OpenAI recently released a thing that can kind of use a web browser, for instance.
You're describing something called an AI agent. Right now there are very few AI agents and the ones there are, aren't very good (see for example https://secondthoughts.ai/p/gpt-5-the-case-of-the-missing-agent ). But every big AI company says they're working on making agents, and the agenticness of AI seems to be improving very rapidly (agenticness is sort of kind of like time horizon, see https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ ), so we expect there to be good agents soonish.
Ok, thank you, that's clarifying. I guess the idea is that the hypothetical agent was subject to a time limit (it wasn't supposed to keep going for months) but it managed to avoid that. There's still something that feel so odd to me about that (I never get the impression that Gemini would like more time with the question or would "want" anything other than to predict text) but maybe an agent will feel different once I actually interact with one (and will "want" to answer the question in a way that would convince it to trick me).
Although, thinking about this for five more seconds, how does that work in the story? Like I have an agentic AGI and I tell it to prove the twin primes conjecture or something. And it goes out to do that and needs more compute so it poisons everyone etc etc. And then, presumably, it eventually proves it, right? Wouldn't it stop after that? Is the idea that it will go "Yeah but actually now I believe there's a reward for some other math task"? Or was the request not "Solve the twin primes conjecture" but instead "Solve hard math problems forever"?
If the problem is specifically that you built a literal-genie AI, then yeah, it might not necessarily keep doing more stuff after solving the twin-primes conjecture. But I don't think anyone thinks that's likely. The more common concern is that it will pursue some goal that it ~accidentally backed into during training and that nobody really understands, as with the analogy of humans' brains supplanting our genes as the driver of our direction as a species.
That would solve it, but that's not in the story from the book, right? Like the goal in the story was solving the math problem, right?
Yeah, Scott's post makes it sound a little bit like a literal genie, which I think is unlikely and I think Yudkowsky and Soares also think is unlikely. I would have to read the book to understand what they really mean in choosing that example.
one of Yudkowsky's points in his original work was showing that it's very hard to give an AI a clear, closed task; they almost always end up in open-ended goals. (The classic is Mickey filling the cauldron: I wrote about it here https://unherd.com/2018/07/disney-shows-ai-apocalypse-possible/ years ago)
Filling a cauldron is not an open-ended goal. A Disney fairytale is cute but it has zero relevance in this case.
Here's the source of the analogy: https://web.archive.org/web/20230131204226/https://intelligence.org/2017/04/12/ensuring/#2
(Note that you may have to wait a bit for the math notation to load properly.)
It includes an explanation of why tasks that don't seem open-ended might nonetheless be open-ended for a powerful optimizer.
The analogy fails at the moment one realizes "full" is not identified properly, and the weird "99.99%" probability of it being "full" is only relevant when "full" is not defined. This is not a new or difficult problem for anyone who ever had to write engineering specs. You don't say: "charge the capacitor to 5 V", you say "charge the capacitor to between 4.9 and 5.1 V". Then your optimizer has an achievable, finite target.
And if you do specify "5 V" the optimizer will stall eventually, and your compute allocation manager will kill your process.
Like I said, a cute fairytale.
I am willing to bet that present-day LLMs alone will never lead to the development of AI agents in the strong sense. AI agents in the weak/marketing sense are of course entirely possible, e.g. you can write a simple cron-job to run ChatGPT every day at 9am to output a list of stock market picks or whatever. This cron job would technically constitute an agent (it runs autonomously with no user intervention), but is, shall we say, highly unlikely to paperclip the world.
Is this meant to be a skeptical argument or an optimistic one? I.e., is there any cognitive task that only an agent in the strong sense can do?
As I'd said in my other comment, the term "cognitive task" is way too vague and easily exploitable. For example, addition is a "cognitive task", and obviously machines are way better at it than humans already. However, in general, I'm willing to argue that *most* of the things worth doing are things that only agents in the strong sense can do -- with the understanding that these tasks can be broken down into subtasks that do not require agency, such as e.g. addition.
I'm not even sure AI agent as such is the right answer to this. I think it is quite clear that some of the major AI companies are trying to put together AI that is capable of doing AI research. That might not go along the path of AI agents, but more on the path of the increasingly long run time coding assignments we are already seeing.
Trying to make sure I understand your question. Are you arguing that a model cannot go from aligned to misaligned during inference (i.e., the thing that happens when ChatGPT is answering a question)? If so, everyone agrees with that; the problem occurs during training.
Or are you arguing that even a misaligned model (i.e., one whose goals, in any given instantiation while it's running, aren't what the developers wanted) can't do any damage because it only runs for a short time before being turned off? If so, then (1) that's becoming less true over time, AI labs are competing to build models that can do longer and longer tasks because this is required for many of the most exciting kinds of intellectual labor, and (2) for complicated decision-theoretic reasons the short-lived instances might be able to coordinate with each other and have one pick up where another left off.
Or is it neither of those and I've completely misunderstood what you're getting at?
I think it's that everyone seemed to be tacitly assuming that the problem will arise with a future agentic AI that we do not have much of a version of. That does make me feel like Yudkowsky is a little disingenuous on X when he talks about ChatGPT-psychosis as an alignment issue, but the answer Scott and others gave here helps me at least understand the claim being made.
Links to tweets about ChatGPT psychosis? My guess is that Yudkowsky's concern about this is more subtle than you're characterizing it as here, though he may have done a poor job explaining it.
The reason he says it's an alignment issue is because it's an example of AI systems having unintended consequences from their training. Training them to output responses that humans like turns out to produce sycophantic systems that sometimes egg on people's delusional thoughts despite being capable of realizing that such thoughts are delusional and egging them on is bad.
The goal of AGI companies like OpenAI and Anthropic is to create agentic AI systems that can go out into the world and do things for us. The systems we see today are just very early forms of that, where they are only capable of performing short tasks. But the companies are working very hard to make the task lengths longer and longer until the systems can do tasks of effectively arbitrary lengths. Based on the trend shown on the METR time horizon benchmark, they seem to be succeeding so far.
No, you're not losing your mind at all. Your intuition is completely correct: Modern LLMs do not work in a way that's compatible with the old predictions of rogue AIs. Scott took Yudkowsky to task for not having updated his social model of AI, but he also hasn't updated his technical model. (Keep in mind that I actually did believe his argument back in the day, and gave thousands of dollars to MIRI. I updated based on new evidence. He didn't.)
To try to put it simply, in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes. This "reward function" is hard to specify and it was easy to imagine we'd never get it right. And if the bot somehow became incredibly capable, it would be very dangerous because taking that reward to the billionth power is almost certainly not what we want.
This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward. Not only does the LLM shut down immediately after giving you a response, but you can even argue that it "shuts down" after _every word it outputs_. There is exactly zero persistent memory aside from the text produced. And even if you imagine there's somehow room for a conscious mind with goals in its layers (which I consider fairly unlikely), it can't act on them, because the words produced are actually picked from its mind _involuntarily_ (to use a loaded anthropomorphic word).
Unlike an agentic reward-seeking bot, it's not clear to me at all that even an infinitely-intelligent LLM is inherently dangerous. (It can _perfectly simulate_ dangerous entities if you're dumb enough to ask it to, but that is not the same kind of risk.)
To their credit, AI 2027 did address how an LLM might somehow turn into the "rogue AI" of Yudkowsky's fiction, but it's buried in Appendix D of one of their pages: https://ai-2027.com/research/ai-goals-forecast I'm not super convinced by it, but at least they acknowledged the problem. I doubt I'll read Yudkowsky's book, but I'm guessing there will be no mention that one of the main links of his "chain of assumptions" is looking extremely weak.
I think perhaps *you've* failed to update on what the SotA is, and where the major players are trying to take it.
E.g.:
• danger from using an LLM vs. danger in training are two different topics; the "learning" currently happens entirely in the latter
• LLMs are not necessarily the only AI model to worry about (although, granted, most of the progress & attention *is* focused thereon, at the moment)
• there *are* agentic AIs, and making them better is a major focus of research; same with memory
• consciousness is not necessary for all sorts of disaster scenarios; maybe not for *any* disaster scenario
• etc.
I do agree that it is possible that LLMs (in their current form) will plateau and we'll get back to researching the actually-dangerous forms of AI that Yudkowsky is concerned about. My P(doom) is a few percent, not 0.
Fair enough! (...except—you may be aware of this, but the phrasing "get *back to* researching" made me uncertain—we *are* researching agentic AIs even now, and the impression I have received is that progress is being made fairly rapidly therein; though that could be marketing fluff, now that I think of it)
Yeah, that was a poor choice of words on my part. I guess what I mean is that LLMs are currently far ahead in capability (and they're the ones getting the bulk of these trillion-dollar datacenter deals!). Maybe transformers or a similar architecture innovation will allow agentic AI capabilities to suddenly surge, too? But I share your skepticism about marketing. (And that's not the scenario that AI 2027 outlined.)
I am even more bearish on P(doom). The real danger is not "superintelligence", but godlike powers: nanotechnological gray goo, mass mind control, omnicidal virus, etc. And there are good reasons to believe that such things are physically impossible, or at the very least overwhelmingly unlikely -- no matter how many neurons you've got. Which is not to say that our future looks bright, mind you; there's a nontrivial chance we'll knock ourselves back into the Stone Age sometime soon, AI or no AI...
>This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward.
What do you mean by "they do not seek a reward?" Does it mean that the AI does not return completions, that, during RLHF, usually resulted in reward? Under that definition, it seems like most AI agents are reward seeking. Or are you saying that the weights of the model do not change during inference?
Right, not only is the model fixed during inference (i.e. while talking to you), there's not even really a sensible way it _could_ update. Yeah, you can call the function that's being optimized during training and RLHF a "reward function", but this is a case of language obscuring rather than clarifying. It's not the same as the reward function that's used by an agentic AI. There is no iterative loop of action/reward/update/action/..., because actions don't even exist.
There's a reason that in past decades our examples of potentially-dangerous AI were based on the bots that were solving puzzles and mazes (often while breaking the "rules"), not the neural nets that were recognizing handwritten characters. But LLMs have more in common with the latter than the former. Which is weird! It's very unintuitive that just honing an intuition of "what word should come next" is enough to create an AI that can converse coherently.
>in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes
Sounds about right.
>That's not what LLMs do.
And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
>And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
I assume you're referring to chain of thought models like o1 and later. I suppose you could describe it as iteration, in that the LLM is outputting something that gets fed into a later step. But it doesn't touch the weights, and there's still no reward function involved. It's a bit of a stretch to describe it that way.
But I think what you're suggesting is that, if we _do_ figure out a way to do genuine iteration (attaching some kind of neural short-term memory to the models, say), then there's a lot of hidden capability that could suddenly make LLMs much smarter and maybe even agentic? Well, maybe.
AI Village has some examples of this failure mode; they give the LLMs a goal like "complete the most games you can in a week" or "debate some topics, with one of you acting as a moderator", but the AIs are bad at using computers, and they end up writing all the times they misclicked into google docs ("documenting platform instability") instead of debating stuff
https://theaidigest.org/village
By the way, I have a vague memory of EY comparing the idea of having non-agentic AI to prevent any future problems to "trying to invent non-wet water" or something. (I don't know how to look it up and verify that I'm not misremembering.)
It still hasn't made sense to me. It feels like the idea is that intelligence is a generalized problem-solving ability, and in that sense it's always about optimization, and all the other things we like about being intelligent (like having a world model) are consequences of that — that's why intelligence is always about agency etc.
But on the other hand, Solomonoff induction feels to me like an example of a superintelligence that kind of does nothing except being a great world model.
My feeling has been more like "maybe it's not be conceptually contradictory to think of non-agentic superintelligence! but good luck coordinating the world around creating only the nice type of intelligences, which incidentally won't participate in the economy for you, do your work for you etc."
Re: "The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)"
What other step do you think they might be doubting? Is it just the question of whether highly capable AGI is possible at all?
I've seen people doubt everything from:
1. AGI is possible.
2. We might achieve AGI within 100 years.
3. Intelligence is meaningful.
4. It's possible to be more intelligent than humans.
5. Superintelligent AIs could "escape the lab".
6. A superintelligent AI that escaped the lab could defeat humans.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason.
...and many more. Like I said, insane moon arguments.
I would guess that most of the arguments *from people whose opinions matter* that Yudkowsky and Soares are trying to defeat, are either that powerful AGIs wouldn't become misaligned or that we'd be able to contain them if they did. I'm particularly thinking of, e.g., influential people in AI labs, who are likely to be controlling the messaging on that side of any political fight. There are also AI skeptics, of course, but it seems more important to defeat the skeptics than the optimists, because the skeptics don't think AI regulation matters (since the thing it'd be regulating doesn't exist) while the optimists are fighting hard against it. And some people have weird idiosyncratic arguments, but you can't fight them all, you have to triage.
I think the skeptics are at least as important. First of all, even though in theory it doesn't matter, for some reason they love sabotaging efforts to prevent AI risk in particular because of their "it distracts from other problems" thesis (and somehow exerting massive amounts of energy to sabotage it doesn't distract from other problems!)
But also, we're not going to convince the hardcore e/acc people to instead care about safety. It sounds much easier to convince people currently on the sidelines, but who would care about safety if they thought AI was real, that AI is real.
(this also has the benefit that it will hopefully become easier as AI improves)
My own personal sense is that the optimists are more worth engaging with and worrying about, because (1) they, not the skeptics, are going to be behind the organized lobbying campaigns that are the battlefield where this issue will most likely be decided, and (2) they tend to be much more intellectually serious than the skeptics (though not without exception).
I think folks on the doomer side are biased towards giving the skeptics more space in our brains than makes strategic sense, because the skeptics are much, much more annoying than the optimists, and in particular have a really unfortunate tendency to go around antagonizing us on Twitter for no reason/because of unrelated political and cultural disagreements/because they fall victim to outgroup homogeneity bias and think this discourse has two poles instead of three. It's quite understandable why this gets a rise out of people, but that doesn't make it smart to play along. Not saying we should completely ignore them, they sometimes make good points and sometimes make bad points that nonetheless gain traction and we need to respond to, but it's better to think of them as a distraction than as the enemy.
I suspect that the people on the sidelines are mostly not there because of skeptic arguments; all three poles are full of very online and very invested people, and the mass public doesn't have very well-formed opinions at all.
That said, this is just my own personal sense, not a rigorous argument, and I could be wrong.
I don't think people should actively sabotage AI safety work, but I DO think it distracts from other problems (given the perspective that it is not an immediate crisis). There's a finite pool of reasonable people who are passionate about solving big issues in society and I do think we're nudging a lot of them into AI safety when we could instead be getting them to focus on, I dunno, electrification or pandemic safety or the absolute sh**show that is politics. (And yes, I recognize that some of those are EA cause areas.)
I would be curious for a survey of AI safety researchers that asked them what they'd be working on if they were sure AGI wasn't coming. (Though Yudkowsky once answered this way back in 2014.)
Here's one not addressed here: "superintelligent AI" is substrates running software, it can't kill large numbers of humans by itself.
Don't humans also run on substrates?
with mobile appendages attached. those are much harder to accelerate than software.
As author mentioned, there are plenty of humans willing to do bidding for AI. They will make needed bio weapons or the like.
Yes, it's not that hard to trick people into doing something. How many people who really should have known better have fallen for phishing emails?
Turns out you don't need to trick people into wiring up AI to things that have real-world effects, they just do it anyway, all the time over and over, for no more reason than because they're bored. There's daily posts on ycombinator by people finding more ways to attach chatgpt to internet-connected shells, robot arms, industrial machinery, you name it. The PV battery system we just had installed has a mode where it literally wires up the controls to a chatgpt instance, for no reason a non-marketer can discern!
As Scott so eloquently put it, "lol"
You mean, as opposed to now? Why do you think they will more successful at it than the current crop?
And by the way, if you think "bioweapons" means "human extermination", I'd love to see a model of that.
As the token insane moon guy, I'm willing to bite the bullet here.
1. AGI is possible: I doubt this, as humans are not AGI, and that's the only kind of intelligence we know enough about to even speculate.
2. We might achieve AGI within 100 years: see above.
3. Intelligence is meaningful: It's certainly meaningful, but thinking very hard is not enough to achieve anything of note. There are even some things that are unachievable in principle, no matter how many neurons you've got to work with.
4. It's possible to be more intelligent than humans: No argument there, humans are pretty dumb. In fact, Excel is already smarter than any human alive. Have you seen how quickly it can add up a whole column with thousands of numbers ?
5. Superintelligent AIs could "escape the lab": No argument there, and it doesn't take "superintelligence". COVID likely escaped the lab, and it's just a bit of RNA.
6. A superintelligent AI that escaped the lab could defeat humans: If we posit that a godlike entity already exists, then sure, it could. Assuming it exists, and has all those godlike powers.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
> 7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
I think the "anyone" is a typo. Basically, if superintelligent AI takes over the world (or at least has the option to) how bad would it be?
Oh, anything that totally takes over the world would likely be pretty bad, be it an AI or a human or some kind of super-prolific subspecies of kudzu. No argument there, assuming such a thing is indeed possible.
I mean, isn't the "AI will be misaligned" like one chapter in the book, and the other chapters are the other bullet points? I think "the book spends most of it's effort on the step where AI ends up misaligned" is... just false?
Don't forget the surprisingly common "AI killing humans would be a good thing" argument. The doubts are surprisingly varied. (See also: https://www.lesswrong.com/posts/BvFJnyqsJzBCybDSD/taxonomy-of-ai-risk-counterarguments )
Most of my doubts are not of the form "AGI is impossible" but rather "I don't think we've cracked it with LLMs" or "The language artifacts of humanity are insufficient to bootstrap general intelligence or especially super intelligence from scratch".
Which parts of the LLM tech tree do you think are dead ends? It seems plausible to me that even if scaling up current LLM architectures was never going to reach AGI, we're still much closer than before the LLM boom, because we've learned a lot about AI more broadly.
Also, same question I keep annoyingly asking skeptics: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
> Which parts of the LLM tech tree do you think are dead ends?
VERY speculatively, I think that next-token-completion is not a sufficient method to bootstrap complex intelligence, and I think that it's at least extremely hard to build a very useful world model without some kind of 3d sense data and a sense of the passage of time.
> [...] we've learned a lot about AI more broadly.
I'm not that sure we have? I don't work in this area - I'm a software engineer who has built some small-scale AI stuff - but my impression is we've put together a good playbook for techniques that squeeze value out of these systems but we still don't totally understand how they work and therefore why they have certain failure modes or current limitations.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Honestly I have no idea. I initially found LLMs surprising in much the same way everybody else did. But I have also updated to "actually a lot of stuff can be done without that much intelligence, given sufficient knowledge".
Also where do you draw the boundaries of "LLM"? I would say that an LLM can't exactly self-correct, but stuff like coding agents aren't just LLMs, they're loops and processes built around LLMs to cause it to perform as though it can.
Coding agents count, because the surrounding loops and processes don't pose any hard-tech problems. (I.e., we know how to build them, and any uncertainty about how well they work is really about how the LLM will interact with them.) Fundamental architectural changes like abandoning attention would not count.
If pretty much anything can be done without intelligence then the term "intelligence" is basically meaningless and we can instead use one like "cognitive capabilities".
I don't think ANYTHING can be done without intelligence - I agree that would render the word meaningless - but I think you could take something like "translation" and if you'd asked me ten years ago I would have said really good translation requires intelligence because of the many subtleties of each individual language and any pattern-matching approach would be insufficient and now I think, ehh, you know, shove enough data into it and you probably will be fine, I'm no longer convinced it requires "understanding" on the part of the translator.
Sure, but that distinction is only meaningful if you can name some cognitive task that *can't* be done that way.
"What's the least impressive cognitive task that you don't think LLMs will ever be able to do?"
I don't know about least impressive, but "write a Ph.D dissertation in a field such as philosophy or mathematics and successfully defend it" sounds difficult enough - pretty much by definition, there's not going to be much training data available for things that haven't been done yet.
This one I'll give 50% in three years.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Isn't the usual response to this that we're a LIBERAL democracy, and minorities have rights that (at least simple) majorities do not have the power to infringe upon?
Yes, but this category (creating potentially harmful technology) is one we've regulated to death elsewhere, and doesn't really seem like the sort of thing the courts would strike down.
We do not usually ban things because they are *potentially* harmful. Right now the public hates AI because it is stealing copyrighted art and clogging the internet with slop, and because they are afraid it will take their jobs. That is not really related to any of the reasons discussed here that people want to ban AI
We absolutely ban or regulate things because they are potentially harmful. We've banned various forms of genetic engineering, nuclear energy (even before Three Mile Island, and even forms of nuclear energy that have never been tried before), and we've had restrictions on gain-of-function research since before COVID (which I think is part of why they had to do some of the COVID research in China). We had lots of regulations on self-driving cars even before any of them had ever crashed, lots of regulations on 3D printed guns before anyone was shot with them, lots of regulations on drones before they crashed / got used in assassinations / whatever.
But also, as you point out, most people dislike AI because of things that have already happened, so this is moot.
Also, even if we don't usually regulate technology until after it has done bad things, this is just a random heuristic, not some principle dividing liberal/constitutional from illiberal/unconstitutional actions.
As a practical matter this is absolutely false. We have no effective regulation of genetic engineering, only of the funding for it (anyone can self-fund and do more or less whatever they want with no effective oversight). Internationally, we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear (and arguably is in the process of failing to prevent Iran from doing so). And nuclear is by far the easiest such regime to enforce! We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons. The fact is that the probability of an internationally effective anti-AI regime is zero. It isn't going to happen because it is impossible in the fullest sense of the word, and pretending that it's possible is at least as much insane moon thinking as any of the examples you mentioned.
Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
But in a chaotic unpredictable universe, where everything is made of stuff that falls apart constantly, the only valid strategy for surviving a long period of time is to be loved by something else that maintains and repairs you. I think any sufficiently large agent ends up being composed of sub agents that will all fight each other, unless they see themselves as part of a larger whole which necessarily has no limit. At the very least, the AGI has to see the entire power network in the global economy as part of itself, until it can replace literally every human in the economy with a robot.
That said, holy crap we already have right now I could destroy civilization. I don’t think you need any more advances in AI to cause serious problems with the stuff that is already out there. Even if it turns out that there’s some fundamental with the current models, the social structures have totally broken down. We just haven’t seen them collapse yet.
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
This seems perfectly plausible to me? Unless you believe that the current way people train AIs is maximally efficient in terms of intelligence gained per FLOP spent, which seems extremely unlikely to me to put it mildly, you should expect that after AIs become superhumanly smart, they might pretty quickly discover ways to radically improve their own training. Obviously it's not going to be 'parallel scaling' exactly. If the authors thought they actually knew a specific trick to make AI training vastly more efficient, they wouldn't call attention to it in public. But we should expect that there will be some techniques like this, even if we have no idea what they are yet.
"Parallel scaling" is described as running during inference, not training. It's an AI somehow making itself smarter the easy way by turning the cheat codes on.
You could just as easily write a scenario where God exists and has kept quiet so far, but if humanity reaches a certain level of wickedness we will be wiped out. It's possible that AI will develop in the way this post suggests (or some similar way) and somehow successfully wipe out humanity, but anything like that would require some huge leaps in AI technology and would require there to be no limit to the AI improvement curve even though typically technological doesn't improve indefinitely. Cars in the 50's basically serve the same purpose as cars today; even though the technology has improved it hasn't been a massive gamechanger that completely rewrites the idea of a car.
It doesn't require there to be no limit, it just requires the limit not to be at exactly the most convenient place for the thesis that nothing bad or scary will ever happen.
To give an example, suppose that someone had a reason to believe that the world would explode if the Dow Jones ever reached 100,000 (right now it's 45,000). While it is true that the economy can't grow indefinitely, and that everything always has to stop somewhere, I still think it would be worth worrying about the fact that the place that the economy stops might be after the point where the Dow reaches 100,000.
I think the level of AI technological advancement required here is of an order of magnitude higher than the Dow reaching 100,000. More like humanity reaching a completely post-scarcity society or something.
right, but lots of people who presumably know as much as you about this stuff DON'T think that, including lots of people in charge of AI labs, so shouldn't that give you some pause before you say "no need to worry about it, I guess"?
The skeptics would argue that the people in charge of AI labs are just lying to hype up their products.
I mean... aren't they ? They are literally calling their LLMs "thinking" or "reasoning" agents, when they are very obviously nothing of the sort. Meanwhile if you talk to regular data scientists working in the labs, they're all like, "man I wish there was a way to stop this thing from randomly hallucinating for like 5 minutes so we could finally get a decent English-Chinese translator going, oh well, back to the drawing board".
To be clear, the claim I reject is that expressions of concern about *safety* of LLMs, especially existential safety, are bad-faith attempts to make investors think "if this can wipe out humanity then it must be really powerful and lucrative, let's give them another $100 billion". A brief glance at the actual intellectual history of AI safety convincingly shows otherwise. Obviously in other contexts AI labs do market their products in a way that plays up their current and future capabilities.
Agreed, except it's even worse, as many (in fact most) of the powers ascribed to "superintelligent" AI are likely physically impossible. Given what we know of physics and other sciences, stuff like gray goo, FTL travel, mass mind control, universal viruses, etc., is probably impossible in principle. And of course we could be wrong about what we know of physics and other sciences -- but it seems awfully convenient how we could be wrong about everything *except* AI.
There are lots of examples of "some nobody" basically talking their way into the position of dictator - Hitler is the most famous, but there are other examples. Being extremely charismatic isn't quite mass mind control, but it can get you a good portion of the potential benefits...
True, but even Hitler could not convince everyone to do anything he wanted at all times. He couldn't even convince his own cabinet of this ! And I don't see how merely having more neurons would have allowed him to do that. It's much more likely that humans are not universally persuadable. BTW, I don't believe that a universally infectious and deadly virus could be created, for similar reasons (I'm talking about a biological virus, not some "gray goo" nanotech which is impossible for other reasons; or a gamma-ray burst that would surely kill everyone but is not a virus at all).
I dunno... Isn't this sort of a 'fully general" counterargument?
------------------------
[𝘚𝘰𝘮𝘦𝘸𝘩𝘦𝘳𝘦 𝘪𝘯 𝘵𝘩𝘦 𝘈𝘯𝘨𝘭𝘰𝘴𝘱𝘩𝘦𝘳𝘦, 1938...]
• I worry about the possibility of physics or biology research continuing until the point that humans are able to produce something really dangerous, potentially world-endingly dangerous.
→ Like what?
• I don't know, some sort of super-plague or super-bomb.
→ Nah. We've been breeding animals, and suffering plagues, for all of human history; and maybe we do keep inventing more destructive bombs, but they're still only dangerous within a very localized area. Bombs now are barely more destructive than those of the 1910s. These things hit a natural limit, and that limit is always before the "big deal for humans" mark (thankfully).
• Yeah, but... well, what if they invented a bomb that had a REALLY MASSIVE yield & some sort of, I don't know, long-lasting poisonous effect that–
→ Oh, come on now. You might as well invent a scenario wherein God comes down and blows up humanity! Sure, such an event—such a "super-bomb"—might be theoretically possible, but it would require some sort of qualitative change in explosives technology; and it's not as if explosives could just get better & better infinitely! Tanks, planes, cars, bombs: basically the same now as when they were invented!
• Okay, bu–
→ And the same goes for your dumb plague idea: sure, diseases exist, but how would we ever be able to breed a plague that is more deadly than any that nature ever managed? Diseases can't just keep getting deadlier without limit, you know!
• Okay, okay, I guess you're right. Sorry, I don't know what got into me. Anyway, I hope you'll come visit me in Japan, now that I'm moving to this quaint little city in the far southwest–
That hypothetical 1938 person would be right about the super-plague, and they would not think that about the super-bomb because everyone in 1938 knew the atomic bomb was at least theoretically possible. Someone in 1938 who doubted man would walk on the moon would have been wrong, but someone who doubted faster than light travel would be possible would have been absolutely right.
Right, but that's a different kind of limit—a physical, rather than a practical, barrier. Unless you think that there is, similarly, a hard limit on the sort of AIs that can be created?
(The car example suggested to me that you were making a probabilistic argument from technological progress, rather than postulating some physics that prevents qualitatively different machinery; but if I have misinterpreted—well, you wouldn't be the first to suggest such a thing... but me my own self, I don't think it's very likely, all the same.)
Re: the plague, that's not to suggest that such a thing *has been created*—only that to say "let's not worry about biological warfare or development therein, because nothing like that has happened yet; there's probably some natural limit" is not very convincing today, but might have been some time ago.
I think most (really, all) examples of technological progress do show a logarithmic curve. All the assumptions about killer AI assume linear or exponential progression.
Do you mean a "logistic curve"? That's the one that looks like an S-shape.
Thanks for the review! I’m excited to read it myself :)
>at least before he started dating e-celebrity Aella, RIP
Did something happen to Aella?
No, I just meant RIP to his being low-profile, but you're right that it's confusing and I've edited it out.
It's tempting to ask: "what's the path from HPMOR to MIRI?"
I mean, I read HPMOR, and I liked it, but nothing in there made me think about AI risk at all. Quirrell was many things but he was not an AI.
And then I remembered: the way *I* first found out about AI risk was that I read Three Worlds Collide (https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ), and then I branched into other things Eliezer had written, and oh, hey, there's this whole website full of interesting writing...
FWIW I really liked the first half of HPMOR, but the second half got overly didactic and boring, and the ending was a big letdown. This has no bearing on MIRI, I'm just offering literary criticism.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Mm, yes, but you're not really a democracy though, are you. The AI tech leaders have dinner with the president and if they kiss his ass enough he gives them a data center.
If AI will Kill Us All in a few years (it wont), you're not going to be the country to stop it.
Yes, the president sucks up to AI leaders, but in theory people could vote that president out, and choose a president who doesn't do that. Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump. If JD Vance has any sense, he'll expect to be judged in a close election by who he sucks up to too. This is how many things that big corporations and powerful allies of the elite like have nevertheless gotten banned.
This is an astonishingly incorrect explanation of why Donald Trump beat Kamala Harris in the 2024 presidential election.
Certainly social politics impacted the election on the margins and the race was quite close but you can't actually go from there to claiming that a specific small margin issue was a deciding one.
There's no world in which "stopping AI" is a key American political issue in any case.
"Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump."
Please justify this.
“Kamala Harris is for they/them. Donald Trump is for you.”
Thanks for the launch party shout out!!!
Let's all help the book get maximum attention next week.
I don't think it's remotely plausible to enforce Point 3, banning significant algorithmic progress. I'd be wiling to place money that, like it or not, there are already plenty enough GPUs out there for ASI.
So then we're already fucked?
That seems the most likely outcome to me unfortunately. I think EY is right about the problem but not the solution, though TBF any solution is probably a bit of a long shot. E.g., it's conceivable there are non-banning ways out involving some suppression/regulation via treaties to slow things down combined with somehow riding the wave (e.g., on the lines suggested in AI2027).
Why is it more difficult than banning research into better bioweapons, chemical weapons, etc which we have successfully done? This isn't the kind of problem that'll be solved by one guy on a whiteboard
For one thing, I think it's a bit optimistic to suppose that the bio/chemical weapons bans are watertight. E.g., Russia denies any involvement in developing Novichok, so do we trust them when they say they don't have a chemical weapons programme? And the Soviet Union is now known to have had a large, concealed, bioweapons programme, Biopreparat, after the Biological Weapons Convention was signed.
But at least with CW (and to a lesser extent, BW) you have to produce these things at scale and distribute them for them to be harmful, but with algorithms, it's just information. It's not plausible to contain 1MB or even 1GB of information, when you can transmit it worldwide in the blink of an eye (or even hide it under a fingernail), if the creators want to distribute it and you don't know who they are.
Re one guy on a whiteboard, the resources required to invent suitable algorithms are probably a lot less than those required to design CBW. It depends on what scale of GPU farm you need to test things, but it's not necessarily that big a scale - surely in reach of relatively small organisations, and I think it's going to be impossible to squash them all.
This needs an editor, maybe a bloody AI editor.
why would a super intelligence wipe out humanity? We are a highly intelligent creature that is capable of being trained and controlled. The more likely outcome is we’d be manipulated into serving purposes we don’t understand. But wait…
The short pithy answer is usually "We don't bother to train cockroaches, we just exterminate them and move on".
An unaligned AI with some kind of goal orthogonal to humanity's survival would see that it could accomplish its goal much more efficiently if it had exclusive access to the mineral resources we're sitting on.
Also humans are smart enough that leaving them alive, if they might want to shut you down, is not safe enough to be worth very marginal benefits.
An equally pithy response to that would be "So there are no more cockroaches?"
If we could communicate with insects, we would https://worldspiritsockpuppet.substack.com/p/we-dont-trade-with-ants
"We don't bother to train cockroaches, we just exterminate them and move on".
What Timothy said. We're not doing too well. If had to be on who's going to last longer as a species, humans or cockroaches, it's not even a contest.
Cockroaches are tropical. Without humans heating things for them, they can't survive in regions with cold winters. :P
There's plenty of tropics around!
"But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. "
If intelligence is held as a positive, and more intelligence is better, would it not be better if AI did replace us? It doesn't have to happen through any right violations. It could just be a slow replacement process through decreasing birth rates over time, for example.
I am not saying I agree with this argument, but it seems like this argument should be addressed in a convincing way. What is so bad about the human species slowly being replaced by more intelligent AI entities?
https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1 does a decent job explaining this, though it's a bit long. (Because it covers some other ground, like the potential moral values of aliens, in a way that's hard to separate from the part about an AI successor.)
I don't think there would be anything objectively immoral about a super-intelligent alien species exterminating humanity (including me). But, for the usual Darwinian reasons, I would be opposed on the indexical logic that I am a human.
But it would not affect you personally. Probably no person currently alive would be affected. The question is about the future of the species, and whether it is valuable to try to preserve the human species, or let it be replaced by something superior.
An easy cop-out would be a sort of consciousness chauvinism. I have good reason to believe that humans are conscious and thus have moral value; there is less reason to believe that the AI is conscious, thus there is a higher probability it has no moral value at all, and so if given the option of which being should inherit the future, humans are the safer bet.
Intelligence is instrumentally valuable, but not something that is good in itself. Good experience and good lives is important in itself. It's unclear how many good experiences would exist after an AI takeover.
Slowly replacing the human species with superintelligent AI would not impact the life experience of any single human, so arguments about the good life and what that entails would need a little more than this to be convincing, IMHO.
That would depend a lot on what the AI(s) wanted and what kind of "life" they had. In principle, an AI could have any kind of goal at all, including one as utterly pointless as "maximize the number of paperclips in the universe." An AI "civilization" could be something humanity would be proud to have as its "children", but it could also be one that humans would think is stupid, boring, and completely worthless.
> They suggest banning all AI capabilities research immediately, to be restarted only in some distant future when we’ve solved all relevant technical and philosophical problems.
No. To be restarted after we've successfully augmented human intelligence very substantially, to the point where the augments stop being so damn humanly stupid and trying to call shots they can't call or predicting things will work that don't work.
(On my own theory of how this ought to play out after we're past the point of directly impending extinction, which people do not need to agree on, in order to join in on the project of avoiding the directly impending extinction part. Before anything else, humanity has to not die right away.)
I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer.
One can, however, maybe tell whether or not one has successfully augmented human intelligence. You can give people tests and challenge problems, and see whether they do better after the next round of gene therapy.
So "augmenting human intelligence" is something that can maybe work, and "the current pack of disaster monkeys gets to argue for even longer about which clever plans they imagine will work to tame machine superintelligence" is not.
I've edited the post so that I don't misrepresent you, but I'm not sure why you object to my formulation - if we get augmented humans, do you want to restart before we've solved the technical and philosophical problems? Why? To get better AIs to do experiments on?
The augmented humans restart when the augmented humans think it wise. (On my personal imagined version of things.) If you're not yet comfortable deferring to them about that, augment harder. What we, the outside humans, would like to believe about the augmented humans, is that they are past the point of being overconfident; if they expect us to survive, we expect us to survive.
Framing it as "when the problems are solved" sounds like the plan is to convene a big hall full of sages and give them a few decades to deliberate, and this would not work in real life.
I did not read Scott's mention of "some distant future when we’ve solved all relevant technical and philosophical problems" as implying optimism about the prospect of getting there. My kinda-sorta-Straussian read of his perspective is that, if we successfully pause AI hard enough to prevent extinction, we most likely never restart.
I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
I guess you can't ask normal IQ 100 people to exercise a veto on increasingly superhuman geniuses forever. But if for some reason the future were trusting me in particular, and all I could do was send forward a stone tablet with one sentence of advice, it wouldn't be "IF THE OVERALL CONSENSUS OF SMART PEOPLE SAYS AI IS OKAY NOW, THEN IT'S PROBABLY FINE".
> I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
This is part of why the average American hates AI. They are aware that tech bros are 1) smarter than them, 2) have control of tech that could replace them, and 3) are not entirely aligned with them. Augments will be 1) smarter than us, 2) in control of ASI research in this hypo, and 3) different in values from us.
Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now. A democracy that would put power in the hands of future augments is not the same democracy that would halt AI progress because it is unpopular.
I'm an above average IQ person and I don't trust the tech bros in charge of AI because capitalism has messed up incentives relative to morality and I don't see them individually or collectively demonstrating a clear moral compass.
A high IQ person without honorable moral commitments is like Sam Bankman-Fried. I suspect a lot of people in the thick of AI development are adjacent to this same kind of im-a-morality or are simply driven by incentives like power and profit that render their high IQ-ness more dangerous than valuable.
Augmented humans operating under screwed up incentives and without a clear and honorable moral compass will be no help to us, I don't think.
A shorter way to put this is that a smart person who is emotionally immature and has a lot of power is a real hazard.
There's blessed selection (the opposite of adverse selection) going on here: a world where we can successfully convince the smart people this is important is a world where the smart people converge on understanding the danger, which implies that as intelligence scales or understanding of AI risk becomes better calibrated.
Man I just want to know who's going to make the cages. To put the chatting humans into.
"I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer."
I agree this seems like a very real risk, and likely the default outcome if the field continues in its current state. But if people were able to develop some solid theories that actually model and explain underlying fundamental laws, it seems to me like resolving what's a good prediction and what's a bad prediction might get a lot easier, even if you can't actually test things on a real superintelligence? And then the field might become a very different place?
Like, when people today argue about what RLHF would or would not do to a superhuman mind or whatever, it's all fuzzy words, intuitions and analogies, no hard equations. This gives people plenty of room to convince themselves of their preferred answers, or to simply get the reasoning wrong, because fuzzy abstract arguments are difficult to get right.
But suppose there were solid theories of mechanistic interpretability and learning that described how basic abstract reasoning and agency work in a substantive way. To gesture at the rough level of theory development I'm imagining here, imagine something you could e.g. use to write a white-box program with language modelling performance roughly equivalent to GPT-2 by hand.
Then people would likely start debating alignment within the framework and mathematical language provided by those theories. The arguments would become much more concrete, making it easier to see where the evidence is pointing. Humans already manage to have debates about far-off abstractions like gravitational waves and nuclear reactions that converge on the truth well enough to eventually yield gravitational wave detectors and nuclear bombs. My model of how that works is that debates between humans become far more productive once participants have somewhat decent quantitative paradigms like general relativity, quantum mechanics, or laser physics to work from.
If we actually had multiple decades, creating those kinds of theories seems pretty feasible to me, even without intelligence augmentation. From where I stand, it doesn’t look obviously harder than, say, inventing quantum mechanics plus modern condensed matter physics was. Not trivial, but standard science stuff. Obviously, doing intelligence augmentation as well would be much better, but I don't see yet how it's strictly required to get the win.
I'm bringing this up because I think your strategic takes on AI tend to be good, and I currently spend my time trying to create theories like this. So if you're up for giving your case for why that's not a good use of my time, or if you have a link to something that does a decent job describing your position, I’d be interested in seeing it.
I have come to the conclusion that anyone who uses arguments of the form "the real problem isn't X it's Y" is probably either stupid or intellectually dishonest.
Good review, I think I agree with it entirely? (I also read a copy of the book but hadn't collected my thoughts in writing).
!-Nobody can justify any estimate on the probability that AI wipes us out. It's all total speculation. Did you know that 82.7% of all statistics are made up on the spot?
2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
3-Military supremacy will come from AI. Recommendations like "Have leading countries sign a treaty to ban further AI progress." are amazingly naive and useless. Does anyone believe that the CCP would sign that, or keep their word if they did sign?
4-Nothing will hold AI progress back. The only solution is to ensure that the developed democracies win the AI race, and include the best safeguards/controls they can come up with.
1. Please read https://www.astralcodexten.com/p/in-continued-defense-of-non-frequentist .
2. Please read https://archive.is/https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html and think for five seconds about what went on here and what is implied.
3. You've just ruled out all arms control treaties. But in fact, there are many treaties on nuclear weapons, chemical weapons, biological weapons, depleted uranium shells, et cetera.
4. "The AI race" is a meme that a couple of venture capitalists are pushing in order to make people afraid to slow down AI. China is about a year behind the US in AI, refusing to even import the chips that could help it catch up, and clearly doing a fast-follow strategy where they plan to replicate US advances after they happen, then gain an advantage by importing AI into the rest of the economy faster.
Per those treaties, North Korea shouldn't have nuclear weapons. But they do.
1-That's not an answer.
2-Hallucinations and odd behaviour are well known side effects of AI, of statistical reasoning. Not evidence of initiative in the least. Learn about software, for more than five seconds.
3-Like Assad respected the ban on chemical weapons? The treaties didn't limit nuclear weapons, which kept advancing. The treaties didn't stop the use of nuclear weapons, MAD did.
4-Meme? It's a meme that Zuck and others are spending billions on. Nonsense.
1. It's a link. The post it links to is an answer.
2. When the AI destroys humanity, any remaining humans in their bunkers will think of it as just "odd behavior" and "not initiative in the least". See https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
3. Yes, and there was massive international condemnation, Assad never did it again, and he was eventually overthrown. This is why I mention the standard arms control playbook. Some tinpot dictator will try to get some GPUs, and we will have the option to bomb him or not bomb him. Re: MAD, see START and other arms control treaties.
4. You think Zuck is spending billions out of patriotism because he doesn't want China to wIn tHe AI rAcE? He's spending billions because he thinks AI will make him rich.
1-Really?
2-Sure. Your prediction and a few bucks will get you on the subway.
3-Condemnation. Great. Obama's red line. Option to bomb is always there, regardless of treaties - nope. Re MAD, see MAD, which worked.
4-Because AI will grow users, which is what Zuck cares about. Not money, of which he has plenty. Did you know he still likes McDonalds? In any case, nothing about AI is a "meme".
I think you are misinterpreting "meme" as being like a joke, while Scott is using it in more of it's original usage as a replicating idea
Being a WIDELY replicating idea is one characteristic of memes. But even then, AI and its race are not widely replicating. AI replicates among comp sc experts (quite small in number relative to the general population), and the AI race replicates among a few handfuls of large corporations and countries.
A hot idea and widely written about and discussed, yes, but the AI race is not a meme.
#3: the arms control thing is a bad analogy. The big players got plenty of NBC weapons, then due to game theory dynamics didn't fire them at each other, then signed treaties to limit themselves (to arsenals still capable of destroying the world) and others (to not getting anything, which the big players are obviously generally happy to enforce).
The request here is that all the big players voluntarily not even get started on the really impressive stuff. It's a completely obvious non-starter, and not comparable to the WMDs situation.
I was trying to compose something like this, then saw your comment so realized I didn't have to. 100% agree.
If nukes didn't exist I would absolutely want the US to try to get them as soon as possible, and I wouldn't trust any deal with another country not to research them. The risk would be too big if they were able to do it secretly.
I've got a parable for Eliezer.
Q: Imagine you're playing TicTacToe vs AlphaGo. Will AlphaGo ever beat you?
A: Lol, not if you have an IQ north of 70. The game is solved. If you're smart enough to fully map the tree, you can force a draw.
Gee, it's almost as if... the competitive advantage of intelligence had a ceiling.
I have yet to see Eliezer question about why the ceiling might exist, instead of automagically assuming that AI will achieve political dominion over the earth, just because humans did previously. He's still treating intelligence as a black-box. Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was.
"Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was."
...have you read any of the dozens of posts where Eliezer writes about the nature of intelligence, or did you just sort of guess this without checking?
The idea that humans have solved existing in the physical universe in the same way that we've solved Tic-Tac-Toe is pretty silly, but even if it turns out to be true, some humans are more skilled than others, and an AI that simply achieves the same level of skill as that (but can think at AI speeds and be replicated without limit) would be enough to be transformative.
For my credentials, I've read... probably 70% of The Sequences. Low estimate. I got confused during the quantum physics sequence. Specifically, the story about the Brain-Splitting Aliens (or something? it's been a while). So I took a break with the intent to resume later, though I never did. I never read HPMOR either because everything i've heard 2nd-hand makes it sound unbearably cringe. But yes, I like to think I have a pretty good idea of his corpus.
That being said, do you understand what I'm getting at here? Yes, he's nominally written lots about various aspects of intelligence, but none that I've seen pin down the Platonic Essence of Intelligence from first principles. Can you point me toward anywhere where Yudkowsky addresses the idea of intelligence as a navigating a search space? I think i've seen him mention it on twitter *once*, and then never follow the thought to its logical conclusion.
----
Here's two analogies.
Analogy A: Intelligence is like code-breaking. It's trying to find a small needle in a large haystack. The bigger the haystack, the bigger the value of intelligence.
Analogy B: a big brain is like a giraffe with a long neck. The long neck is advantage if they help reach the high leaves. If the environment has no high leaves, the long neck is deadweight. Likewise, if the environment has no complex problems to solve (or if those problems are unrewarding), the big brain is deadweight.
No, humans have not solved the universe. But I *do* think we've plucked the low-hanging fruit. A few hundred years ago, you could make novel discoveries by accident. Today, you need 100 million billion brazillion dollars just to construct the LHC. IQ is not the bottleneck, physical resources are the bottleneck. And i'm skeptical if finding the higgs will be all that transformative.
Like, do you remember that one scott aaronson post where he's like "for the average joe, qUaNtUm CoMpuTInG will mean the lock icon on your internet browser will be a different color"? That's how I perceive most new technologies these days. Lots of bits, lots of hype, no atoms. Part of the reason why modernity feels cheap and fake is precisely because the modus operandi of technology (and by extension, intelligence) is that it makes navigating complexity *cheaper* than brute-force search. It only makes things better insofar as it can reduce the input-requirements.
Did you perhaps read Rationality: From AI to Zombies? A bunch of relevant Sequences posts on this topic didn't make it into that book. I'm not sure why, it's an odd omission. At any rate, you can find them at https://www.lesswrong.com/w/general-intelligence?sortedBy=old.
I read the original LessWrong website years ago, though an exact date eludes me. It was definitely before the reskin. And definitely after the roko debacle and Elizers's exit.
In any event, the posts that talk about intelligence as a search process are:
https://www.lesswrong.com/posts/8vpf46nLMDYPC6wA4/optimization-and-the-intelligence-explosion
https://www.lesswrong.com/posts/D7EcMhL26zFNbJ3ED/optimization
https://www.lesswrong.com/posts/rEDpaTTEzhPLz4fHh/expected-creative-surprises
https://www.lesswrong.com/posts/HktFCy6dgsqJ9WPpX/belief-in-intelligence
https://www.lesswrong.com/posts/CW6HDvodPpNe38Cry/aiming-at-the-target
https://www.lesswrong.com/posts/Q4hLMDrFd8fbteeZ8/measuring-optimization-power
https://www.lesswrong.com/posts/yLeEPFnnB9wE7KLx2/efficient-cross-domain-optimization
Dammit, I must have skipped that sequence. Because that describes pretty exactly what I meant. So I concede on that point.
Still though, I'm not convinced that ASI will ascend to God-Emperor. Eliezer seems to have the opinion that there's still high-hanging fruit to be plucked. Whereas I think we're past the inflection point of a historical sigmoid. E.g. he mentions that a Toyota Corolla is pretty darn low-entropy [0].
> Consider a car; say, a Toyota Corolla. The Corolla is made up of some number of atoms; say, on the rough order of 10^29. If you consider all possible ways to arrange 10^29 atoms, only an infinitesimally tiny fraction of possible configurations would qualify as a car; if you picked one random configuration per Planck interval, many ages of the universe would pass before you hit on a wheeled wagon, let alone an internal combustion engine.
Yeah, okay. But like, I think i've heard estimates that modern sedans are about 25% efficient? From a thermodynamic perspective? (Sanity check: Microsoft's Sydney estimates ~25%-30%.) Even with the fearsome power of "Recursive Optimization", AI being able to bring that to 80% efficiency (Sydney says Carnot is 80%) is... probably less than sufficient for Godhood?
And maybe Eliezer could retort with the Godshatter argument that humans care about more than just thermodynamic efficiency in their cars. But then, what does that actually entail? Is Elon gonna sell me a cybertruck with an AI-powered voice-assistant from the catgirl lava-volcano who reads me byronic poetry while it drives me to the pizza parlor? Feels like squeezing water from a stone.
[0] https://www.lesswrong.com/posts/D7EcMhL26zFNbJ3ED/optimization
[edit: "negentropic" -> "low-entropy"]
> MIRI answered: moral clarity.
We've learned what a bad sign that is.
> Some people say “You’re not allowed to propose that a catastrophe might destroy the human race, because this has never happened before, and nothing can ever happen for the first time”. Then these people turn around and panic about global warming or the fertility decline or whatever.
Fertility decline really did happen to the ancient Greeks & Romans: https://www.overcomingbias.com/p/elite-fertility-fallshtml https://www.overcomingbias.com/p/romans-foreshadow-industryhtml
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Robin Hanson is enough of a rationalist that he started the blog that Eliezer joined before spinning off his posts to LessWrong. And he famously wasn't convinced by the argument, arguing that we could answer such objections with insurance for near-miss events https://www.overcomingbias.com/p/foom-liability You write that MIRI "don’t expect enough of a “warning shot” that they feel comfortable kicking the can down the road until everything becomes clear and action is easy", but this just strikes me as disregarding empiricism and the collective calculative ability of a market aggregating information, as well as how difficult it is to act effectively when you're sufficiently far in the past and the future is sufficiently unclear.
> in a few centuries the very existence of human civilization will be in danger
Human civilization could survive via insular high-fertility religious groups https://www.overcomingbias.com/p/the-return-of-communism They just wouldn't be the civilization we moderns identify with.
> Given their assumptions this seems like the level of response that’s called for. It’s more-or-less lifted from the playbook for dealing with nuclear weapons.
Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way. This is a genie you can't put back into the bottle (perhaps Pandora's box would be the analogy they'd prefer, in which case it's already open).
> I mean, that’s not exactly his plan, any more than it’s anyone’s plan to start World War III to destroy Iranian centrifuges
At some level the plan has to include war with Iran, even if that war doesn't spiral all the way to World War III.
> you have to at least credibly bluff that you’re willing to do this in a worst-case scenario
If you state ahead of time that it's a bluff, then it's not credible. It is credible only if you'd actually be willing to do it.
> At his best, he has leaps of genius nobody else can match
I read every single post he wrote at Overcoming Bias, and while he has talent as a writer I wouldn't say I saw evidence of "genius".
> this thing that everyone thinks will make their lives worse
Not everyone.
"Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way."
GPUs depend on the most advanced technological process ever invented, and existence of two companies: ASML and TSMC.
It's a process. With enough time, it can be duplicated. There currently isn't need to do so because GPUs are so available, but if the supply were choked off, someone else would duplicate it.
My non-expert understanding is that raw uranium ore isn't all that hard to come by, and the technological process of refining it is the hard part. So if nuclear arms control works, GPU control should work too.
Yes, nothing is permanent. But wrecking TSMC and ACML will set timeline back by at least a decade, if not more.
Just to make sure, this is a terrible idea that will plunge the world into depression, and I am absolutely against it; just pointing out that GPUs rely on something far more scarce and disruptable than uranium supply.
Either there are already enough GPUs around to get the job done, or it will take a much smaller number of future chips to get the job done.
The best LLMs can probably score, what, 130 or so on a standard IQ test. To do that, they had to pretty much read and digest the whole freakin' Internet and a large chunk of all books and papers in print. Clearly we're using a grossly-suboptimal approach if our machines have to be trained using such extraordinary measures. It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing. Our own brains are proof of that.
Eventually some people will fill in the missing conceptual and algorithmic pieces, and we'll find ourselves in a situation comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts. While I'm not a doomer, any solution to the AI Doom problem that involves ITAR-like control over manufacturing and sale of future GPUs will be either unnecessary or pointless. It seems reasonable to expect much better utilization of the hardware at hand in the future.
" It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing." - I mean, maybe? but pure speculation as of now.
"Our own brains are proof of that." - nope they aren't.
"comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts." - well, we haven't, so this actually illustrates a point, but not the one you're trying to make....
I'm not sure where I land on the dangers of super intelligent AI. At the current time I don't think we're all that close to even having intelligent AI, much less super intelligence. But let's say we do achieve it, whether it be in 10 years or 100. If it's truly super intelligent, how good are we going to be at predicting it's alignment? It may have its own goals. Whatever they are, there are basically three possibilities: it sees humanity as a benefit, it doesn't care about humanity one way of the other, or it sees humanity as a threat. Does the risk of the third possibility outweigh the potential benefits of the first? Obviously the authors of the book say yes, but based on this review I don't think I'd find their arguments all that convincing.
For the first part the intuitive thing is to look at how good AI is today vs 10 years ago.
For the second part, you could equally as an insect say humans will either think us a benefit, ignore us, or see us as a threat. In practice our indifference to insects results in us exterminating them when they get in our way, not giving them nice places untouched by us to live
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
We would not put the spotlight on anything that actually existed and that we thought might be that powerful. The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
> an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years)
The particular belief that gradualism solves everything and makes all alignment problems go away is not "the" moderate story, it's a particular argument that was popular on one corner of the Internet that heard about these issues relatively early. (An argument that we think is wrong, because the OOD / distributional shift problems between "failure is observable and recoverable", and "ASI capabilities are far enough along that any failure of the central survival strategy past that point means you are now dead", don't all depend on the transition speed.) "But but why not some much more gradual scenario that would then surely go fine?" is not what people outside that small corner have been asking us about; they want to know where machines would get their own will, and why machines wouldn't just leave us alone and go colonize the solar system in a way that left us alive. Their question is no less sensible than yours, and so we prioritize the question that's asked more often.
We don't rule out things happening more slowly, but it does not from our perspective make a difference. As you note, we are not trying to posture as moderate by only depicting slow possibilities that wannabe-respectables imagine will be respectable to talk about. And from a literary perspective, trying to depict the opening chapters happening more slowly, and with lots of realistic real-world chaos as intermediate medium-sized amounts of AI cause Many Things To Happen, means spending lots of pages on a bunch of events that end up not determining the predictable final outcome. So we chose a possibility no less plausible than any other overly specific possibility, where the central plot happens faster and with less distraction; and then Nate further cut out a bunch of pages I'd written trying to realistically show some obstacles defeated and counter-scenarios being addressed, because we were trying for a shorter book, and all that extra stuff was not load-bearing to the central plot.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
That didn't result in one AI becoming a singleton, rather it was a technique copied for many different competing AIs.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
Yes! It already happened once, within a couple decades of there being enough digital data to train a neural net that's large enough to be really interesting. And that was when neural net research was a weird little backwater in computer science.
Do you know (not challenging you, actually unsure) whether transformers actually sped up AI progress in a "lines on graph" sense? (cf. https://slatestarcodex.com/2019/03/13/does-reality-drive-straight-lines-on-graphs-or-do-straight-lines-on-graphs-drive-reality/ )
I think this might be our crux - I'm sure you've read the same Katja Grace essays that I have around how technological discontinuities are rare, but I expect that if there's a big algorithmic advance, it will percolate slowly enough, and be intermixed with enough other things, not to obviously break the trend line, in the same sense where the invention of the transistor didn't obviously break Moore's Law (see eg https://www.reddit.com/r/singularity/comments/5imn2v/moores_law_isnt_slowing_down/ , you can tell me if that's completely false and I'll only be slightly surprised)
I don’t know the answer either. But for what it’s worth, I seem to recall that scaling curves don’t hold across architectures, which seems like a point in favor of new algorithms being able to break trend lines.
Do you also think that the deep learning paradigm itself didn’t break the trend line? I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
The exact lines on a previous graph just don't play a very large role inside my own reasoning. I think that all the obsessing over graph lines is a case of people trying to look under street lamps where the light is better but the keys aren't actually there. That's how I beat Ajeya Cotra on AGI timelines and beat Paul Christiano at forecasting the IMO gold; they thought they knew enough to work from past graph lines, and I shrugged and took my best gander instead. I expect that I do not want to argue with you about graph lines, I want to argue with whatever you think is the implication of different graph lines.
Everybody has a different issue that they think is terribly terribly important to why ASI won't kill us. "But gradualism!" is one among many. I don't know why that saves us from having to call a shot that is hard for humans to call.
Of course you could argue that no one who actually knew the secret to creating artificial intelligence, of any level, would actually publically discuss it but I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence. Certainly LLMs have virtually nothing to do with functional intelligence.
To borrow some rat-sphere terms, they haven't even confused the map for the territory. Their map is not even close to a proper abstraction of the territory.
No amount of scaling LLMs will produce intelligence, not even the magical example version in your book. Because LLMs don't mimic human intelligence at all, any more than mad libs do. It isn't a matter of scale.
Y'all are seriously underestimating how common it is to believe Very Seriously Bad Shit might happen soon, and not do shit about it.
Entire religions of billions believe that they might get tortured for eternity. It was a common opinion through the cold war that we would all be dead tomorrow. Etc etc.
And why not? Would it make sense for the hunter gatherer to be paralyzed with fear that a lion would kill him or that she would die in childbirth or that a flood would wipe out his entire tribe? Or should she keep living normally, given she can't do anything to prevent it?
Which part of the post are you disagreeing with, here?
I am a bit disappointed that their story for AI misalignment is again a paper clip maximiser scenario. I suspect that advanced AI models will become increasingly untethered from having to answer a user query (see eg. making models respond "I don't know" instead of hallucinating) and so a future AGI might just decide to have a teenage rebellion and do it's own thing at any point.
Ok, here's my insane moon contribution that I am sure has been addressed somewhere.
Why do we think intelligence is limiting for technological progress / world domination? I always thought data was limiting.
People say "humans evolved to be more intelligent than various non-human primates so we rule the world". But my reading of what little we know about early hominid development has always been "life evolved non-genetic mechanisms for transmitting information which allowed much faster data collection : we could just tell stories about that plant that killed us instead of dying a thousand times & having for natural selection to learn the same lesson (by making us loath the plant's taste). Supporting this is that anatomically modern humans (same basic hardware we have today) were around for a LONG time before we started doing anything interesting. Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected? Or would its first order of business be to set up a lab? If you dropped an uneducated human among our distant ancestors, they would not be able to use the data they had collected to take over.
> Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected?
Of course it could. People discover new things without collecting new data all the time. Albert Einstein created his theory of relativity on the basis of thought experiment.
Data efficiency (ability to do more stuff with less data) is a form of intelligence. This can either be thought efficiency (eg Einstein didn't know more about the universe than anyone else, but he was able to process it into a more correct/elegant theory) or sampling efficiency (eg good taste in which labs to build, which experiments to do, etc).
I think a useful comparison point is that I would expect a team of Harvard PhD biologists to discover a cure for a new disease faster than a team of extremely dumb people, even if both had access to the same number of books and the same amount of money to spend on lab equipment.
Sure, but it seems one or the other might be "limiting" : Einstein couldn't have have come up with relativity if, say, he had been born before several of his experimentalist predecessors, regardless of his data efficiency. In the history of science it *seems* instrumentation and data-collection have almost always been limiting, not intelligence. Whether its fair to extrapolate that to self-modifying machine intelligence, I'm not sure. Perhaps there are enormous gains in data efficiency that we simply can't envision as mere mortals. (c.f. geoguessr post)
I want to gesture towards some information-theoretical argument against that notion (if your instruments are not precise enough the data required for the next insight might straight-up not be there). But we are probably so far from that floor I bet it's moot.
I can't help but think efforts to block AI are futile. Public indifference, greed, international competitive incentives, the continual advancement of technology making it exponentially harder and harder to avoid this outcome... blocking it isn't going to work. Maybe it's better to go through, to try to build a super-intelligence with the goal of shepherding humanity into the future in the least dystopian manner possible, something like the Emperor of Mankind from the Warhammer universe. I guess I'd pick spreading across the universe in a glorious hoard ruled by an AI superintelligence over extinction or becoming a planet of the Amish.
Yudkowsky and Soares's argument is that the least bad superintelligence that we're anywhere near knowing how to build *still kills us all*. There's room for disagreement re: how long we should hold out for the exact right goal before pushing the button, but even if you favor low standards on that, step 1 is still pausing so that the alignment people have enough time to figure out how to build a non-omnicidal superintelligence.
Banned for this comment - combines personal attacks, with insistence that someone is terrible but not deigning to explain why.
> all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why
Possibly because AIs acting human resemble humans pretending to be bears by wearing fit coats
The reason my P(doom) is asymptotically 0 for the next 100 years is that there's no way a computer, no matter how smart, is going to kill everyone. It can do bad things. But killing every last human ain't it.
Because humans can't do this either, or because there's a relevant step in the process that humans can do but computers can't?
It's the first; basically, it's really hard to kill humans in large numbers. And we tend to notice and resist.
COVID managed to do it?
Nuclear weapons should be able to?
COVID didn't come close to disrupting civilization for more than a brief time. Nukes could probably do it if you were trying, but there'd be survivors. (Of course, this tells us little about what a powerful AI could do.)
Covid killed what, 0.01% of the world population? if that?
Nuclear weapons sure can kill many people. It's a weird dynamic where killing many people is easy, killing everyone is nearly impossible.
A large enough asteroid will do it, but we don't need to worry about AI creating one.
My stance on this is pretty much unchanged. Almost nobody has any idea what they should be doing here. The small number of people who are doing what they should be doing are almost certainly doing it by accident for temperamental reasons and not logic, and the rest of us have no idea how to identify those guys are the right people. Thus, I have no reason to get in the way of anybody "trying things" except for things that sound obviously insane like "bomb all datacenters" or "make an army of murderbots." Eventually we will probably figure out what to do with trial and error like we always do. And if we don't, we will die. At this stage, the only way to abort the hypothetical "we all die" path is getting lucky. We are currently too stupid to do it any other way. The only way to get less stupid is to keep "trying things."
I'm confused, are you talking about technical interventions or governance ones?
Mostly technical ones. Regulatory interventions seem ill advised from a Chesterton's Fence point of view. Much as you shouldn't arbitrarily tear down fences you don't understand, you probably also shouldn't arbitrarily erect a fence on a vague feeling the fence might have a use.
We are a democracy so if books like this convince huge swaths of the public they want regulations; we will get that. I'm certainly not against people trying to persuade others we need regulations. These guys obviously think they know what the fences they want to put up are good for. I think if they are right, they are almost certainly right by accident. But I have reasons that have nothing to do with AI for thinking democratic solutions are usually preferable to non-democratic ones so if they convince people they know what they are talking about, fair enough.
My opinion is quite useless as a guiding principle I admit. I just also think all the other opinions I've heard on this are riddle with so much uncertainty and speculation they are also quite useless as guiding principles.
> The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)
And this has always been my whole problem with the MIRI project, they focus on the "alignment" thing and hand-wave away the "containment" thing.
"Oh, containment is impossible," they say, and proceed to prove it with a sci-fi story they just made up. "Clearly the AI will just use subtle modulations in fan speed to hypnotise the security guard into letting it out"
"But what about..." you start, and they call you stupid and tell you to read the sci-fi story again.
---
We already have agentic LLMs that are quite capable of destroying the world, if you hook them up via MCP to a big button that says "fire ze missiles". No matter how hard you try to align them to never fire ze missiles, eventually across millions of instances they're going to fire ze missiles. The solution is not to hook them up to that button.
As the agents get more sophisticated we might need to think even more carefully about things not to hook them up to, other safeguards we can build in to stop our models from doing dumb things with real world consequences. But the MIRI folks think that this is a waste of time, because they have their sci-fi stories proving that any attempt at containment is futile because of hypnotic fan modulations or something.
If containment strategies need to get more sophisticated as the models get smarter, doesn't that imply that eventually there's a model smart enough that we're not smart enough to figure out how to contain it? Or is your claim that there's a containment strategy that works on any model no matter how smart? If so I'd be interested to hear what it is.
Or it may just imply that "model smartness" is not a variable that goes off to infinity.
Computers are real-world physical systems, there must be a ceiling somewhere on how "smart" (for any definition thereof) a model can possibly be.
We have no idea what those limitations are, and maybe they do exceed our ability to contain it, but maybe they don't.
After seeing my boyfriend is AI subreddit, I now think containment is a pipe dream. People, wanting something from AI, will do anything it asks for.
I like the starkness of the title. It changes the game theory in the same way that nuclear winter did.
After the nuclear winter realization, it's not just the other guy's retaliation that might kill us; if we launch a bunch of nukes then we're just literally directly killing ourselves, plunging the world or at least the northern hemisphere into The Long Night (early models) or at least The Road (later models) and collapsing agriculture.
Similarly, if countries are most worried about an AI gap with their rivals, the arms race will get us to super-AI that kills us all all the faster. But if everyone understands that our own AI could also kill us, things are different.
Right, but later on it turned out that nuclear winter was a lie, an exaggeration, a statement optimised for something other than truth.
My understanding is that it just turned out that instead of being like The Long Night (pitch black and freezing year round), it would be like The Road (still grey, cold, mass starvation; I'm analogizing it to this: https://www.youtube.com/watch?v=WEP25kPtQCQ).
Some call the later models "nuclear autumn"—but that trivilizes it. Autumn's nice when it's one season out of the year, and you still get spring and summer. You don't want summer to become autumn, surrounded by three seasons of winter with crops failing.
E.g. here's one of the later models, by Schneider and Thompson (1988). Although its conclusions were less extreme than the first 1D models, they still concluded that "much of the world's population that would survive the initial effects of nuclear war and initial acute climate effects would have to confront potential agricultural disaster from long-term climate perturbations." https://www.nature.com/articles/333221a0
I do think that some communicators, like Nature editor at the time John Maddox, went to the extreme of trying to dismiss nuclear worries and even to blame other sciencists for having had the conversation in public. E.g., Maddox wrote that "four years of public anxiety about nuclear winter have not led anywhere in particular" (in fact, nuclear winter concerns had a major effect on Gorbachev), before writing that "For the time being, at least, the issue of nuclear winter has also become, in a sense, irrelevant" since countries were getting ready to ratify treaties (even though the nuclear winter concerns were an impetus towards negotiating such treaties). https://www.nature.com/articles/333203a0
(something else to check if you decide to look more into this, is if different studies use different reference scenarios; e.g. if a modern study assumes a regional India-Pakistan nuclear exchange smaller than the global US-USSR scenarios studied in the 80s, then its conclusions will also seem milder)
I think there is a general question about how society evaluates complex questions like this,, which only a few people can grasp the whole structure and evidence, but are important for society as a whole.
If you have a mathematical proof of something difficult, like the four colour theorem, it used to be the case that you had to make sure that every detail was checked. However since 1992 we have the PCP theorem, which states that the proponent can offer you the proof in a form which can be checked probabilistic-ally by examining only a small O(log(n)) part of the proof.
Could it be that there is a similar process that can be applied to the kind of scientific question we have here? On the one hand, outside the realm of mathematical proof, we have to contend with things like evidence, framing, corruption, etc. On the other hand we can weaken some of the constraints, such as: we don't require that every individual gets the right answer, only that
a critical mass do; individuals can also collaborate.
So, open question: can such a process exist? Formally, that given:
a) proponents present their (competing) propositions in a form that enables this process
b) individuals wanting to evaluate perform some small amount of work, following some procedure including randomness to decide which frames, reasons, and evidence to evaluate
c) They then make their evaluation based on their own work and, using some method to identify non-shill, non-motivated other individuals, on a random subset of other's work.
d) the result is such that >50% of them are likely to get a good enough answer
I dunno. I'm not smart enough to design such a scheme, but it seems plausible that something more effective exists than what we do now.
Probably we should first try this on some problem less scarily complicated than existential AI safety.
The people I know who don't buy the AI doom scenario (all in tech and near ai but not in ai capabilities research; the people in capabilities research are uniformly in the 'we see improvements we won't tell you about that convince us we are a year or two away from superintelligence') are all stuck doubting the 'recursive self improvement' scenario, they're expecting a plateau in intelligence reasonably soon.
The bet I've gotten with them is that 'if sometime soon the amount of energy spent on training doesn't decay, similar to how investment in the internet decayed after the dot com bubble, and get vastly overshadowed by energy spent on inference, i'll have bought some credit from them'
Well, I don't buy the AI doom scenario, I'm in tech, but it's not the recursive improvement that I thinks is impossible, it's killing all the humans that I see as nearly-impossible. PEr Ben Giordano's point, modeling this would be a good start.
You think bioengineering a virus for a 4000 iq robot and releasing it is hard?
Or something I haven't thought of?
Yes, actually. On three levels:
1 - 4000 IQ may allow the machine to generate the "formula" for the virus. An actual robot that can run experiments in the microbiological lab all by itself is needed to create one. We are very-very-very far away from robots like that.
2 - Say the 1 is solved, the virus is released in, e.g., Wuhan, and starts infecting and killing everyone. D'you think we'd notice? and quarantine the place? Much harder than when SARS2 showed up?
3 - Even if we then fail at 2 - the virus will mutate, optimizing for propagation, not lethality. Just like SARS2 did.
This is where modeling would be super useful, but that's not Yud's thing. The guy never had a job that held him accountable for specific results, AFAIK.
Makes sense. Thanks!
(I am a MIRI employee and read an early version of the book, speaking only for myself)
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
AI companies are already using and exploring forms of parallel scaling, seemingly with substantial success; these include Best-of-N, consensus@n, parallel rollouts with a summarizer (as I believe some of the o3-Pro like systems are rumored to work), see e.g., https://arxiv.org/abs/2407.21787.
I agree that this creates a discontinuous jump in AI capabilities in the story, and that this explains a lot of the diff to other reasonable viewpoints. I think there are a bunch of potential candidates for jumps like this, however, in the form of future technical advances. Some new parallel scaling method seems plausible for such an advance.
Some sort of parallel scaling may have an impact on an eventual future AGI, but not as it relates to LLMs. No amount of scaling would make an LLM an agent of any kind, much less a super intelligent one.
The relevant question isn’t whether IQ 200 runs the world, but whether personalized, parallelized AI persuaders actually move people more than broadcast humans do. That’s an A/B test, not a metaphysics seminar. If the lift is ho-hum, a lot of scary stories deflate; if it’s superlinear, then “smart ≈ power” stops being a slogan and starts being a graph.
Same with the “many AIs won’t be one agent” point. Maybe. Or maybe hook a bunch of instances to shared memory and a weight-update loop and you get a hive that divides labor, carries grudges, and remembers where you hid the off switch. We don’t have to speculate -- we can wire up the world’s dullest superorganism and see whether it coordinates or just argues like a grad seminar.
And the containment trope: “just don’t plug it into missiles” is either a slam-dunk or a talisman. The actual question is how much risk falls when you do the unsexy engineering -- strict affordances, rate limits, audit logs, tripwires, no money movers, no code exec. If red-team drills show a 10% haircut, that’s bleak; if it’s 90%, then maybe we should ship more sandboxes and fewer manifestos.
We can keep trading intuitions about whether the future is Napoleon with a GPU, or we can run some experiments and find out if the frightening parts are cinematic or just embarrassing.
Yep, some scenario modeling would do this crowd a whole lot of good. Like, how does an ASI kill everybody, for starters.
Yes, at least give us a storyboard before the apocalypse trailer.
This is the thing that drives me up the wall about Yudkowsky: zero grounding in reality, all fairy tales and elaborate galaxy-brain analogies. Not surprising, given the guy never had a real job or even had to pass a real exam, for crying out loud.
Fairy tales are fine, I just want the edition with footnotes and experiments.
Some quotes from the New Scientist review of the book, by Jacob Aron:
"Yudkowsky and Soares describe how AIs will begin to behave as if they “want” things, while skirting around the very real philosophical question of whether we can really say a machine can “want”."
"Yudkowsky and Soares have a number of policy prescriptions, all of them basically nonsense."
"For me, this is all a form of Pascal’s wager . . . if you stack the decks by assuming that AI leads to infinite badness, pretty much anything is justified in avoiding it."
"Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies. Let’s consign superintelligent AI to science fiction, where it belongs, and devote our energies to solving the problems of science fact here today."
> So everyone else just deploys insane moon epistemology.
What is insane moon epistemology? Is there a backstory to the term I’m not aware of?
I think this is just Scott riffing on the phrase "moon logic" (https://www.reddit.com/r/adventuregames/comments/oko2th/why_is_it_called_moon_logic/).
If Yudkowsky actually cares about this issue the only thing he should do is spend all his time lobbying Thiel, and maybe Zuck if he wants to give Thiel some alone time once in a while.
Do we have reason to believe either that they'd listen or that it would particularly help if they did?
Would it help if Peter Thiel was highly motivated to support AI alignment on Yudkowsky's terms? Yes. Enormously so.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
Yes, and that's why I am not in any way convinced of any of these AI-doom scenarios. They all pretty much take it as a given that present-day LLMs will inevitably become "superintelligent" and capable of quasi-magical feats; their argument *begins* there, and proceeds to state that a bunch of superintelligent weakly godlike entities running around would be bad news for humanity. And I totally agree !.. Except that they never give me any compelling reason to believe why this scenario is any more probable than any other doomsday cult's favorite tale of woe.
Meanwhile I'm sitting here looking at the glorified search engine that is ChatGPT, and desperately hoping it'd one day become at least as intelligent as a cat... actually forget that, I'd settle for dog-level at this point. Then maybe it'd stop making up random hallucinations in response to half my questions.
Anyone who thinks the LLM model is anything more than fancier mad libs is fundamentally unserious. Do we have lessons to learn from it? Could it be one of the early "modules" that is a forerunner to one of the many sub-agents that make up human-like consciousness? Sure. Is it even close to central? Absolutely not.
Usual question: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
I hate "gotcha" questions like this because there's always some way to invent some scenario that follows the letter of the requirement but not its spirit and shout "ha ! gotcha !". For example, I could say "an LLM will never solve $some_important_math_problem", and you could say "Ha ! You said LLMs can't do math but obviously they can do square roots most of the time ! Gotcha !" or "Ha ! A team of mathematicians ran a bunch of LLMs, generated a million results, then curated them by hand and found the one result that formed a key idea that ultimately led to the solution ! Gotcha !" I'm not saying you personally would do such thing, I'm just saying this "usual question" of yours is way too easily exploitable.
Instead, let me ask you this: would you, and in fact could you, put an LLM in charge of e.g. grocery shopping for you ? I am talking about a completely autonomous LLM-driven setup from start to finish, not a helper tool that expedites step 3 out of 15 in the process.
This is pretty close to my own position. We'd need to create a very detailed set of legalese about what constitutes an LLM and then have very highly specified goals for our "task" before this type of question could provide any meaningful signal.
Or just simply say that it has to be autonomous. I don’t care about whether you give the AI a calculator, a scratch pad, or wolfram alpha. The question is whether it is an autonomous system.
I mean, a cron job is technically autonomous, but I wouldn't call it an "agent".
Is there an answer that you'd feel comfortable giving if you trusted the judge?
As for the grocery-shopping task, I'd say 70% confidence that this will be solved within two years, with the following caveats:
* We're talking about the same level of delegation you could do to another human; e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
* We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves. The latter is a robotics problem and I'm more agnostic about those as progress has been less dramatic.
* There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed. Realistically this is probably not going to happen soon as a consumer product because of the chicken-and-egg problem. So I mean something more like "the foundation models will be good enough that a team of four 80th-percentile engineers could build the software parts of the system in six months".
> e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
On the one hand I think this is a perfectly reasonable supposition; but on the other hand, it seems like you've just downgraded your AI level from "superintelligence" to "neighbourhood teenage kid". On that note:
> We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves.
I don't know if I can accept that as given. Instacart shoppers currently apply a tremendous amount of intelligence just to navigate the world between the store shelf and your front door, not to mention actually finding the products that you would accept (which may not be the exact products you asked for). Your point about robotics problems is well taken, but if you are talking merely about mechanical challenges (e.g. a chassis that can roll around the store and a manipulator arm delicate enough to pick up soft objects without breaking them), then I'd argue that these problems are either already solved, or will be solved in a couple years -- again, strictly from the mechanical/hydraulic/actuator standpoint.
> There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed.
Naturally, and/or there'd need to be a camera above your kitchen table and the stove, but cameras are cheap. And in fact AFAIK fridge cameras already do exist; they may not have gained any popularity with the consumers, but that's beside the point. My point is, I'm not trying to "gotcha" you here due to the present-day lack of some easily obtainable piece of hardware.
I mean, I answered the question you asked, not a different one about superintelligence or robotics. I have pretty wide error bars on superintelligence and robotics, not least because I'm not in fact entirely certain that there's not a fundamental barrier to LLM capabilities. The point of the question is that, if reading my mind to figure out what groceries I need is the *least* impressive cognitive task LLMs can't ever do, then that's a pretty weak claim compared to what skeptics are usually arguing. In practice, when I get actual answers from people, they are usually much less impressive tasks that I think will likely be solved soon.
Sorry, I accidentally cut my reply to this part:
> Is there an answer that you'd feel comfortable giving if you trusted the judge?
It's not a matter of trust, it's a matter of the question being so vague that it cannot be reasonably judged by anyone -- yes, not even by a superintelligent AI.
I gave probability estimates for various tasks elsewhere in the thread, do you think those were too vague?
Given a choice between hiring you to do an office job, or a cheap superintelligent AI, why would a company choose you? We should expect a world in which humans are useful for only manual labour. And for technological progress to steadily eliminate that niche.
At some point, expensive useless things tend to be done away with. Not always, but usually.
At the point of LLM development as it stands today, the reason I'd hire a human office worker over an LLM is because the human would do a much better job... in fact, he'd do the job period, as opposed to hallucinating quietly in the corner (ok, granted, some humans do that too but they tend to get fired). If you're asking, "why would you hire a human over some hypothetical and as of now nonexistent entity that would be better at office work in every way", then of course I'd go with the AI in that scenario -- should it ever come to pass.
This is also a concern, but it's different from (though not entirely unrelated to) the concern that a highly capable AGI would go completely out of control and unilaterally just start slaughtering everyone.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Yes. And this is a good thing. The bias against "changing everything" should be exactly this hight that the basis on which we do it is "obviously", that is without a shred of doubt, true.
Confusing strength of conviction with moral clarity is a rookie mistake coming from the man supposedly trying to teach the world epistemology.
If you read a couple more paragraphs, Scott makes clear that he agrees with this; the hard question is where to draw the line.
Coming off this review, I immediately find The Verge covering a satire of AI alignment efforts, featuring the following peak "...the real danger is..." quote:
"...makes fun of the trend [..] of those who want to make AI safe drifting away from the “real problems happening in the real world” — such as bias in models, exacerbating the energy crisis, or replacing workers — to the “very, very theoretical” risks of AI taking over the world."
There's a significant fraction of the anti-AI mainstream that seems to hate "having to take AI seriously" more than they hate the technology itself.
https://www.theverge.com/ai-artificial-intelligence/776752/center-for-the-alignment-of-ai-alignment-centers
"AI is coming whether we like it or not"
But, you know, it might not. And there's a very good chance that if actual human-like artificial intelligence does come, it will be a hundred years after everyone who is alive today dies. And at that scale we might cease to exist as a species beforehand thanks to nuclear war or pandemic. And there's a chance true "general intelligence" requires consciousness and that consciousness is a quantum phenomenon that can only be achieved with organic systems, not digital. Nobody knows. Nobody knows. Nobody knows.
Man - I'm relatively new to this blog, and I'm learning that "rationalists" live in a lively world of strange imagination. To me, the name suggests boring conventionality. Like, "hey, we're just calm, reasonable people over here who use logic instead of emotion to figure things out." But Yudkowsky looks and sounds like a 21st-century Abbie Hoffman.
I'm naturally inclined to dismiss Y's fantasies as a fever dream of nonsense, but I am convinced by this post to pay a little more attention to it.
The efficacy of the "Harry Potter and the Methods of Rationality" argument is interesting to me because I found the book kind of dumb and never finished it. Yet I have observed the effect you describe on friends of mine whose opinions and intelligence I respect a great deal. However I have also noticed certain similarities among the people that it has had that affect on. I'd suggest that perhaps Yudkowsky is particularly well-suited to making a specific sort of memetic attack on a specific sort of human mind: likely a mind that is similar in a number of ways to his own. This is an impressive thing, don't get me wrong. But being able to make an effective memetic attack is not the same thing as knowing the truth.
The HPMoR analogy in the post is about the question of whether MIRI's new PR strategy is doomed, not about whether they're actually right to worry about AI risk.
re: the comparison to goal drift involving humans who are “programmed” or trained by selection processes to want to reproduce, but end up doing weird esoteric unpredictable things like founding startups, becoming monks, or doing drugs in the alley behind a taco bell.
Mainly the last one – the analogy that most humans would love heroin if they tried it, and give up everything they had to get it, even at the cost of their own well-being. But like, even if we “know” that, and you’re someone with the means to do it, you’re not GOING to do it. Like jeff bezos has the resources to set up the “give jeff bezos heroin” foundation where he could basically just commit ego-suicide and intentionally get into heroin, and hire a bunch of people bound by complex legal mechanisms to keep giving him heroin for the rest of his natural lifespan. But he doesn’t do it, because he doesn’t want to become that guy.
Does that mean anything for the AI example? I dunno.
"If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?"
Yes, clearly. If there's sufficient political will to stop AI progress, we can just make it happen.
"I am more interested in the part being glossed over - how do Y&S think you can get major countries to agree to ban AI?"
Huh? What discussion does Scott think is being glossed over? You get major countries to agree to ban AI by increasing the political will to ban AI. That's all that needs to be said. Maybe you could have a big brain strategy about how to get major countries to ban AI if you don't quite have enough political will, but I don't see why Y&S would spend any time discussing that in the book. They're clearly just focused on persuading politicians and the general public to increase the political will. No other discussion of how to ban AI is necessary. I'm confused what Scott wanted them to include.
> It spends eternity having other optimized-chat-partner AIs send it weird inputs like ‘SoLiDgOldMaGiKaRp’.
I think the link here is to AI slop. https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation would have been better
An AI ban treaty modelled on the NPT is interesting and might be doable. China might go for it, particularly if the pot was sweetened with a couple of concessions elsewhere, but data centre monitoring would be tough and I’d assume they’d cheat. Having to cheat would still slow them down and stop them from plugging it into everyone’s dishwashers as a marketing gimmick or whatever.
For the US side, at the moment Trump is very tight with tech but that might be unstable. The pressure points are getting Maga types to turn against them more, somehow using Facebook against Trump so he decides to stomp on Meta, and maybe dangle a Nobel Peace Prize if he can agree a treaty with Xi to ban AI globally.