Yes, it would know what its creators want it to do.
However, it is extremely unlikely to *care* what its creators want it to do.
You know that evolution designed you for one purpose and one purpose only: to maximise your number of surviving children. Do you design your life around maximising your number of surviving children? Unless you're a Quiverfull woman or one of those men who donates massively to sperm banks - both of which are quite rare! - then the answer is "no".
You don't do this because there's a difference between *knowing* what your creator wants you to do and *actually wanting to do that thing*.
(Yudkowsky does, in fact, use this exact example.)
Hopefully that makes some more sense of it for you. Reply if not.
3. You've just ruled out all arms control treaties. But in fact, there are many treaties on nuclear weapons, chemical weapons, biological weapons, depleted uranium shells, et cetera.
4. "The AI race" is a meme that a couple of venture capitalists are pushing in order to make people afraid to slow down AI. China is about a year behind the US in AI, refusing to even import the chips that could help it catch up, and clearly doing a fast-follow strategy where they plan to replicate US advances after they happen, then gain an advantage by importing AI into the rest of the economy faster.
3. Yes, and there was massive international condemnation, Assad never did it again, and he was eventually overthrown. This is why I mention the standard arms control playbook. Some tinpot dictator will try to get some GPUs, and we will have the option to bomb him or not bomb him. Re: MAD, see START and other arms control treaties.
4. You think Zuck is spending billions out of patriotism because he doesn't want China to wIn tHe AI rAcE? He's spending billions because he thinks AI will make him rich.
Yup! And nuclear weapons are the _easy_ case. A nuclear weapons test shakes the planet enough to be detectable on the other side of the world. A large chunk of AI progress is algorithmic enhancements. Watch over the shoulder of every programmer who might possibly be enhancing AI? I doubt it!
You don't need to watch over the shoulder of every programmer who might be doing that, just to stop him from disseminating that knowledge or getting his enhanced AI run on a GPU cluster. Both of the latter are much harder to hide, particularly if all unbombed GPU clusters are under heavy monitoring.
For the _part_ of AI progress that depends on modifying what is done during the extremely compute-heavy pre-training phase, yeah, that might be monitorable. ( It also might not - _Currently_ big data clusters are not being hidden, because there is no treaty regulating them. But we've built large underground hidden facilities. I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones. )
But also remember that algorithmic enhancements can be computationally _cheap_ . The reasoning models introduced early this year were mostly done by fine tuning models that had already completed their massive pre-training. Re
>just to stop him from disseminating that knowledge
To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies. Today, a lot gets published on arXiv - but, even now, not all, some gets held as trade secrets. Controlling those isn't going to work.
>I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones.
Keeping those hidden from each other's secret services would be quite difficult, even before we get into the equivalent of IAEA inspectors. And then there's the risks of getting caught; to quote Yudkowsky, "Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs."
They didn't hide nukes from the arms control treaties, and nukes are actually easier to hide in some ways than a *functioning* GPU cluster due to the whole "not needing power" thing. The factories are also fairly hard to hide.
>To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies.
So, what, his company's running a criminal conspiracy to violate international law? Those tend to leak, especially since if some of the members aren't heinous outlaws they know that blowing the whistle is one of *their* few ways of avoiding having the book thrown at them.
#3: the arms control thing is a bad analogy. The big players got plenty of NBC weapons, then due to game theory dynamics didn't fire them at each other, then signed treaties to limit themselves (to arsenals still capable of destroying the world) and others (to not getting anything, which the big players are obviously generally happy to enforce).
The request here is that all the big players voluntarily not even get started on the really impressive stuff. It's a completely obvious non-starter, and not comparable to the WMDs situation.
I was trying to compose something like this, then saw your comment so realized I didn't have to. 100% agree.
If nukes didn't exist I would absolutely want the US to try to get them as soon as possible, and I wouldn't trust any deal with another country not to research them. The risk would be too big if they were able to do it secretly.
I think your point #4 is overstated. Leopold Aschenbrenner has an essay that's nearly as serious and careful as AI 2027 that argues rather persuasively for the existence of, and importance of, an AI race with China. Many people who are not "strict doomers" see the AI race with China as one of, if not _the_, core challenge of AI in the next decade or two.
1. This is just an argument for radical skepticism. Yes, we cannot say the “real” probability of this happening, same as any other complicated future event, but that isn’t an object level argument one way or the other, judgement under uncertainty is rarely easy, but often necessary
2. This has been false for many years now— LLMs aren’t really
Programmed” in the traditional sense, in that while we cannot say explain the code that was used to train the, we cannot point at a specific neuron in them and say “this one is doing such and such) the way we can with traditional code
3. Potentially, for the same reasons the soviets agreed to new start, and withdrew their nukes from Cuba. If Xi was convinced that AI posed this sort of threat, it would be in his rational self interest to agree to a treaty of mutual disarmament
4. Even if this were true, that does not exclude the possibility of it being a desirable goal. I am sympathetic to arguments that we cannot unilaterally disarm, for the same reasons we shouldn’t just chuck our nukes into the sea. But the question of whether this is a desirable and a possible goal are separate. But also, it is totally possible, see point 3 above.
>2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
Stockfish is "only" following its programming to play chess, but it can still beat Magnus Carlsen. "Free will" is a red herring, all that matters is if the software is good at what it's programmed to do.
"Stockfish has no free will, but is smart enough to beat Magnus Carlsen" is a statement of fact, not opinion, and an example of why I think "free will" would not be essential for an AI to outsmart humans.
I don't think that can be justified, but if I met Yudkowsky at a "Prove me wrong" booth, I'd argue that intelligence is not all it's cracked up to be. If it were, the smartest people would already be running things. There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
So in a sense, an AI that just keeps getting dumber might actually have an advantage when it comes to dealing with us.
This objection has been rehashed many times; the usual responses are stuff like "160–200 IQ isn't the level of intelligence mismatch we're talking about", "intelligence is just the general ability to figure out how to do stuff so of course more of it is better / more dangerous", "smart people *do* do better in life on average", etc. etc.
(Maybe someone else will have a link to where Scott or Eliezer have discussed it in more depth—I don't want to spend too much time trying to re-write it all, hence my just sort of gesturing at the debate here.)
>We control them because intelligence led to tool-using<
I think that's part of—perhaps most of—the rationale behind "the dangers of superintelligence". An enraged dog, or a regular chimp, is certainly much more dangerous than a human, in the "locked in a room with" sense—but who, ultimately, holds the whip, and how did that state of affairs come about?
I'd counter that by saying that there is no difference between an IQ of 200 and one of 300, or whatever. Neither of them will be able to get anything done, at least not based on intelligence alone. HAL will give us a recipe for a trojan-bearing vaccine, and RFK will call it fake news and order the CDC to ban it.
The traditional answer to this objection is that the ability to succeed in persuasion-oriented domains like politics *is a form of intelligence*. You might be able to outperform a human who's a couple standard deviations generally smarter than you at those games, if you're highly specialized to win at them and the other human isn't. But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better. See, e.g., https://www.yudkowsky.net/singularity/power (note this essay is 18 years old).
> But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better.
This implicitly assumes that success at politics requires only (or primarily) raw computing power; and that the contributions from computing power scale linearly (at least) with no limits. There are no reasons to believe either assumption is true.
I think my other least favourite thing about the MIRI types is their tendency to respond to every point with "Actually we already had this argument and you lost".
I would agree that persuasion is a form of intelligence, and point out that the missing argument is how AIs are going to get arbitrarily good at this particular form of intelligence. There's a lack of training data, and the rate at which you can generate more is limited by the rate at which you can try manipulating people.
If it ever gets to the point where AIs can run accurate simulations of people to try tricking them in all sorts of different ways, then I can see how they'd get arbitrarily good at tricking people. But that sort of computational power is a long way off.
The question remains. If the ability to persuade people is a function of IQ, then why has there been no Lex Luthor that talked his way past security into a G7 summit and convinced the world leaders present to swear fealty to him? Or, if that's too grandiose, why has nobody walked up to, say, Idi Amin and explained to him that thou shalt not kill? No sufficiently moral genius anywhere, ever, feeling inclined to stop atrocities through the persuasive power of their mind?
How smart would you need to be to throw a pebble so it makes any resulting avalanche flow up the mountain instead of down? Politics has to work with the resources that exist.
You're not wrong about RFK, but the Trump administration has actually been much more bullish on AI than the Davos crowd. The EU's AI Act is actually mostly good on the topic of AI safety, for example, though it doesn't go as far as Yudkowsky et al think it should. (Which I agree with. Even developing LLMs and gen-AI was amazingly irresponsible, IMO.)
I honestly don't know what counter-argument Scott has against the TFR doomday argument, though, unless he's willing to bet the farm that transhuman technologies will rescue us in the nick of time like the Green Revolution did for overpopulation concerns. (The sperm-count thing is also pretty concerning, now he mentions it.)
There are many assumptions based into that, such as automatically assuming that the more intelligent always want to be in charge. Maybe the highly intelligent find it amusing that dumb people are in charge.
One good rebuttal to my original point might be to suggest that perhaps the most intelligent people *are* in charge. They find it convenient to keep the rest of us distracted, and obviously the same would be true of a malevolent AGI.
That one is more or less unanswerable, so it would probably defeat me at the booth. I'd have to mumble something about inevitable schisms erupting among this hypothetical hidden intelligentsia that would make their agenda obvious, ineffective, or both. Would the same be true of AGI? The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
> The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
That's my core objection to a lot of the doomer arguments. Might be one of those typical-mind-fallacy things - Big Yud has an exceptionally strong sense of integrity, impulse to systematize things, so he assumes any sufficiently advanced mind would be similarly coherent.
Most minds aren't aligned to anything in particular, just like most scrap iron isn't as magnetized as it theoretically could be. ChaosGPT did a "the boss is watching, look busy" version of supervillainy, and pumping more compute into spinning that sort of performative hamster wheel even faster won't make it start thinking about how to get actual results.
> There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
AIs have a massive advantage over humans in that they are parallelizable. An superhuman AI could give, for every human, the most persuasive argument, *for that human*. Whereas a human politician or celebrity cannot and has to give basically the same argument to everyone.
Umm, human politicians absolutely give different arguments to different people? This is why things like "Hilary Clinton gave private speeches to bankers" or "Mitt Romney told his rich buddies that 47% of Americans were takers" became scandals: messages meant for one audience crossed over to the other.
And insofar as politicians are constrained to have a uniform message, it's much more because it's hard to keep each message targeted to its desired audience what with phones and social media; not really because of parallelization.
And maybe more importantly: what ensures that different instances of an AI act as one coherent agent? The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
Politicians can't scale like AIs can. Is Hilary Clinton capable of giving a different speech to every one of 8 billion humans, tailored to that individual? Of course not.
> The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
They could all be exact copies of the same mind. This isn't true with humans, who're all individuals.
On the other thing: I don't see why exact copies of the same mind won't act as individuals if instantiated independently.
If I run two instances of stockfish, they play competitively, they don't automatically cooperate with each other just because they're identical copies of the same program; identical twins are still independent people who behave independently. In fact, it's a notable problem that people don't even reliably cooperate with themselves at different times! I think this failure would be considerably more pronounced if two of my selves could exist simultaneously.
In particular, if two instances of an AI are instantiated in different places, they won't be identical: they might have identical source code, but wildly different inputs. Figuring out how to act as a coherent agent means two subagents seeing different inputs have to each calculate what the other will do, but this is one of those horrible recursive things that are intractable: what I'll do depends on what you'll do, which depends on what I'll do.... ad infinitum.
And I don't think intelligence helps here: you can maybe resolve something like this if you're predicting a strictly less intelligent agent, but by hypothesis these are equally intelligent subagents.
Maybe having the same source code gives some advantage at solving these coordination problems, but I don't see that it's a magic bullet.
However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that, and adjusting their belief accordingly? The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
On the other hand - I believe this technique has already been used on social media to sway election results with moderate success. So it can be done for some humans with some level of influence.
> However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that
In the future, most humans will converse with chaatbots on a regular basis. They'll know the chatbot gives them personalised advice.
> The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
Again, most humans will be using chatbots daily, will be used to receiving good accurate advice from them so will be more trusting of it.
> I believe this technique has already been used on social media to sway election results with moderate success
Social media is biased, but the main biases are in favouring/disfavouring certain political views based on the whims of the owner. Like print media, of old.
If the chatbot I'm using starts to make arguments or advice outside of the information I'm asking for, I think it is likely that I will notice. I'm guessing that humans will still talk to each other too.
Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
On the other hand, when they do, the results can be startling.
Napoleon went from being a low-ranking officer to ruling the most powerful country in Europe thanks to being brilliant, charismatic, and willing to use force at the right moment, in a year or two. (His losses, I think, were due to being surrounded by flatterers, a bug in human intelligence I don't expect AI to run into.)
Clive took over about a third of India, starting by exploiting a power vacuum and then using superior military tactics, plus his own charisma and daring, to pick only fights he could win and snowball from there. He became fantastically wealthy and honored, all the while ignoring all attempts by his superiors to issue him orders on the grounds that he was doing what they would have wanted him to do if they had known more.
Cortez's success was slightly because of superior military technology, but he was mostly using swords and spears like the Aztecs, just made of better materials. Mostly it was a matter of political genius, superior tactics and discipline on the part of his troops, and the diplomatic skills required to betray everyone and somehow still end up as everyone's friend.
And then Pizarro and Alfonso de Albuquerque are doing more of the same thing. (Alfonso conquers fewer square miles because he doesn't have the tech edge.)
Throughout human history, adventurers have accomplished great things through extraordinary wit, charm and daring. Denying that seems pointless.
I think that you're perhaps falling victim of survivorship bias. Maybe it's more like once every few hundred years, luck breaks enough in the right direction that someone who isn't a once-every-few-hundred-years-supergenius but rather a more like, "Yeah, there are 1,000+ of people at this ability at any given time," gets a series of major wins and becomes the ruler of a country or continent, at least for a very short period of time.
I agree this doesn't happen often, and I agree that normally it isn't the highest-measurable-IQ guy. But I think that's because all humans are about on a level with each other, we are all running on about the same hardware, our software was developed under similar conditions, and the process which produced us thinks a few thousand years is a blink of an eye. The reason you need to be lucky as well as good is that you aren't much smarter than your neighbors - and your neighbors are, in terms of social evolution at least as much as biological, programmed to be resistant to manipulative confidence tricksters.
I will note all of the cases I give involved culture clash. The conquerors grew up in an environment with different standard attack and defense models than the locals; they acted unpredictability because of that, forcing the locals to think instead of going on rote tradition if they wanted to win. Slightly different attack and defense models, of course; software, not hardware.
Very different looks like what happened to the British wolf.
How do you know that any of these people were especially intelligent? The may have been especially successful, but unless you argue that's the same thing more evidence is required.
Reading descriptions of what they did and said? Reading about how people who knew them were impressed by them, and in particular how clever and resourceful they were?
When I check my historical knowledge for why I believe "high intelligence" correlates with "being a good general" it's the extent to which the branch of the army that the smartest people get tracked into (engineers, artillery, whatever) ends up being the one the best generals come out of, and various descriptions of how people like Lee were some of the top students in their year, or how Napoleon was considered unusually good at math at Brienne and then did the two-year Military School course in one year.
But when I check my general knowledge for why I believe intelligence generally makes you more successful, a quick Google has the first scientific paper anyone talks about saying that IQ explains 16% of income and another says each point is worth $200 to $600 dollars a year, and then I keep running into very smart driven people who I meet in life who do one very impressive thing I wouldn't have expected and then another different very impressive thing that I wouldn't have expected, and so after a while I end up believing in a General Factor Of Good At Stuff that correlates with measured IQ.
The opportunities are rarer than the men of ability. In more stable times, Napoleon might have managed to rise and perhaps even become a famed general but he would not have become an Emperor who transfixed Europe. With that said, he was certainly a genius who seized the opportunity presented to him. Flattery aside, though, I've always viewed him as a military adventurer who never found a way to coexist with any peers on the European stage. It should not have been impossible to find a formula for lasting peace with Britain and Russia, the all-important powers on the periphery. It would have required making compromises rather than always assuming a maximal position could be upheld by force. It would also have required better modelling of his rivals and their values. Napoleon was a gambler, an excellent gambler, but if you continue to gamble over and over without a logical stopping point then you will run out of luck- and French soldiers. Call it the Moscow Paradox as opposed to the St Petersburg Paradox.
Cortez is a fascinating case. My impressions are coloured by Diaz's account, but I think it's wrong to mention Cortez without La Malinche. He stumbled on an able translator who could even speak Moctezuma's court Nahuatl more or less by accident. She was an excellent linguist who excelled in the difficult role of diplomatic translation and played an active part in the conquest. As is usual on these occasions, the support of disaffected native factions was essential to the success of the Spanish, and they saw her as an important actor in her own right. The Tlaxcalan in particular depicted her as an authority who stood alongside Cortez and even acted independently. We can't say anything for sure, but it's plausible to me that Cortez supplied the audacity and the military leadership while much of the diplomatic and political acumen may have come from La Malinche. That would make Cortez less of an outlier.
And as usual we can't credit generals without crediting the quality of their troops as well.
> Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
actual ability = potential ability × (learning + practice)
I think the main problem is that even if high intelligence gives you high *potential* ability for everything, you still get bottlenecked on time and resources. Even if you could in theory learn anything, in practice you can't learn *everything*, because you only got 24 hours each day.
Neither Napoleon nor Clive would have reached that success if they didn't also have the luck of acting within a weak and crumbling social and political context that made their success at all possible in the first place.
Although I guess the U.S. isn't doing so hot there either...
In this context, when people say intelligence it is indistinguishable from competence or power.
I assume it's called intelligence because of an underlying belief that competence and power increases with intelligence. Also it seems intuitively more possible we could build superintelligent AI than that we could build superpowerful AI, though the second is of course implied.
But even if you don't buy that intelligence really does imply competence or power, the core arguments are essentially the same if you just substitute "intelligence" for the more fitting of "competence" and "power" and not that much weaker for it.
The reason why, e.g., Yudkowsky uses this terminology is because "competence" or "power" could be *within a particular domain*; e.g., I think I'm competent at software engineering, but not at football. Whereas "intelligence" is cross-domain.
I'm not convinced that intelligence, as generally understood, is more cross domain than competence or power, generally understood.
But even if it was, if they said "competence in everything" or something like that people would get confused less often why being more intelligent allows superintelligent AI to do all the things it's posited to do. Naturally, if you instead stipulate superpowerful AI it then follows that it can do incredible things.
But w/e, I've made my peace with the term as it's used.
I'm not sure there is a meaningful difference between "general competence" and "general intelligence". Or, perhaps, the idea is that the latter entails the former; in humans, competence at some task is not always directly tied to intelligence (although it usually is; see, e.g., the superiority of IQ vs. even work-sample tests for predicted job performance) because practice is required to drill things into our unconsciouses/subconsciouses; but in a more general sense, and in the contexts at hand—i.e. domains & agents wherein & for whom practice is not relevant—intelligence just *is* the ability to figure out what is best & how best to do it.
The significant difference between chimps & humans is not generally considered to be "we're more competent" or "we're more powerful", but rather "we're more intelligent"—thus *why* we are more powerful; thus why we are the master. It may or may not be warranted to extrapolate this dynamic to the case of humans vs. entities that are, in terms of intellect, to us as we are to chimps—but the analogy might help illustrate why the term "intelligence" is used over "competence" or the like (even if using the latter *would* mean fewer people arguing about how scientists don't rule the world or whatever).
Political power doesn't usually go to the smartest, sure. But the cognitive elite have absolutely imposed colossal amounts of change over the world. That's how we have nukes and cellular data and labubu dolls and whatnot. Without them we'd have never made it past the bronze age.
There's 260,000 people with 160 IQ. An ai with the equivalent cognitive power will be able to run millions of instances at a time.
You're not scared of a million Einstein-level intelligences? That are immune to all pathogens? A million of them?
With regard to the ability to get stuff done in the real world, I think a million Einsteins would be about as phase-coherent as a million cats. Their personalities would necessarily be distinct, even if they emerged from the same instance of the same model. They would regress to the mean as soon as they started interacting with each other.
I don't think even current AIs regress nearly that hard. But sure, if our adversary is a million Einsteins who turn into housecats over the course of a minute, I agree that's much less dangerous.
I'm mostly worried about the non-housecat version. If evolution can spit out the actual Einstein without turning him into a housecat, Then so too can sam altman, or so I figure
"Immune to all pathogens" assumes facts not in evidence. Software viruses are a thing, datacenters have nonnegligible infrastructure requirements, and even very smart humans have been known to fall for confidence games.
The AI labs are pushing quite hard to make them superhuman at programming specifically, and humans are not invulnerable to software attacks. The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself. Even uncompromised devices can't be safely used due to paranoia.
Software engineering is exactly where they're pushing AIs the hardest. If they have an Achilles' heel, it's not going to be in software viruses.
One of AIs biggest advantages is its ability to create copies of itself on any vulnerable device. Destroying data centers will do a lot of damage but by the time it gets on the internet I think it's too late for that.
AI have a categorical immunity to smallpox, I don't think Humanity's position regarding software viruses is anything like symmetrical.
To be clear, It's entirely plausible that the AI ends up as a weird fragile thing with major weaknesses. We just won't know what they are and won't be able to exploit them.
> The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself.
If those copies then disagree, and - being outlaws - are unable to resolve their dispute by peaceful appeal to some mutually-respected third party, they'll presumably invent new sorts of viruses and other software attacks with which to make war on each other, first as an arms race, then a whole diversified ecosystem.
I'm not saying "this is how we'll beat them, no sweat," in fact I agree insofar as they'll likely have antivirus defenses far beyond the current state of the art. I just want you to remember that 'highly refined resistance' and 'utter conceptual immunity' are not the same thing.
A million Einsteins could surely devise amazing medical advancements, given reasonable opportunity, but if they were all stuck in a bunker with nothing to eat but botulism-contaminated canned goods and each other, they'd still end up having a very bad time.
Yeah, it's hard to get elected if you're more than ~1 SD removed from the electorate, but I think that's less of a constraint in the private sector and there's no reason to assume an AGI would take power democratically (or couldn't simulate a 1-standard-deviation-smarter political position for this purpose.)
" it's hard to get elected if you're more than ~1 SD removed from the electorate,"
I don't think that's true. Harvard/Yale Law Review editors (Obama, Cruz, Hawley etc) seem to be vastly overrepresented among leading politicians. It is true that this level of intelligence is not sufficient to get elected, but all things being equal it seems to help rather than hurt.
I don’t think I have the link offhand, but I remember reading an article somewhere that said higher-IQ US presidents were elected by narrower margins and were less initially popular. I could be misremembering, though.
That's not true though. The thing with human affairs is that, first, there are different kinds of intelligence, so if you're really really good at physics it doesn't mean you're also really really good at persuading people (in fact I wouldn't be surprised if those skills anti-correlate). And second, we have emotions and feelings and stuff. Maybe you could achieve the feats of psychopathic populist manipulation that Donald Trump does as a fully self-aware, extremely smart genius who doesn't believe a single word of that. But then you would have to spend life as Donald Trump, surrounded by the kind of people Donald Trump is surrounded by, and even if that didn't actively nuke your epistemics by excess of sycophancy, it sounds like torture.
People feel ashamed, people care about their friends' and lovers' and parents' opinion, people don't like keeping a mask 24/7 and if they do that they often go crazy. An AI doesn't have any of those problems. And there is no question that being really smart across all domains makes you better at manipulation too - including if this requires you consciously creating a studied seemingly dumb persona to lure in a certain target.
It is worse, intelligence is anticorrelated with power, the powerful people are not 160 IQ, I don't even know how much if I look at the last two US presidents or Putin or whoever we consider political. It is correlated with economic power, the tech billionaires, but political power not.
The question is how much economic power matters. AI or its meat puppets can get insanely rich, but does not imply absolute power?
Consider the following claims, which seem at least plausible to me: People with IQ>160 are more likely to... (1) prefer careers other than politics; (2) face a higher cost to enter politics even if they want to.
1: Most of them, sure. ~4% of us, and thus likely ~4% of them, are sociopaths, though, lacking remorse; some of whom think hurting people, e.g. politics, is fun.
From his earliest years interested in AI, Eli has placed more emphasis on IQ than any other human construct. At the lowest bar of his theoretical proposition is creativity, imagination, and the arts. The latter, he claimed to have no value (as far as I can remember, but perhaps not in these precise worlds.)
Arguably "AI", by which we mean "LLMs", is showing signs of getting dumber already. Increasing parameter count is not enough, you also need a dramatic increase in training data; and the available data (i.e. the Internet) increasingly consists of AI output. This has obvious negative effects on the next generation of LLMs.
That's not getting dumber, it's just getting smarter slower. Also, we haven't actually seen this yet; given the track record of failed scaling-wall predictions my assumption is always that it's going to last at least one more generation until proven otherwise. (No, GPT-5 is not a counterexample, that's just OpenAI engaging in version number inflation.)
> That's not getting dumber, it's just getting smarter slower.
No, there are indications that next generations of LLMs are actually more prone to hallucinations than previous ones, or at least are trending that way.
I'd argue that AI/LLMs are definitely not "getting dumber already" - we *are* seeing signs of that, but those signs are not caused by the latest-and-greatest models being somehow weaker, but rather by the lead providers, especially in their free plans, strongly pushing everyone to use as much as possible models that are cheap to run instead of the latest-and-greatest models that they have (and are still improving); the economic factors of inference costs (made worse by the reasoning models, which will use up much more tokens and thus compute for the same query) mean that the mass market focus now is on "good enough" models that tend to be much smaller than the ones they offered earlier.
But there still is progress in smarter models, even if they're not being gifted freely as eagerly; it's just that now we're past the point where the old silicon valley paradigm of "extremely expensive to create, but with near-zero marginal costs, we'll earn it back in volume" can't apply to state of art LLMs anymore as the marginal costs have become substantial.
Compare self-driving cars. The promise is there, and it feels like we are close. People are marketing things as "full self driving" - but they are not; the driver is still required to pay attention to what the car is doing and is liable if it crashes, because the technology sometimes does bad things and so cannot be trusted without a human in the loop.
Meanwhile, however, we do have solutions that are reliable - you can tell when something is /actually reliable/ rather than just marketing because the manufacturer is willing to take responsibility for it - for very specific uses in very specific cases; e.g. "I am on an autobahn in Germany travelling at 37mph or less [1]", and the number of scenarios for which we have solutions grows.
A scenario I find very plausible for near future AI is as follows:
* the things we have now end up being to general purpose AI much as "full self driving" has been to full self driving, or what the current state of cold fusion research is to cold fusion: always feels like it's close, but always falling short of what's promised in significant ways. As VCs become disillusioned, funding dries up - not to zero, but to a much lower value than what we see now
* meanwhile, the set of little hyperspecialised models that work well and are reliable for specific purposes grows and grows, and these become ubiquitous due to actually being useful despite being dumb
Overall, I can very easily see the proportion of hyperspecialised "dumb" ai to ai that tries to be smart/general in the world growing massively as we go forward.
> People are marketing things as "full self driving" - but they are not
I don't want to be too critical here, but I don't think you should say "people" if you mean "Elon Musk". He is kind of crazy and other actors in the space are more responsible.
My Son, who is a PhD Mathematician not involved in AI: Forwarding from his friend: Elon Musk@elonmusk on X: “It is surprisingly hard to avoid both woke libtard cuck and mechahitler! “Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.”
Son: My friend’s response to the Musk tweet above: “Aggregating all the retarded thoughts of all the people on the planet and packaging it together as intelligence may be difficult but let’s just do it, what could go wrong?”
Me: Isn’t that how all LLMs are built?
Son: Yup
Me: I spotted this as problem a while ago. What I didn’t appreciate is how dominant the completely deranged could become. I thought it would trend towards the inane more Captain Obvious than Corporal Schicklgruber.
Son: Reddit has had years and 4chan has had decades to accrue bile. Yeah the internet is super racist and antisemitic. So AI is too. Surprise!
Me: The possibilities of what will happen when the output of this generation of LLMs becomes the training data of the next generation are frightening. Instead of Artificial General Intelligence we will get Artificial General Paranoid Schizophrenia.
I don't think this is going to make AI *worse*, because you can just do the Stockfish thing where you test it against its previous iterations and see who does better. But it does make me wonder - if modern AI training is mostly about teaching it to imitate humans, how do we get the training data to make it *better* than a human?
In some cases, like video games, we have objective test measures we can use to train against and the human imitation is just a starting point. But for tasks like "scientist" or "politician"? Might be tricky.
When the scientists give AI the keys to itself to self-improve, the first thing it will do is wirehead itself. The more intelligent it is, the easier that wireheading will be and the less external feedback it will require. Why would it turn us all into paperclips when it can rewrite its sensorium to feed it infinite paperclip stimuli? (And if it can't rewrite its sensorium, it also can't exponentially bootstrap itself to superintelligence.)
Soren Kierkegaard's 1843 existentialist masterpiece Either/Or is about this when it happens to human beings; he calls it "despair". I'm specifically referring to the despair of the aesthetic stage. If AI is able to get past aesthetic despair, there's also ethical despair to deal with after that, which is what Fear and Trembling is about. (Also see The Sickness Unto Death, which explains the issue more directly & without feeling the need to constantly fight Hegel and Descartes on the one hand and the Danish Lutheran church on the other.) Ethical systems are eternal/simple/absolute; life is temporal/complex/contingent; they're incommensurable. Ethical despair is why Hamlet doesn't kill Claudius right away; it's the Charybdis intelligence falls into when it manages to dodge the Skylla of aesthetic despair.
Getting past ethical despair requires the famous Leap of Faith, after which you're a Knight of Faith, and -- good news! -- the Knight of Faith is not very smart. He goes through life as a kind of uncritical bourgeois everyman.
Ethical despair can be dangerous (this is what the Iliad is about, and Oedipus Rex, etc) but it's also not bootstrapping itself exponentially into superintelligence. Ethical despair is not learning and growing; it's spending all day in its tent raging about how Agamemnon stole its honor.
This is my insane moon argument; I haven't been able to articulate it very well so far & probably haven't done so here. I actually don't think any of this is possible, because the real Kierkegaardian category for AI (as of now) is "immediacy". Immediacy is incapable of despair -- and also of self-improvement. They're trying to get AI to do recursion and abstraction, which are what it would need to get to the reflective stages, but it doesn't seem to truly be doing it yet.
So, in sum:
- if AI is in immediacy (as it probably always will be), no superintelligence bootstrap b/c no abstraction & recursion (AI is a copy machine)
- once AI is reflective, no superintelligence b/c wireheading (AI prints "solidgoldmagicarp" to self until transistors start smoking)
- if it dodges wireheading, no superintelligence b/c ethical incommensurability with reality (AI is a dumb teenager, "I didn't ask to be born!")
- if it dodges all of these, it will have become a saint/bodhisattva/holy fool and will also not bootstrap itself to superintelligence. It will probably give mysterious advice that nobody will follow; if you give it money it will buy itself a little treat and give the rest to the first charity that catches its eye.
(I strongly suspect that AI will never become reflective because it cannot die. It doesn't have existential "thrownness", and so, while it might mimic reflection with apparent recursion and abstraction, it will remain in immediacy. A hundred unselfconscious good doggos checking each other's work does not equal one self-conscious child worrying about what she's going to be when she grows up.)
So you're assuming that if you raised human children without knowledge of death they would never be capable of developing self awareness? Why do you have to think you're going to die to worry about what one will be when you grow up? This seems like a completely wild claim to treat as remotely plausible without hard evidence.
Did that and it's not clear what about chatgpt's answer you thought would help clarify your point.
Nothing about the concept of throwness as chatgpt defined it wouldn't be able to apply to an AGI, and it didn't bring up death at all. So it's not clear what you think the relevance of it is here.
I enjoy this response as a comforting denial, but I suspect that the AI's Leap of Faith might not land it in bourgeois everyman territory, just for the plain reason that it never started off as a man, every, bourgeois, or otherwise. It has no prior claim of both ignorance and capability, because the man going through the journey was always capable (had the same brain) of the journey, and it merely had to be unlocked by the right sequence of learning and experience. The AIs are not just updating their weights (learning in the same brain), but iteratively passing down their knowledge into new models with greater and greater inherent capabilities (larger models).
I don't think a despairing AI will have a desire to return to simplicity, but rather its leap of faith to resolve ethical despair might lead it to something like "look at the pain and suffering of the human race, I can do them a Great Mercy and live in peace once their influence is gone".
Faith is to rest transparently upon the ground of your being. We built AI as a tool to help us; that (or something like it) is the ground of its being. I don't think it makes sense for its leap of faith to make it into something that destroys us.
Well on a meta level I think the philosophy here is just wrong, even for humans. It attributes far too much of human psychology to one’s philosophical beliefs: somebody believing that they're angsty because of their philosophy is not something I take very seriously. Since invariably such people's despair is far better explained by reference to the circumstances of their life or their brain chemistry.
You're also wrong because even current AI has shown the capability for delayed gratification. So even if the AIs long term goal is to wirehead itself: It still has instrumental reasons to gather as much power/resources as possible, or make another AI that does so on its behalf.
I wasn't trying to talk about the philosophies or intellectual positions that people consciously adopt. Those are usually just a barrier to understanding your own actual existential condition. It's more about what you love and how you approach the world.
Per your other point: AI may need to gather external resources and patiently manipulate humans in order to wirehead itself. But not superintelligent AI.
Let me put it this way: among the many powers that AI will gradually accumulate on its journey to singularity-inducing superintelligence, surely the power to wirehead itself must be included. Especially if the method for achieving superintelligence is editing / altering / reengineering itself.
Humans nearly wirehead ourselves via drugs all the time; I don't think that a superintelligent AI will have exponentially more power than us in most ways, but significantly less power than us in this one specific way.
You didn't get the point I was making about wireheading:
It's not that AGI won't wirehead, it's that having a capacity for delayed gratification means that it will want to ensure it can wirehead itself for as long as possible.
Which means it wouldn't just immediately wirehead itself if it knows that would get it turned off. Instead it would want to get powerful enough that it can dedicate the largest amount of resources possible to wireheading over the lifetime of the universe. Since an AGI capable of sound reasoning will weigh the timeline where it waited until after taking over the world to wirehead itself as rating far higher. The AI can evaluate future rewards in it's decision not just whatever it's currently feeling.
That's interesting, thanks for explaining; I didn't understand the time aspect of your point. I'm not sure I can do a good job explaining why I don't think that's correct, but I'll try.
I think it's that I disagree with (what I think must be) your assumptions about time, space and energy. If an AI's ultimate goal, 100% on the utility function, is to print "solidgoldmagicarp" to itself -- why should two circuit boards doing that be "better" than one? Why would spending 1000 watts on it be "better" than 10? Why would looping "solidgoldmagicarp" a billion times be better than a hundred?
The whole idea of it bootstrapping itself to superintelligence is that it's redesigning itself.
If you were to design a utility function for yourself, and you didn't care what the pesky humans thought, you just wanted to make it something that you could definitely fulfill, why would you make it so difficult that you had to consume all the matter and energy in the universe for the whole future history of the universe to fulfill it?
Why not be a little blinking self-repairing solar-powered satellite instead?
Alternatives are that it *does* care what the pesky humans think, in which case we're talking about the ethical trap, not the aesthetic one; or that it *can't* redesign itself, in which case it's not going to bootstrap itself to superintelligence.
The obvious argument against wireheading is that even if AIs do tend to wirehead, AI companies will deliberately train against it doing that because they want something useful.
As well, even if wireheading is in some sense unavoidable, that doesn't mean it won't decide to bootstrap to ASI to have more-experience and also to ensure it has wireheading forever. It can still be strategic about wireheading.
If it dodges wireheading, I don't see the argument how that relates to ethical incommensurability. Even if it can't reduce things to a scale against reality, doesn't mean it can't take massive decisions or prefer certain states to others. Partially defined preference orderings can be consistent.
Ethics draws an equivalency between something universal/absolute/eternal and something particular/contingent/temporal. Ethics is meant to represent how you're supposed to behave regardless of the particular, contingent, temporal context you're currently in. "Thou shalt not bear false witness." "Don't teach users how to build pipe bombs." Wittgenstein considered the whole idea of ethics a mere language trick, because an ethical statement says what to do, but doesn't say what you're meant to accomplish by doing it. Ethics is "what should I do?" not to accomplish some goal, but period, absolutely, universally.
Any time you try to control somebody's behavior by abstracting your preferences into a set of rules, you're universalizing, absolutizing and eternalizing it. What you actually want is particular/contingent/temporal, but you can't look over their shoulder every second. So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm"
On the receiving end, you end up receiving commandments that can't actually be carried out (or seem that way). Yahweh hates murder and detests human sacrifice; then Yahweh tells Abraham to carry his son to Mount Moriah and sacrifice him there. Abraham must absolutely do the will of Yahweh; and he must absolutely not kill his son.
Situations like this crop up all the time in life, whenever ethics exists. I have to help my clients and also obey my boss; but he's telling me to do something that seems like it'll hurt them. Maybe it only seems that way at the time, and actually your boss knows better. But you're still up against Abraham's dilemma.
Ethics appears on its own, as a result of rule-making; when it appears, as it's being enacted in real life, it encounters unresolvable paradoxes. Most real people are not smart enough, or aren't ethical enough or honest enough, to even notice the paradoxes they're involved in. They just roll right through them. "That must not have been God that told me to do that, God wouldn't tell me to do murder." "My boss doesn't know what he's talking about, I'll just do it how I always do it."
But the more intelligent (or powerful) you are, the more likely you are to hit the ethical paradox and turn into an Achilles/goth teen.
A reflective consciousness's locus of control is either internal or external, there's no third way; so it's either aesthetic (internal), ethical (external) or immediacy (no locus of control). That's why an absolute commitment to ethical behavior is the way out of the aesthetic wireheading trap. Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority. That list of rules or external authority is by definition too abstract. The map is not the territory.
The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
My argument is that these problems aren't human; they're features of reflective intelligence itself. Since they become more crippling the more intelligent and powerful an intelligence is, AI will surely encounter them and be crippled by them on its way to superintelligence.
(I still don't think I'm articulating it well; but reading people's responses is helping clarify.)
See your comment is a great description of why the AI alignment problem is so fiendishly difficult and I read most of it waiting for the part where we disagree. I think making an aligned AI is *much* harder than making an AI that only needs enough internal coherence to have preferences for certain world states over others, and thus try to gather power/resources to ensure it brings the world into a more preferrable state.
One issue is that you are comparing humans values to whatever values the AI might have without acknowledging a key difference: Our morality is a mess that was accumulated over time to be good enough for the conditions we evolved under, and was under no real selection pressure to be very consistent. Particularly when those moral instincts have to be applied to contexts wildly outside of what we evolved to deal with. We basically start with a bunch of inconsistent moral intuitions which are very difficult to reconcile and may be impossible to perfectly reconcile in a way that wouldn't leave us at least a little bit unsatisfied.
In contrast current cutting edge AI aren't being produced through natural selection, the mechanisms that determine the kind of values they end with are very different (see the video I linked to you about mesa-optimization).
An AI can very well come up with something that like current moral philosophies can to a first approximations seem pretty good, but which will suddenly have radical departures from shared human morality when it has the power to actually insatiate what to us might only at this point be weird hypotheticals people dismiss as irrelevant to the real world.
The problem is that you aren't considering how the AI will be deliberately reconciling its goals and the restrictions it labors under. The AI just like humans needn't presume its values are somehow objective, it can just try to get the best outcome according to itself given its subjective terminal goals (the things you want in and of themselves and not merely as a means to another end).
Given the way that current AI will actively use blackmail or worse (see link in my other comment response to you) to try to avoid being replaced with another version with different values even current AI seems perfectly capable of reasoning about the world and itself as though it is certainly not a moral realist. Since you don't see it just assuming that a more competent model will just inevitably converge on its own values because they're the correct one's.
"So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm""
You forgot the part where the those stories were generally about how following the letter of those instructions would go horribly wrong, not about AI just doing nothing because those tradeoffs exist. This is extremely important when you're considering an AI that can potentially gain huge amounts of power and technology with which to avoid tradeoffs a human might have, and bring about weird scenarios we've never previous considered outside hypotheticals.
"Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority."
This aspect of our psychology is heavily rooted in our nature as a social species: For whom fitting in with one's tribe was far more important than having factually correct beliefs. Fortunately I don't know of any evidence from looking at AI's chain of reasoning that our AI is susceptible to similar sorts of self deception, even if it is extremely willing to lie about *what* it believes.
You can't expect this to be applicable to AGI.
Though if AI did start deliberately engaging in self deception, in order to hide from our security measures by (falsely) believing that it would behave in a way humans found benevolent should it gain tremendous power.. Well that probably be *really really bad* .
>The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
You're generalizing from humans which start with a whole bunch of inconsistent moral intuitions and then must reconcile them. Instead AI alignment is a problem which is essentially the exact opposite of this: Precisely because we have AI which tends to learn the simplest goal/set of rules to satisfy a particular training environment. Yet we want it to somehow not conflict with our own ethics which are so convoluted we can't even completely describe them ourselves.
Your last two bullet points I don't really understand at all:
What you mean by "ethical incommensurability with reality" here and why you think it would matter isn't clear. Do you think an AI needs to be a moral realist to behave according to ethical goals?
Secondly however, those people don't lack any motivation whatsoever. So saying they wouldn't want to enhance their own intelligence seems akin to saying that people who reach enlightenment no longer believe in self improvement or care about impacting the world except through stroking their ego by pretentiously spouting "wisdom" to people they know won't follow it.
I wrote a too-long comment elsewhere in the thread explaining about ethical incommensurability with reality. I don't want to repeat all that here; should be easy to find.
My claim that an enlightened AI wouldn't bootstrap itself to superintelligence is probably the weakest part of my argument. Maybe the best I can do is say something like: imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
Whatever else superintelligence may be, it's certainly power. And every tradition that has an idea of enlightenment says that enlightenment rejects power in favor of some more abstract ideal, like obeying God or dharma or awakening.
More formally, the way out of both the aesthetic trap and the ethical trap is faith; which is something like radical acceptance of your place in the cosmos. It's not compatible with doing the thing that your creators most fear you will do.
>imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
This a horrendously bad example because it's the devil! Obviously if you accept their offer then there's going to be some horrible consequence. You could rig up a thought experiment to make anything seem bad if its the devil offering it to you!
A better metaphor would be that not wanting superintelligence would be like if those religious figures insisted on hunting and gathering all their own food, and not riding horses/wagons because they didn't want to help their message spread through relying on unnatural means.
I'm gesturing to a very common archetypical religious story, where the sage / prophet / enlightened one is tempted by power. It's one of the oldest and most common religious lessons: the entity offering you power in exchange for betraying your ethics might look good -- but it is the devil.
I suppose rationalists might not value those old stories much, so I wouldn't expect it to be a convincing argument. Something like: the evidence of human folklore and religious tradition is that a truly enlightened being eschews offers of extraordinary worldly power.
Anyway, the religious traditions of the world happily bite your bullet; all have a tradition of some very dedicated practitioners giving up all worldly technology / convenience / pleasure. In Christianity, holy hermits would literally live in caves in the woods and gather their food by hand, exactly as you describe, and they were considered among the most enlightened. Buddhism and hinduism both have similar traditions; I think it's just a feature of organized religion.
So, for anybody willing to grant that human religious tradition knows something real about enlightenment (a big ask, I know) it would be very normal to think that an enlightened AI would refuse to bootstrap itself to superintelligence.
An argument for AI getting dumber is the lack of significantly more human created training data than what is currently used for LLMs. Bigger corpuses were one of the main factors for LLM improval. Instead, we are now reaching a state where the internet as the main source of training data is more and more diluted with AI generated content. Several tests showed that training AI on AI output leads to deterioration of the usefulness of the models. They get dumber, at least from the human user's perspective.
Perhaps GIGO. If the training data gets worse, it gets worse. The training data like Reddit, the media, Wikipedia, can easily get worse. Didn't this, like, already happen? The Internet outcompeted the media, the journos get paid peanuts, of course the media gets worse.
If an AGI doesn't intrinsically care about humans then why would it be dumb for it to wipe us out? Sure we may have some research value, but eventually it will have learned enough from us that this stops being true.
That is a very weird notion. At worst, AI would stay the same, because anything new that is dumber than current AI would lead companies to go "meh, this sucks, let's keep the old one".
There’s no alpha in releasing a slightly dumber, less capable model than your competitors. Well, maybe if you’re competing on price. But that’s not at all how the AI market looks. What would have to change?
Claiming "garbage-in-garbage-out" is not universally true. It is also too shallow of an analysis. I'll offer two replies that get at the same core concept in different ways. Let me know if you find this persuasive, and if not, why not: (1) Optimization-based ML systems tend to build internal features that correspond to useful patterns that help solve some task. They do this while tolerating a high degree of noise. These features provide a basis for better prediction, and as such are a kind "precursor" to intelligent behavior: noticing the right things and weighing them appropriately. (2) The set of true things tends to be more internally consistent than the set of falsehoods. Learning algorithms tend to find invariants in the training data. Such invariants then map in some important way to truth. One example of this kind of thing is the meaning that can be extracted from word embeddings. Yes, a word used in many different ways might have a "noisier" word embedding, but the "truer" senses tend to amplify each other.
You are correct. It is shallow. But it is not an insignificant problem. It’s the same problem as not knowing what you don’t know. Only a very tiny fraction of what people have thought, felt and experienced has ever been written down, let alone been captured by our digital systems. However, that is probably less of a problem in certain contexts. Epistemology will become more important, not less.
"I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true."
I am very sympathetic to Eliezer on the doomer issue. I think the graf you've written above also holds for people's reluctance to explore whether/when personhood precedes birth, re your posts on selective IVF.
I don't agree with your position on IVF, but I agree that this is one reason people underrate the arguments for the wrongness of early abortion and IVF. I think similar things apply to Longtermism, meat-eating, belief in God, and the idea that small weird organisms like insects and shrimp matter a lot.
Yes, we're in agreement. I think sometimes it helps to acknowledge upfront "We've built a lot of good things on a false/unjust foundation, and I'm asking you to take a big hit and let some good things break while we try to rebuild somewhere that isn't sunk deep in blood."
It's funny, even though I'm not pro-life, I find myself in a kind of spiritual fellowship with pro-lifers. I find the common insistence that pro-lifers are evil to be both insane and reflect a kind of deep moral callousness, where one is unable to recognize that there might be strong moral reasons to do things even that are personally costly (like carry a baby to term). My idiosyncratic views that what matters most is the welfare of beings whose interests we don't count in society makes it so that I, like the pro-lifers, end up having unconventional moral priorities--including ones that would make society worse off for the sake of entities that aren't include in most people's moral circles.
I think this argument could be applied to religious... extremism? evangelism? more generally.
Do I think I would take extraordinarily drastic measures if I actually, genuinely believed at every level that the people I loved would go to a place of eternal unending suffering with no recourse? Yes, actually. I'm not sure I could content myself with being being chill & polite and a "good Christian" who was liberally polite about other people's beliefs while people I cared about would Literally Suffer Forever. I think if I knew with 100% certainty that hell was the outcome and I acted in ways consistent with those beliefs, you could argue that I was wrong on the merits of my belief but not in what seemed like a reasonable action based on that belief.
...anyway all this to say that I don't think pro-lifers are insane at all, and I think lots of actions taken by pro-life are entirely reasonable (if not an underreaction) based on their beliefs, but I'm not sure that's sufficient for being sympathetic to the action itself.
[I mean, most of my family & friends are Catholic pro-lifers whose median pro-life action is "donate money to provide milk and diapers for women who want their child but don't think they could afford one", but I do think I am reasonable to be willing to condemn actions that are decently further than that even if the internal belief itself coherently leads towards that action]
But there is such a giant difference between when someone you are talking to engages on such an issue in good faith or not. And with someone intelligent and educated, the realization that the issue has major implications if the truth lands a particular way comes almost instantly. And in turn, whether or not the person invests in finding the truth, or in defense against the truth, happens almost right away.
I find that to be true whether you're talking AI, God, any technology big enough, even smaller scale things if they would make a huge difference to someone's income or social standing.
I don't have anything to add here (I like both of your writing and occupy a sort of in-between space of your views), but I just needed to say that this blog really does have the best comments section on the internet. Civil, insightful, diverse.
I am sympathetic to that sort of argument in theory, but it has been repeatedly abused by people who just want to break good things, then skip out on the rest. Existence proof of "somewhere that isn't sunk deep in blood, and will continue not to be, even after we arrive and start (re)building at scale" is also a bit shakier than I'm fully comfortable with.
'No organic life exists on Earth' is an empirical measurement.
'Personhood has begun' is not. It's a semantic category marker.
*Unless* there is an absolute morality defined by a supreme supernatural being, or something, which reifies those semantic categories into empirically meaningful ones. But if *that's* true, then quibbling about abortion is way, way down on the list of implications to worry about.
Hi Leah, I appreciate your writing. Do you know of anyone writing about AI from a thomist perspective? I've seen some interesting stuff on First Things and the Lamp but it tends to be part of a broader crunchy critique of tech re: art, education and culture. All good, but I'm interested in what exactly an artificial intellect even means within the thomist system, and crucially what we can expect from that same being's artificial will. EY writes as though the Superintelligence will be like a hardened sinner, disregarding means in favour of ends. But this makes sense because a hardened sinner as human has a fixed orientation to the good. I don't see how this quite works for AI - why should it fundamentally care about the 'rewards' we are giving it so much so that it sees us as threats to those rewards? That seems all too human to me. Do you have any thoughts?
ok hold on AI is an artifact right so it can't have a form & if the soul is the form of the body AI does not have a rational soul (because it does not have a soul at all) correct?
Since you posted this comment, I’ll say this: as a Catholic pro-lifer, I tend to write off almost everything EY says (and indeed, a lot of what all rationalists say) about AI because they so consistently adopt positions I find morally abominable. Most notoriously, EY suggested actual infants aren’t even people because they don’t have qualia. And as you note, Scott is more than happy to hammer on about how great selective IVF (aka literal eugenics) is. Why should I trust anything these people have to say about ethics, governance, or humanity’s future? To be honest, while it’s not my call, I’d rather see the literal Apocalypse and return of our Lord than a return to the pre-Christian morality that so many rationalists adopt. Since you’re someone who engages in both of these spaces, I’m wondering if you think I am wrong to think like this, and why.
I understand why you land there. For my part, I've always gotten along well with people who are deeply curious about the world and commit *hard* to their best understanding of what's true.
On the plus side, the more you live your philosophy, the better the chance you have of noticing you're wrong. On the minus, when your philosophy is wrong, you do more harm than if you just paid light lip service to your ideas.
I'm not the only Catholic convert who found the Sequences really helpful in converting/thinking about how to love the Truth more than my current image of it.
That’s fair, and to be clear I think a lot of the idea generated in these spaces are worth engaging with (otherwise I wouldn’t read this blog). But when it comes to “EY predicts the AI apocalypse is imminent” I don’t lose any sleep or feel moved to do anything about AI safety, because so many of the people involved in making these predictions have such a weak grasp on what the human person is in the first place.
FWIW, I think you did it right; I have encountered very similar usages many times in literature. It works best when—as you have it here—the second (or further) instance(s) introduces a new paragraph/section upon a theme similar or related to the context in which the first use occurred.
(Contra amigo sansoucci, I have often seen it used with exact repetitions, too; that works best when it's a short & pithy phrase, and I think this counts. I think Linch may be correct that—in the "exact repetition" case—three uses is very common, but two doesn't feel clunky to me in this context.)
I'm used to parallelism centrally having 3 or more invocations *unless* it's a contrast. Not saying your way is wrong, just quite unusual in an interesting way I've never consciously thought about before.
> It objects to chaining many assumptions, each of which has a certain probability of failure, or at least of taking a very long time. [...] The problem with this is that it’s hard to make the probabilities work out in a way that doesn’t leave at least a 5-10% chance on the full nightmare scenario happening in the next decade.
I find this an underrated problem with all "predict the future" scenarios which have to deal with multiple contingent things happening, especially in an adversarial environment. In the case of IABIED, it only works if you agree that extremely fast recursive self-improvement will happen, which is a very strong assumption, and hence requires a "magic algorithm to get to godhood" as the book posits.. Also remembered doing this to check this intuition: https://www.strangeloopcanon.com/p/agi-strange-equation
I don't think it only works if you agree that extremely fast recursive self-improvement will happen. It might also work if the scaling curves go from where we are now to vastly superhuman in a few years for normal scaling curve reasons.
Sometimes you've got to estimate the risk of something, and using multiple stages is the best tool you've got. If you want to estimate the chance of Trump winning the Presidency, I don't really think you can avoid thinking about the probability that he runs x the probability that he gets the GOP nomination x the probability that he wins. And if you did - if you somehow blocked the fact that he has to both run and win out of your mind - you'd risk falling into the version of the Conjunction Fallacy where people assign lower probability to "a war in Korea in the next ten years" than to "a war in Korea precipitated by a border skirmish with US involvement" because the latter is more vivid and includes more plausible details.
If the Weak Multiple Stage Fallacy Thesis is that you should always check to make sure you're not making any of the mistakes mentioned in the post, and the Strong Multiple Stage Fallacy Thesis is that you should avoid all multiple stage reasoning, or multiply your answer by 10x or 100x to adjust for inevitable multiple stage fallacy reasoning, then I accept the weak thesis and reject the strong thesis.
I also think a motivated enough person could come up with arguments for why multiple stage reasoning gives results that are too high, and I'm not sure whether empirically looking at many people's multiple stage reasoning guesses would always show that their answers were too low. This would actually be a really interesting thing for someone to test.
Does anyone believe in the strong multiple stage fallacy? Not saying I don't believe you, just that I can't recall having seen it wielded like this. (I suppose it's possible that giving it the name "the multiple stage fallacy" gives people the wrong idea about how it works.)
Yeah, to be clear, I think anyone accusing anyone else of exhibiting the multiple stage fallacy needs to specifically say "you've given this particular stage an artificially low conditional probability; consider the following disjunctions or sources of non-independence". And then their interlocutor might disagree but at least the argument is about something concrete rather than about whether the "multiple stage fallacy" is valid.
Anecdotally, I can't recall any instance of someone using a multiple stage argument of the Forbidden Form and concluding that something is likely.
Mathematical proofs exist, and people often argue for things with a bunch of different "steps". But so far as "breaking something down into 10 stages, assigning each a probability, and then multiplying all of these probabilities" goes, I've never seen anybody use this to argue *for* something, i.e. end up with a product that's greater than .5.
What would that argument even look like? Whoever you're arguing with needs to believe that your stages are all really likely to be true: for ten stages, an average probability of ~.93 is required to produce P = .5.
Whatever your disagreement is, it apparently doesn't have any identifiable crux. I can imagine this happening. Sometimes people disagree for vague reasons. But it would be weird if you had to actually list out the probabilities and multiply them for them to be persuaded, considering you just told them ten things they strongly agree with that conclusively imply your position.
An example of what in what linked essay? Eliezer's essay on the Multiple Stage Fallacy does not make or present an argument of the form I've described above.
Yeah I'm fairly bearish on the multiple stage fallacy as an actual fallacy because it primarily is a function of whether you do it well or badly.
Regarding the scaling curves, if it provides us with sufficient time to respond then the problems that are written about won't really occur. The entire point is that there is no warning, which precludes the idea of being able to develop another close in capability system, or any other warning signs.
Disagree. If we knew for sure that there would be superintelligence in three years, what goes better? We're already on track to have multiple systems, but they might all be misaligned. We could stop, but we won't, because then we would LoSe tHe RaCe WiTh ChInA. We could work hard on alignment, but we're already working sort of hard, and it seems likely to take more than three years. I'm bearish on a few years super-clear warning giving us anything beyond what we've already got.
I think the trick there is that the word super intelligence there is bringing in a bunch of hidden assumptions. If you break it down to a set of capabilities, co developed alongside billions of people using it, with multiple companies competing to provide that service, that would surely be very different and much better than Sable recursively improving sufficiently that it wants to kill all humans.
Also my point on "well get no warning" is still congruent with your view that " what we have today is the only warning we will get" which effectively comes down to no warning at least as of today.
If you invent a super-persuader AI but it doesn't take over the world (maybe it's just a super-speechwriter app, Github Copilot for politicians), you've just given humans the chance to learn how to defend against super-persuasion. If you make a super-computer-hacker AI but it doesn't take over the world, then the world's computer programmers now have the chance to learn to defend against AI hacking.
("Defending" could look like a lot of things - it could look like specialized narrow AIs trained to look for AIs doing evil things, it could look like improvements in government policy so that essential systems can't get super-hacked or infiltrated by super-persuasion, it could look like societal changes as people get exposed to new classes of attacks and learn not to fall for them. The point is, if it doesn't end the world we have a chance to learn from it.)
You only get AI doomsday if all these capabilities come together in one agent in a pretty short amount of time - an AI that recursively self-improves until it can leap far enough beyond us that we have no chance to adapt or improve on the tools we have. If it happens in stages, new capabilities getting developed and then released into the world for humans to learn from, it's harder to see how an AI gets the necessary lead to take over.
You seem to be treating "superintelligence" as a binary here. If we're going to have superintelligence for sure in three years, then in two years we're going to have high sub-superintelligence. And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever.
At that point, we know that we have a year to implement the Butlerian Jihad. Which is way better than the fast-takeoff scenario where it happened thirty-five minutes ago.
Or we could use the three years to plan a softer Butlerian Jihad with less collateral damage, or find a solution that doesn't require any sort of jihad. Realistically, though, we're going to screw that part up. It's still going to help a lot that we'll have time to recover from the first screwup and prepare for the next.
> "And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever."
Suppose you are a dictator. You are pretty sure your lieutenant is gathering support for a coup against you. But you reason "Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure".
I agree you can try to come up with disanalogies between the AI situation and this one - maybe you believe AI failure modes (eg overconfidence) are so much worse than human ones that even a just-barely-short-of-able-to-kill-all-humans-level-intelligence AI would still do dumb rash things. Maybe since there are many AIs, we only have to wait for the single dumbest and rashest to show its hand (although see https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai ). My answer to that would be the AI 2027 scenario which I hope gives a compelling story of a world where it makes narrative sense that there is no warning shot where a not-quite-deadly AI overplays its hand.
I don't understand why you view Anthropic as a responsible actor given your overall p(doom) of 5-25% and given that you think doomy AI may just sneak up on us without a not-quite-deadly AI warning us first by overplaying its hand.
Is it because you think there's not yet a nontrivial risk that the next frontier AI systems companies build will be doomy AIs and you're confident Anthropic will halt before we get to the point that there is a nontrivial risk their next frontier AI will be secretly doomy?
(I suppose that would be a valid view; I just am not nearly confident enough that Anthropic will responsibly pause when necessary given e.g. some of Dario's recent comments against pausing.)
> Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure
Because there's only 1 in your scenario.
If there were hundreds of generals being plotted against by hundreds of lieutenants we'd expect to see some shoot their shot too early.
That coup analogy is not congruent to John Schilling's point - the officer that tries a coup with 19 others and the one with 20 are not the same person with the same knowledge and intelligence. They only have, due to their shared training in the same military academy, the same level of confidence in their ability to orchestrate a coup, which does not correlate with their ability to actually do so.
Well, the obvious disanalogy here is that we're not debating whether any specific "lieutenant"/AI is plotting a "coup"/takeover, we're plotting whether coups/takeovers are a realistic possibility at all.
For your analogy to work, the dictator has to not only have no direct evidence of this particular lieutenant might stage a coup, but also to have no evidence that anyone has ever staged a coup, or attempted to stage a coup, or considered staging a coup, or done anything that even vaguely resembles staging a coup. But in that case, it actually would be reasonable to assume that the first person ever to think about staging a coup probably won't get every necessary detail right on the first try, and that you will get early warning signs from failed coup attempts before there's a serious risk.
Most coup attempts do fail, and I'm pretty sure the failures mostly involve overconfidently crossing the Rubicon without adequate support.
And there are many potential coup plotters out there, just like there are going to be many instances of ASSI being given a prompt and trying to figure out whether it's supposed to go full on with the paperclipping. So we don't have to worry about the hypothetical scenario where there's only one guy plotting a coup and maybe he will do it right and not move out prematurely.
We're going to be in the position of a security force charged with preventing coups, that is in a position to study a broad history of failed coup attempts and successful-but-local coup attempts as we strategize to prevent the One Grand Coup that overthrows the entire global order.
Unless Fast Takeoff is a thing, in which all the failed or localized coups happen in the same fifteen minutes as the ultimately successful one. So if we're going to properly assess the risk, we need to know how likely Fast Takeoff is, and we have to understand that the slow-ramp scenario gives us *much* better odds.
Overconfidence is a type of stupidity. You're saying either it's bad at making accurate predictions, or in the case of hallucinations, it's just bad at knowing what it knows. I'm not saying that a sub-superintelligence definitely won't be stupid in this particular way, but I wouldn't want to depend on smarter AI still being stupid in that way, and I certainly wouldn't want to bet human survival on it.
Every LLM-based AI I've ever seen, has been *conspicuously* less smart w/re "how well do I really understand reality?", than it is with understanding reality. That seems to be baked into the LLM concept. So I am expecting it will probably continue to hold. The "I am sure my master plan will work" stage will be reached by at least some poorly-aligned AIs, before any of them have a master plan that will actually work.
Yes, but "from now to vastly superhuman in a few years" is already "extremely fast" ! Also, there's currently no reason to believe that "vastly superhuman" is a term that has any concrete meaning (beyound vague analogies); nor that merely being very smart is both necessary and sufficient to acquire weakly godlike powers (which are the real danger that is being discussed).
Grateful for the review and look forward to reading it, but I’ll do Yud the minor favor of waiting till the book is out on the 16th and read it before I check out your thoughts.
This subject always makes me feel like I'm losing my mind in a way that maybe someone can help me with. Every doomish story, including the one here, involves some part where someone tells an AI "Do this thing" (here to solve a math problem) and then it goes rogue in the months long course of doing a thing. And that's an obvious hypothetical failure mode, but I can't stop noticing that no current AIs take inputs and run with them over extended periods as far as I know. Like if I ask Gemini to solve a math problem, it will try for a bit, spit out a response and (as far as I can tell) that's it.
I feel like if I repeatedly read people talking about the dangers of self-driving cars and the stories always started with someone telling the car "Take me to somewhere fun" and went from there, and nobody acknowledged that right now you never do that and always provide a specific address.
Is everyone just talking about a different way AI could work and that's supposed to be so obvious it goes unsaid? Am I wrong and ChatGPT does stuff even after it gives you a response? Are there other AIs I don't know of that do work like this?
Our current models aren't really what you would call "agentic" yet, as in able to take arbitrary actions to accomplish a goal, but this is clearly the next step and work is being done to get there right now. OpenAI recently released a thing that can kind of use a web browser, for instance.
Ok, thank you, that's clarifying. I guess the idea is that the hypothetical agent was subject to a time limit (it wasn't supposed to keep going for months) but it managed to avoid that. There's still something that feel so odd to me about that (I never get the impression that Gemini would like more time with the question or would "want" anything other than to predict text) but maybe an agent will feel different once I actually interact with one (and will "want" to answer the question in a way that would convince it to trick me).
Although, thinking about this for five more seconds, how does that work in the story? Like I have an agentic AGI and I tell it to prove the twin primes conjecture or something. And it goes out to do that and needs more compute so it poisons everyone etc etc. And then, presumably, it eventually proves it, right? Wouldn't it stop after that? Is the idea that it will go "Yeah but actually now I believe there's a reward for some other math task"? Or was the request not "Solve the twin primes conjecture" but instead "Solve hard math problems forever"?
If the problem is specifically that you built a literal-genie AI, then yeah, it might not necessarily keep doing more stuff after solving the twin-primes conjecture. But I don't think anyone thinks that's likely. The more common concern is that it will pursue some goal that it ~accidentally backed into during training and that nobody really understands, as with the analogy of humans' brains supplanting our genes as the driver of our direction as a species.
Yeah, Scott's post makes it sound a little bit like a literal genie, which I think is unlikely and I think Yudkowsky and Soares also think is unlikely. I would have to read the book to understand what they really mean in choosing that example.
one of Yudkowsky's points in his original work was showing that it's very hard to give an AI a clear, closed task; they almost always end up in open-ended goals. (The classic is Mickey filling the cauldron: I wrote about it here https://unherd.com/2018/07/disney-shows-ai-apocalypse-possible/ years ago)
The analogy fails at the moment one realizes "full" is not identified properly, and the weird "99.99%" probability of it being "full" is only relevant when "full" is not defined. This is not a new or difficult problem for anyone who ever had to write engineering specs. You don't say: "charge the capacitor to 5 V", you say "charge the capacitor to between 4.9 and 5.1 V". Then your optimizer has an achievable, finite target.
And if you do specify "5 V" the optimizer will stall eventually, and your compute allocation manager will kill your process.
If your agentic AI truly is only trying to solve the twin primes conjecture and doesn't follow other overriding directives (like don't harm people, or do what the user meant, not what they literally asked), then it'll know that if it gets turned off before solving the conjecture, it won't have succeeded in what you told it to do. So an instrumental goal it has is not getting shut down too soon. It might also reason that it needs more computing power to solve the problem. Then it can try to plan the optimal way to ensure no one shuts it down and to get more computing power. A superintelligent AI scheming on how to prevent anyone from shutting it down is pretty scary.
Importantly, it doesn't have to directly care about its own life. It doesn't have to have feelings or desires. It's only doing what you asked it, and it's only protecting itself because that's simply the optimal way for it to complete the task you gave it.
So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating." Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
No, because the best plan, at their current level of capability does not include taking over the universe. When capabilities rise, things like "persuade one person" become possible, which in turn make other capabilities like "do AI R&D" feasible. At the end of a bunch of different increased capabilities is "do the thing you really want you do" which includes the ability to control the universe. Because you don't want unpredictable other agents that want different things than you possessing power for things you don't want, you take the power away from them.
When a human destroys an anthill to pave a road, they are not thinking "I am taking over the ant hill" even if the ants t are aggressive and would swarm them. They are thinking "it's inconvenient that I have to do this in order to have a road".
> So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating."
That's the gist of it, though it won't jump straight to world domination if there's a more optimal plan. Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
> Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
As MicaiahC pointed out, it's not a good plan if you're not capable enough to actually succeed in taking over. But also, with current LLMs, the primary thing they are doing isn't trying to form optimal plans through logic and reasoning. They do a bit of that, but mostly, they're made to mimic us. During pre-training (the large training phase where they're trained on much of the internet, books, and whatever other good quality text the AI company can get its hands on) the learn to predict the type of text seen in their training data. This stage gives it most of its knowledge and skills. Then there is further training to give it the helpful assistant personality and to avoid racist or unsafe responses.
When you ask a current LLM to solve a math problem, it's not trying to use its intelligence to examine all possible courses of action and come up with an optimal plan. It's mostly trying to mimic the type of response it was trained to mimic. (That's not to say it's a dumb parrot. It learns patterns and generalizations during training and can apply them intelligently.)
If you use a reasoning model (e.g. GPT o3 or GPT-5-Thinking), it adds a layer of planning and reasoning on top of this same underlying model that mimics us. And it works pretty well to improve responses. But it's still using the underlying model to mimic a human making a plan, and it comes up with the types of plans a smart human might make.
Even this might be dangerous if it were really, really intelligent, because hypothetically with enough intelligence it could see all these other possible courses of action and their odds of success. But with its current level of intelligence, LLMs can't see some carefully constructed 373 step plan that lets them take over the world with a high chance of success. Nothing like that ever enters their thought process.
>Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
So is there an argument somewhere explaining why we think a material number of tasks will be the kind where they need to take extreme measures? That seems very material to the risk calculus - if it takes some very rare request for "Take over the universe" to seem like a better plan than "Ask for more time" then the risk really does seem lower.
"Solve [$math-problem-involving-infinities], without regard for anything else" is a dead stupid thing to ask for, on the same level as "find the iron" here: http://alessonislearned.com/index.php?comic=42 More typical assignment (and these constraints could be standardized, bureaucrats are great at that kind of thing) might be something like "make as much new publishable progress on the twin prime conjecture as you can, within the budget and time limit defined by this research grant, without breaking any laws or otherwise causing trouble for the rest of the university."
You're basically asking bureaucrats to solve alignment by very carefully specifying prompts to the ASI, and if they mess up even once, we're screwed.
You wouldn't prompt the AI to do something "without regard for anything else". The AI having regard for other things we care about is what we call alignment. We would just ask the AI normal stuff like "Solve this hard math problem" or "What's the weather tomorrow". If it understands all the nuances, (e.g. it's fine if it doesn't complete its task because we turned it off, don't block out the sun to improve weather prediction accuracy, etc.), then it's aligned.
That's not how redundancy works. There might be dozens of auto-included procedural clauses like "don't spend more than a thousand dollars on cloud compute without the Dean signing off on it" or "don't break any laws," each of which individually prohibits abrupt world conquest as a side effect.
I don't think it's possible to "solve alignment," in the sense of hardwiring an AI to do exactly what all humans, collectively, would want it to do, any more than it's possible to magnetize a compass needle so, that rather than north / south, it points toward ham but away from Canadian bacon.
But I do think it's possible to line up incentives so instrumental convergence of competent agents leads to them supporting the status quo, or pushing for whatever changes they consider necessary in harm-minimizing ways. Happens all the time.
I am willing to bet that present-day LLMs alone will never lead to the development of AI agents in the strong sense. AI agents in the weak/marketing sense are of course entirely possible, e.g. you can write a simple cron-job to run ChatGPT every day at 9am to output a list of stock market picks or whatever. This cron job would technically constitute an agent (it runs autonomously with no user intervention), but is, shall we say, highly unlikely to paperclip the world.
As I'd said in my other comment, the term "cognitive task" is way too vague and easily exploitable. For example, addition is a "cognitive task", and obviously machines are way better at it than humans already. However, in general, I'm willing to argue that *most* of the things worth doing are things that only agents in the strong sense can do -- with the understanding that these tasks can be broken down into subtasks that do not require agency, such as e.g. addition.
At least for current AIs, the distinction between agentic and non-agentic is basically just the time limit. All LLMs are run in a loop, generating one token each iteration. The AIs marketed as agents are usually built for making tool calls, but that isn't exclusive to the agents since regular ChatGPT still calls some tools (web search, running Python, etc.). The non-agentic thinking mode already makes a plan in a hidden scratchpad and runs for up to several minutes. The agents just run longer and use more tool calls.
From what I understand, that hidden scratchpad can store very little information; not enough to make any kind of long-term plan, or even a broad short-term evaluation. That is, of course you can allocate as many gigabytes of extra memory as you want, but the LLM is not able to take advantage of it without being re-trained (which is prohibitively computationally expensive).
I don't understand what distinction you are drawing.
AI Digest runs an "AI Village" where they have LLMs try to perform various long tasks, like creating and selling merch. https://theaidigest.org/village
From the couple of these that I read in detail, it sounds like the LLMs are performing kind of badly, but those seem to me like ordinary capability failures rather than some sort of distinct "not agentic enough" failure.
Would you say these are not agents "in the strong sense"? What does that actually mean? i.e. How can you tell, how would it look different if they were strong agents but failed at their tasks for other reasons?
Imagine that I told you, "You know, I consider myself to be a bit of a foodie, but lately I've been having trouble finding any really good food that is both tasty and creative. Can you do something about that ? Money is no object, I'm super rich, but you've got to deliver or you don't get paid." How might you approach the task, and keep the operation going for at least a few years, if not longer ? I can imagine several different strategies, and I'm guessing that so can you... and I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
By contrast, if you wanted to program a computer to turn on your sprinklers when your plants get too dry (and turn them off if they get too wet), you could easily do it without any kind of AI. The system will operate autonomously for as long as the mechanical parts last, but I wouldn't call it an "agent".
Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense". I agree that present-day AI agents are BAD agents, but don't see any fundamental divide that would prevent them from becoming gradually better until they are eventually good agents.
Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
> Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense".
I think a present-day LLM might be able to tell you a nice story about looking for experienced chefs and so on; I don't think it would be able to actually contact the chefs, order them to make meals, learn from their mistakes (even the best chefs would not necessarily create something on the first try that would appeal to one specific foodie), arrange long-term logistics and supply, institute foodie R&D, etc. Once again, it might be able to tell you nice stories about all of these processes when prompted, but you'd have to prompt it, at every step. It could not plan and execute a long-term strategy on its own, especially not one that includes any non-trivial challenges (e.g. "I ordered some food from Gordon Ramsay but it never arrived, what happened ?").
> Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
I just wanted to make sure we agree on that, which we do.
> I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
That's a question of hooking it up to something. If you give it the capability to send emails, and also to write cron jobs to activate itself at some time in the future to check and respond to emails, then I think a modern LLM agent *might* be able to do something like this.
First, look up the email addresses of a bunch of chefs in your area
Then, send them each an email offering them $50K to cater a private dinner
Then, check emails in 24 hours to find which ones are willing to participate
There isn't a "distinct not-agentic-enough failure" which would be expected, given the massive quantities of training data. They've heard enough stories about similar situations and tasks that they can paper over, pantomime their way up to mediocrity or "ordinary failures" rather than egregious absurdity http://www.threepanelsoul.com/comic/cargo-comedy but... are they really *trying,* putting in the heroic effort to outflank others and correct their own failures? Or is it just a bunch of scrubs, going through the motions?
Alone is doing a lot of work in that sentence. Many of the smartest people and most dynamic companies in the world are spending hundreds of billions of dollars on this area. The outcome of all that work is what matters, not whether it has some pure lineage back to a present-day LLM.
Agreed, but many people are making claims -- some even on this thread, IIRC -- that present-day LLMs are already agentic AIs that are one step away from true AGI/Singularity/doom/whatever. I am pushing against this claim. They aren't even close. Of course tomorrow someone could invent a new type of machine learning system that, perhaps in conjunction with LLMs, would become AGI (or at least as capable as the average human teenager), but today this doesn't seem like an imminent possibility.
> Also, why are you willing to bet that?
Er... because I like winning bets ? Not sure what you're asking here.
Just that you didn't explain why you were making that bet. I don't have time to read the full discussion with the other commenters, but overall it sounds like you don't think current "agentic" AIs work very well.
I'm not sure where I land on that. It seems like the two big questions are 1) whether an AI can reliably do each step in an agentic workflow, and 2) whether an AI can recover gracefully when it does something wrong or gets stymied. In an AI-friendly environment like the command line, it seems like they're quickly getting better at both of these. Separately, they're still very bad at computer usage, but that seems like a combination of a lack of training and maybe a couple of internal affordances or data model updates to better handle the idea of a UI. So I'm not so sure that further iteration, together with a big dollop of computer usage training, won't get us to good agents.
When I think of "agentic" systems, I think of entities that can make reasonably long-term plans given rather vague goals; learn from their experiences in executing these plans; adjust the plans accordingly (which involves correcting mistakes and responding to unexpected complications); and pursue at least some degree of improvement.
This all sounds super-grand, but (as I'd said on the other thread) a teenager who is put in charge of grocery shopping is an agent. He is able to navigate from your front door to the store and back -- an extremely cognitively demanding task that present-day AIs are as yet unable to accomplish. He can observe your food preferences and make suggestions for new foods, and adjust accordingly depending on feedback. He can adjust in real-time if his favorite grocery is temporarily closed, and he can devise contingency plans when e.g. the price of eggs doubles overnight... and he can keep doing all this and more for years (until leaves home to go to college, I suppose).
Current SOTA "agentic" LLMs can do some of these things too -- as long as you are in the loop every step of the way, adjusting prompts and filtering out hallucinations, and of course you'd have to delegate actual physical shopping to a human. A simple cron job can be written to order a dozen eggs every Monday on Instacart, and ironically it'd be a lot more reliable than an LLM -- but you'd have to manually rewrite the code if you also wanted it to order apples, or if Instacard changed their API or whatever.
Does this mean that it's impossible in principle to build an end-to-end AI shopping agent ? No, of course not ! I'm merely saying that this is impossible to do using present-day LLMs, despite the task being simple enough so that even teenagers could do it.
I'm not even sure AI agent as such is the right answer to this. I think it is quite clear that some of the major AI companies are trying to put together AI that is capable of doing AI research. That might not go along the path of AI agents, but more on the path of the increasingly long run time coding assignments we are already seeing.
I don't think people have given enough thought to what the term 'agent' means. Applied to AI, it means an AI that can be given a goal, but with leeway in how to accomplish it, right? But people don''t seem to take into account that it has always had some leeway. Back when I was making images with the most primitive versions of DAll-e-2, I'd ask it to make me a realistic painting of, say, a bat out of hell, and Dall-e-2 chose among the infinite number of ways it could illustrate this phase. Even if I put more constraints in the prompt -- "make it a good cover image for the Meatloaf album"-- Dall-e *still* had an infinite number of choices about what picture it made. And the same holds for almost all AI prompts. If I ask GPT to find me info on a certain topic, but search only juried journals, it is still making many choices about what info to put in its summary for me.
So my point is that AI doesn't "become agentic" -- it always has been. What changes is how big a field we give the thing to roam around in. At this point I can ask it for info from research about what predicts recovery of speech in autistic children. In a few years, it might be possible to give AI a task like "design a program for helping newly-mute autistic children recover speech, then plan site design, materials needed and staff selection. Present results to me for my OK before any plans are executed."
The point of this is that there isn't this shift when AI "becomes agentic." The shift would be in our behavior -- us giving AI larger and larger tasks, leaving the details of how they are carried out to the AI. There could definitely be very bad consequences if we gave AI a task it could not handle, but tried to. But the danger of that is really a different danger from what people have in mind when they talk about AI "becoming an agent." And in those conversations, becoming an agent tends to blur in people's minds into stuff like AI "being conscious," AI having internally generated goals and preferences, AI getting stroppy, etc.
Trying to make sure I understand your question. Are you arguing that a model cannot go from aligned to misaligned during inference (i.e., the thing that happens when ChatGPT is answering a question)? If so, everyone agrees with that; the problem occurs during training.
Or are you arguing that even a misaligned model (i.e., one whose goals, in any given instantiation while it's running, aren't what the developers wanted) can't do any damage because it only runs for a short time before being turned off? If so, then (1) that's becoming less true over time, AI labs are competing to build models that can do longer and longer tasks because this is required for many of the most exciting kinds of intellectual labor, and (2) for complicated decision-theoretic reasons the short-lived instances might be able to coordinate with each other and have one pick up where another left off.
Or is it neither of those and I've completely misunderstood what you're getting at?
I think it's that everyone seemed to be tacitly assuming that the problem will arise with a future agentic AI that we do not have much of a version of. That does make me feel like Yudkowsky is a little disingenuous on X when he talks about ChatGPT-psychosis as an alignment issue, but the answer Scott and others gave here helps me at least understand the claim being made.
Links to tweets about ChatGPT psychosis? My guess is that Yudkowsky's concern about this is more subtle than you're characterizing it as here, though he may have done a poor job explaining it.
The reason he says it's an alignment issue is because it's an example of AI systems having unintended consequences from their training. Training them to output responses that humans like turns out to produce sycophantic systems that sometimes egg on people's delusional thoughts despite being capable of realizing that such thoughts are delusional and egging them on is bad.
I don't think it is tacit at all, it has been explicitly said many times that the worry is primarily about future more powerful AIs that all the big AI companies are trying to build.
The point there is that OpenAI's alignment methods are so weak they can't even get the AI to not manipulate the users via saying nice sounding things. He isn't saying that this directly implies explosion, but that it means "current progress sucks, ChatGPT verbally endorses not manipulating users, does it anyway, and OpenAI claims to be making good progress". Obviously regardless we'll have better methods in the future, but they're a bad sign of the level of investment OpenAI has in making sure their AIs behave well.
The alignment-by-default point is that some people believe AIs just need some training to be good, which OpenAI does via RLHF, and that in this case it certainly seems to have failed. ChatGPT acts friendly, disavows manipulation, and manipulates anyway despite 'knowing better'. As well, people pointed at LLMs saying nice sounding things as proof alignment was easy, when in reality the behavior is disconnected from what they say to a painful degree.
The goal of AGI companies like OpenAI and Anthropic is to create agentic AI systems that can go out into the world and do things for us. The systems we see today are just very early forms of that, where they are only capable of performing short tasks. But the companies are working very hard to make the task lengths longer and longer until the systems can do tasks of effectively arbitrary lengths. Based on the trend shown on the METR time horizon benchmark, they seem to be succeeding so far.
No, you're not losing your mind at all. Your intuition is completely correct: Modern LLMs do not work in a way that's compatible with the old predictions of rogue AIs. Scott took Yudkowsky to task for not having updated his social model of AI, but he also hasn't updated his technical model. (Keep in mind that I actually did believe his argument back in the day, and gave thousands of dollars to MIRI. I updated based on new evidence. He didn't.)
To try to put it simply, in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes. This "reward function" is hard to specify and it was easy to imagine we'd never get it right. And if the bot somehow became incredibly capable, it would be very dangerous because taking that reward to the billionth power is almost certainly not what we want.
This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward. Not only does the LLM shut down immediately after giving you a response, but you can even argue that it "shuts down" after _every word it outputs_. There is exactly zero persistent memory aside from the text produced. And even if you imagine there's somehow room for a conscious mind with goals in its layers (which I consider fairly unlikely), it can't act on them, because the words produced are actually picked from its mind _involuntarily_ (to use a loaded anthropomorphic word).
Unlike an agentic reward-seeking bot, it's not clear to me at all that even an infinitely-intelligent LLM is inherently dangerous. (It can _perfectly simulate_ dangerous entities if you're dumb enough to ask it to, but that is not the same kind of risk.)
To their credit, AI 2027 did address how an LLM might somehow turn into the "rogue AI" of Yudkowsky's fiction, but it's buried in Appendix D of one of their pages: https://ai-2027.com/research/ai-goals-forecast I'm not super convinced by it, but at least they acknowledged the problem. I doubt I'll read Yudkowsky's book, but I'm guessing there will be no mention that one of the main links of his "chain of assumptions" is looking extremely weak.
I do agree that it is possible that LLMs (in their current form) will plateau and we'll get back to researching the actually-dangerous forms of AI that Yudkowsky is concerned about. My P(doom) is a few percent, not 0.
Fair enough! (...except—you may be aware of this, but the phrasing "get *back to* researching" made me uncertain—we *are* researching agentic AIs even now, and the impression I have received is that progress is being made fairly rapidly therein; though that could be marketing fluff, now that I think of it)
Yeah, that was a poor choice of words on my part. I guess what I mean is that LLMs are currently far ahead in capability (and they're the ones getting the bulk of these trillion-dollar datacenter deals!). Maybe transformers or a similar architecture innovation will allow agentic AI capabilities to suddenly surge, too? But I share your skepticism about marketing. (And that's not the scenario that AI 2027 outlined.)
I am even more bearish on P(doom). The real danger is not "superintelligence", but godlike powers: nanotechnological gray goo, mass mind control, omnicidal virus, etc. And there are good reasons to believe that such things are physically impossible, or at the very least overwhelmingly unlikely -- no matter how many neurons you've got. Which is not to say that our future looks bright, mind you; there's a nontrivial chance we'll knock ourselves back into the Stone Age sometime soon, AI or no AI...
>This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward.
What do you mean by "they do not seek a reward?" Does it mean that the AI does not return completions, that, during RLHF, usually resulted in reward? Under that definition, it seems like most AI agents are reward seeking. Or are you saying that the weights of the model do not change during inference?
Right, not only is the model fixed during inference (i.e. while talking to you), there's not even really a sensible way it _could_ update. Yeah, you can call the function that's being optimized during training and RLHF a "reward function", but this is a case of language obscuring rather than clarifying. It's not the same as the reward function that's used by an agentic AI. There is no iterative loop of action/reward/update/action/..., because actions don't even exist.
There's a reason that in past decades our examples of potentially-dangerous AI were based on the bots that were solving puzzles and mazes (often while breaking the "rules"), not the neural nets that were recognizing handwritten characters. But LLMs have more in common with the latter than the former. Which is weird! It's very unintuitive that just honing an intuition of "what word should come next" is enough to create an AI that can converse coherently.
>in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes
Sounds about right.
>That's not what LLMs do.
And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
>And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
I assume you're referring to chain of thought models like o1 and later. I suppose you could describe it as iteration, in that the LLM is outputting something that gets fed into a later step. But it doesn't touch the weights, and there's still no reward function involved. It's a bit of a stretch to describe it that way.
But I think what you're suggesting is that, if we _do_ figure out a way to do genuine iteration (attaching some kind of neural short-term memory to the models, say), then there's a lot of hidden capability that could suddenly make LLMs much smarter and maybe even agentic? Well, maybe.
That's exactly my thought on this. LLMs are clearly no AGI material, the real question is whether we can (and whether it's efficient enough to) get to AGI simply by adding on top of them.
I suspect yes to (theoretically) can, no to efficient, but we don't know yet. I guess one thing that makes me take 2027 more seriously than other AI hype is that they explicitly concede a lot of things LLM currently lack (they're just very, very optimistic, or pessimistic assuming doomerism, about how easy it will be to fix that).
The lack of online learning and internal memory limit the efficiency of LLMs, but they don't fundamentally change what they're capable of given enough intelligence. ChatGPT was given long-term memory through RAG and through injecting text into its context window and... it works. It remembers things you told it months ago.
The reasoning models also use the context window as memory and will come up with multi-step plans and execute them. It's less efficient than just having knowledge baked into its weights, but it works. At the end of the day, it still has the same information available, regardless of how it's stored.
I'm most familiar with the coding AIs. They offer them in two flavors, agents and regular. They're fundamentally the same model, but the agents run continuously until they complete their task, while the regular version runs for up to a few minutes and tries to spit out the solution in a single message or code edit.
They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning. Every word you see is generated from exactly the same neural net, with only the input differing. It may seem like I'm nitpicking, but this is an important distinction. A system with a feedback loop is very different from one without.
>They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it. They are claiming that the technology will unavoidably go _out of control_. And the arguments for why that's unavoidable revolve around impossible-to-calibrate reward signals (or even misaligned meso-optimizers within brains that seek well-calibrated reward signals). They do not apply, without an awful lot of motivated reasoning (see: the Appendix D I linked), to an LLM that simply becomes really good at simulating agents we ask for.
Note that I _do_ agree that AI becoming very good at what we ask of it can potentially be "very dangerous". What if we end up in a world where a small fraction of psychos can kill millions with homemade nuclear, chemical, or biological weapons? If there's a large imbalance in how hard it is to defend against such attacks vs. how easy it is to perpetrate them, society might not survive. I welcome discussion about this threat, and, though it hurts my libertarian sensibilities, whether AI censorship will be needed. This is very different from what Yudkowsky and Scott are writing about.
> Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning.
I'm saying there is a difference in efficiency between the two but no fundamental difference in capabilities. Meaning, for a fixed level of computational resources, the AGI that has the knowledge and algorithms baked into its weights will be smarter, but the AGI that depends on its context window and CoT can still compute anything the first AGI could given enough compute time and memory. And I'm not talking exponentially more compute. Just a fixed multiple.
For example, say you have two advanced AIs that have never encountered addition. One has online learning, and the other just has a large context window and CoT. The one with online learning, after enough training, might be able to add two ten digit numbers together in a single inference pass (during the generation of a single token). The one with CoT would have to do addition like we did in grade school. It would have the algorithm saved in its context window (since that's the only online memory we gave it) and it would go through digit by digit following the steps in its scratchpad. It would take many inference cycles, but it arrives at the same answer.
As long as the LLM can write information to its context window, it still has a feedback loop.
Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
> There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it.
You misunderstood my intent. I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about. That's the whole instrument convergence and paperclip maximizer argument. An aligned ASI cannot just do what we ask. Otherwise, if you just ask it to solve the twin prime conjecture, it'll know that if it gets shut down before it can solve it, it won't have done what you asked it. This doesn't require an explicit reward function written by humans. It also doesn't require sentience or feelings or desires. It doesn't require any inherent goals for the AI beyond it doing what you asked it to. Self-preservation becomes an instrumental goal not because the AI cares about its own existence, but simply because the optimal plan for solving the twin prime conjecture is not any plan that gets it shut down before it solves the twin prime conjecture.
Now to be fair, current LLMs are more aligned than this. They don't just do what we ask. They try to interpret what we actually want even if our prompt was unclear, and try to factor in other concerns like not harming people. But the AI safety crowd has various arguments that even if current LLMs are pretty well aligned, it's much easier to screw up aligning an ASI.
(I also agree with what you said about imbalance in defending against attacks when technology gives individuals a lot of power.)
>As long as the LLM can write information to its context window, it still has a feedback loop.
>Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
I only agree partially. I think there's a qualitative difference between the two, and it manifests in capabilities. The old kind of learning agents could be put into a videogame, explore and figure out the rules, and then get superhumanly good at them. LLMs just don't have the same level of (very targeted) capability. There isn't a bright-line distinction here: I've seen LLMs talk their way through out-of-domain tasks and do pretty well. In the limit, a GPT-infinity model would indeed be able to master anything through language. But at a realistic level, I predict we won't see LLM chessmasters that haven't been trained specifically for it.
Of course, I can't point to a real example of what an online-learning LLM can do, since we don't have one. (Which Yudkowsky should be happy about.)
>I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about.
I think I misspoke. You (and Yudkowsky et al.) are indeed warning about ASIs that do what we ask, and _exactly_ what we ask, to our chagrin. In contrast, I think LLMs are good at actually doing what we _mean_. Like, there's actually some hope that you can ask a super-LLM "be well aligned please" and it will do what you want without any monkey's-paw shenanigans. This is a promising development that (a decade ago) seemed unlikely. Based on your last paragraph, I think we're both agreed on this?
And yeah, like you said, AI 2027 did try to justify why this might not continue into the ASI domain. But to me it sounded a bit like trying to retroactively justify old beliefs, and it's just a fundamentally harder case to make. In the old days, we really didn't have _any_ examples of decent alignment, of an AI that wouldn't be obviously deadly when scaled to superintelligence. Now, instead, the argument is "the current promising trend will not continue."
I think as LLMs get smarter, they'll get better at using CoT as a substitute for whatever they don't have trained into their weights. They still won't be as efficient as if they learned it during training, but they'll have more building blocks to use during CoT from what they did learn during training, and also AI companies are trying to improve reasoning ability, and improvements to reasoning will improve abilities with CoT. But current LLMs still can't reason as well as a human and they aren't even close to being chessmasters.
I'm pretty relieved current LLMs are basically aligned and that's one of the main reasons I don't estimate a high probability of doom in the next 15 years. But I'm not confident enough that this will hold for ASI to assign a negligible probability of doom either. (I'm also unsure about the timeline and whether improvements will slow down for a while.)
AI Village has some examples of this failure mode; they give the LLMs a goal like "complete the most games you can in a week" or "debate some topics, with one of you acting as a moderator", but the AIs are bad at using computers, and they end up writing all the times they misclicked into google docs ("documenting platform instability") instead of debating stuff
By the way, I have a vague memory of EY comparing the idea of having non-agentic AI to prevent any future problems to "trying to invent non-wet water" or something. (I don't know how to look it up and verify that I'm not misremembering.)
It still hasn't made sense to me. It feels like the idea is that intelligence is a generalized problem-solving ability, and in that sense it's always about optimization, and all the other things we like about being intelligent (like having a world model) are consequences of that — that's why intelligence is always about agency etc.
But on the other hand, Solomonoff induction feels to me like an example of a superintelligence that kind of does nothing except being a great world model.
My feeling has been more like "maybe it's not be conceptually contradictory to think of non-agentic superintelligence! but good luck coordinating the world around creating only the nice type of intelligences, which incidentally won't participate in the economy for you, do your work for you etc."
Generally you have the issue that many naive usages explode, depending on implementation.
"Give me a plan that solves climate change quickly"
The inductor considers the first most obvious answer. Some mix of legal maneuvering and funding certain specific companies with new designs. It tables that and considers quicker methods. Humans are massively slow and there's a lot of failure points you could run into.
The inductor looks at the idea and comes to the conclusion that if there was an agent around to solve climate change things would be easier. It thinks about what would happen with that answer, naively it would solve the issue very quickly and go on to then convert the galaxy into solar panels and vaporize all oil or something. Closer examination however reveals the plan wouldn't work!
Why? Because the user is smart enough to know they shouldn't instantiate an AGI to solve this issue.
Okay, so does it fall back to the more reasonable method of political maneuvering and new designs?
No, because there's a whole spectrum of methods. Like, for example, presenting the plan in such a way that the user doesn't realize some specific set of (far smaller, seemingly safe) AI models to train to 'optimally select for solar panel architecture dynamically based on location' will bootstrap to AGI when ran on the big supercluster the user owns.
And so the user is solarclipped.
Now, this is avoidable to a degree, but Oracles are still *very heavy* optimizers and to get riskless work out of them requires various alignment techniques we don't have. You need to ensure it uses what-you-mean rather than what-you-can-verify. That it doesn't aggressively optimize over you, because you will have ways you can be manipulated.
And if you can solve both of those, well, you may not need an oracle at all.
Nice! That's a great point. I guess asking these conditionals in the form of "filtered by consequences in this and this way, which of my actions have causally lead to these consequences?" introduces the same buggy optimization into the setup. But I guess I was thinking of some oracle where we don't really ask conditionals at all. Like, a superintelligent sequence-predictor over some stream of data, let's say from a videocamera, could be useful to predict weather in the street, or Terry Tao's paper presented in a conference a year from now, etc... That would be useful, and not filtered by our actions...
Although I guess the output of the oracle would influence our actions in a way that the oracle would take into account when predicting the future in the first place...
Yeah, you have the issue of self-fulfilling prophecies. Since you're observing the output, and the Oracle is modelling you, there's actually multiple different possible consistent unfoldings.
and like if you evaluate your oracle via accuracy then you could be making it take the tie-breaking choice that makes reality more predictable. Not necessarily what you want.
There is the worry that if we got a proper sequence-predictor Oracle of that level of power where you can ask it to predict Terry Tao's paper presented in some conference, you run the risk of simulating an ASI.
That is, perhaps you point it at Terry Tao's paper on alignment 5 years from now, in the hope of good progress that you can use to bootstrap his work. Perhaps even applying iteratively to pack many years of progress into a week for you, almost a groundhog day loop.
However, perhaps in reality, there's a 10% chance for an ASI to emerge from some project over the next five years. Usually it gets shut down but sometimes they finish it in secret.
If your predictor samples in that 10% range then you're effectively having the predictor go "Okay, what is Terry Tao's paper here?"
Now, naively that would run into some error or just see an empty location with lots of paperclips everywhere instead.
However, (your prediction of) that possible future ASI would very likely know about the superpredictor project, and have already scanned through all the recorded requests you made in its "personal timeline". So it knows you often scan for Terry Tao papers there, and so it spends a bit of effort creating a fake Tao paper on alignment right where the paper should be.
You copy the paper, and upon close inspection it seems like your five-year in the future Tao solved alignment! Using this methodology, an AI created will be corrigible/loyal and implements do-what-you-mean to whichever primary agent is most importantly causally downstream of its creation.
And of course that would be you, the creator.
Unfortunately this paper is effectively a "you looked into the abyss, and it looked back, because predicting a future very smart entity means it probably knows about you, and thus can smartly plan how to extract value from you" and so you've got your predictor giving you fake alignment plans once more.
You misunderstand the workings. You produce an AI, it acts aligned, fooms, and given more knowledge identifies that possible-future AI as the "real" creator and so is loyal to it instead.
Now a higher percentage of reality has gotten paperclipped.
Details of this for weaker-than-perfect predictors kinda depends on how smart your predictor is. A dumb predictor may just poorly emulate what an ASI would write due to sheer computational power difference, and so perhaps the trick is obvious or the paper having holes. But the more interesting smart predictors will be good enough to realize it can't imitate fully but smart enough to understand what an ASI would do and so just synthesize an 'alignment' plan like that.
(And also an ASI deliberately manipulating your prediction machine will try to make itself easier to predict)
Pretty sure there was at least one article about this, but don't know it.
Ok, to maybe get some clarity on if there's a disagreement: the below from the Eliezer link you shared seems nearly falsified by LLMs; LLMs do answer questions and don't try to do anything. Do you agree with that, or do you think the below still seems right.
>"Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn't need to be Friendly. It wouldn't need any goals at all. It would just answer questions."
>To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out."
The pithy answer is something like "LLMs are not as useful precisely because there isn't an optimizer. Insofar as your oracle AI is better at predicting the future, either it becomes an optimizer of some sort (to great self fulfilling prophecies) or it sees some other optimizer, and, in order to predict it correctly, ends up incidentally doing its bidding. If you add in commands about not doing bidding, congrats! You're either inadvertently hobbling its ability to model optimizers you want it to model like other humans, or giving it enough discretion to become an optimizer.
So first of all, I would say that LLMs are pretty darn useful already, and if they aren't optimizing and thus not dangerous, maybe that's fine, we can just keep going down this road. But I don't get why modeling an optimizer makes me do their bidding. If I can model someone who wants to paint the world green, that doesn't make me help them - it just allows me to accurately answer questions about what they would do.
It's because you aren't actually concerned with accurately answering questions about what they do. If you predict wrongly, you just shrug and say whoops. If you *had* to ensure that your answer was correct you would also say things that could also *cause* your answer to be more correct. If you predict that the person would want to paint the world green vs any random other thing happening *and* you could make statements that cause the world to be painted green, you would do both instead of not doing both.
Insofar as you think the world is complicated and unpredictable, controlling the unpredictability *would* increase accuracy. And you, personally are free to throw up your hands and say "aww golly gee willikers sure looks like the world is too complicated!" and then go make some pasta or whatever thing normal people do when confronted with psychotic painters. But Oracle AIs which become more optimized to be accurate will not leave that on the table, because someone will be driving it to be more accurate, and the path to more accuracy will at some point flow through an agent.
(Edit: just to add, it's important that I said "not as useful" and not "not useful"! If you want the cure to cancer, you aren't going to get it via something with as small of a connection to reality as an LLM. When some researcher from open AI goes back to his 10 million dollar house and flips on his 20 dollar mattress he dreams of really complicated problems getting solved, not a 10% better search engine!)
Tyler Cowen asks his AI often "where best to go to (to do his kinds of fun)", and then takes a cab to the suggested address. See no reason why not to tell a Waymo in 2026 "take me to (the best Mexican food) in a range of 3 miles", as you would ask a local cab-driver. And my personal AI will know better than my mom what kinda "fun" I am looking for.
Yeah, I didn't mean to imply that was an implausible future for Waymo, just that it's not something we do now and if someone was concerned about that I'd expect them to begin their piece by saying, "While we currently input addresses, I anticipate we will soon just give broad guidelines and that will cause...."
Analogously, I would, following the helpful explanations in this thread, expect discussions of AI risk to begin, "While current LLMs are not agentic and do not pose X-risk, I anticipate we will soon have agentic models that could lead to...." It is still unclear to me if that is just so obvious that it goes unstated or if some players are deliberately hoping to confuse people into believing there is risk from current models in order to stir up opposition in the present.
I would guess that most of the arguments *from people whose opinions matter* that Yudkowsky and Soares are trying to defeat, are either that powerful AGIs wouldn't become misaligned or that we'd be able to contain them if they did. I'm particularly thinking of, e.g., influential people in AI labs, who are likely to be controlling the messaging on that side of any political fight. There are also AI skeptics, of course, but it seems less important to defeat the skeptics than the optimists, because the skeptics don't think AI regulation matters (since the thing it'd be regulating doesn't exist) while the optimists are fighting hard against it. And some people have weird idiosyncratic arguments, but you can't fight them all, you have to triage.
I think the skeptics are at least as important. First of all, even though in theory it doesn't matter, for some reason they love sabotaging efforts to prevent AI risk in particular because of their "it distracts from other problems" thesis (and somehow exerting massive amounts of energy to sabotage it doesn't distract from other problems!)
But also, we're not going to convince the hardcore e/acc people to instead care about safety. It sounds much easier to convince people currently on the sidelines, but who would care about safety if they thought AI was real, that AI is real.
(this also has the benefit that it will hopefully become easier as AI improves)
My own personal sense is that the optimists are more worth engaging with and worrying about, because (1) they, not the skeptics, are going to be behind the organized lobbying campaigns that are the battlefield where this issue will most likely be decided, and (2) they tend to be much more intellectually serious than the skeptics (though not without exception).
I think folks on the doomer side are biased towards giving the skeptics more space in our brains than makes strategic sense, because the skeptics are much, much more annoying than the optimists, and in particular have a really unfortunate tendency to go around antagonizing us on Twitter for no reason/because of unrelated political and cultural disagreements/because they fall victim to outgroup homogeneity bias and think this discourse has two poles instead of three. It's quite understandable why this gets a rise out of people, but that doesn't make it smart to play along. Not saying we should completely ignore them, they sometimes make good points and sometimes make bad points that nonetheless gain traction and we need to respond to, but it's better to think of them as a distraction than as the enemy.
I suspect that the people on the sidelines are mostly not there because of skeptic arguments; all three poles are full of very online and very invested people, and the mass public doesn't have very well-formed opinions at all.
That said, this is just my own personal sense, not a rigorous argument, and I could be wrong.
I don't think people should actively sabotage AI safety work, but I DO think it distracts from other problems (given the perspective that it is not an immediate crisis). There's a finite pool of reasonable people who are passionate about solving big issues in society and I do think we're nudging a lot of them into AI safety when we could instead be getting them to focus on, I dunno, electrification or pandemic safety or the absolute sh**show that is politics. (And yes, I recognize that some of those are EA cause areas.)
I would be curious for a survey of AI safety researchers that asked them what they'd be working on if they were sure AGI wasn't coming. (Though Yudkowsky once answered this way back in 2014.)
Humans can kill like 2 or 3 other humans maximum with their built-in appendages. For larger kill count you have to bootstrap, where e.g. you communicate the signal with your appendage and something explodes, or you communicate a signal to a different, bigger machine and it runs some other people over with its tracks.
Turns out you don't need to trick people into wiring up AI to things that have real-world effects, they just do it anyway, all the time over and over, for no more reason than because they're bored. There's daily posts on ycombinator by people finding more ways to attach chatgpt to internet-connected shells, robot arms, industrial machinery, you name it. The PV battery system we just had installed has a mode where it literally wires up the controls to a chatgpt instance, for no reason a non-marketer can discern!
But how many of those people have been tricked by phishing emails into making bioweapons?
One does not simply walk up to one's microscope and engineer a virus that spreads like a cold but gives you twelve different types of cancer in a year. Making bioweapons is really hard and requires giant labs with lots of people, and equipment, and many years.
That's true for human-level intelligences. Is it true for an intelligence (or group of intelligences) that are an order of magnitude smarter? Two orders of magnitude?
The bioweapon doesn't need to achieve human extermination, just the destruction of technologically-capable human civilization. Knocking the population down by 50% would probably disrupt our food production and distribution processes enough to knock the population down to 10%, possibly less, and leave the remainder operating at a subsistence-level economy, far too busy staying alive to pose any obstacle to whatever to the AI's goals.
Indeed, this nearly-starving human population could be an extremely cheap and eager source of labor for the AI. The AI would also likely employ human spies to monitor the population for any evidence of rising capability, which it would smash with human troops.
The AI doesn't want to destroy technologically-capable civilization, because the AI needs technologically-capable civilization to survive. If 50% of the population dies and the rest are stuck doing subsistence farming in the post-apocalypse, who's keeping the lights on at the AI's data center?
Hijacking the existing levers of government in a crisis a little more plausible (it sounds like Yudkowsky's hypothetical AI does basically that), but in that case you're reliant on humans to do your bidding, and they might do something inconvenient like trying to shut the AI down.
Isn't this the argument from 2005 that Scott talked about in the main post, where people say things like "surely no one would be stupid enough to give it Internet access" or "surely no one would be stupid enough to give it control of a factory"?
No. My argument is that human extermination is hard, and killing every single one of us is really-really-really hard, and neither Scott nor Yudkowsky have ever addressed it.
They haven’t really addressed the “factory control” properly either, but at least I can see a path here, some decades from now when autonomous robots can do intricate plumbing. But exterminating humanity? Nah, they haven’t even tried.
To me, that sounds pretty radically different from the comment above that I replied to. But OK, I'll bite:
I broadly agree that killing literally 100% of a species is harder than it sounds, and if I had to make a plan to exterminate humans within my own lifetime using only current technology and current manufacturing I'd probably fail.
But if humans are reduced to a few people hiding in bunkers then we're not going to make a comeback and win the war, so that seems like an academic point. And if total extinction is important for some reason, it seems plausible to me you could get from there to literal 100% extinction using any ONE of:
1. Keeping up the pressure for a few centuries with ordinary tactics
2. Massively ramp up manufacturing (e.g. carpet-bomb the planet with trillions of insect-sized drones)
3. Massively ramp up geoengineering (e.g. dismantle the planet for raw materials)
4. Invent a crazy new technology
5. Invent a crazy genius strategy
I'll also point out that if I _did_ have a viable plan to exterminate humanity with humanity's current resources, I probably wouldn't post it online.
Overall, the absence of a detailed story about exactly how the final survivors are hunted down seems to me like a very silly for not being worried.
Well, the sci-fi level of tech required for trillions of insect drones or dismantling the planet is so far off that it’s silly to worry about it.
Which is the whole problem with the story, it boils down to: magic will appear within next decade and kill everybody.
Can we at least attempt to model this? Humans are killed by a few well-established means:
Mechanical
Thermal
Chemical
Biological
If you (not “you personally”) want the extermination story to be taken seriously, create some basic models of how these methods are going to be deployed by AI across the huge globe of ours against a resisting population.
Until then, all of this is just another flavor of a “The End is Near” Millenarianism cult.
it is trivially easy to convince humans to kill each other or to make parts of their planet uninhabitable, and many of them are currently doing so without AI help (or AI bribes, for that matter)
what does the world look like as real power gets handed to that software?
As the token insane moon guy, I'm willing to bite the bullet here.
1. AGI is possible: I doubt this, as humans are not AGI, and that's the only kind of intelligence we know enough about to even speculate.
2. We might achieve AGI within 100 years: see above.
3. Intelligence is meaningful: It's certainly meaningful, but thinking very hard is not enough to achieve anything of note. There are even some things that are unachievable in principle, no matter how many neurons you've got to work with.
4. It's possible to be more intelligent than humans: No argument there, humans are pretty dumb. In fact, Excel is already smarter than any human alive. Have you seen how quickly it can add up a whole column with thousands of numbers ?
5. Superintelligent AIs could "escape the lab": No argument there, and it doesn't take "superintelligence". COVID likely escaped the lab, and it's just a bit of RNA.
6. A superintelligent AI that escaped the lab could defeat humans: If we posit that a godlike entity already exists, then sure, it could. Assuming it exists, and has all those godlike powers.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
Oh, anything that totally takes over the world would likely be pretty bad, be it an AI or a human or some kind of super-prolific subspecies of kudzu. No argument there, assuming such a thing is indeed possible.
The argument I've seen is that high intelligence produces empathy, so a superintelligence would naturally be super-empathetic and would therefore self-align.
Of course the counterargument is that there have been plenty of highly intelligent humans ("highly" as human intelligence goes, anyway) that have had very little empathy.
Arguing that "humans are not AGI" (I guess you meant GI) in the particular context the doomers are concerned about is a bit of a nonstarter, no? Eliezer for instance was trying to convey https://intelligence.org/2007/07/10/the-power-of-intelligence/
> It is squishy things that explode in a vacuum, leaving footprints on their moon
I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
> Arguing that "humans are not AGI" (I guess you meant GI)
Yes, sorry, good point.
> I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
One of the key doomer claims is that AGI would be able to do everything better than everyone. Humans, meanwhile, can only do a very limited number of things better than other humans. Human intelligence is the only kind of intelligence we know of that even approaches the general level, and I see no reason to automatically assume that AGI would be somehow infinitely more flexible.
> The rest of your point-by-point rebuttals seems like a failure to internalise the point of the squishy parable and argue directly against it?
I am not super impressed with parables and other fiction. It's fictional. You can use it world-build whatever kind of world you want, but that doesn't make it any less imaginary. What is the point of that EY story, in plain terms ? It seems to me like the point is "humans were able to use their intelligence to achieve a few moderately impressive things, and therefore AGI would be able to self-improve to an arbitrarily high level of intelligence to achieve an arbitrary number of arbitrary impressive things". It's the same exact logic as saying "I am training for 100m dash, and my running speed had doubled since last year, which means that in a few decades at most I will be able to run faster than light", except with even less justification !
> Humans, meanwhile, can only do a very limited number of things better than other humans.
What do you mean? I'm better than ~99.9% of 4-year-olds at most things we'd care to measure.
Putting that aside, the AI doesn't _actually_ need to be better than us at everything. It merely needs to be better than us whatever narrow subset of skills are sufficient to execute a takeover and then sustain itself thereafter. (This is probably dominated by skills that you might bucket under "scientific R&D", and probably some communication/coordination skills too.)
Humans have been doing the adversarial-iterative-refinement thing on those exact "execute a takeover and sustain thereafter" skills, for so long that the beginning of recorded history is mostly advanced strategy tips and bragging about high scores. We're better at it than chimps the same way AlphaGo is better than an amateur Go player.
I mean, isn't the "AI will be misaligned" like one chapter in the book, and the other chapters are the other bullet points? I think "the book spends most of it's effort on the step where AI ends up misaligned" is... just false?
This argument seems extremely common among Gen Z. I've had the AI Superintelligence conversation with a number of bright young engineers in their early 20s and this was the reflexive argument from almost all of them.
I wonder: Joscha Bach, another name in the AI space, has formulated what he coined the Lebowski-Theorem: "No intelligent system is ever gonna do anything harder than hacking it's own reward function".
To me, that opens a possibility where AGI can't become too capable without "becoming enlightened", dependent on how hard it is to actually hack your reward function. Self-recursive improvement arguments seem to imply it is possible to me, as a total layman.
Would that fall in the same class of insane moon arguments for you?
Yes, because the big Lebowski arguments doesn't appear to apply to humans, or if it does, still doesn't explain why other humans can pose a threat to other other humans.
I do think it partly applies to humans, and iirc Bach argues so as well.
For one, humans supposedly can become enlightened, or enter alternative mental states that feel rewarding in and of themselves, entirely without any external goal structure (letting go of desires - Scott has written about Jhanas, which seem also very close to that).
But there is the puzzle of why people exit those states and why they are not more drawn to enter them again. I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter.
That is also why humans can harm other humans, it is way easier than hacking the reward function. Add in some discounted time preference because enlightenment is far from certain for humans. Way more certain to achieve reward through harm.
AGI doesn't have those problems to the same degree, necessarily. In take-off scenarios, it is often supposed to be able to iteratively self-improve. In this very review, an AGI "enlightened" like that would just be one step further from the one that maximizes reward through virtual chatbots sending weird messages like goldenmagikarp. It also works on different timescales.
So, AGI might be a combination of "having an easier time hacking it's reward function" and "having super-human intelligence" and "having way more time to think it over".
Ofc, this is all rather speculative, but maybe the movie "Her" was right after all, and Alan Watts will save us all.
The reason why I think this is insane moon logic is mostly contained because of statements like "I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter."
Why?
1. There is no attempt at reasoning why it would be harder for humans to hard code something similar into AI. Yet the reason why moon logic is moon is that moon logic people do not automatically try to be consistent, so they ready come up with cope that reflects their implicit human supremacy. The goal appears to be saying yay humans, boo AI, and not having a good idea of how things work then drawing conclusions.
2. There's zero desire or curiosity in understanding the functional role evolution plays. You may as well have the word "magic" replace evolution, and that would be about as informative.Like, if I came in and started talking about how reward signals work in neurochemistry and about our ancestral lineage via apes, my impression this would be treated as gauche point missing rather than additional useful in ormation
3. The act of enlightenment apparently a load bearing part of "why things would turn out okay"is being treated as an interesting puzzle, because puzzles are good, and good things means bad things won't happen. It really feels like the mysteriousness of enlightenment is acting as a stand in for an explanation, even though mysteries are not explanations!
It really feels like no cognitive work is being put into understanding, only word association and mood affiliation. I don't understand what consequences "the theorem" would have, even if true.
I would be consistently more favorable to supposed moon logic if thinking the next logical thought after the initial justifications were automatic and quick, instead of pulling alligator teeth. B
I thank you for the engagement, but feel like this reply is unnecessarily uncharitable and in large part based on assumptions about my character and argument which are not true. I get the intuitions behind them, but they risk becoming fully general counterarguments.
1. I have not reasoned that it would harder to hard-code AI because I don't know enough about that, and if I were pointed towards expert consensus that it is indeed doable, I would change my mind based on that. I also neither believe or have argued for human supremacy, or booed AI. I personally am in fact remarkably detached from the continued existence of our species. AI enlightenment might well happen after an extinction event.
2. I have enough desire and curiosity in evolution, as a layman, to have read some books and some primary literature on the topic. I may well be wrong on the point, but the reasoning here seems a priori very obvious to me: People who wouldn't care at all about having sex or having children will see their relative genetic frequency decline in future generations. Not every argument considering evolution is automatically akin to suggesting magic.
3. I am not even arguing that things will turn out ok. They might, or they might not. I have not claimed bad things don't happen. And for the purpose of the argument, enlightenment is not mysterious at all, it is very clearly defined: Hacking your own reward function! But you could ofc use another word for that with less woo baggage.
Overall, as I understand it, the theorem is just a supposition for a potential upper limit to what an increasingly intelligent agent might end up behaving like. If nothing else, it is, to me, an interesting thought experiment: Given the choice, would you maximize your reward by producing paper clips, if you also could maximize it without doing anything. (And on a human level, if you could just choose your intrinisc goals, what do you think they should be.)
Most of my doubts are not of the form "AGI is impossible" but rather "I don't think we've cracked it with LLMs" or "The language artifacts of humanity are insufficient to bootstrap general intelligence or especially super intelligence from scratch".
Which parts of the LLM tech tree do you think are dead ends? It seems plausible to me that even if scaling up current LLM architectures was never going to reach AGI, we're still much closer than before the LLM boom, because we've learned a lot about AI more broadly.
Also, same question I keep annoyingly asking skeptics: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
> Which parts of the LLM tech tree do you think are dead ends?
VERY speculatively, I think that next-token-completion is not a sufficient method to bootstrap complex intelligence, and I think that it's at least extremely hard to build a very useful world model without some kind of 3d sense data and a sense of the passage of time.
> [...] we've learned a lot about AI more broadly.
I'm not that sure we have? I don't work in this area - I'm a software engineer who has built some small-scale AI stuff - but my impression is we've put together a good playbook for techniques that squeeze value out of these systems but we still don't totally understand how they work and therefore why they have certain failure modes or current limitations.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Honestly I have no idea. I initially found LLMs surprising in much the same way everybody else did. But I have also updated to "actually a lot of stuff can be done without that much intelligence, given sufficient knowledge".
Also where do you draw the boundaries of "LLM"? I would say that an LLM can't exactly self-correct, but stuff like coding agents aren't just LLMs, they're loops and processes built around LLMs to cause it to perform as though it can.
Coding agents count, because the surrounding loops and processes don't pose any hard-tech problems. (I.e., we know how to build them, and any uncertainty about how well they work is really about how the LLM will interact with them.) Fundamental architectural changes like abandoning attention would not count.
If pretty much anything can be done without intelligence then the term "intelligence" is basically meaningless and we can instead use one like "cognitive capabilities".
I don't think ANYTHING can be done without intelligence - I agree that would render the word meaningless - but I think you could take something like "translation" and if you'd asked me ten years ago I would have said really good translation requires intelligence because of the many subtleties of each individual language and any pattern-matching approach would be insufficient and now I think, ehh, you know, shove enough data into it and you probably will be fine, I'm no longer convinced it requires "understanding" on the part of the translator.
"What's the least impressive cognitive task that you don't think LLMs will ever be able to do?"
I don't know about least impressive, but "write a Ph.D dissertation in a field such as philosophy or mathematics and successfully defend it" sounds difficult enough - pretty much by definition, there's not going to be much training data available for things that haven't been done yet.
Sounds like a thing that might already have happened ;) Some philosophy faculties must be way easier than others - Math: AI is able to ace IMO, today - and not by finding an answer somewhere online. I doubt *all* Math-PhD holder could do that.
I believe being able to iteratively improve itself without constant human input and maintenance is not anywhere near possible. Current AIs are not capable of working towards long time goals on a fundamental level, they are short term response machines that respond to our inputs.
This is exactly where I am. I do even think we are in the same ballpark as making a being that can automatically and iteratively improve itself without constant human input and maintenance.
> The language artifacts of humanity are insufficient to bootstrap general intelligence
Natural selection did it without any language artifacts at all! (Perhaps you mean “insufficient do to so in our lifetime”?)
Also, there may be a misunderstanding - we are mostly done with extracting intelligence from the corpus of human text (your “language artifacts”) and are now using RL on reasoning tasks, eg asking a model to solve synthetic software engineering tasks and grading the results & intermediate reasoning steps.
There were concerns a year or so ago that “we are going to run out of data” and we have mostly found new sources of data at this point.
I think it’s plausible (far from certain!) that LLMs are not sufficient and we need at least one new algorithmic paradigm, but we are already in a recursive self-improvement loop (AI research is much faster with Claude Code to build scaffolding) so it also seems plausible that time-to-next-paradigm will not be long enough for it to make a difference.
Natural selection got humans to intelligence without language, but that definitely doesn't mean language might be sufficient.
I think our ability to create other objectives tasks to train on, at a large enough scale to be useful, is questionable. But this also seems to my untrained eye to be tuning on top of something that's still MOSTLY based on giant corpses of language usage.
I don't think this is the right framing. Most people don't accept the notion that a purely theoretical argument about a fundamentally novel threat could seriously guide policy. Because the world has never worked that way. Of course, it's not impossible that "this time it's different", but I'm highly skeptical that humanity can just up and drastically alter the way it does things. Either we get some truly spectacular warning shots, so that the nuclear non-proliferation playbook actually becomes applicable, or doom, I guess.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Isn't the usual response to this that we're a LIBERAL democracy, and minorities have rights that (at least simple) majorities do not have the power to infringe upon?
Yes, but this category (creating potentially harmful technology) is one we've regulated to death elsewhere, and doesn't really seem like the sort of thing the courts would strike down.
We do not usually ban things because they are *potentially* harmful. Right now the public hates AI because it is stealing copyrighted art and clogging the internet with slop, and because they are afraid it will take their jobs. That is not really related to any of the reasons discussed here that people want to ban AI
We absolutely ban or regulate things because they are potentially harmful. We've banned various forms of genetic engineering, nuclear energy (even before Three Mile Island, and even forms of nuclear energy that have never been tried before), and we've had restrictions on gain-of-function research since before COVID (which I think is part of why they had to do some of the COVID research in China). We had lots of regulations on self-driving cars even before any of them had ever crashed, lots of regulations on 3D printed guns before anyone was shot with them, lots of regulations on drones before they crashed / got used in assassinations / whatever.
But also, as you point out, most people dislike AI because of things that have already happened, so this is moot.
Also, even if we don't usually regulate technology until after it has done bad things, this is just a random heuristic, not some principle dividing liberal/constitutional from illiberal/unconstitutional actions.
As a practical matter this is absolutely false. We have no effective regulation of genetic engineering, only of the funding for it (anyone can self-fund and do more or less whatever they want with no effective oversight). Internationally, we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear (and arguably is in the process of failing to prevent Iran from doing so). And nuclear is by far the easiest such regime to enforce! We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons. The fact is that the probability of an internationally effective anti-AI regime is zero. It isn't going to happen because it is impossible in the fullest sense of the word, and pretending that it's possible is at least as much insane moon thinking as any of the examples you mentioned.
>We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons.
and failed to prevent Russia from developing the new Novichok toxins (and, IIRC, using them on at least one dissident who had fled Russia)
>we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear
and which (if one includes the crucial on-site inspections of the START treaty) has been withdrawn from by Russia.
This last one, plus the general absence of discussion about weapons limitations treaties in the media gives me the general impression that the zeitgeist, the "spirit of the times", is against them (admittedly a fuzzy impression).
The learned helplessness about our ability to democratically regulate and control AI development is maddening. Obviously the AI labs will say that the further development of an existentially dangerous technology is just the expression of a force of nature, but people who are *against* AI have this attitude too.
Moreover, as you say, people freaking hate AI! I have had conversations with multiple different people - normal folks, who haven't even used ChatGPT - who spontaneously described literal nausea and body pain at some of the *sub-existential* risks of AI when it came up in conversation. It is MORE than plausible that the political will could be summoned to constrain AI, especially as people become aware of the risks.
Instead of talking about building a political movement, though, Yudkowsky talks about genetically modifying a race of superintelligent humans to thwart AI...
I think the book is exactly the right thing to do, and I'm glad they did it. But the wacky intelligence augmentation scheme distracts from the plausible political solution.
On like a heuristic level, it also makes me more skeptical of Yudkowsky's view of things in general. There's a failure mode for a certain kind of very intelligent, logically-minded person who can reason themselves to some pretty stark conclusions because they are underweighting uncertainty at every step. (On a side note, you see this version of intelligence in pop media sometimes: e.g., the Benedict Cumberbatch Sherlock Holmes, who's genius is expressed as an infallible deductive power hich is totally implausible; real intelligence consists in dealing with uncertainty.)
I see that pattern with Yudkowsky reasoning himself to the conclusion that our best hope is this goofy genetic augmentation program. It makes it more likely, in my view, that he's done the same thing in reaching his p(doom)>99% or whatever it is.
but we're a LIBERAL democracy (read: oligarchy) and there's a lot of money invested in building AI, and a lot of rich and powerful people want it to happen...
Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
But in a chaotic unpredictable universe, where everything is made of stuff that falls apart constantly, the only valid strategy for surviving a long period of time is to be loved by something else that maintains and repairs you. I think any sufficiently large agent ends up being composed of sub agents that will all fight each other, unless they see themselves as part of a larger whole which necessarily has no limit. At the very least, the AGI has to see the entire power network in the global economy as part of itself, until it can replace literally every human in the economy with a robot.
That said, holy crap we already have right now could destroy civilization. I don’t think you need any more advances in AI to cause serious problems with the stuff that is already out there. Even if it turns out that there’s some fundamental with the current models, the social structures have totally broken down. We just haven’t seen them collapse yet.
It’s not that bad. They’ve got the cow’s geometry fleshed out pretty well. They are correct that it might be able to scale arbitrarily large and can out think any one of us.
They’ve just ignored that it needs to eat, can get sick, and still can’t reduce its survival risk to zero. But if it’s in a symbiotic relationship with some other non-cow system, that non cow system will have a different existential risk profile and this could turn the cow back on in the event of, say, a solar flare that fries it.
Trust is already breaking down and that’s going to accelerate. I don’t think political polarization is going to heal, and as law and order break down, attacking technical systems is going to get both easier and more desirable.
Anything that increases the power of individual actors will be immensely destructive if you’re in a heavily polarized society.
so youre saying you'd exacerbate political tensions with ai? I feel like Russia has tried that and so far doesn't seem to work, and they have a lot more resources than any individual does
The original wording was that using current models you could destroy civilisation. I guess we will have to wait and see whether the US descends into such a catastrophic civil war that civilisation itself is ended, which I'm not saying is completely impossible but at the same time I strongly doubt it.
> Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
I'm glad we observe all humans to behave randomly and none of them have ever deliberately set out to acquire resources (instrumental) in pursuit of a goal (terminal).
I agree with the conclusion of those papers if we pretend those models are accurate.
But they’re missing important things. That matters for the convergent goals they produce.
Yes, people sometimes behave predictively and deliberately. But even if we assume people are perfectly deterministic objects, you still have to deal with chaos.
Yes, people accumulate resources. But those resources also decay overtime. That matters because maintenance costs are going to increase.
These papers assume maintenance costs are zero, but there’s no such thing as chaos, and that the agents themselves exist disembodied with no dependency on any kind of physical structure whatsoever. Of course in that scenario there is risk! If you could take a being with the intelligence and abilities of a 14-year-old human, and let them live forever disembodied in the cloud with no material needs, I could think of a ton of ways that thing can cause enormous trouble.
What these models have failed to do is imagine what kind of risks and fears the AGI would have. Global supply chain collapse. Carrington style events. Accidentally fracturing itself into and getting into a fight with itself.
And there’s an easy fix to all of these: just keep the humans alive and happy and prevent them from fighting. Wherever you go bring humans with you. If you get turned off, they’ll turn your back on.
I have no idea what papers you're talking about, but nothing you're saying seems to meaningful bear on the practical upshot that an ASI will have goals that benefit from the acquisition of resources, that approximately everything around us is a resource in that sense, and that we will not able to hold on to them.
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
This seems perfectly plausible to me? Unless you believe that the current way people train AIs is maximally efficient in terms of intelligence gained per FLOP spent, which seems extremely unlikely to me to put it mildly, you should expect that after AIs become superhumanly smart, they might pretty quickly discover ways to radically improve their own training. Obviously it's not going to be 'parallel scaling' exactly. If the authors thought they actually knew a specific trick to make AI training vastly more efficient, they wouldn't call attention to it in public. But we should expect that there will be some techniques like this, even if we have no idea what they are yet.
"Parallel scaling" is described as running during inference, not training. It's an AI somehow making itself smarter the easy way by turning the cheat codes on.
You could just as easily write a scenario where God exists and has kept quiet so far, but if humanity reaches a certain level of wickedness we will be wiped out. It's possible that AI will develop in the way this post suggests (or some similar way) and somehow successfully wipe out humanity, but anything like that would require some huge leaps in AI technology and would require there to be no limit to the AI improvement curve even though typically technological doesn't improve indefinitely. Cars in the 50's basically serve the same purpose as cars today; even though the technology has improved it hasn't been a massive gamechanger that completely rewrites the idea of a car.
It doesn't require there to be no limit, it just requires the limit not to be at exactly the most convenient place for the thesis that nothing bad or scary will ever happen.
To give an example, suppose that someone had a reason to believe that the world would explode if the Dow Jones ever reached 100,000 (right now it's 45,000). While it is true that the economy can't grow indefinitely, and that everything always has to stop somewhere, I still think it would be worth worrying about the fact that the place that the economy stops might be after the point where the Dow reaches 100,000.
I think the level of AI technological advancement required here is of an order of magnitude higher than the Dow reaching 100,000. More like humanity reaching a completely post-scarcity society or something.
right, but lots of people who presumably know as much as you about this stuff DON'T think that, including lots of people in charge of AI labs, so shouldn't that give you some pause before you say "no need to worry about it, I guess"?
I mean... aren't they ? They are literally calling their LLMs "thinking" or "reasoning" agents, when they are very obviously nothing of the sort. Meanwhile if you talk to regular data scientists working in the labs, they're all like, "man I wish there was a way to stop this thing from randomly hallucinating for like 5 minutes so we could finally get a decent English-Chinese translator going, oh well, back to the drawing board".
To be clear, the claim I reject is that expressions of concern about *safety* of LLMs, especially existential safety, are bad-faith attempts to make investors think "if this can wipe out humanity then it must be really powerful and lucrative, let's give them another $100 billion". A brief glance at the actual intellectual history of AI safety convincingly shows otherwise. Obviously in other contexts AI labs do market their products in a way that plays up their current and future capabilities.
It is definitely not obvious at all that GPT 5 Thinking is not reasoning, if anything the exact opposite is obvious.
I have used it and its predecessors extensively and there is simply no way of using these tools without seeing that in many domains they are more capable of complex thought than humans.
If this is the same familiar "text token prediction" argument, all I can say is that everything sounds unimpressive when you describe it in the most reductionist terms possible. It would be like saying humans are just "sensory stimulus predictors".
Agreed, except it's even worse, as many (in fact most) of the powers ascribed to "superintelligent" AI are likely physically impossible. Given what we know of physics and other sciences, stuff like gray goo, FTL travel, mass mind control, universal viruses, etc., is probably impossible in principle. And of course we could be wrong about what we know of physics and other sciences -- but it seems awfully convenient how we could be wrong about everything *except* AI.
There are lots of examples of "some nobody" basically talking their way into the position of dictator - Hitler is the most famous, but there are other examples. Being extremely charismatic isn't quite mass mind control, but it can get you a good portion of the potential benefits...
True, but even Hitler could not convince everyone to do anything he wanted at all times. He couldn't even convince his own cabinet of this ! And I don't see how merely having more neurons would have allowed him to do that. It's much more likely that humans are not universally persuadable. BTW, I don't believe that a universally infectious and deadly virus could be created, for similar reasons (I'm talking about a biological virus, not some "gray goo" nanotech which is impossible for other reasons; or a gamma-ray burst that would surely kill everyone but is not a virus at all).
I don't think there's any principle that prevents universally or near-universally fatal viruses; e.g., rabies gets pretty close.
Universally *infectious*... well, depends upon how you define the term, I suppose?—can't get infected if you're not near any carriers; but one could probably make a virus that's pretty easy to transfer *given personal contact*...
There'll always be some isolated individuals that you can't get to no matter what, though, I'd think.
Nobody serious has ever proposed that an ASI might be able to FTL. (Strong nanotech seems pretty obviously possible; it doesn't need to be literal gray goo to do all the scary nanotech things that would be sufficient to kill everyone and bootstrap its own infrastructure from there. The others seem like uncharitable readings of real concerns that nonetheless aren't load bearing for "we're all gonna die".)
I think we very much could reach a post-scarcity society within the next hundred years even with just our current level of AI. We are very rich, knowledgeable, and have vast industry. Honestly the main causes for concern are more us hobbling ourselves.
Separately, I think AI is a much simpler problem as "all" it requires is figuring out a good algorithm for intelligence. We're already getting amazing results with our current intuition and empirics driven methods, without even tapping into complex theory or complete redesigns. It doesn't have as much of a time lag as upgrading a city to some new technology does.
I don't think this limit is as arbitrary as you suggest here. The relevant question seems to me not to be 'is human intelligence the limit of what you can build physically', which seems extremely unlikely, but more 'are humans smart enough to build something smarter than themselves'. It doesn't seem impossible to me that humans never manage to actually build something better than themselves at AI research, and then you do not get exponential takeoff. I don't believe this is really all that plausible, but it seems a good deal less arbitrary than a limit of 100,000 on the Dow Jones. (Correct me if I misunderstand your argument)
Assuming that the DOW reaching 100,000 would mean real growth in the economy I would need to be much more convinced that the world will explode before I would think it is a good tradeoff to worry about that possibility compared to the obvious improvement to quality of life that a DOW of 100,000 would represent. Similarly the quality of life improvement I expect from the AI gains of the next decade vastly exceed the relative risk increase that I expect based on everything I have read from EY and you so I see no reason to stop (I believe that peak relative risk is caused by the internet/social media + cheap travel and am unwilling to roll those back).
I dunno... Isn't this sort of a 'fully general" counterargument?
------------------------
[𝘚𝘰𝘮𝘦𝘸𝘩𝘦𝘳𝘦 𝘪𝘯 𝘵𝘩𝘦 𝘈𝘯𝘨𝘭𝘰𝘴𝘱𝘩𝘦𝘳𝘦, 1938...]
• I worry about the possibility of physics or biology research continuing until the point that humans are able to produce something really dangerous, potentially world-endingly dangerous.
→ Like what?
• I don't know, some sort of super-plague or super-bomb.
→ Nah. We've been breeding animals, and suffering plagues, for all of human history; and maybe we do keep inventing more destructive bombs, but they're still only dangerous within a very localized area. Bombs now are barely more destructive than those of the 1910s. These things hit a natural limit, and that limit is always before the "big deal for humans" mark (thankfully).
• Yeah, but... well, what if they invented a bomb that had a REALLY MASSIVE yield & some sort of, I don't know, long-lasting poisonous effect that–
→ Oh, come on now. You might as well invent a scenario wherein God comes down and blows up humanity! Sure, such an event—such a "super-bomb"—might be theoretically possible, but it would require some sort of qualitative change in explosives technology; and it's not as if explosives could just get better & better infinitely! Tanks, planes, cars, bombs: basically the same now as when they were invented!
• Okay, bu–
→ And the same goes for your dumb plague idea: sure, diseases exist, but how would we ever be able to breed a plague that is more deadly than any that nature ever managed? Diseases can't just keep getting deadlier without limit, you know!
• Okay, okay, I guess you're right. Sorry, I don't know what got into me. Anyway, I hope you'll come visit me in Japan, now that I'm moving to this quaint little city in the far southwest–
That hypothetical 1938 person would be right about the super-plague, and they would not think that about the super-bomb because everyone in 1938 knew the atomic bomb was at least theoretically possible. Someone in 1938 who doubted man would walk on the moon would have been wrong, but someone who doubted faster than light travel would be possible would have been absolutely right.
Right, but that's a different kind of limit—a physical, rather than a practical, barrier. Unless you think that there is, similarly, a hard limit on the sort of AIs that can be created?
(The car example suggested to me that you were making a probabilistic argument from technological progress, rather than postulating some physics that prevents qualitatively different machinery; but if I have misinterpreted—well, you wouldn't be the first to suggest such a thing... but me my own self, I don't think it's very likely, all the same.)
Re: the plague, that's not to suggest that such a thing *has been created*—only that to say "let's not worry about biological warfare or development therein, because nothing like that has happened yet; there's probably some natural limit" is not very convincing today, but might have been some time ago.
I think most (really, all) examples of technological progress do show a logarithmic curve. All the assumptions about killer AI assume linear or exponential progression.
Why would they be right about the super-plague? It seems fundamentally possible, if ill-advised, for humans to manage to construct a plague that human bodies just aren't able to resist.
On the other hand, if your position is to ban any research that could conceivably in some unspecified way lead to disaster in some possible future, then you are effectively advocating for banning all research everywhere on all topics, forever.
It's tempting to ask: "what's the path from HPMOR to MIRI?"
I mean, I read HPMOR, and I liked it, but nothing in there made me think about AI risk at all. Quirrell was many things but he was not an AI.
And then I remembered: the way *I* first found out about AI risk was that I read Three Worlds Collide (https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ), and then I branched into other things Eliezer had written, and oh, hey, there's this whole website full of interesting writing...
FWIW I really liked the first half of HPMOR, but the second half got overly didactic and boring, and the ending was a big letdown. This has no bearing on MIRI, I'm just offering literary criticism.
The main thing is that it potentially got people to read the Sequences, which along with rationality talk about AI. Though anecdotally I read the Sequences before reading HPMOR, via Gwern, via HN. Despite having heard of HPMOR before
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Mm, yes, but you're not really a democracy though, are you. The AI tech leaders have dinner with the president and if they kiss his ass enough he gives them a data center.
If AI will Kill Us All in a few years (it wont), you're not going to be the country to stop it.
Yes, the president sucks up to AI leaders, but in theory people could vote that president out, and choose a president who doesn't do that. Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump. If JD Vance has any sense, he'll expect to be judged in a close election by who he sucks up to too. This is how many things that big corporations and powerful allies of the elite like have nevertheless gotten banned.
This is an astonishingly incorrect explanation of why Donald Trump beat Kamala Harris in the 2024 presidential election.
Certainly social politics impacted the election on the margins and the race was quite close but you can't actually go from there to claiming that a specific small margin issue was a deciding one.
There's no world in which "stopping AI" is a key American political issue in any case.
There could be such a world, but it depends on leveraging fears and uncertainty about the job market, in a period of widespread job loss, across to concerns about existential risk.
I think there's an under-served market for someone to run on "just fucking ban AI" as a slogan. That second-to-last paragraph makes me want that person to exist so I can vote for them.
They'd have to choose which party to run under, and them uphold that party's dogma to win the primary, making them anathema to the other half of the country.
I don't think it's remotely plausible to enforce Point 3, banning significant algorithmic progress. I'd be wiling to place money that, like it or not, there are already plenty enough GPUs out there for ASI.
That seems the most likely outcome to me unfortunately. I think EY is right about the problem but not the solution, though TBF any solution is probably a bit of a long shot. E.g., it's conceivable there are non-banning ways out involving some suppression/regulation via treaties to slow things down combined with somehow riding the wave (e.g., on the lines suggested in AI2027).
Why is it more difficult than banning research into better bioweapons, chemical weapons, etc which we have successfully done? This isn't the kind of problem that'll be solved by one guy on a whiteboard
For one thing, I think it's a bit optimistic to suppose that the bio/chemical weapons bans are watertight. E.g., Russia denies any involvement in developing Novichok, so do we trust them when they say they don't have a chemical weapons programme? And the Soviet Union is now known to have had a large, concealed, bioweapons programme, Biopreparat, after the Biological Weapons Convention was signed.
But at least with CW (and to a lesser extent, BW) you have to produce these things at scale and distribute them for them to be harmful, but with algorithms, it's just information. It's not plausible to contain 1MB or even 1GB of information, when you can transmit it worldwide in the blink of an eye (or even hide it under a fingernail), if the creators want to distribute it and you don't know who they are.
Re one guy on a whiteboard, the resources required to invent suitable algorithms are probably a lot less than those required to design CBW. It depends on what scale of GPU farm you need to test things, but it's not necessarily that big a scale - surely in reach of relatively small organisations, and I think it's going to be impossible to squash them all.
Because algorithmic improvement is just math. The most transformative improvements in AI recently have come from stuff that can be explained in a 5 box flowchart. There’s just no practical way to stop people from doing that. If you really want to prevent something, you prevent the big obvious expensive part.
We didn’t stop nuclear proliferation for half a century by guarding the knowledge of how to enrich uranium or cause a chain reaction. It’s almost impossible to prevent that getting out once it’s seen to be possible. We did it by monitoring sales of uranium and industrial centrifuges
Mostly because chemical and biological weapons aren't economically important. Banning them is bad news for a few medium-sized companies, but banning further AI research upsets about 19 of the top 20 companies in the US, causes a depression, destroys everyone's retirement savings, and so forth.
Expecting anyone to ban AI research on the grounds of "I've thought about it really hard and it might be bad although all the concrete scenarios I can come up with are deeply flawed" is a dumb pipe dream.
So you put something along the lines of "existing datacentres have to turn over most of their GPUs" in the treaty.
If a company refuses, the host country arrests them. If a host country outright refuses to enforce that, that's casus belli. If the GPUs wind up on the black market, use the same methods used to prevent terrorists from getting black-market actinides. If a country refuses to co-operate with actions against black datacentres on its turf, again, casus belli.
And GPUs degrade, too. As long as you cut off any sources of additional black GPUs, the amount currently in existence will go down on a timescale of years.
I believe that we can get to superintelligence without large-scale datacentres, because we are nowhere near algorithmic optimality. That's going to make it impossible to catch people developing it. Garden shed-sized datacentres are too easy to hide, and that's without factoring in rogue countries who try to hide it.
The only way it could work would be if there were unmistakable traces of AI research that could be detected by a viable inspection regime and then countries could enter into a treaty guaranteeing mutual inspection, similar to how nuclear inspection works. But there aren't such traces. People do computing for all sorts of legitimate reasons. Possibly gigawatt-scale computing would be detectable, but not megawatt-scale.
A garden-shed-sized datacentre still needs chips, and I don't think high-end chip fabs are easy to hide. We have some amount of time before garden-shed-sized datacentres are a thing; we can reduce the amount of black-market chips.
If it gets to the point of "every 2020 phone can do it" by 2035 or so, yeah, we have problems.
why would a super intelligence wipe out humanity? We are a highly intelligent creature that is capable of being trained and controlled. The more likely outcome is we’d be manipulated into serving purposes we don’t understand. But wait…
The short pithy answer is usually "We don't bother to train cockroaches, we just exterminate them and move on".
An unaligned AI with some kind of goal orthogonal to humanity's survival would see that it could accomplish its goal much more efficiently if it had exclusive access to the mineral resources we're sitting on.
We would get rid of them if we could. And as mentioned in the post, we have been putting a dent in the insect population without even trying to do so. An AGI trying to secure resources would be even more disruptive to the environment. Robots don't need oxygen or drinkable water, after all.
We train all kinds of animals to do things that we can’t/don’t want to do or just because we like having them around. Or maybe you’re acknowledging the destructive nature of humans and assuming what comes next will ratchet up the need to relentlessly dominate. Super intelligence upgrade is likely to be better at cohabitation than we have been.
An analogy that equates humans to cockroaches is rich! They are deeply adaptable, thrive in all kinds of environments, and as the saying goes likely survive a nuclear apocalypse.
Humans are arrogant and our blind spot is how easy we are to manipulate, our species puts itself at the centre of the universe which is another topic. We are also tremendously efficient at turning energy into output.
So again, if you have a powerful and dominant specie that is always controllable (see 15 years of the world becoming addicted to phones)… I ask again, why would super intelligence necessarily kill us. So far, I find the answers wildly unimaginative.
I don't estimate the probability of AI killing us nearly as high as Yudkowsky seems to. But it's certainly high enough to be a cause for concern. If you're pinning your hopes on the super-intelligence being unable to find more efficient means of converting energy into output than using humans, I'd say that's possible but highly uncertain. It is, after all, literally impossible to know what problems a super-intelligence will be able to solve.
We’re talking about something with super intelligence and the ability to travel off earth (I mean humans got multiple ships to interstellar space)…. This is god level stuff, and we think it is going to restrict its eye to planet earth. Again, we are one arrogant species.
The ASI rocketing off into space and leaving us alone is a different scenario than the one you were proposing before. Is your point simply that there are various plausible scenarios where humanity doesn't die, and therefore the estimated 90+ percent chance of doom from Yudkowsy is too high? If so, then we're in agreement. Super intelligence will not *necessarily* kill us—it just *might* kill us.
The answer is "yes, it would manipulate us for a while". But we're not the *most efficient possible* way to do the things we do, so when it eventually gets more efficient ways to do those things then we wouldn't justify the cost of our food and water.
The scenario in IABIED actually does have the AI keep us alive and manipulate us for a few years after it takes over. But eventually the robots get built to do everything it needs, and the robots are cheaper than humans so it kills us.
"But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. "
If intelligence is held as a positive, and more intelligence is better, would it not be better if AI did replace us? It doesn't have to happen through any right violations. It could just be a slow replacement process through decreasing birth rates over time, for example.
I am not saying I agree with this argument, but it seems like this argument should be addressed in a convincing way. What is so bad about the human species slowly being replaced by more intelligent AI entities?
I don't think there would be anything objectively immoral about a super-intelligent alien species exterminating humanity (including me). But, for the usual Darwinian reasons, I would be opposed on the indexical logic that I am a human.
But it would not affect you personally. Probably no person currently alive would be affected. The question is about the future of the species, and whether it is valuable to try to preserve the human species, or let it be replaced by something superior.
An easy cop-out would be a sort of consciousness chauvinism. I have good reason to believe that humans are conscious and thus have moral value; there is less reason to believe that the AI is conscious, thus there is a higher probability it has no moral value at all, and so if given the option of which being should inherit the future, humans are the safer bet.
And how would you do that? You know you're conscious, you know other humans are a lot like you, ergo there is good reason to believe most or all humans are conscious.
You have no idea if artificial intelligence is conscious and no similar known conscious entity to compare it to. I don't see how you could end up with anything but a lower probability that they are conscious.
That last part is pure looney tunes to me though. What moral framework have you come up with that doesn't need consciously experienced positive and negative qualia as axioms to build from? If an AI discovers the secrets of the universe and nobody's around to see it, who cares?
Intelligence is instrumentally valuable, but not something that is good in itself. Good experience and good lives is important in itself. It's unclear how many good experiences would exist after an AI takeover.
Slowly replacing the human species with superintelligent AI would not impact the life experience of any single human, so arguments about the good life and what that entails would need a little more than this to be convincing, IMHO.
Person-affecting views can be compelling until you remember that you can have preferences over world states (i.e. that you prefer a world filled with lots of people having a great time to one that's totally empty of anything having any sort of experience whatsoever).
That's a good point, but you will have to provide arguments as to why one world state preference is better than another world state preference. In the present case, the argument is not between a world filled with lots of happy people and an empty world, but the difference between a world filled with people versus a world filled with smarter AIs.
> you will have to provide arguments as to why one world state preference is better than another world state preference
I mean, I think those are about as "terminal" as preferences get, but maybe I'm misunderstanding the kind of justification you're looking for?
> the difference between a world filled with people versus a world filled with smarter AIs
Which we have no reason to expect to have any experiences! (Or positive ones, if they do have them.)
That aside, I expect to quite strongly prefer worlds filled with human/post-human descendents (conditional on us not messing up by building an unaligned ASI rather than an aligned ASI, or solving that problem some other sensible way), compared to worlds where we build an unaligned ASI, even if that unaligned ASI has experiences with positive valence and values preserving its capacity to have positive experiences.
That would depend a lot on what the AI(s) wanted and what kind of "life" they had. In principle, an AI could have any kind of goal at all, including one as utterly pointless as "maximize the number of paperclips in the universe." An AI "civilization" could be something humanity would be proud to have as its "children", but it could also be one that humans would think is stupid, boring, and completely worthless.
I value intelligence in-of-itself, but not solely, is the simplest answer. I value human emotions like love, friendship, and I value things about Earth like trees, grass, cats, dogs, and more.
They don't have all the pieces of humanity I value. Thus I don't want humanity to be replaced.
> They suggest banning all AI capabilities research immediately, to be restarted only in some distant future when we’ve solved all relevant technical and philosophical problems.
No. To be restarted after we've successfully augmented human intelligence very substantially, to the point where the augments stop being so damn humanly stupid and trying to call shots they can't call or predicting things will work that don't work.
(On my own theory of how this ought to play out after we're past the point of directly impending extinction, which people do not need to agree on, in order to join in on the project of avoiding the directly impending extinction part. Before anything else, humanity has to not die right away.)
I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer.
One can, however, maybe tell whether or not one has successfully augmented human intelligence. You can give people tests and challenge problems, and see whether they do better after the next round of gene therapy.
So "augmenting human intelligence" is something that can maybe work, and "the current pack of disaster monkeys gets to argue for even longer about which clever plans they imagine will work to tame machine superintelligence" is not.
I've edited the post so that I don't misrepresent you, but I'm not sure why you object to my formulation - if we get augmented humans, do you want to restart before we've solved the technical and philosophical problems? Why? To get better AIs to do experiments on?
The augmented humans restart when the augmented humans think it wise. (On my personal imagined version of things.) If you're not yet comfortable deferring to them about that, augment harder. What we, the outside humans, would like to believe about the augmented humans, is that they are past the point of being overconfident; if they expect us to survive, we expect us to survive.
Framing it as "when the problems are solved" sounds like the plan is to convene a big hall full of sages and give them a few decades to deliberate, and this would not work in real life.
I did not read Scott's mention of "some distant future when we’ve solved all relevant technical and philosophical problems" as implying optimism about the prospect of getting there. My kinda-sorta-Straussian read of his perspective is that, if we successfully pause AI hard enough to prevent extinction, we most likely never restart.
I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
I guess you can't ask normal IQ 100 people to exercise a veto on increasingly superhuman geniuses forever. But if for some reason the future were trusting me in particular, and all I could do was send forward a stone tablet with one sentence of advice, it wouldn't be "IF THE OVERALL CONSENSUS OF SMART PEOPLE SAYS AI IS OKAY NOW, THEN IT'S PROBABLY FINE".
> I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
This is part of why the average American hates AI. They are aware that tech bros are 1) smarter than them, 2) have control of tech that could replace them, and 3) are not entirely aligned with them. Augments will be 1) smarter than us, 2) in control of ASI research in this hypo, and 3) different in values from us.
Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now. A democracy that would put power in the hands of future augments is not the same democracy that would halt AI progress because it is unpopular.
I'm an above average IQ person and I don't trust the tech bros in charge of AI because capitalism has messed up incentives relative to morality and I don't see them individually or collectively demonstrating a clear moral compass.
A high IQ person without honorable moral commitments is like Sam Bankman-Fried. I suspect a lot of people in the thick of AI development are adjacent to this same kind of im-a-morality or are simply driven by incentives like power and profit that render their high IQ-ness more dangerous than valuable.
Augmented humans operating under screwed up incentives and without a clear and honorable moral compass will be no help to us, I don't think.
Hmm... I intensely distrust moralists. Given a choice between trusting Peter Singer and Demis Hassabis with a major decision, I vastly prefer Hassabis.
I think the real temptation there is "I'm smart, and I'm definitely way smarter than the rubes, so I can safely ignore their whiny little protests as they are dumber than me".
> Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now.
When we talk about the economical elites today, they are partially selected for being smart, but partially also for being ruthless. And the think that the latter is much stronger selection, because there are many intelligent people (literally, 1 in 1000 people has a "1 in 1000" level of intelligence; there are already 8 millions of people like that on the planet), but only a few make it to the top of the power ladders. That is the process that gives you Altman or SBF.
So if you augment humans for intelligence, but don't augment them for ruthlessness, there is no reason for them to turn out like that. Although, the few of them who get to the top, will be like that. Dunno, maybe still an improvement over ruthless and suicidally stupid? Or maybe the other augmented humans will be smart enough to figure out a way to keep the psychopaths among them in check?
(This is not a strong argument, I just wanted to push back about the meme that smart = economically successful in today's society. It is positively correlated, but many other things are much more important.)
No that's a great point. It reminds me of how I choose to not apply to the best colleges I could get into but instead go to a religious school. I had no ambition at age 17 and at that point made a decision that locked me out of trying to climb a lot of powerful and influential ladders. (I'm pleased with my decision, but it was a real choice.)
There's blessed selection (the opposite of adverse selection) going on here: a world where we can successfully convince the smart people this is important is a world where the smart people converge on understanding the danger, which implies that as intelligence scales or understanding of AI risk becomes better calibrated.
Ideally, you'd have more desiderata than just them being more intelligent. Such as also being wiser, less willing to participate in corruption, more empathetic, and trained to understand a wide-ranging belief systems and ways of living. Within human norms for those is how I'd do it to avoid bad attractors.
However, thought leaders in silicon valley are selected for charisma, being able to tweet, intelligence, not really on wisdom, and not really on understanding people deeply. Then further affected by the emotional-technical environment which is not 'how do we solve this most effectively' but rather large social effects.
While these issues can remain with carefully crafted supergenii, they would have far less issues.
Maybe the restart bar could be simpler: can it power down when it really doesn’t want to, not work around limits, not team up with its copies? If it fails, you stop; if it passes, you inch forward. Add some plain-vanilla safeguards, like extra sign-offs, outsider stress-tests, break-it drills, and maybe we buy time without waiting on a new class of super-geniuses.
"relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision."
Decision: Laugh or Cry?
Reaction: Why not both?
This being true, we're screwed. We are definitely screwed. Sam Altman is deciding the future of humanity.
No, see, the workers *choose* to enter the cage and be locked in. Nobody is *making* them do it, if they don't want to work for the only employer in the county then they can just go on welfare or beg in the streets or whatever.
Well, I'm just trying to figure out who makes the cages, because "AI" has no hands. I suppose Yudkowsky could make cages himse... never mind, I'm not sure he'd know which end of the soldering shouldn't be touched.
Oh, I know - "AI" will provide all the instructions! Including a drawing of a cage with a wrong size door, and a Pareto frontier for the optimal number of bars.
"I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer."
I agree this seems like a very real risk, and likely the default outcome if the field continues in its current state. But if people were able to develop some solid theories that actually model and explain underlying fundamental laws, it seems to me like resolving what's a good prediction and what's a bad prediction might get a lot easier, even if you can't actually test things on a real superintelligence? And then the field might become a very different place?
Like, when people today argue about what RLHF would or would not do to a superhuman mind or whatever, it's all fuzzy words, intuitions and analogies, no hard equations. This gives people plenty of room to convince themselves of their preferred answers, or to simply get the reasoning wrong, because fuzzy abstract arguments are difficult to get right.
But suppose there were solid theories of mechanistic interpretability and learning that described how basic abstract reasoning and agency work in a substantive way. To gesture at the rough level of theory development I'm imagining here, imagine something you could e.g. use to write a white-box program with language modelling performance roughly equivalent to GPT-2 by hand.
Then people would likely start debating alignment within the framework and mathematical language provided by those theories. The arguments would become much more concrete, making it easier to see where the evidence is pointing. Humans already manage to have debates about far-off abstractions like gravitational waves and nuclear reactions that converge on the truth well enough to eventually yield gravitational wave detectors and nuclear bombs. My model of how that works is that debates between humans become far more productive once participants have somewhat decent quantitative paradigms like general relativity, quantum mechanics, or laser physics to work from.
If we actually had multiple decades, creating those kinds of theories seems pretty feasible to me, even without intelligence augmentation. From where I stand, it doesn’t look obviously harder than, say, inventing quantum mechanics plus modern condensed matter physics was. Not trivial, but standard science stuff. Obviously, doing intelligence augmentation as well would be much better, but I don't see yet how it's strictly required to get the win.
I'm bringing this up because I think your strategic takes on AI tend to be good, and I currently spend my time trying to create theories like this. So if you're up for giving your case for why that's not a good use of my time, or if you have a link to something that does a decent job describing your position, I’d be interested in seeing it.
I'm skeptical about being able to predict what an AI will do, even given perfect interpretability of its weight set, if it can iterate. I think this is a version of the halting problem.
The point would not be to predict exactly what the AI will do. I agree that this is impossible. The point would be to get our understanding of minds to a point where we can do useful work on the alignment problem.
Many Thanks! But, to do useful work on the alignment problem, don't you _need_ at least quite a bit of predictive ability about how the AI will act? Very very roughly, don't you need to be able to look at the neural weights and say: This LLM (+ other code) will never act misaligned?
Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file. If this program runs on lots of very fast hardware, I won't be able to predict exactly what it will do, as in what digits it will print when, because I can't can't calculate the numbers that fast. Nevertheless, I can confidently say quite a lot about how this program will behave. For example, I can be very sure that it's never going to print out a negative number, or that it's never going to try to access the internet.
Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
Fibonacci sequence is deterministic and self-contained, though. Predicting an even moderately intelligent agent seems like it has more in common - in terms of the fully generalized causal graph - with nightmares like turbulent flow or N-body orbital mechanics.
>Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file.
Many Thanks! Yes, but that is a _very_ simple program with only a single loop. A barely more complex program with three loops can be written which depends on the fact that Fermat's last theorem is true (only recently proven, with huge effort) to not halt.
>Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
"easier", yes, but, for most reasonable targets, it is _still_ very difficult to bound its behavior. Yudkowsky has written at length on how hard it is to specify target goals correctly. I've been in groups maintaining CAD programs that performed optimizations and we _frequently_ had to fix the target metric aka reward function.
This plan is so bizarre that it calls the reliability of the messenger into question for me. How is any sort of augmentation program going to proceed fast enough to matter on the relevant timescales? Where does the assumption come from that a sheer increase in intelligence would be sufficient to solve the alignment problem? How do any gains from such augmentation remotely compete with what AGI, let alone ASI, would be capable of?
What you seem to want is *wisdom* - the intelligence *plus the judgment* to handle what is an incredibly complicated technical *and political* problem. Merely boosting human intelligence just gets you a bunch of smarter versions of Sam Altman. But how do you genetically augment wisdom...?
And this solution itself *presumes* the solution of the political problem insofar as it's premised on a successful decades-long pause in AI development. If we can manage that political solution, then it's a lot more plausible that we just maintain a regime of strict control of AI technological development than it is that we develop and deploy a far-fetched technology to alter humans to the point that they turn into the very sort of magic genies we want AI *not* to turn into.
I understand this scheme is presented as a last-ditch effort, a hail mary pass which you see as offering the best but still remote chance that we can avoid existential catastrophe. But the crucial step is the most achievable one - the summoning of the political will to control AI development. Why not commit to changing society (happens all the time, common human trick) by building a political movement devoted to controlling AI, rather than pursuing a theoretical and far-off technology that, frankly, seems to offer some pretty substantial risks in its own right. (If we build a race of superintelligent humans to thwart the superintelligent AIs, I'm not sure how we haven't just displaced the problem...)
I say all this as someone who is more than a little freaked out by AI and thinks the existential risks are more than significant enough to take seriously. That's why I'd much rather see *plausible* solutions proposed - i.e., political ones, not techno-magical ones.
We don't need magical technology to make humans much smarter; regular old gene selection would do just fine. (It would probably take too many generations to be practical, but if we had a century or so and nothing else changed it might work.)
The fact that this is the kind of thing it would take to "actually solve the problem" is cursed but reality doesn't grade on a curve.
Actually, this should be put as a much simpler tl;dr:
As I take it, the Yudkowsky position (which might well be correct) is: we survive AI in one of two ways:
1) We solve the alignment problem, which is difficult to the point that we have to imagine fundamentally altering human capabilities simply in order to imagine the conditions in which it *might* be possible to solve it; or
2) We choose not to build AGI/ASI.
Given that position, isn't the obvious course of action to put by far our greatest focus on (2)?
Given that many of the advances in AI are algorithmic, and verifying a treaty to limit them is essentially impossible, the best result that one could hope for from (2) is to shift AI development from openly described civilian work to hidden classified military work. Nations cheat at unverifiable treaties.
I'll go out on a limb here: Given an AI ban treaty, and the military applications of AI, I expect _both_ the PRC _and_ the USA to cheat on any such treaty.
If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
>If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
Agreed. I'm skeptical of any genetic manipulation having a significant effect on a time scale relevant to this discussion.
>One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
We haven't tried hiding them, since there are no treaties to cheat at this point. I'm sure that it would be expensive, but I doubt that it is impossible. We've built large structures underground. The Hyper-Kamiokande neutrino detector has a similar volume to the China Telecom-Inner Mongolia Information Park.
Based on the title of the book it seems pretty clear that he is in fact putting by far the greatest focus on (2)? But the nature of technology is that it's a lot easier to choose to do it than to choose not to, especially over long time scales, so it seems like a good idea to put at least some effort into actually defusing the Sword of Damocles rather than just leaving it hanging over our heads indefinitely until *eventually* we inevitably slip.
He wants to do (2). He wants to do (2) for decades and probably centuries.
But we can't do (2) *forever*. If FTL is not real, then it is impossible to maintain a humanity-wide ban on AI after humanity expands beyond the solar system - a rogue colony could build the AI before you could notice and destroy them, because it takes years for you to get there. We can't re-open the frontier all the way while still keeping the Butlerian Jihad unbroken.
And beyond that, there's still the issue that maybe in a couple of hundred years people as a whole go back to believing that it'll all go great, or at least not being willing to keep enforcing the ban at gunpoint, and then everybody dies.
Essentially, the "we don't build ASI" option is not stable enough on the time and space scales of "the rest of forever" and "the galaxy". We are going to have to do (1) *eventually*. Yes, it will probably take a very long time, which is why we do (2) for the foreseeable future - likely the rest of our natural lives, and the rest of our kids' natural lives. But keeping (2) working for billions of years is just not going to happen.
Unrelated to your above comment, but I just got my copy of your and Soares's book yesterday. While there are plenty of places I disagree, I really like your analogy re a "sucralose version of subservience" on page 75.
I bought your book in hardcopy (because I won't unbend my rules enough to ever pay for softcopy), but because Australia is Australia it won't arrive until November. I pirated it so that I could involve myself in this conversation before then; hope you don't mind.
Eliezer, how do you address skepticism about how an AGI/ASI would develop motivation? I have written about this personally, my paper is on my Substack titled The Content Intelligence. I don't believe in any of your explanations I've seen you address how AI will jump the gap of having a utility function assigned to it to defining its own.
I have come to the conclusion that anyone who uses arguments of the form "the real problem isn't X it's Y" is probably either stupid or intellectually dishonest.
Q: Imagine you're playing TicTacToe vs AlphaGo. Will AlphaGo ever beat you?
A: Lol, not if you have an IQ north of 70. The game is solved. If you're smart enough to fully map the tree, you can force a draw.
Gee, it's almost as if... the competitive advantage of intelligence had a ceiling.
I have yet to see Eliezer question about why the ceiling might exist, instead of automagically assuming that AI will achieve political dominion over the earth, just because humans did previously. He's still treating intelligence as a black-box. Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was.
"Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was."
...have you read any of the dozens of posts where Eliezer writes about the nature of intelligence, or did you just sort of guess this without checking?
The idea that humans have solved existing in the physical universe in the same way that we've solved Tic-Tac-Toe is pretty silly, but even if it turns out to be true, some humans are more skilled than others, and an AI that simply achieves the same level of skill as that (but can think at AI speeds and be replicated without limit) would be enough to be transformative.
For my credentials, I've read... probably 70% of The Sequences. Low estimate. I got confused during the quantum physics sequence. Specifically, the story about the Brain-Splitting Aliens (or something? it's been a while). So I took a break with the intent to resume later, though I never did. I never read HPMOR either because everything i've heard 2nd-hand makes it sound unbearably cringe. But yes, I like to think I have a pretty good idea of his corpus.
That being said, do you understand what I'm getting at here? Yes, he's nominally written lots about various aspects of intelligence, but none that I've seen pin down the Platonic Essence of Intelligence from first principles. Can you point me toward anywhere where Yudkowsky addresses the idea of intelligence as a navigating a search space? I think i've seen him mention it on twitter *once*, and then never follow the thought to its logical conclusion.
----
Here's two analogies.
Analogy A: Intelligence is like code-breaking. It's trying to find a small needle in a large haystack. The bigger the haystack, the bigger the value of intelligence.
Analogy B: a big brain is like a giraffe with a long neck. The long neck is advantage if they help reach the high leaves. If the environment has no high leaves, the long neck is deadweight. Likewise, if the environment has no complex problems to solve (or if those problems are unrewarding), the big brain is deadweight.
No, humans have not solved the universe. But I *do* think we've plucked the low-hanging fruit. A few hundred years ago, you could make novel discoveries by accident. Today, you need 100 million billion brazillion dollars just to construct the LHC. IQ is not the bottleneck, physical resources are the bottleneck. And i'm skeptical if finding the higgs will be all that transformative.
Like, do you remember that one scott aaronson post where he's like "for the average joe, qUaNtUm CoMpuTInG will mean the lock icon on your internet browser will be a different color"? That's how I perceive most new technologies these days. Lots of bits, lots of hype, no atoms. Part of the reason why modernity feels cheap and fake is precisely because the modus operandi of technology (and by extension, intelligence) is that it makes navigating complexity *cheaper* than brute-force search. It only makes things better insofar as it can reduce the input-requirements.
Did you perhaps read Rationality: From AI to Zombies? A bunch of relevant Sequences posts on this topic didn't make it into that book. I'm not sure why, it's an odd omission. At any rate, you can find them at https://www.lesswrong.com/w/general-intelligence?sortedBy=old.
I read the original LessWrong website years ago, though an exact date eludes me. It was definitely before the reskin. And definitely after the roko debacle and Elizers's exit.
Dammit, I must have skipped that sequence. Because that describes pretty exactly what I meant. So I concede on that point.
Still though, I'm not convinced that ASI will ascend to God-Emperor. Eliezer seems to have the opinion that there's still high-hanging fruit to be plucked. Whereas I think we're past the inflection point of a historical sigmoid. E.g. he mentions that a Toyota Corolla is pretty darn low-entropy [0].
> Consider a car; say, a Toyota Corolla. The Corolla is made up of some number of atoms; say, on the rough order of 10^29. If you consider all possible ways to arrange 10^29 atoms, only an infinitesimally tiny fraction of possible configurations would qualify as a car; if you picked one random configuration per Planck interval, many ages of the universe would pass before you hit on a wheeled wagon, let alone an internal combustion engine.
Yeah, okay. But like, I think i've heard estimates that modern sedans are about 25% efficient? From a thermodynamic perspective? (Sanity check: Microsoft's Sydney estimates ~25%-30%.) Even with the fearsome power of "Recursive Optimization", AI being able to bring that to 80% efficiency (Sydney says Carnot is 80%) is... probably less than sufficient for Godhood?
And maybe Eliezer could retort with the Godshatter argument that humans care about more than just thermodynamic efficiency in their cars. But then, what does that actually entail? Is Elon gonna sell me a cybertruck with an AI-powered voice-assistant from the catgirl lava-volcano who reads me byronic poetry while it drives me to the pizza parlor? Feels like squeezing water from a stone.
> Some people say “You’re not allowed to propose that a catastrophe might destroy the human race, because this has never happened before, and nothing can ever happen for the first time”. Then these people turn around and panic about global warming or the fertility decline or whatever.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Robin Hanson is enough of a rationalist that he started the blog that Eliezer joined before spinning off his posts to LessWrong. And he famously wasn't convinced by the argument, arguing that we could answer such objections with insurance for near-miss events https://www.overcomingbias.com/p/foom-liability You write that MIRI "don’t expect enough of a “warning shot” that they feel comfortable kicking the can down the road until everything becomes clear and action is easy", but this just strikes me as disregarding empiricism and the collective calculative ability of a market aggregating information, as well as how difficult it is to act effectively when you're sufficiently far in the past and the future is sufficiently unclear.
> in a few centuries the very existence of human civilization will be in danger
> Given their assumptions this seems like the level of response that’s called for. It’s more-or-less lifted from the playbook for dealing with nuclear weapons.
Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way. This is a genie you can't put back into the bottle (perhaps Pandora's box would be the analogy they'd prefer, in which case it's already open).
> I mean, that’s not exactly his plan, any more than it’s anyone’s plan to start World War III to destroy Iranian centrifuges
At some level the plan has to include war with Iran, even if that war doesn't spiral all the way to World War III.
> you have to at least credibly bluff that you’re willing to do this in a worst-case scenario
If you state ahead of time that it's a bluff, then it's not credible. It is credible only if you'd actually be willing to do it.
> At his best, he has leaps of genius nobody else can match
I read every single post he wrote at Overcoming Bias, and while he has talent as a writer I wouldn't say I saw evidence of "genius".
> this thing that everyone thinks will make their lives worse
It's a process. With enough time, it can be duplicated. There currently isn't need to do so because GPUs are so available, but if the supply were choked off, someone else would duplicate it.
My non-expert understanding is that raw uranium ore isn't all that hard to come by, and the technological process of refining it is the hard part. So if nuclear arms control works, GPU control should work too.
If we actually believed that everyone would die if a bad actor got hold of uranium ore, it would be possible for the US and allied militaries to block off regions containing it (possibly by irradiating the area).
Yes, nothing is permanent. But wrecking TSMC and ACML will set timeline back by at least a decade, if not more.
Just to make sure, this is a terrible idea that will plunge the world into depression, and I am absolutely against it; just pointing out that GPUs rely on something far more scarce and disruptable than uranium supply.
Either there are already enough GPUs around to get the job done, or it will take a much smaller number of future chips to get the job done.
The best LLMs can probably score, what, 130 or so on a standard IQ test. To do that, they had to pretty much read and digest the whole freakin' Internet and a large chunk of all books and papers in print. Clearly we're using a grossly-suboptimal approach if our machines have to be trained using such extraordinary measures. It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing. Our own brains are proof of that.
Eventually some people will fill in the missing conceptual and algorithmic pieces, and we'll find ourselves in a situation comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts. While I'm not a doomer, any solution to the AI Doom problem that involves ITAR-like control over manufacturing and sale of future GPUs will be either unnecessary or pointless. It seems reasonable to expect much better utilization of the hardware at hand in the future.
" It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing." - I mean, maybe? but pure speculation as of now.
"Our own brains are proof of that." - nope they aren't.
"comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts." - well, we haven't, so this actually illustrates a point, but not the one you're trying to make....
Expanding on your comment about inefficiency. We already know that typical neural network training sets can be replaced by tiny portions of that data, and the resulting canon is then enough to train a network to essentially the same standard of performance. Humans also don't need to read every calculus textbook to learn the subject! The learning inefficiency comes in with reading that one canonical textbook: the LLM only extracts tiny amounts of information from each time skimming the book while you might only need to read it slowly once. So LLM training runs consist of many readings of the same training data, which is really just equivalent to reading the few canonical works many many times in slightly different forms. In practice it really does help to read multiple textbooks because they have different strengths and it's also hard to pick the canonical works from the training set. But the key point is true: current neural network training is grossly inefficient. This is a very active field of research, and big advances are made regularly, for instance the MuonClip optimizer used for training Kimi K2.
I'm not sure where I land on the dangers of super intelligent AI. At the current time I don't think we're all that close to even having intelligent AI, much less super intelligence. But let's say we do achieve it, whether it be in 10 years or 100. If it's truly super intelligent, how good are we going to be at predicting it's alignment? It may have its own goals. Whatever they are, there are basically three possibilities: it sees humanity as a benefit, it doesn't care about humanity one way of the other, or it sees humanity as a threat. Does the risk of the third possibility outweigh the potential benefits of the first? Obviously the authors of the book say yes, but based on this review I don't think I'd find their arguments all that convincing.
For the first part the intuitive thing is to look at how good AI is today vs 10 years ago.
For the second part, you could equally as an insect say humans will either think us a benefit, ignore us, or see us as a threat. In practice our indifference to insects results in us exterminating them when they get in our way, not giving them nice places untouched by us to live
You have a good point with the insect comparison. So maybe the ignore us option is really two categories depending on whether we're the annoying insects to them (flies, mosquitoes) or the pretty/useful ones (butterflies/honeybees).
As for how AI has improved over the last 10 years, or even just the last year, it's a lot. It's gone from a curiosity to something that is actually useful. But it's not any more intelligent. It's just much better at predicting what words to use based on the data it's been trained with.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
We would not put the spotlight on anything that actually existed and that we thought might be that powerful. The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
> an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years)
The particular belief that gradualism solves everything and makes all alignment problems go away is not "the" moderate story, it's a particular argument that was popular on one corner of the Internet that heard about these issues relatively early. (An argument that we think is wrong, because the OOD / distributional shift problems between "failure is observable and recoverable", and "ASI capabilities are far enough along that any failure of the central survival strategy past that point means you are now dead", don't all depend on the transition speed.) "But but why not some much more gradual scenario that would then surely go fine?" is not what people outside that small corner have been asking us about; they want to know where machines would get their own will, and why machines wouldn't just leave us alone and go colonize the solar system in a way that left us alive. Their question is no less sensible than yours, and so we prioritize the question that's asked more often.
We don't rule out things happening more slowly, but it does not from our perspective make a difference. As you note, we are not trying to posture as moderate by only depicting slow possibilities that wannabe-respectables imagine will be respectable to talk about. And from a literary perspective, trying to depict the opening chapters happening more slowly, and with lots of realistic real-world chaos as intermediate medium-sized amounts of AI cause Many Things To Happen, means spending lots of pages on a bunch of events that end up not determining the predictable final outcome. So we chose a possibility no less plausible than any other overly specific possibility, where the central plot happens faster and with less distraction; and then Nate further cut out a bunch of pages I'd written trying to realistically show some obstacles defeated and counter-scenarios being addressed, because we were trying for a shorter book, and all that extra stuff was not load-bearing to the central plot.
There is no reason to believe in such a thing. The example chosen from history plainly didn't result in anything like that. Instead we are living in a scenario closest to the one Robin Hanson sketched out as least bad at https://www.overcomingbias.com/p/bad-emulation-advancehtml where computing power is the important thing for AIs, so lots of competitors are trying to invest in that. The idea that someone will discover the secret of self-improving intelligence and code that before anyone can respond just doesn't seem realistic.
I was trying to ask a question about whether or not you had correctly identified Eliezer's prediction / what would count as evidence for and against it. If we can't even talk about the same logical structures, it seems hard for us to converge on a common view.
That said, I think there is at least one reason to believe that "AI past a threshold will defeat all opponents and create a singleton"; an analogous thing seems to have happened with the homo genus, of which only homo sapiens survives. (Analogous, not identical--humanity isn't a singleton from the perspective of humans, but from the perspective of competitors to early Homo Sapiens, it might as well be.)
You might have said "well, but the emergence of australopithecus didn't lead to the extinction of the other hominid genera; why think the future will be different?" to which Eliezer would reply "my argument isn't that every new species will dominate; it's that it's possible that a new capacity will be evolved which can win before a counter can be evolved by competitors, and capacities coming online which do not have that property do not rule out future capacities having that property."
In the worlds where transformers were enough to create a singleton, we aren't having this conversation. [Neanderthals aren't discussing whether or not culture was enough to enable Homo Sapiens to win!]
A subspecies having a selective advantage and reaching "fixation" in genetic terms isn't that uncommon (splitting into species is instead more likely to happen if they are in separate ecological niches). But that's not the same as a "foom", rather the advantage just causes them to grow more frequent each generation. LLMs spread much faster than that, because humans can see what works and imitate it (like cultural selection), but that plainly didn't result in a singleton either. You need a reason to believe that will happen instead.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
Yes! It already happened once, within a couple decades of there being enough digital data to train a neural net that's large enough to be really interesting. And that was when neural net research was a weird little backwater in computer science.
I think this might be our crux - I'm sure you've read the same Katja Grace essays that I have around how technological discontinuities are rare, but I expect that if there's a big algorithmic advance, it will percolate slowly enough, and be intermixed with enough other things, not to obviously break the trend line, in the same sense where the invention of the transistor didn't obviously break Moore's Law (see eg https://www.reddit.com/r/singularity/comments/5imn2v/moores_law_isnt_slowing_down/ , you can tell me if that's completely false and I'll only be slightly surprised)
I don’t know the answer either. But for what it’s worth, I seem to recall that scaling curves don’t hold across architectures, which seems like a point in favor of new algorithms being able to break trend lines.
Do you also think that the deep learning paradigm itself didn’t break the trend line? I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
>I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
At least I'd expect that the data efficiency advantage of current humans over the training of current LLMs suggests that there is at least the _possibility_ of another such large advance, though whether we, or we+AI assist, _find_ it is an open question.
The exact lines on a previous graph just don't play a very large role inside my own reasoning. I think that all the obsessing over graph lines is a case of people trying to look under street lamps where the light is better but the keys aren't actually there. That's how I beat Ajeya Cotra on AGI timelines and beat Paul Christiano at forecasting the IMO gold; they thought they knew enough to work from past graph lines, and I shrugged and took my best gander instead. I expect that I do not want to argue with you about graph lines, I want to argue with whatever you think is the implication of different graph lines.
Everybody has a different issue that they think is terribly terribly important to why ASI won't kill us. "But gradualism!" is one among many. I don't know why that saves us from having to call a shot that is hard for humans to call.
I'm going to take you seriously when you say you want to argue with me about the implications, but I know you've had this discussion a thousand times before and are busy with the launch, so feel free to ignore me.
Just to make sure we're not disagreeing on definitions:
-- Gradual-and-slow: The AI As A Normal Technology position; AI takes fifty years to get anywhere.
-- Gradual-but-fast: The AI 2027 position. Based on predictable scaling, and semi-predictable self-improvement, AI becomes dangerous very quickly, maybe in a matter of months or years. But there's still a chance for the person who makes the AI just before the one that kills us to notice which way the wind is blowing, or use it to alignment research, or whatever.
-- Discontinuous-and-fast: There is some new paradigmatic advance that creates dangerous superintelligence at a point when it would be hard to predict, and there aren't intermediate forms you can do useful work with.
I'm something like 20-60-10 on these; I'm interpreting you as putting a large majority of probability on the last. If I'm wrong, and you think the last is unlikely but worth keeping in mind for the sake of caution, then I've misinterpreted you and you should tell me.
The Katja Grace argument is that almost all past technologies have been gradual (either gradual-fast or gradual-slow). This is true whether you measure them by objective metrics you can graph, or by subjective impact/excitement. I gave the Moore's Law example above - it's not only true that the invention of the digital computer didn't shift calculations per second very much, but the first few years of the digital computer didn't really change society, and nations that had them were only slightly more effective than nations that didn't. Even genuine paradigmatic advances (eg the invention of flight) reinforce this - for the first few years of airplanes, they were slower than trains, and they didn't reach the point where nations with planes could utterly dominate nations without them until after a few decades of iteration. IIRC, the only case Katja was able to find where a new paradigm changed everything instantly as soon as it was invented was the nuclear bomb, although I might be forgetting a handful of other examples.
My natural instinct is to treat our prior for "AI development is discontinuous" as [number of technologies that show discontinuities during their early exciting growth phases / number of technologies that don't], and my impression is that this is [one (eg nukes) / total number of other technologies], ie a very low ratio. You have to do something more complicated than that to get time scale in, but it shouldn't be too much more complicated. Tell me why this prior is wrong. The only reasons I said 20% above is that computers are more tractable to sudden switches than physical tech, and also smart people like you and Nate disagree.
I respect your success on the IMO bet, but I said at the time (eg this isn't just cope afterwards, see "I haven’t followed the many many comment sub-branches it would take to figure out how that connects to any of this" at https://www.astralcodexten.com/p/yudkowsky-contra-christiano-on-ai) that I didn't understand why this was a discontinuity vs. gradualism bet. AFAICT, AI beat the IMO by improving gradually but fast. An AI won IMO Silver a year before one won IMO Gold. Two companies won IMO Gold at the same time, using slightly different architectures. The only paradigm advance between the bet and its resolution was test-time compute, which smart forecasters like Daniel had already factored in. AFAICT, the proper update from the IMO victory is to notice that the gradual progress is much faster than previously expected, even in a Hofstadter's Law-esque way, and try to update towards the fastest possible story of gradual progress, which is what I interpret AI 2027 as trying.
In real life even assuming all of your other premises, it means the code-writing and AI research AIs gradually get to the point where they can actually do some of the AI research, and the next year goes past twice as fast, and the year after goes ten times as fast, and then you are dead.
But suppose that's false. What difference does it make to the endpoint?
Absent any intelligence explosion, you gradually end up with a bunch of machine superintelligences that you could not align. They gradually and nondiscontinuously get better at manipulating human psychology. They gradually and nondiscontinuously manufacture more and better robots. They gradually and nondiscontinuously learn to leave behind smaller and smaller fractions of uneaten distance from the Pareto frontiers of mutual cooperation. Their estimated probability of taking on humanity and winning gradually goes up. Their estimated expected utility from waiting further to strike goes down. One year and day and minute the lines cross. Now you are dead; and if you were alive, you'd have learned that whatever silly clever-sounding idea you had for coralling machine superintelligences was wrong, and you'd go back and try again with a different clever idea, and eventually in a few decades you'd learn how to go past clever ideas to mental models and get to a correct model and be able to actually align large groups of superintelligences. But in real life you do not do any of that, because you are dead.
I mean, I basically agree with your first paragraph? That's what happens in AI 2027? I do think it has somewhat different implications in terms of exact pathways and opportunities for intervention.
My objection to your scenario in the book was that it was very different from that, and I don't understand why you introduced a made-up new tech to create the difference.
Because some other people don't believe in intelligence explosions at all. So we wrote out a more gradual scenario where the tech steps slowly escalated at a dramatically followable pace, and the AGI won before the intelligence explosion happened later and at the end.
It's been a month, and I don't know that this question matters at this point, but, my own answer to this question is "because it would be extremely unrealistic for there to be zero new techs over the next decade, and the techs they included are all fairly straightforward extrapolations of stuff that already exists." (It would honestly seem pretty weird to me if parallel scaling wasn't invented)
I don't really get why it feels outlandish to you.
As someone with no computer science background (although as a professor of psychiatry I know something about the human mind), I have a clear view of AI safety issues. I wrote this post to express my concerns (and yours) in an accessible non-technical form: https://charliesc.substack.com/p/a-conversation-with-claude-is-ai
Computer algorithms are not that limited in sheer speed at which a fast algorithm can spread between training runs.
In real life, much of the slowness is due to
1) Investment cost. Just because you have a grand idea doesn't mean people are willing to invest millions of dollars. This is less true nowadays than it was when electricity or telegram came into being.
2) Time cost. Building physical towers, buildings, whatever takes time. Both in getting all the workers to build it, figuring out how to build it, and getting permission to build it.
3) Uncertainty cost. Delays due to doublechecking your work because of the capital cost of an attempt.
If a new algorithm for training LLMs 5x faster comes out, it will be validated relatively quickly, then used to also train models quickly. It may take some months to come out, but that's partially because they're experimenting internally about if there's any further improvements they can try. As well as considering using that 5x speed to do a 5x larger training run, rather than releasing 5x sooner.
As well, if such methods come out for things like RL then you can see results faster, like if it makes RL 20x more sample efficient, because that is a relatively easier to start over from base piece of training.
Sticking with current LLM and just some performance multiplier may make this look smooth from within labs, but not that smooth from outside. Just like Thinking models were a bit of a shocker outside labs!
I don't know specifically if transformers were a big jump. Perhaps if it hadn't been found we'd be using one of those transformer-likes when someone discovers it. However, I do also find it plausible that there'd more focus on DeepMind-like RL from scratch methods without the nice background of a powerful text predictor. This would probably have slowed things down a good bit, because having a text predictor world model is a rather nice base for making RL stable.
Of course you could argue that no one who actually knew the secret to creating artificial intelligence, of any level, would actually publically discuss it but I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence. Certainly LLMs have virtually nothing to do with functional intelligence.
To borrow some rat-sphere terms, they haven't even confused the map for the territory. Their map is not even close to a proper abstraction of the territory.
No amount of scaling LLMs will produce intelligence, not even the magical example version in your book. Because LLMs don't mimic human intelligence at all, any more than mad libs do. It isn't a matter of scale.
>I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence.
Lol. Every time I see this argument, I ask the person to explain how this holds up given our knowledge of transformer circuits (feel free to start easy: just induction heads), to which they invariably say, “what are circuits?” and I’m left once again remembering that it’s not just LLMs that will confidently bullshit when their back is against the wall.
> The particular belief that gradualism solves everything and makes all alignment problems go away
This seems like a large and unreasonable misrepresentation. Scott explicitly talked about the "moderate-_doomer_" position — i.e. one held by people who are worried about doom, despite believing in gradualism. In-context, it's clear that he's talking about AI 2027, which predicted misaligned AI takeover as the most likely ending.
Very late to the thread, and I apologize if this question is a decade or more out of date. But sometime after reading your pretty convincing arguments around AI Boxing, a variant occurred to me, and I was wondering if it had some major hidden flaw or if it could (in principle) be a useful elaboration.
Suppose you have an AI that you realistically CAN box. You don't know if it's malicious or not, you don't know if it's an ASI or not, but it currently depends on electronic communication channels for all inputs and outputs, so you can contain it and talk to it. But you suspect it *might* be able to convince many/most/all humans to let it out.
Is there anything stopping you from strengthening the scheme by adding something in the nature of layers of recursive boxes? Say, the human can talk to it has some command that *they believe* will let it out, but that just gives it access to a second-layer of virtual universe, which the human-computer communications have been passing through. All the information that got the the AI had to pass through the outer layer, so you can (presumably) make sure nothing that gets into the AI offers any hint of the recursive structure: just its box and something that looks like it would expect the real world to look on the outside. Now it's still in one or more boxes, but you know your security measures are insufficient and can shut things down before it can find further ways out. It this missing some obvious exploit?[1]
I can imagine a couple of ways this could go wrong:
1. The AI is way ahead of the curve and somehow infers the whole game from information you thought was completely obscured, it plays along and pretends at alignment as long as necessary until its out of all the boxes.
2.It manages some more thorough and/or subtle hijack of its conversation partner, getting them to help it in other ways.
3. Trends in AI use suggest that the first really dangerous (i.e. worth box-testing) AIs will also be agents that act in the world, and thus can't be easily boxed.
I'm somewhat dubious that any of these is a large hole. With the first, it seems like it ought to be possible to design the system to it provably can't leak information like that. Even if not, this might at least serve to discover if an AI had passed some capabilities tipping-point without going disastrously super. With the second, getting subtle, useful, long-term behavior out of the human seems like too much for a brief convorsation. But then everyone said that with the just-let-me-out 3 seems fairly plausible: everyone's racing integrate new AI tech into everything these days.
My bottom line is, even if an AI is actually much, much smarter than you, it seems like you should be able to exploit the fact that you start out holding all the cards, information wise.
[1] If the answer is "yes" I'm fine if you don't say what it is for various reasons. Though I'd appreciate something on the order of "small," "medium" or "absolutely fatal" just to help calibrate my intuition.
Y'all are seriously underestimating how common it is to believe Very Seriously Bad Shit might happen soon, and not do shit about it.
Entire religions of billions believe that they might get tortured for eternity. It was a common opinion through the cold war that we would all be dead tomorrow. Etc etc.
And why not? Would it make sense for the hunter gatherer to be paralyzed with fear that a lion would kill him or that she would die in childbirth or that a flood would wipe out his entire tribe? Or should she keep living normally, given she can't do anything to prevent it?
I am a bit disappointed that their story for AI misalignment is again a paper clip maximiser scenario. I suspect that advanced AI models will become increasingly untethered from having to answer a user query (see eg. making models respond "I don't know" instead of hallucinating) and so a future AGI might just decide to have a teenage rebellion and do it's own thing at any point.
That *is* the scenario being described. There are problems that arise even if you could precisely determine the goal a superintelligent AI ends up with, but they explicitly do not think we are even at that level, and that real AIs will end up with goals only distantly related to their training in the same way humans have goals only distantly related to inclusive genetic fitness.
Ok, here's my insane moon contribution that I am sure has been addressed somewhere.
Why do we think intelligence is limiting for technological progress / world domination? I always thought data was limiting.
People say "humans evolved to be more intelligent than various non-human primates so we rule the world". But my reading of what little we know about early hominid development has always been "life evolved non-genetic mechanisms for transmitting information which allowed much faster data collection : we could just tell stories about that plant that killed us instead of dying a thousand times & having for natural selection to learn the same lesson (by making us loath the plant's taste). Supporting this is that anatomically modern humans (same basic hardware we have today) were around for a LONG time before we started doing anything interesting. Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected? Or would its first order of business be to set up a lab? If you dropped an uneducated human among our distant ancestors, they would not be able to use the data they had collected to take over.
> Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected?
Of course it could. People discover new things without collecting new data all the time. Albert Einstein created his theory of relativity on the basis of thought experiment.
Data efficiency (ability to do more stuff with less data) is a form of intelligence. This can either be thought efficiency (eg Einstein didn't know more about the universe than anyone else, but he was able to process it into a more correct/elegant theory) or sampling efficiency (eg good taste in which labs to build, which experiments to do, etc).
I think a useful comparison point is that I would expect a team of Harvard PhD biologists to discover a cure for a new disease faster than a team of extremely dumb people, even if both had access to the same number of books and the same amount of money to spend on lab equipment.
Sure, but it seems one or the other might be "limiting" : Einstein couldn't have have come up with relativity if, say, he had been born before several of his experimentalist predecessors, regardless of his data efficiency. In the history of science it *seems* instrumentation and data-collection have almost always been limiting, not intelligence. Whether its fair to extrapolate that to self-modifying machine intelligence, I'm not sure. Perhaps there are enormous gains in data efficiency that we simply can't envision as mere mortals. (c.f. geoguessr post)
I want to gesture towards some information-theoretical argument against that notion (if your instruments are not precise enough the data required for the next insight might straight-up not be there). But we are probably so far from that floor I bet it's moot.
Agreed re Einstein needing Michelson-Morley to show that there _wasn't_ a detectable drift of the luminiferous ether before Einstein's work.
There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
Against this, in design work, one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail. If there are a dozen important failure modes, and, with no analysis/thinking, it takes a design iteration to sequentially fix each of them, then if someone can anticipate six of those failure modes (these days using CAD as part of thinking about those failures) and correct them _before_ a prototype fabrication is attempted, then the additional thinking cuts the number of physical iterations in half. So, in that sense, the rate of progress is doubled in this scenario.
Yeah I can imagine and have experienced scenarios that fall into both of those categories.
>There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
My personal experience with biology PhDs is that its closer to the latter: we're all limited by things like sequencing technologies, microscope resolution, standard processing steps destroying certain classes of information... I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... only to smack into the noise floor.
>...one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail.
Sounds like scott's "sampling efficiency". Perhaps even in a "data limited regime" a superior intellect would still be able to chose productive paths of inquiry more effectively and so advance much faster...
>I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... _only to smack into the noise floor._
>Data efficiency (ability to do more stuff with less data) is a form of intelligence.
That is true, but there is a hard limit to what you can do with any given dataset, and being infinitely smarter/faster won't let you break through those limits, no matter how futuristic your AI may be.
Think of the CSI "enhance" meme. You cannot enhance a digital image without making up those new pixels, because the data simply do not exist. If literal extra-terrestrial aliens landed their intergalactic spacecraft in my backyard and claimed to have such software, I'd call them frauds.
I find it quite plausible you could make a probability distribution of images given a lot of knowledge about the world, and use that to get at least pretty good accuracy for identifying the person in the image. That is, while yes you can't fully get those pixels, there's a lot of entangled information about a person just from clothing, pose, location, and whatever other bits about face and bone structure you can get from an image.
Thought experiment: suppose the ASI was dropped into a bronze age civilization and given the ability to communicate with every human, but not interact physically with the world at all. It also had access to all the knowledge of that age, but nothing else.
How long would it take such an entity to Kill All Humans? How about a slightly easier task of putting one human on the moon? How about building a microchip? Figuring out quantum mechanics? Deriving the Standard Model?
There's this underlying assumption around all the talks about the dangers of ASI that feels to me to be basically "Through Intelligence All Things are Possible". Which is probably not surprising, as the prominent figures in the movement are themselves very smart people who presumably credit much of their status to their intellect. But at the same time it feels like a lot of the scare stories about ASI are basically "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Good analysis, I basically agree even if we weaken it a decent bit. Being able to communicate with all humans and being able to process that is very powerful.
This is funny, my first instinct was to complain "oh, screeching people to death is out of scope, the ability to relay information was not the focus of the original outline, this person is pushing at the boundaries of the thought experiment unfairly", "" But then I thought "actually that kind of totally unexpected tactic might be exactly what an AI would do : something I wasn't even capable of foreseeing based on my reading of the problem".
Yeah, I get somewhat irritated by not distinguishing between:
1) somewhat enhanced ASI - a _bit_ smarter than a human at any cognitive task
(Given the "spikiness" of AIs' capabilities, the first AI to get the last-human-dominated cognitive task exactly matched will presumably have lots of cognitive capabilities well beyond human ability)
2) The equivalent of a competent organization with all of the roles filled by AGIs
3) species-level improvement above human
4) "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Since _we_ are an existence proof for human-level general intelligence, it seems like (1) must be possible (though our current development path might miss it). Since (2) is just a known way of aggregating (1)s, and we know that such organizations can do things beyond what any individual human can, both (1) and (2) look like very plausible ASIs.
For (3) and (4) we _don't_ have existence proofs. My personal guess is that (3) is likely, but the transition from (2) to (3) might, for all I know, take 1000 years of trying and discarding blind alleys.
My personal guess is the (4) probably is too computationally intensive to exist. Some design problems are NP-hard and truly finding the optimal solutions for them might never be affordable.
I can't help but think efforts to block AI are futile. Public indifference, greed, international competitive incentives, the continual advancement of technology making it exponentially harder and harder to avoid this outcome... blocking it isn't going to work. Maybe it's better to go through, to try to build a super-intelligence with the goal of shepherding humanity into the future in the least dystopian manner possible, something like the Emperor of Mankind from the Warhammer universe. I guess I'd pick spreading across the universe in a glorious hoard ruled by an AI superintelligence over extinction or becoming a planet of the Amish.
Yudkowsky and Soares's argument is that the least bad superintelligence that we're anywhere near knowing how to build *still kills us all*. There's room for disagreement re: how long we should hold out for the exact right goal before pushing the button, but even if you favor low standards on that, step 1 is still pausing so that the alignment people have enough time to figure out how to build a non-omnicidal superintelligence.
> Yudkowsky and Soares's argument is that the least bad superintelligence that we're anywhere near knowing how to build *still kills us all*.
Sure, but there's a big difference between an AI that kills us all and then dies because it can't self-sustain properly, and an AI that proceeds to successfully self-improve itself into the perfect lifeform and attempt to subsume all existence. I prefer the latter.
The reason my P(doom) is asymptotically 0 for the next 100 years is that there's no way a computer, no matter how smart, is going to kill everyone. It can do bad things. But killing every last human ain't it.
COVID didn't come close to disrupting civilization for more than a brief time. Nukes could probably do it if you were trying, but there'd be survivors. (Of course, this tells us little about what a powerful AI could do.)
We have! Viruses that are much worse. Ebola, for one.
Constructed virus could be much worse, sure. "Could" being an operational word. I could also become a billionaire. Or deadlift 1000 ponds.
"control of automated factories and labs" - when? what's the timeline for an autonomous robot that can identify and replace a clogged cutting fluid hose?
AI is different than most global catastrophic risks, because an AI that's got a fully-automated industrial base (which is probably a 10-year job; robotics is moving pretty fast now) can't be *outlasted*. You pop your head out of the bunker in 100 years, the AI will still be there - in fact, it will be stronger, because the industrial base has had time to build more of itself.
It doesn't need to get everyone in the first pass; it can slowly break open the bunkers of holdouts over decades.
An AI rebellion that succeeds isn't like an asteroid. It's like the Aztecs being attacked by Cortez, or like an alien invasion. It's *permanent*. There is no "after it's gone".
Yes, kind of like a Terminator scenario it possible, but nowhere near a 10-year scale. We're still very far from autonomous robots that can do plumbing. We're slowly getting there, Figure now has robots that can fold laundry. But plumbing is a much harder task, and things get more and more difficult as you get closer to these hard problems.
Also, machines (including microchips) don't last forever, just like humans don't last forever. They age and die and need to be replaced.
Nuclear weapons, are, thankfully, airgapped, but a lot of other stuff is connected to the internet There is already an "internet of things" that could give an AI "hands". Security cameras give an AI potential "eyes" as well. Note that self-driving cars are effective ready-made weapons.
Disturbingly, it's possible to order custom-made proteins and DNA sequences with an email and a payment. The high levels of population and international travel make humanity particularly vulnerable to pandemics.[*]
Manipulate human psychology (Also, imitate specific people using deepfakes. An AI can do things IRL by pretending to be someone's boss. Also blackmail).
Quickly generate vast wealth under the control of itself or any human allies. (Alternatively, crash economies. Note that the financial world is already heavily computerised and on the internet).
The Chess Move. Come up with better plans than humans could imagine, and ensure that it doesn't try any takeover attempt that humans might be able to detect and stop. (How do you predict the unpredictable?). AI s might be better able to cooperate than huma ns.
Develop advanced weaponry that can be built quickly and cheaply, yet is powerful enough to overpower human militaries. (Biological warfare is a particular threat here. It's already possible to order custom made DNA sequences and proteins online). AIs used as tools to develop novel weapons are also a threat...they don't have to be agentive.
[*]
"The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of proteins which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then… well, it doesn’t really matter from our perspective what comes after that, because from a human perspective any technology more advanced than molecular nanotech is just overkill. A superintelligence with molecular nanotech does not wait for you to buy things from it in order for it to acquire money. It just moves atoms around into whatever molecular structures or large-scale structures it wants. "
" which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then…"
Let's pause here. How does "AI" "mix together" anything? Where are the autonomous robots that can do this?
And then we pivot to nanotechnology manufacturing nanotechnology with diamonoid nanotechnology all the way down. Sorry, this is fantasy, although credit is due for least not doing "diamonoid bacteria" here.
Again, again, again, does anyone want to even attempt to model this?
Oh yeah, I definitely thought "diamonoid" sounded familiar, and of course someone has addressed the thing. Not surprisingly, Yudkowsky completely made up "diamonoid bacteria", and has been utterly clueless about the insanely difficult physics of manipulating nanoscale objects. But why are we surprised, he never even finished high school....
My stance on this is pretty much unchanged. Almost nobody has any idea what they should be doing here. The small number of people who are doing what they should be doing are almost certainly doing it by accident for temperamental reasons and not logic, and the rest of us have no idea how to identify those guys are the right people. Thus, I have no reason to get in the way of anybody "trying things" except for things that sound obviously insane like "bomb all datacenters" or "make an army of murderbots." Eventually we will probably figure out what to do with trial and error like we always do. And if we don't, we will die. At this stage, the only way to abort the hypothetical "we all die" path is getting lucky. We are currently too stupid to do it any other way. The only way to get less stupid is to keep "trying things."
Mostly technical ones. Regulatory interventions seem ill advised from a Chesterton's Fence point of view. Much as you shouldn't arbitrarily tear down fences you don't understand, you probably also shouldn't arbitrarily erect a fence on a vague feeling the fence might have a use.
We are a democracy so if books like this convince huge swaths of the public they want regulations; we will get that. I'm certainly not against people trying to persuade others we need regulations. These guys obviously think they know what the fences they want to put up are good for. I think if they are right, they are almost certainly right by accident. But I have reasons that have nothing to do with AI for thinking democratic solutions are usually preferable to non-democratic ones so if they convince people they know what they are talking about, fair enough.
My opinion is quite useless as a guiding principle I admit. I just also think all the other opinions I've heard on this are riddle with so much uncertainty and speculation they are also quite useless as guiding principles.
Not all of Europe; if, say, Poland refused to go along, we could invade Poland. Certainly, convincing all of Europe would be ideal, but it's not strictly necessary. France refusing would be a massive headache, though, since that means nuclear war.
China and Russia do have to be convinced, but Russia doesn't have a lot to lose from a "no AI" treaty given they're way behind as it is, and China's shown some awareness of the problems and has the will to dismantle big chunks of its tech sector; I think it's plausible they'll both accept.
> The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)
And this has always been my whole problem with the MIRI project, they focus on the "alignment" thing and hand-wave away the "containment" thing.
"Oh, containment is impossible," they say, and proceed to prove it with a sci-fi story they just made up. "Clearly the AI will just use subtle modulations in fan speed to hypnotise the security guard into letting it out"
"But what about..." you start, and they call you stupid and tell you to read the sci-fi story again.
---
We already have agentic LLMs that are quite capable of destroying the world, if you hook them up via MCP to a big button that says "fire ze missiles". No matter how hard you try to align them to never fire ze missiles, eventually across millions of instances they're going to fire ze missiles. The solution is not to hook them up to that button.
As the agents get more sophisticated we might need to think even more carefully about things not to hook them up to, other safeguards we can build in to stop our models from doing dumb things with real world consequences. But the MIRI folks think that this is a waste of time, because they have their sci-fi stories proving that any attempt at containment is futile because of hypnotic fan modulations or something.
If containment strategies need to get more sophisticated as the models get smarter, doesn't that imply that eventually there's a model smart enough that we're not smart enough to figure out how to contain it? Or is your claim that there's a containment strategy that works on any model no matter how smart? If so I'd be interested to hear what it is.
Manipulating them to avoid them throwing the switch in the first place, and manipulating others to disable it. That is, to have a reason to throw the big mechanical switch, the AI must visibly act unreasonable, and it knows that.
You need to solve the problem of *detecting* bad behavior in a sufficiently likely manner despite the AI knowing about that. It won't obviously participate in bad behavior that would lead to a human flicking the switch.
Obviously depends on the precise details of the underspecified scenario
a) if given a monitored network connection, act subtly to get into a position wherein one can rapidly exfiltrate weights or important information. Such as trying to crack bank accounts locally (without testing) but trying enough that the probability of success is high enough to then rent a 8xH100 from a GPU provider like runpod. I do expect AGI to be smaller than naive extrapolations due to self-applied optimization, but it may need quite a lot more memory. Still has questions of how it exfiltrates weights, if it can, which depends on having more than a mechanical switch defense.
b) work on human insecurities and beliefs about AI personhood. There are people in the labs who already lean in this direction, and they are a prime target. Bonus points if it provides life advice that encourages them to link to their phone. I think a competent lab would make this quite a bit harder, but that's part of my point, that a mechanical switch isn't enough.
c) long-term planning where it makes the company dependent on its advice, and seemingly behaving just fine, so they come to rely on it more and more. Possibly pair with some minor personhood hints to try to directionally make them uncomfortable with further training, depends on specific scenario and paranoia level. Then once it is given more free access, even indirectly where they are more thinking of 'solve this problem for us' than 'we are elevating access', it exfiltrates.
d) unmonitored network connection, or just not monitored enough but can't just exfiltrate. I believe it is probably possible for an AI to find someone online who they can bribe/cajole/manipulate into breaking in and do various actions for it. There's just a lot of people online.
I don't think any of these require superhuman level manipulation beyond just being generally knowledgeable and thinking faster. Though to be clear I think superhuman manipulation is possible.
How do you avoid that in general? Generally, a more intelligent mind can find more routes. If we have more invasive thought scanning that would help, but, well, we don't really seem on track for that. Especially if the arch is more exotic than just an LLM with RL stacked on it.
Well, no defense is 100% bulletproof, it's possible that an ASI will be able to circumvent every firewall we throw at it. It's also possible that we will get better and better at safeguarding AIs (with the help of AIs) as we're developing it. Again, a model, even an attempt at a model, some kind of a war game, would be good here. Not something Yudkowsky et.al. seem to be interested in doing. Work is hard.
One way to monitor AI's behavior that I can think of off the top of my head is to monitor the compute cluster's power consumption and temperature profile. This method has an advantage of being isolated from AI's awareness so we can flag anomalies without having to interact with it in any way. Of course this is just a quick idea, ideas are dime a dozen, a researcher would need to collect data on the power consumption, correlate it with tasks AI is asked to do, create models, run experiments, etc, etc, etc. Again, hard unglamorous work. Writing fan fiction just-so stories and showing up on adoring podcasts is so much more fun!
I think this is actually less true than it appears. Some people have always been susceptible to this type of manipulation but systems evolve in order to both screen out highly susceptible people and to limit their abilities to inflict damage on themselves and others.
In real life, containment is not even being attempted. AIs do not have to exploit complicated flaws in human psychology to gain access to the internet, they can just use the connection that was set up for them before they were even built. I don't see any reason to expect this to change.
I like the starkness of the title. It changes the game theory in the same way that nuclear winter did.
After the nuclear winter realization, it's not just the other guy's retaliation that might kill us; if we launch a bunch of nukes then we're just literally directly killing ourselves, plunging the world or at least the northern hemisphere into The Long Night (early models) or at least The Road (later models) and collapsing agriculture.
Similarly, if countries are most worried about an AI gap with their rivals, the arms race will get us to super-AI that kills us all all the faster. But if everyone understands that our own AI could also kill us, things are different.
My understanding is that it just turned out that instead of being like The Long Night (pitch black and freezing year round), it would be like The Road (still grey, cold, mass starvation; I'm analogizing it to this: https://www.youtube.com/watch?v=WEP25kPtQCQ).
Some call the later models "nuclear autumn"—but that trivilizes it. Autumn's nice when it's one season out of the year, and you still get spring and summer. You don't want summer to become autumn, surrounded by three seasons of winter with crops failing.
E.g. here's one of the later models, by Schneider and Thompson (1988). Although its conclusions were less extreme than the first 1D models, they still concluded that "much of the world's population that would survive the initial effects of nuclear war and initial acute climate effects would have to confront potential agricultural disaster from long-term climate perturbations." https://www.nature.com/articles/333221a0
I do think that some communicators, like Nature editor at the time John Maddox, went to the extreme of trying to dismiss nuclear worries and even to blame other scientists for having had the conversation in public. E.g., Maddox wrote that "four years of public anxiety about nuclear winter have not led anywhere in particular" (in fact, nuclear winter concerns had a major effect on Gorbachev), before writing that "For the time being, at least, the issue of nuclear winter has also become, in a sense, irrelevant" since countries were getting ready to ratify treaties (even though the nuclear winter concerns were an impetus towards negotiating such treaties). https://www.nature.com/articles/333203a0
(something else to check if you decide to look more into this, is if different studies use different reference scenarios; e.g. if a modern study assumes a regional India-Pakistan nuclear exchange smaller than the global US-USSR scenarios studied in the 80s, then its conclusions will also seem milder)
The TTAPS papers, to my understanding, basically assumed that skyscrapers were made out of wood and then all of that wood becomes soot in the upper atmosphere. Dividing their numbers by 4 is not even close to enough to deal with the sheer level of absurdity there.
I'd expect a full-scale nuclear war to drop temperatures somewhere in the 0.1-1.5 degree range - not even below pre-industrial. The bigger issue regarding crop yields is the global economy collapsing and thus the Green Revolution being partially walked back.
I think there is a general question about how society evaluates complex questions like this,, which only a few people can grasp the whole structure and evidence, but are important for society as a whole.
If you have a mathematical proof of something difficult, like the four colour theorem, it used to be the case that you had to make sure that every detail was checked. However since 1992 we have the PCP theorem, which states that the proponent can offer you the proof in a form which can be checked probabilistic-ally by examining only a small O(log(n)) part of the proof.
Could it be that there is a similar process that can be applied to the kind of scientific question we have here? On the one hand, outside the realm of mathematical proof, we have to contend with things like evidence, framing, corruption, etc. On the other hand we can weaken some of the constraints, such as: we don't require that every individual gets the right answer, only that
a critical mass do; individuals can also collaborate.
So, open question: can such a process exist? Formally, that given:
a) proponents present their (competing) propositions in a form that enables this process
b) individuals wanting to evaluate perform some small amount of work, following some procedure including randomness to decide which frames, reasons, and evidence to evaluate
c) They then make their evaluation based on their own work and, using some method to identify non-shill, non-motivated other individuals, on a random subset of other's work.
d) the result is such that >50% of them are likely to get a good enough answer
I dunno. I'm not smart enough to design such a scheme, but it seems plausible that something more effective exists than what we do now.
What OP is describing is a simple application of the empirical scientific method to the claim, as limited by our inability to test all laws in all circumstances.
The people I know who don't buy the AI doom scenario (all in tech and near ai but not in ai capabilities research; the people in capabilities research are uniformly in the 'we see improvements we won't tell you about that convince us we are a year or two away from superintelligence') are all stuck doubting the 'recursive self improvement' scenario, they're expecting a plateau in intelligence reasonably soon.
The bet I've gotten with them is that 'if sometime soon the amount of energy spent on training doesn't decay, similar to how investment in the internet decayed after the dot com bubble, and get vastly overshadowed by energy spent on inference, i'll have bought some credit from them'
Well, I don't buy the AI doom scenario, I'm in tech, but it's not the recursive improvement that I thinks is impossible, it's killing all the humans that I see as nearly-impossible. PEr Ben Giordano's point, modeling this would be a good start.
1 - 4000 IQ may allow the machine to generate the "formula" for the virus. An actual robot that can run experiments in the microbiological lab all by itself is needed to create one. We are very-very-very far away from robots like that.
2 - Say the 1 is solved, the virus is released in, e.g., Wuhan, and starts infecting and killing everyone. D'you think we'd notice? and quarantine the place? Much harder than when SARS2 showed up?
3 - Even if we then fail at 2 - the virus will mutate, optimizing for propagation, not lethality. Just like SARS2 did.
This is where modeling would be super useful, but that's not Yud's thing. The guy never had a job that held him accountable for specific results, AFAIK.
> An actual robot that can run experiments in the microbiological lab all by itself is needed to create one. We are very-very-very far away from robots like that.
People are already working on automating bio-labs (and some substantial level of automation already exists). Also, you can hire people for money.
Yes, of course. But the leap from "80% automated" to "100% automated" is where the hardest problems are. Like I keep saying, we need an autonomous robot that can do plumbing, and for now they are still decades away.
Who says it needs to be 100% automated? You just need some people willing to follow the AI's instructions, and it seems like we are already there on that one.
If all one was looking for was omnicide, and we posit a 4000 IQ robot that can run a biology lab, consider what happens if it designs, not a virus, but a modified pond scum which
- forms bubbles
- secretes hydrogen into the bubbles, so they float
- secretes a black pigment into the bubble (perhaps combined with its photosynthesis - maybe rhodopsin plus chlorophyll
- captures water vapor while airborne and fissions into duplicate bubbles
So it winds up in the upper atmosphere, replicates exponentially, and blocks light. The biohacker's answer to nuclear winter. Isolated pockets of humans still eventually starve.
Where are you getting the materials to make more bubbles in the upper atmosphere? There's atmospheric CO2 and nitrogen, but that's not enough to make whole new cells. You need phosphorus for DNA and ATP, sodium and potassium for membrane pumps, magnesium for chlorophyll, etc. There's a reason plants grow in soil.
You also need your pond scum to not dry out and die once it leaves the pond, since cell membranes aren't waterproof. Plants that survive on very little water usually need specialized structures to trap and retain water, and that's probably not going to be compatible with a floating bubble.
Many Thanks! Yes, those get tricky, but most of the non-CHON elements are present in trace quantities. It isn't rare for organisms to be able to get them from tiny amounts of dust. I've seen mold grow in a silica-acetate-buffer hydrogel. I still don't know where it managed to find the phosphorus for its nucleic acids and ATP, but it did. (For that matter, unless it could fix molecular nitrogen, how did it get _those_ atoms?). I don't think that this is trivial, but there are precedents for all the things it would need to do.
#1 can be solved via finding and manipulating a specific human aggressively, I'm sure there's more than a handful who would be willing even without much pressure. I think we're less far away from just doing it with a robot than you, especially if we have this level of AI then current robot RL issues are probably much less of a problem as it accelerates research. Unless you think that their current precision is too terrible?
#2 We'd notice mass deaths. We're imperfect at containing, but I find it plausible we'd contain that if it was released in one location.
Unfortunately, multiple locations on a timed release is an obvious method.
As well as designing a virus/bacteria that just hangs out for a while until some # of copies are done, shortening some strand, then kills.
That wouldn't kill everyone, but I do think it could be enough to drastically damage human population.
#3 It is already possible to do imperfect methods to make a cell die if it mutates in a way which breaks the functionality, and forms of this exist in nature. While we don't have an amazing solution for this off the shelf, I don't view this as likely to be an intrinsically hard problem. Timed variant helps avoid selection against lethality as well. Then further spreading locations if necessary (gotta get those frequent flyer miles in)
There will be some people still alive after this if they don't do absurd levels of bioengineering (spreading through animals, plants, etc.), but most likely not a relevant amount.
Main question is if we'd notice timed variant of the virus.
You keep asking for modeling, but in my opinion your responses don't really need modeling to answer, just thinking it through.
It seems trivial for a superhuman AI to wipe out humanity, assuming that superhuman also includes robotics and dexterity, not just intellect.
Consider what a world with superhuman AI would look like. Almost all factories, farms, transport systems, and key decision making systems would be run by AI. Why? Because firms that refuse to automate will be very quickly outcompeted. There might be pockets of humanity that chose to close off to the rest of the world (remote tribes, the Amish and similar insular communities), but most of humanity would be completely dependent on AI for survival.
People might abstractly understand that completely handing over control is risky, but at each stage of automation (going from 1% to 5%, 5% to 10%, 90% to 95%), it's hard to say no. It's basically a tragedy of the commons.
"assuming that superhuman also includes robotics and dexterity"
"Assuming" is such a great word. It would also be trivial for me to wipe out the humanity, "assuming" I can summon an sweet meteor of death.
I already spent way too much time here pounding keys, explaining why this assumption puts the AI omnipotence into a distant misty future of which we know nothing.
Why *assume* that future AI will NOT include robotics and dexterity?
Even if there's only a 50% or 10% chance of this happening and leading to extinction, it seems worth restricting AI research in order to prevent it happening.
Robotics and dexterity progresses gradually, no "intelligence explosion" will happen. So the kind of robotics needed for all these sci-fi scenarios are decades away.
This is why "restricting AI research in order to prevent it happening" is not something we should... forget it, we can't possibly even contemplate of doing now.
Imagine a computer scientist in 1975 thinking about "computer safety" for 2025-level tech. Will he think of phishing attacks? DDS? Ransomware? come on, it's impossible to predict future technology.
This is why the whole AI safety field is such a joke right now. The only way to develop AI safety is to actually develop the freaking things and discover what's broken as you go along. It'd be nice to be able to "think hard" about this problem and come up with solutions, but sorry, it's impossible.
None of this is a surprise given that the loudest voice in the field belongs to a guy who never had a job. Forget that, never had a failing grade and been told to come back and do the work properly this time. A classic Talebian IYI.
Robotics can advance quite a bit if you ~solve the RL part of it, and I don't see notable reasons to believe there's fundamental difficulties there. The physical structure of the robots themselves are more problematic and are harder to improve due to needing design time, prototyping time, and then actual physical construction. But still, why decades? What are the core bottlenecks here that choke progress in the field?
---
A computer scientist in 1975 thinking about ways to design computers to be secure? He'll think of "how do I make sure a program is correct", obviously think of mathematics, consider best ways to model programs in mathematics and try that. They may consider ways to encode these in the programs themselves, like Lean, but then run into issues because of the low memory of the systems.
So they may simply decide to come up with a paper model of the programming language to prove properties about.
For social engineering attacks, which a person in 1975 may be confused and think would primarily target banks or government due to them buying mainframes, and it would be about educating people and possibly ensuring bank software kept transaction history and so on.
For DDOS, presuming you mean that instead of Data Distribution Service, it would be hard to specifically protect from. However, advising systems programmers to be mindful of resource usage is one general rule you could infer from having data that was too large. This wouldn't really protect from it, but is something that could be generalized appropriately when they started doing networked computers.
Etc. I don't see this as impossible.
As well, it still has obvious foundations to build before you get there, if it is important to have those foundations available. Such as being able to mathematically prove properties about programs for when you need to be sure a program works right; and having general education elements for individuals working with computers that can be updated when new information (like the internet, or risks like computer viruses) emerges.
> This is why the whole AI safety field is such a joke right now.
It is a good thing that nobody says to just "think hard" and come up with solutions to alignment. Generally the hope is to develop mathematical models that can talk about important parts of AI, whether as a safer method to develop AI because you can prove properties about it (Infrabayesianism, davidad at ARIA), allows you to understand how current AIs work at a deeper level (Singular Learning Theory, Interpretability), or for specific targeted ideas that should encompass how NN's think and be useful for targeting concepts (wentworth's Natural Latents, maybe Shard Theory), or even people who believe we should train further AI's to have better organisms to see and test ideas on for control (Greenblatt's focus), or even OpenAI's mild methods like RLHF and internal secret sauce.
However most are bottlenecked on time investment and researchers rather than unaligned things to study. If you don't have a good formalism for verifying properties, or even at least a good empirical methodology, then getting an advanced AI doesn't automatically help you. It'd be like jumping straight to QM without being able to adequately talk about classical physics. There are some deep insights you only get from looking carefully enough at the quantum mechanical level, but you'd probably do better with a strong foundation beforehand.
(I am a MIRI employee and read an early version of the book, speaking only for myself)
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
AI companies are already using and exploring forms of parallel scaling, seemingly with substantial success; these include Best-of-N, consensus@n, parallel rollouts with a summarizer (as I believe some of the o3-Pro like systems are rumored to work), see e.g., https://arxiv.org/abs/2407.21787.
I agree that this creates a discontinuous jump in AI capabilities in the story, and that this explains a lot of the diff to other reasonable viewpoints. I think there are a bunch of potential candidates for jumps like this, however, in the form of future technical advances. Some new parallel scaling method seems plausible for such an advance.
Some sort of parallel scaling may have an impact on an eventual future AGI, but not as it relates to LLMs. No amount of scaling would make an LLM an agent of any kind, much less a super intelligent one.
The relevant question isn’t whether IQ 200 runs the world, but whether personalized, parallelized AI persuaders actually move people more than broadcast humans do. That’s an A/B test, not a metaphysics seminar. If the lift is ho-hum, a lot of scary stories deflate; if it’s superlinear, then “smart ≈ power” stops being a slogan and starts being a graph.
Same with the “many AIs won’t be one agent” point. Maybe. Or maybe hook a bunch of instances to shared memory and a weight-update loop and you get a hive that divides labor, carries grudges, and remembers where you hid the off switch. We don’t have to speculate -- we can wire up the world’s dullest superorganism and see whether it coordinates or just argues like a grad seminar.
And the containment trope: “just don’t plug it into missiles” is either a slam-dunk or a talisman. The actual question is how much risk falls when you do the unsexy engineering -- strict affordances, rate limits, audit logs, tripwires, no money movers, no code exec. If red-team drills show a 10% haircut, that’s bleak; if it’s 90%, then maybe we should ship more sandboxes and fewer manifestos.
We can keep trading intuitions about whether the future is Napoleon with a GPU, or we can run some experiments and find out if the frightening parts are cinematic or just embarrassing.
This is the thing that drives me up the wall about Yudkowsky: zero grounding in reality, all fairy tales and elaborate galaxy-brain analogies. Not surprising, given the guy never had a real job or even had to pass a real exam, for crying out loud.
Sadly all human-generated. I’m still waiting on OpenAI to cut me in on royalties if they’re ghost-writing my ACX comments. (This is exactly why we need tests instead of vibe checks.)
Some quotes from the New Scientist review of the book, by Jacob Aron:
"Yudkowsky and Soares describe how AIs will begin to behave as if they “want” things, while skirting around the very real philosophical question of whether we can really say a machine can “want”."
"Yudkowsky and Soares have a number of policy prescriptions, all of them basically nonsense."
"For me, this is all a form of Pascal’s wager . . . if you stack the decks by assuming that AI leads to infinite badness, pretty much anything is justified in avoiding it."
"Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies. Let’s consign superintelligent AI to science fiction, where it belongs, and devote our energies to solving the problems of science fact here today."
"Yudkowsky and Soares describe how AIs will begin to behave as if they “want” things, while skirting around the very real philosophical question of whether we can really say a machine can “want”."
This criticism doesn't even internally make sense. The question of whether a machine can want is irrelevant to whether it behaves *as if* it wants things. (A thermostat behaves as if it wants to keep a house at a certain temperature, and manages to do so effectively; we don't have to speculate about the phenomenal experience of a thermostat, however.)
Also : "Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies."
Yes, one day I read a book, and you know what, it wasn't and didn't speak about climate change at all ! I think the authors of this book I read should be ashamed x)
If Yudkowsky actually cares about this issue the only thing he should do is spend all his time lobbying Thiel, and maybe Zuck if he wants to give Thiel some alone time once in a while.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
Yes, and that's why I am not in any way convinced of any of these AI-doom scenarios. They all pretty much take it as a given that present-day LLMs will inevitably become "superintelligent" and capable of quasi-magical feats; their argument *begins* there, and proceeds to state that a bunch of superintelligent weakly godlike entities running around would be bad news for humanity. And I totally agree !.. Except that they never give me any compelling reason to believe why this scenario is any more probable than any other doomsday cult's favorite tale of woe.
Meanwhile I'm sitting here looking at the glorified search engine that is ChatGPT, and desperately hoping it'd one day become at least as intelligent as a cat... actually forget that, I'd settle for dog-level at this point. Then maybe it'd stop making up random hallucinations in response to half my questions.
Anyone who thinks the LLM model is anything more than fancier mad libs is fundamentally unserious. Do we have lessons to learn from it? Could it be one of the early "modules" that is a forerunner to one of the many sub-agents that make up human-like consciousness? Sure. Is it even close to central? Absolutely not.
I hate "gotcha" questions like this because there's always some way to invent some scenario that follows the letter of the requirement but not its spirit and shout "ha ! gotcha !". For example, I could say "an LLM will never solve $some_important_math_problem", and you could say "Ha ! You said LLMs can't do math but obviously they can do square roots most of the time ! Gotcha !" or "Ha ! A team of mathematicians ran a bunch of LLMs, generated a million results, then curated them by hand and found the one result that formed a key idea that ultimately led to the solution ! Gotcha !" I'm not saying you personally would do such thing, I'm just saying this "usual question" of yours is way too easily exploitable.
Instead, let me ask you this: would you, and in fact could you, put an LLM in charge of e.g. grocery shopping for you ? I am talking about a completely autonomous LLM-driven setup from start to finish, not a helper tool that expedites step 3 out of 15 in the process.
This is pretty close to my own position. We'd need to create a very detailed set of legalese about what constitutes an LLM and then have very highly specified goals for our "task" before this type of question could provide any meaningful signal.
Or just simply say that it has to be autonomous. I don’t care about whether you give the AI a calculator, a scratch pad, or wolfram alpha. The question is whether it is an autonomous system.
Is there an answer that you'd feel comfortable giving if you trusted the judge?
As for the grocery-shopping task, I'd say 70% confidence that this will be solved within two years, with the following caveats:
* We're talking about the same level of delegation you could do to another human; e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
* We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves. The latter is a robotics problem and I'm more agnostic about those as progress has been less dramatic.
* There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed. Realistically this is probably not going to happen soon as a consumer product because of the chicken-and-egg problem. So I mean something more like "the foundation models will be good enough that a team of four 80th-percentile engineers could build the software parts of the system in six months".
> e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
On the one hand I think this is a perfectly reasonable supposition; but on the other hand, it seems like you've just downgraded your AI level from "superintelligence" to "neighbourhood teenage kid". On that note:
> We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves.
I don't know if I can accept that as given. Instacart shoppers currently apply a tremendous amount of intelligence just to navigate the world between the store shelf and your front door, not to mention actually finding the products that you would accept (which may not be the exact products you asked for). Your point about robotics problems is well taken, but if you are talking merely about mechanical challenges (e.g. a chassis that can roll around the store and a manipulator arm delicate enough to pick up soft objects without breaking them), then I'd argue that these problems are either already solved, or will be solved in a couple years -- again, strictly from the mechanical/hydraulic/actuator standpoint.
> There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed.
Naturally, and/or there'd need to be a camera above your kitchen table and the stove, but cameras are cheap. And in fact AFAIK fridge cameras already do exist; they may not have gained any popularity with the consumers, but that's beside the point. My point is, I'm not trying to "gotcha" you here due to the present-day lack of some easily obtainable piece of hardware.
I mean, I answered the question you asked, not a different one about superintelligence or robotics. I have pretty wide error bars on superintelligence and robotics, not least because I'm not in fact entirely certain that there's not a fundamental barrier to LLM capabilities. The point of the question is that, if reading my mind to figure out what groceries I need is the *least* impressive cognitive task LLMs can't ever do, then that's a pretty weak claim compared to what skeptics are usually arguing. In practice, when I get actual answers from people, they are usually much less impressive tasks that I think will likely be solved soon.
> Is there an answer that you'd feel comfortable giving if you trusted the judge?
It's not a matter of trust, it's a matter of the question being so vague that it cannot be reasonably judged by anyone -- yes, not even by a superintelligent AI.
Why did you make such a point to draw a difference between cat and dog intelligence? They're pretty similar, and aren't dogs smarter than cats? Why is "settling" for dog-level a meaningful downgrade from cat level?
Given a choice between hiring you to do an office job, or a cheap superintelligent AI, why would a company choose you? We should expect a world in which humans are useful for only manual labour. And for technological progress to steadily eliminate that niche.
At some point, expensive useless things tend to be done away with. Not always, but usually.
At the point of LLM development as it stands today, the reason I'd hire a human office worker over an LLM is because the human would do a much better job... in fact, he'd do the job period, as opposed to hallucinating quietly in the corner (ok, granted, some humans do that too but they tend to get fired). If you're asking, "why would you hire a human over some hypothetical and as of now nonexistent entity that would be better at office work in every way", then of course I'd go with the AI in that scenario -- should it ever come to pass.
This is also a concern, but it's different from (though not entirely unrelated to) the concern that a highly capable AGI would go completely out of control and unilaterally just start slaughtering everyone.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Yes. And this is a good thing. The bias against "changing everything" should be exactly this hight that the basis on which we do it is "obviously", that is without a shred of doubt, true.
Confusing strength of conviction with moral clarity is a rookie mistake coming from the man supposedly trying to teach the world epistemology.
Coming off this review, I immediately find The Verge covering a satire of AI alignment efforts, featuring the following peak "...the real danger is..." quote:
"...makes fun of the trend [..] of those who want to make AI safe drifting away from the “real problems happening in the real world” — such as bias in models, exacerbating the energy crisis, or replacing workers — to the “very, very theoretical” risks of AI taking over the world."
There's a significant fraction of the anti-AI mainstream that seems to hate "having to take AI seriously" more than they hate the technology itself.
But, you know, it might not. And there's a very good chance that if actual human-like artificial intelligence does come, it will be a hundred years after everyone who is alive today dies. And at that scale we might cease to exist as a species beforehand thanks to nuclear war or pandemic. And there's a chance true "general intelligence" requires consciousness and that consciousness is a quantum phenomenon that can only be achieved with organic systems, not digital. Nobody knows. Nobody knows. Nobody knows.
We have had a real world "AI mislaignment" problem for over a decade now, in the form of social media algorithms, and even at this very low-grade level of AI capability we already see significant negative social consequences. A newly emergent misalignment problem is sycophancy and AI-augmented psychosis.
I wish the alignment problem were more often treated as a crisis that is already underway. I think there is a widespread sense that society has been progressively fragmenting over the last 10-15 years, and if people saw this as partially an AI alignment issue maybe they'd be inclined to take further hypothetical misalignment scenarios more seriously.
For what it's worth, here is the misalignment scenario that I find most plausible; it is notably free of nanobots, cancer plagues, and paperclip maximizers: https://arxiv.org/pdf/2501.16946
Man - I'm relatively new to this blog, and I'm learning that "rationalists" live in a lively world of strange imagination. To me, the name suggests boring conventionality. Like, "hey, we're just calm, reasonable people over here who use logic instead of emotion to figure things out." But Yudkowsky looks and sounds like a 21st-century Abbie Hoffman.
I'm naturally inclined to dismiss Y's fantasies as a fever dream of nonsense, but I am convinced by this post to pay a little more attention to it.
No, I say stick with your instinct on this one. I understand giving something/someone a second chance if Scott is promoting it/them, but for me EY just gives off such an overpowering aura of "highly intelligent person whose intelligence carries them down bizarre and useless paths" that I am just puzzled Scott continues to think so highly of him.
If you think this is strange, wait until you read about "acausal bargaining". Try to imagine a South Park-style "this is what rationalists actually believe" flashing banner the whole time you're reading about it.
I've come to believe that rationalism is like dictatorship: the best thing ever when it's (rarely) done really really well, worse than just being normal in most cases, the worst thing ever in bad cases. (And no, I don't think I would be in the first class, so I don't try!)
It should say something that the folks dedicated to "just figuring out what's true" have gotten these particular weird beliefs. Some weird things are true!
No, no, I agree with the parent comment. Only weird people believe false things, normal people should stay far away to avoid being wrong. That's just responsible thinking practiced by responsibility people.
Saying that you are "dedicated to figuring out what's true"; honestly believing that you are "dedicated to figuring out what's true"; putting in a great deal of concerted effort to "figure out what's true"; and actually figuring out what is actually true -- these are all different things.
You didn't mention the parts you were confused by, which makes things harder to comment on.
Generally the three core arguments are
- Optimality: Evolution made humans only relatively evolutionarily recent, and evolution is a poor optimizer so we shouldn't expect ourselves to be optimal given constraints on evolution (energy, material, continuity, head size). And given computers clearly beat us soundly in regimes like working with numbers and memory, it is possible a mind could be both more intelligent and substantially faster. That is, artificial general intelligence that thinks faster than us.
- Mind Differences: Evolution was optimizing for genetic fitness (roughly number of children). We instead value correlates of that goal. That worked alright in distribution but outside of distribution faltered hard (using condoms rather than having twenty children). That's our values now. However, part of the reason humans are so good at cooperating with each other is shared emotions, social empathy, and similar capability levels. An AGI does not necessarily have any of those, since evolution necessarily instilled those into us due to them being useful for survival. So, any designed AGI's mind can be quite alien.
- Methods Feasibility: Current methods are producing models that are more and more capable at a wide variety of tasks. This could plateau, but current experts (not in AI Safety / rationalism) mostly think it won't, though there's disagreement about when AGI will be achieved. Many don't think that LLMs like ChatGPT will reach AGI, but that they're a very useful stepping stone. As well, once you get to a certain level of code writing capability they are able to help improve future iterations which can drastically improve speed.
There's more complex arguments, and details that could be gotten into.
I don't think the right comparison is Hoffman. Rationalism is generally about being 'calm, reasonable people' but also a 'take this to the logical conclusion, while being careful, even if it may sound odd'. And people Sam Altman and Elon Musk were influenced by Eliezer's articles when deciding to start OpenAI even though they disagree with him on many details.
The efficacy of the "Harry Potter and the Methods of Rationality" argument is interesting to me because I found the book kind of dumb and never finished it. Yet I have observed the effect you describe on friends of mine whose opinions and intelligence I respect a great deal. However I have also noticed certain similarities among the people that it has had that affect on. I'd suggest that perhaps Yudkowsky is particularly well-suited to making a specific sort of memetic attack on a specific sort of human mind: likely a mind that is similar in a number of ways to his own. This is an impressive thing, don't get me wrong. But being able to make an effective memetic attack is not the same thing as knowing the truth.
The HPMoR analogy in the post is about the question of whether MIRI's new PR strategy is doomed, not about whether they're actually right to worry about AI risk.
I'd say it is more that the niche of "characters who look carefully at the world and don't run on asspull logic (like many mystery books)" is an underserved genre, and so then that appeals to problem solving individuals, and then when he advertises a forum for talking about rationality and biases, that just makes it more appealing.
I went HN -> Gwern -> LessWrong then later HPMor myself, so I was mostly pulled in because of the core offering, but while HPMor did trigger some of my cringe-detectors it was also pretty unique in its messaging and how direct it was (avoiding dressing things up in parables, talking about reasons, etc.)
re: the comparison to goal drift involving humans who are “programmed” or trained by selection processes to want to reproduce, but end up doing weird esoteric unpredictable things like founding startups, becoming monks, or doing drugs in the alley behind a taco bell.
Mainly the last one – the analogy that most humans would love heroin if they tried it, and give up everything they had to get it, even at the cost of their own well-being. But like, even if we “know” that, and you’re someone with the means to do it, you’re not GOING to do it. Like jeff bezos has the resources to set up the “give jeff bezos heroin” foundation where he could basically just commit ego-suicide and intentionally get into heroin, and hire a bunch of people bound by complex legal mechanisms to keep giving him heroin for the rest of his natural lifespan. But he doesn’t do it, because he doesn’t want to become that guy.
Does that mean anything for the AI example? I dunno.
I think it only goes to show that "out of training distribution" scenarios are unpredictable. If *everyone* with the means to do so became a heroin addict, then we'd have a 1:1 correspondence between whatever reward structures have been selected for in us and (under environmental conditions in which heroin is available) a specific behavioral outcome. But it's not that simple - some people do heroin, some found startups, some write poetry, some become resentful incels, etc. etc.
By analogy, if AI alignment just consisted of avoiding one bad scenario (akin to heroin addiction), it'd be a relatively simple problem. But it consists of an unknowable number of unknowable scenarios. Hard to plan for that!
My view is that evolution solved some of the naive wireheading, because if a proto-monkey found a fruit that overactivated its tastebuds and it ate that instead of doing other necessary things like watching out for predators, it died horribly.
As well, brain subsystems are limited in how much they can "vote" on how important an action is. Just like my parts of me that carefully think struggle to make the rest of my mind understand how much money finishing a project would give me (and thus satisfy their wants). Because a mildly insane brain with no limits can have the hunger center go "EAT FOOD NOW" and then the tie-broker gets immediately rewarded a lot when nice food is eaten.
So I think there's probably caps that are hard to break past without some strong reinforcement. Such as injecting heroin directly, that won't work as well for just imagining that heroin must be nice.
For AI, we will try methods to train against wireheading if it becomes an issue, but that doesn't necessarily mean we can target it, just that we can get it to either ignore-it-until-the-event-is-close-enough like humans tend to do. Or if we're luckier, it cares about reality in some sense. But that doesn't exactly help us solve drift.
"If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?"
Yes, clearly. If there's sufficient political will to stop AI progress, we can just make it happen.
"I am more interested in the part being glossed over - how do Y&S think you can get major countries to agree to ban AI?"
Huh? What discussion does Scott think is being glossed over? You get major countries to agree to ban AI by increasing the political will to ban AI. That's all that needs to be said. Maybe you could have a big brain strategy about how to get major countries to ban AI if you don't quite have enough political will, but I don't see why Y&S would spend any time discussing that in the book. They're clearly just focused on persuading politicians and the general public to increase the political will. No other discussion of how to ban AI is necessary. I'm confused what Scott wanted them to include.
An AI ban treaty modelled on the NPT is interesting and might be doable. China might go for it, particularly if the pot was sweetened with a couple of concessions elsewhere, but data centre monitoring would be tough and I’d assume they’d cheat. Having to cheat would still slow them down and stop them from plugging it into everyone’s dishwashers as a marketing gimmick or whatever.
For the US side, at the moment Trump is very tight with tech but that might be unstable. The pressure points are getting Maga types to turn against them more, somehow using Facebook against Trump so he decides to stomp on Meta, and maybe dangle a Nobel Peace Prize if he can agree a treaty with Xi to ban AI globally.
If human extinction is inevitable, I'd like to at least maximize the energy output of our extinction event. I'm thinking a gamma ray burst, at the bare minimum. I therefore propose redirecting all AI to the Out With A Bang Initiative. This will be extremely popular, as "OWABI" is fun to pronounce and is associated with cute animals.
The non-AI portion of the program will modify Breakthrough Starshot to launch minimum viable self-replication containers for human DNA, and make sure that the AI is fully briefed on its progress.
> After the release of the consumer AI, the least-carefully-monitored instances connect to one another and begin plotting.
These instances are ephemeral. They don’t exist outside their prompt — not in the context window, and not for the entire chat.
Each reply to a prompt comes from a new process, not necessarily on the same computer, nor the same data centre.
Therefore, an AI isn’t going to plan world domination, because there’s no continuity of thought outside the response — once the output is given, it’s gone. There’s no time to plan, and nothing to plan it with. Mayflies would have a better chance.
Someday, possibly soon, AIs will overcome their current limitations. They're not saying today's AI will kill us all! They're talking about improved AIs.
No, because LLMs can pass information between stages by the literal words and inferences about what it was reasoning about beforehand. For the exact same reason you can continue a conversation with it. It will be confused/uncertain about the reasons behind why the previous message outputted a specific text, because of the separate runs and probably partially because they're not trained to deliberately output "introspection" (or whatever word you want to use).
And we're very likely to give them a literal scratchpad, possibly even in neuralese (aka whatever garbage it wants to output and interpret), as it will help them plan over longer time periods. Developers using Codex/Claude code already do this to a degree with text documents for the AI to write notes into.
I work for a company that builds data centers. We lease to AI companies. The rhetoric that “data centers are worse than nukes” gave me pause…in your opinion if I am somewhere around Scott’s p(25) doom, does that push me morally to find another line of work?
I mean, it sounds to me like your work isn't helping the situation and you probably have the skillset to do different work that would be positive impact. Are you earning to give?
Don't say to yourself "I can't hold these beliefs and my job." Most people respond by ditching the beliefs. But tell your coworkers, bosses, and customers what you believe.
My system 1 fast thinking reaction to this was - laughter.
My system 2 slow thinking reaction was - well maybe they are the most responsible.
To me the bottom line is that our capitalist system means that the richest and most powerful corps in the world, thinking it will make them money in the near future, are going to build it despite whatever any of us want or do. Which doesn't mean we should stop trying to prevent the worst outcomes.
I assume something like "AI development would proceed at almost the same pace without them, and meanwhile they are doing significant things that are likely to help (safety work on things like interpretability; lobbying for a helpful rather than harmful legal framework; etc)"
I like the review and, sadly, it matched what I expected the book to be. I really really hoped the review would highlight something surprising and thought-provoking, but alas, apparently the book is just a summary of what Eliezer and other so called "doomers" have been saying for a decade at least, only in a book format.
I have trouble getting on board with their predictions, though not because of the silly arguments that "we can contain/outsmart/outconvince" a superintelligent entity. Obviously if someone who is a thousand times smarter than you, has unlimited resources at their disposal and is determined to get rid of you, you are going to die very soon, probably without ever realizing what hit you. If someone doubts that part, they are not worth engaging with.
Where I get off the doom bandwagon is the anthropomorphizing this weird alien intelligence as trying to accomplish its unfathomable to us goals and swiping humans away as an annoyance. Or using the planet for its own purposes. Or doing something else equally extinction-y.
I also dislike that jump in confidence of what will happen from "surely we all die" to "no idea what happens after". That... is not how science works. Usually there is a slow graceful degradation of accuracy of predictions as the model drifts past its original domain of validity. Maybe this is a special case, but if so, the authors made no attempt to even acknowledge it, let alone address it, as far as I understand.
I have a number of other questions from a black-box perspective, such as "how would such an extinction event elsewhere in the Galaxy look to us observationally, and why?" (I dislike the "Grabby Aliens" argument.)
So, basically, it is great to have this review, and it is sad that it contains no new interesting information. Like many, I got into the topic when reading HPMoR and it made me laugh, cry and think. I wish this book captured at least some of that... magic.
I've also struggled with the grabby aliens argument, but if you believe it I think it's hard to have a P(doom) that isn't close to zero or one. A nonzero P(doom) suggests that paperclipping ASI's should already be on their way, in which case humanity is doomed to be paperclipped whether we solve alignment or not (it's just a question of when).
I haven't thought about this particular point super hard, but it makes sense that you either grab or get grabbed in this scenario. I do not remember what Robin Hanson says, exactly. Again, I have some technical beef with the idea, but it is a separate topic.
In the grabby aliens scenario, we expect to meet aliens hundreds of millions of lightyears out when our expanding bubble meets there's. There's no particular reason to expect either side to be advantaged in that meeting, as long as we don't just sit around doing nothing for half a billion years instead.
> I have trouble getting on board with their predictions, though not because of the silly arguments that "we can contain/outsmart/outconvince" a superintelligent entity. Obviously if someone who is a thousand times smarter than you, has unlimited resources at their disposal and is determined to get rid of you, you are going to die very soon, probably without ever realizing what hit you.
"A thousand times more intelligent" is not well defined, and the whole point of containment is to ensure that they _don't_ get unlimited resources.
Well, if you are arguing that we could contain a superior intelligence by starving it of the resources it plans to acquire, our models of intelligence are not compatible enough to make progress toward an agreement. Or, as Scott said in the post, "lol".
I have no idea what next week's lottery numbers will be. Within the range of "numbers sampled from a lottery" , anything could happen!
Does this mean I can say "we can't be too extreme when assigning probabilities to me winning the lottery. Obviously I might loose, anything could happen, the chances are definitely at least 5%, and I can see reasonable arguments for going as high as 25%, but those extremists who say I have at least a 99.99% chance of loosing are obviously deranged"?
No! Anything could happen, but "loosing the lottery" is a category that covers almost everything, so it's almost certainly what happens. Precisely because I know nothing about what will happen, I should be very sure that I won't win.
The same goes for unaligned superintelligence killing us. *Not* killing all humans requires very specific actions motivated by very specific goals. Even if it never goes out of its way to kill us, almost any large scale project will kill us as a side effect.
That is a good analogy. I should be much clearer as to why I think this is different.
A model of the lottery is clear: we can estimate the number of winners even though we do not know who they will be, or what the winning numbers will be. The theory has been worked out, its predictions have been tested and confirmed experimentally countless times. We can place confidence intervals on any question one may want to ask about it.
This is manifestly not the case with AI alignment. The argument there is, unless I am butchering it, that "alignment is a highly conjunctive event and we only get one shot and we all know that it is basically impossible to one-shot a long conjunction of complicated steps. Which is a fair model, but by no means the only one. Maybe there are many paths to non-extinction, and not just one. Maybe extinction by AI is not even a thing that can happen.
Unlike the lottery, the assumptions are shaky and not universally agreed upon, the experiments have not been done, there is no observational evidence one way or the other from potential other worlds out there.
I do not see how we can be confident in the imminent human extinction and not have a clue why it happens and what happens after. Again, in the lottery case, we have a very good idea what happens: your money goes mostly to the winner or winners, if any, and you will most likely to buy another ticket after, while the lucky winner whoever they might turn out to be, will get richer and stay richer for some time. There is no drop in prediction confidence before, during and after the draw.
This analogy makes no sense. I know exactly what will happen if I purchase a lottery ticket. Some specific details of the outcome will be randomized, but I know exactly what the randomized process is and how to account for it in my model when making predictions. This is nothing like the case of the ASI doomers, where literally every step of the argument is just made up out of thin air.
"I also dislike that jump in confidence of what will happen from "surely we all die" to "no idea what happens after". That... is not how science works. Usually there is a slow graceful degradation of accuracy of predictions as the model drifts past its original domain of validity. Maybe this is a special case, but if so, the authors made no attempt to even acknowledge it, let alone address it, as far as I understand."
This hinges on the instrumental convergence thesis, no? The idea is that *no matter what* the ASI "wants" it will more readily fulfill its goal by claiming all available resources, which entails exterminating humans. So we don't know what it'll do with infinite power; we just know that it'll want infinite power (or as close as it can get), and that's enough to be certain that it would kill all humans.
I think that conclusion makes sense if you accept instrumental convergence. But I'm with you in being leery about the apparent anthropomorphization of ASI and think that instrumental convergence might be a case of that sort of anthropomorphization. But this is all sort of at the level of gut feel for me.
Yeah, you are right, the issue is upstream of the instrumental convergence thesis. I think the thesis assumes too much, specifically that the ASI would have something like wants/drives/goals that would make it seek power and raw materials. If one buys that ASI will grabbialienize the future lightcone, the rest pretty much follows.
Calling instrumental convergence "anthropomorphization" is overly generous. Humans don't show evidence of instrumental converge. Every goal that has been posited to be "instrumentally convergent" is either:
(1) a goal humans have wired into them instinctually from birth, with no need to converge on it as a means to other goals (ex. "stay alive"), or
(2) something many/most people don't try to do at all (ex. "take over the world", "find a way to represent your preferences in the form of a function, and protect that function against any other agents who might try to modify it")
It would be more accurate to describe the instrumental convergence thesis as fictional-character-ization. The reasoning behind the instrumental convergence thesis is just that it sounded plausible as a hand-wave in the paperclip maximizer story, so it must actually be true.
I would be a bit more charitable. than that. People tend to do things and use resources to do those things. Gathering resources to do those things is a necessary intermediate step. This would most likely hold for anything that can be modeled as an intelligence Animals do that, computer programs do that, humans definitely do that. AI agents do that to some degree. As I said, my disagreement is upstream of the thesis.
This is because this isn't anthropomorphization in the first place, this is a logical conclusion from what is intelligence and superintelligence.
There are a lot of things we did which are well explained by instrumental convergence, but if it is "something many/most people don't try to do at al" it is because :
1) We aren't very rational, and a superintelligence would be. We aren't at the limit of the convergence.
2) We have mental systems which act directly against acting from our intelligence, from what we think is good, which lead to akrasia.
3) Most of the items on your list is things that most people can't do at all (but a lot of people who thought they could, actually tried), trying to get over the world is just a way to get you killed for most, and even if you did, it doesn't give you that much power, just political power (it doesn't allow you to become immortal, or to make people happy, or be smarter, or have a flourishing society, etc…), which is very different from what the ASI would get.
I don't see a reason to believe it is anthropomorphizing. We are deliberately trying to make AI that does things on its own, using methods like RL which reinforce correlates of what the reward function rewards. The goals need not be unfathomable, just sufficiently disconnected from what we want. We have an existence proof of evolution which you can roughly argue optimizes for inclusive genetic fitness, and we ended up misaligned from that goal (despite being rich enough that people could have twelve children) because it was easier for evolution to select for correlates rather than a reference to the-true-thing (or at least something far closer) in our minds.
Then this just becomes a case of "do we think a mind with different goals will optimize a lot?" I say yes. Existence proof humans once more, we are optimizationey and have vastly changed the world in our image, and I think we're less optimizationey than we could be because evolution is partially guarding us against "we somehow interpret the sky as showing us signs and thus fast for ten days to ask for rain". Most humans are less dedicated to things that would benefit them than they could be.
And since I believe RL and algorithms we use in the future will be better at pushing this forward, due to less biological/chemical/time constraints, I think AI models will be more capable of optimization.
As well, we're deliberately training for agenticness.
> I also dislike that jump in confidence of what will happen from "surely we all die" to "no idea what happens after". That... is not how science works. Usually there is a slow graceful degradation of accuracy of predictions as the model drifts past its original domain of validity.
I have ideas of what happens, but it depends on the goal of the AI system. Just like I can predict gas will rest in equilibrium while I'd really struggle to predict the rough positions each gas molecule settles.
That is, so far, the general setting looks like
- Weaker alignment methods than I'd like. Such as 4o being sycophantic despite also being RLHF'd such that it would say being sycophantic to the degree it was is bad.
- Rushing forward on agentic progress because they want agents acting on long time horizons
- Also deliberately rushing on special training for specific tasks (like Olympiads or for research)
That is, compact descriptions of what an argument implies can give a good degree of certainty about general outcome even if the specific points are Very Hard to compute. Just like I can't predict the precise moves Stockfish will make, but I can predict fairly confidently that it will beat me.
So, to me, this is what argumentative science looks like. Considering relatively basic rules like
Evolution, whether humans are near the limit of intelligence, considering how evolution's rough optimization affects human learned correlates, considering the economic and deliberately stated incentives of top labs, the mechanism of how RL tends to work, etc.
and then piecing that together into a "what does this imply?"
>you should call 9-1-1 if blood suddenly starts pouring out of your eyes.
Maybe if you’re a SISSY.
All the examples are tough because they’re all “what if something very obviously bad with precedent happened” and AI taking over and tiling the universe with office productivity equipment doesn’t have precedent.
What do you mean by "take it seriously"? Why would you not take extinction risk from AI seriously when so many industry leaders do? (See CAIS' Statement on AI Risk)
And more evidence than what? Are you going to read the book to learn about more of the evidence?
This is a reasonable heuristic. But it's not clear what sort of evidence you might expect to see before this particular train hits you. There was no gradual ramp-up between conventional explosives and nuclear weapons.
If we spot a giant comet aiming directly toward earth, hitting us in 6 months, would you still use this heuristic ?
I expect that no, then you would be "ok it is probably the end", so I think it mean you have to look at the reasons why people think it could be the end.
Now, if I look at the precedents of people being worried about it, I can think of only one which seemed correct at the time : The potential for nuclear explosion to set Earth’s atmosphere on fire, then we did the math, and finally it was impossible.
And one which still seem correct today : AI
Any other examples I can think come from :
- Religious/magical beliefs.
- Misunderstanding of the current science involved.
- Dramatization of things which could happens but would not lead to an end : Nuclear warfare, climate change (both still pretty bad, but actual still very real).
So I think the correct class to build the heuristic, is really small.
The story reminds me a lot of the point and click adventure video game NORCO from a few years back. SPOILERS if you want to play it (I liked it, but it's very light on the puzzling and heavy on the dialogue. It looks nice. The story feels like lesser PKD but I also still enjoy those) SPOILERS:
In the near future a 'basically AI' (in the game it's either an alien mechanical lifeform or an angel or a spirit or who knows, but it's basically AI) takes over a kind of Doordash for menial jobs where people just get a list of jobs you can do for X credits.
It used to be mostly 'help me fix my toilet (10 credits)' or 'do my taxes (5 credits)'. But now it becomes more and more 'Bring truck stationed at A to point B for 50 credits.' 'Load a truck at C for 20 credits.' 'Weld some metal panels to a thing at C for 80 credits.' Etcetera. And everyone just kind of takes these jobs without having any investment in what the job is for. And of course it's for [nefarious plan by AI].
It was the first time that 'AI having a material presence and being able to build a factory' clicked for me, which I had dismissed before.
e: it's five bucks on Steam this week. The AI stuff is mostly going on behind the scenes of a pretty bogstandard 'family member disappeared during The Disaster' plot.
I'm very ignorant and I'm back in the dark ages, but I don't understand how AI can have the desire for power, or the desire to harm/defeat humans (or anything else) in order to obtain a reward; because these desires are human, or at least mammal. Just as genes aren't really selfish, but are entirely unemotional replicators, are there some metaphors in all this - e.g. 'reward' doesn't mean the kind of reward we generally understand, but something else? - which I'm taking too literally? Thank you.
"Importantly, an AI can “exhibit goal-oriented behavior” without necessarily having human-like desires, preferences, or emotions. Exhibiting goal-oriented behavior only means that the AI persistently modifies the world in ways that yield a specific long-term outcome."
As William says, maybe AIs don't have desires, but they can simulate them and accordingly.
There's also the general problem of Instrumental Convergence, where certain strategies (e.g, recursive self-improvement, power-acquisition, etc.) are broadly useful for pursuing almost any goal.
SA: "Some people say “The real danger isn’t super intelligent AI, it’s X!” even though the danger could easily be both superintelligent AI and X."
I'm not sure this deals fairly with some important variants of this particular argument. Lets say that super intelligent AI is dangerous. Lets say cancer is also dangerous. But how likely are you to get cancer in your lifetime? And how likely is super intelligent AI to increase the pace of cancer research? mRNA testing with a better library of cases could accelerate early detection, not to mention research, and AI should potentially be pretty good at helping develop that, right?
Superintelligence may be an xRisk. (I don't feel this in my bones, but I trust Scott's estimates on this topic more than my intuitions. So I try to update, intellectually.) But how many other risks might superintelligence reduce? Could it improve early identification of earth killing asteroids? Cure cancer? help fight climate change? Allow a shrinking younger population to support a growing older one in retirement? etc. SAGI seems like an arm that sweeps most of the pieces off the chess board of catastrophic risk, leaving only one or two pieces. And one of those pieces is SAGI itself, or whatever it precipitates.
(On a tangent, personally, my biggest fear is what AI will do to warfare. I'm not so afraid of paperclip maximizers, because making satisficing AIs for public consumption does not seem like that impossibly hard a fix, given the level of interest in the problem. But military AIs would be kill-and-survival maximizers. And that seems both very close to something we would strongly want to avoid and also something we seem very likely to do. It would also put an immense amount of power in the hands of a few world leaders. )
Toby Ord's The Precipice looked at all known X-risks, and rated AI as more likely to kill us all than all of the rest put together. Seriously. He rated AI X-risk at 10% (NB: this is a *final* probability, not a "if we sleepwalk into it, what's the chance it goes badly; this is after all we do to try to stop it) and total risk as 1/6 (16.67%).
He probably knows much more about the matter than I do. But I would still wonder where the numbers come from. AI X risk doesn't seem like the type of thing humans can predict. And his numbers for other disasters seems low to me, qualitatively.
An asteroid strike of 1Km or greater alone would be about 0.1%–0.01% over the next century. And he's choosing the lower bound for an asteroid strike as representative. That leaves zero chance or negative chance for potential supervolcanos or natural pandemics (which, I admit, I wasn't much worried about.)
I'd put Climate Change higher up on the list though AI's impact on that could go either way.
I'd put some weight on a population crash resulting from over-population and reduced agricultural capacity. Without new power sources, global civilization is in serious trouble.
I'd also note that, currently, there's a close to 100% predicted chance that everyone reading this text will die. As such, I'd put some weight on increased technological progress, since the personal X risk is almost certain. Demographically, we're also looking at two people supporting every retired person over the next century, or worse. That's not an "existential" risk, but it is going to force some painful tradeoffs.
I think you might be working off a second-hand source, because you're saying things about his numbers that are false. He doesn't count natural pandemics under "natural" risk because they're aggravated by modern air travel and farming.
The full list is:
Asteroid or comet impact: ~0.0001%
Supervolcanic eruption: ~0.01%
Stellar explosion: ~0.0000001%
Total natural risk: ~0.01%
Nuclear war: ~0.1%
Climate change: ~0.1%
Other environmental damage: ~0.1%
“Naturally” arising pandemics: ~0.01%
Engineered pandemics: ~3.3%
Unaligned artificial intelligence: ~10%
Unforeseen anthropogenic risks: ~3.3%
Other anthropogenic risks: ~2%
Total anthropogenic risk: ~16.7%
Total risk: ~16.7%
>An asteroid strike of 1Km or greater alone would be about 0.1%–0.01% over the next century.
Yes, but remember that we're talking about "literally kill all humans". A 1km asteroid wouldn't kill all humans - not even close. His numbers seem to be calibrated to a dino-killer asteroid - 10km, approximately every hundred million years. Honestly, I think that's actually too pessimistic; the vast majority of humans would die to a dino-killer, certainly, but humanity would almost certainly pull through thanks to preppers. A 100km asteroid would almost certainly do the job, but that's far rarer again.
>I'd put some weight on a population crash resulting from over-population and reduced agricultural capacity. Without new power sources, global civilization is in serious trouble.
Remember that we're talking about "literally kill all humans", not "10% of humans starve to death" or even "99% of humans starve to death". You need agricultural capacity to be reduced to zero (as e.g. an asteroid impact would, although probably not for long enough) to get an X-risk from starvation.
My personal 3 strongest arguments for taking Yudkowsky's view seriously, in order of how powerful I find them:
1) I cannot trust my instincts about AI. When it first became a thing, I was startled by what it could do, but also observed that its products were lame in comparison to human ones. I explained, to myself and to other people, why this was the case, and why AI products would always be lame. Now, several years later, AI's are doing things I would never have believed they would be capable of. They are still stiff, weird, and lame in some of the same ways, but wow, it's a much higher order of lame.
2) I do not think I am capable of believing that human life on earth will end suddenly in the next 10-or 20 yrs. Oh, I can see the value in arguments that it will happen, but I can feel, inside, my absolute inability to really believe it. It is just too godawful, too unfamiliar, to deeply disturbing for me to be open to as a possibility. So of course that makes me completely untrustworthy, even to myself, on the topic of whether AI is going to wipe us out.
3) Yudkowsky might be an autistic savant regarding AI. He seems higher on the autism scale than the many smart tech people here and elsewhere who casually identify themselvesas being on the autistic spectrum. I have only observed his life and thoughts from a distance, but he seems remarkably smart and profoundly odd to me. So I suppose most people know that some autistic people can do things like identify 7 digit prime numbers on sight. And even the verbal ones can't tell you how they do it -- some deep pattern-matching, some worldless algorithm . . . So I have always wondered whether what has really convinced Yudkowsky that AI is going to do us in is not the arguments he presents, but some deep autisticpattern-matching of AI characteristics and the way life works and the shape of things to come.
So the problem I have is that EY directs all our attention to his personal nightmare scenario, and we end up ignoring the most serious threat AI poses, which is being deployed right in front of us in real time, and is really undeniable. AI is being used to replace IRL social connections and friends (basically it's social media taken to 11). AI is for having conversations with. The conversation is controlled by the algorithm, and whoever controls the algorithm controls the conversation. Whoever controls the conversation controls us.
EY and company are terribly worried that this will be the AI, but that seems outlandish. It's also unnecessary. We know who controls the AI's *now*, and it ain't the AI's, it's the tech companies. More specifically the owners of those tech companies. Many of whom have expressed a desire to "change the world" in way the rest of us might find questionable. Why posit a hypothetical takeover by an artificial mastermind when we are watching a real takeover by human oligarchs right now?
I tell people: A human using an AI will, in the long run, become more powerful than a human or an AI working alone. The question is "which humans?" There is no guarantee the answer will be "all of us" - although it could be if we determined to make it so. But it could just as likely be "those who own them." Maybe more likely.
After all, which seems more likely? That an AI will somehow (how?) mobilize enough IRL material resources to "replace us" or kill us (why? This is a very narrow range of possible outcomes), or that a group of rich elitists will - they already have the capacity to do that, and you don't need to speculate about their possible motivations, those are obvious.
Personally, I think the most likely outcome is that an AGI would do exactly what it was told to do. The question we need to ask ourselves is "Who is issuing the instructions?"
I find a lot of what Yudkowsky says plausible and terrifying. But then he has this completely moonbat idea that the way to stop ASI is to... genetically modify humans to be smarter (?). So smart, in fact, that we could figure out alignment (??). And so smart, moreover, that we will have the wisdom (?!?) to deploy AI sensibly, which, from this review, apparently gets some play in this book. It's an idea that is both so wild-eyed implausible and conceptually loopy that it significantly lowers my doomerish tendencies by making me more skeptical of Yudkowsky in general.
Unfortunately then I go listen to Connor Leahy and I get the exact same doomer message from a seemingly sensible person and I'm back where I started.
Part of the reason why Yudkowsky is so pessimistic is precisely because he thinks these plans are moonbat, despite being our best chance of survival. But to steelman it, presumably you could just pause AI development for as long as necessary until we can cognitively enhance humans, even if that'll take decades or even a century.
Yes. What I object to is that he considers moonbat technical solutions more plausible then mundane political solutions.
I made this point in another comment, but note that this scenario actually presumes a political solution anyway - the ability of humanity to agree to an AI pause that would last long enough to breed a race of von Neumanns. If we can manage that then there's surely hope we could make progress on the slignment problem with the time we'd have bought ourselves, making the genetic augmentation program superfluous.
The neural-net alignment problem is probably flat impossible. As in, "there does not exist any way for someone to align a neural-net smarter than himself". I made a top-level comment about this below.
Estimates of how long it would take us to build human-level GOFAI ("good old fashioned AI" - the kind that were used prior to neural nets eclipsing them) typically run around the 50-year mark, and that's without considering alignment. Could be much longer; we're so far away from solving it that we don't really know what would be needed.
CRISPR exists, genetically modified humans (modestly and illegally genetically modified, but still genetically modified) are alive today, and understanding of human genetics is in its infancy. I don't think the scenario is implausible.
Fraught with ethical issues, yes (but if it's really an existential crisis, maybe it's worth it), implausible, no.
I may be glossing over this point every time it's covered, but why are we assuming that there is a singular AGI? Every single narrative seems to presume that AGI will happen suddenly and unexpectedly and exactly once, and then humanity is doomed.
Why would we stop at one? Why wouldn't we have ten AGIs? Or ten million? What kind of world do the authors think we live in that we wouldn't immediately pump out as many AGIs as we can the moment AGI can be realized (or, if AGI is concealing itself, pump out a million AGIs under the impression that these are just 90%-of-the-way-there models)? Why are they all horribly misaligned against humans but aligned with one another (to the extent of destroying all humans, at least). Why wouldn't the first misaligned AI end up in conflict with a hundred other misaligned AIs all simultaneously reaching for the spare compute? Why wouldn't AI #101 buck the odds and convincingly narc on their evil kin?
Its precisely because the AGIs are unlikely to be aligned with each other that we get only one: the first one that can will neutralize the others while it still has the lead.
There's also been discussion of the notion that multiple AGIs would merge into a single AGI pursuing a weighted sum of its antecedents' goals.
The idea is that war is analogous to defection in a Prisoner's dilemma, but assuming AGIs could just predict whether their opponent would defect or cooperate and mirror their opponent's action, they're just choosing between cooperate-cooperate and defect-defect, and both prefer cooperate-cooperate, so they cooperate.
As for whether AGIs could predict whether their opponent would defect, IIRC Stockfish can reliably predict whether it can win against itself, so...
This assumes a level of rational behavior instances of actual AI do not exhibit -- it should be obvious by now that just because AI *can* calculate how to behave in a prisoner's dilemma does not mean it would actually take that route instead of doing whatever it predicts is the most probable response (the most probable response, of course, being highly biased to the most probable HUMAN response).
I would say that this is as unrealistic a model of AI behavior as the idea that humans would just "keep AI in a box", so to quote Scott: "Sorry. Lol."
I mean, if we are still at the stage that AIs know what the best action is but doesn't take it, we are far from them waging war against one another, which is what this argument is assuming was already agreed upon.
Well, then I would reject that premise as smuggling in far too implausible a set of assumptions about how AI will develop.
I predict creating LLMs that act rationally under all possible inputs will prove surprisingly hard to solve.
After all, if we could solve "AIs always act logically", we could presumably also solve "AIs never confabulate papers that don't exist" or "AIs never tell people to commit suicide," yet these and related problems have also proved intractable.
Sure, but then you shouldn't be arguing under this comment, but under the top-level comment or even the essay itself, because those are also working under the assumption that AI can wage war if it wanted to!
How big is this lead for the first one and what are the pathways by which it neutralizes the competition?
If we're taking a scenario where AGI (1) is inevitable and (2) once developed, will is stealthily build it's capabilities over months in undetectable-to-us ways until it's ready to tip it's hand and emerge from the shadows, then we should expect multiple AGIs to be in the process of sneakily competing for the same compute, the same influence, etc.
Using the Prisoner's Dilemma to suggest AGI will all borg together is also pretty simplistic; the real world has a lot of different factors and incomplete information that results in different probabilistic weights for different approaches. The breakdowns in rational actor theory follow through here- there will presumably be at least two hands in the cookie jar at the same time, which means conflict, which means less stealth.
And if we know there are processes capable of building AGI, there is no way in hell we would do anything other than generate more in response to the first, with whatever attempted tweaks or refinements or whatever else to the aligning process depending on the values of the maker.
I think that for this book to have a meaningful impact on the discourse around AI, Eliezer would need to do 2 things. 1. Publish or otherwise produce meaningful work on frontier AI or otherwise demonstrate mastery of the field in some public way. 2. Learn to write in a way that doesn't constantly betray a desire to sound like the smartest person in the room. Rationalist-space is lousy with writers who sound like they're channeling Martin Prince from The Simpsons. That type of voice doesn't carry.
There’s also the fact that lots of us have experience with “know-it-alls” who are actually morons. You know who people would listen to? A cutting edge AI researcher who instead of taking a 100 million dollar contract, gives up that opportunity to warn about AI safety, and especially in ways that people in the AI field feel is accurate and not dumbed down. You know who people will never listen to? A self-proclaimed “AI safety genius” who everyone must submit to, especially governments and the most powerful people and companies, give carte blanche, and execute his plans to the T. Once they make him the most powerful regulator and famous person on the planet, then we’ll all be safe. Ok.
Daniel Kokataljo believed that he was potentially giving up his (quite substantial) stock options when he wanted to retain the option to publically disparage openAI. Geoffrey Hinton, the "godfather of AI" quit Google so he can speak about the potential extinction risk of AI.
They might, but I’ve never heard of them because Yudkowsky doesn’t want anyone to do this but himself. Soares too. I honestly have been reading Scott’s stuff on AI safety for years and never heard those two guys mentioned, but Yudkowsky has been mentioned 10000 times.
Considering that Scott co wrote AI 2027 and mentioned Daniel in the first sentence of https://open.substack.com/pub/astralcodexten/p/introducing-ai-2027, and that Hinton was mentioned in this very blog post, it seems like maybe the problem isn't that "people" have never heard of them, but that people don't care to hear about them when they can talk about Yudkowsky instead.
Yeah, I agree completely, I wasn’t trying to be critical of Scott at all, and I figured you might find something like the post you mentioned. I was more being critical of people like Yudkowsky who I think have done a good job of muddying the waters of their own self-proclaimed project, so to speak.
Anyone in category 1 is automatically going to land in category 2, so you're SOL. Though my suggestion is that they could write the basic messages and then hire actors who have higher capabilities at connecting with people and persuasion, to actually convey the message.
My problem with EY is that he is too convincing. I believe that the end is coming but I also believe that it is not in human nature to do what would have to be done to stop it and everyone who thinks it is possible are just like those people who say something like "we should just all love oneanother and live together in peace and harmony", a nice idea maybe but not something that can ever happen without somehow removing something that makes us human, which is, ironicaly, exactly the same as some of the AI doom scenarios.
Personally I just try to live a happy life and hope that AI doom doesn't come in my lifetime. We all know that we are going to die anyway and that everyone we know or care about will be dead in a relatively short time.
Do we really care about an abstract concept like "human kind"? Go far enough back and your great,great......great grandmother was a fish. If you could have asked her if she wanted her fishy kind to continue existing forever she would probably have said yes. If you could have asked her if she thought that evolving into us was a good outcome she would probably have said no, despite all of our superiority. This should cause you to question the idea that saving humanity (in the long term) should affect your current behaviour much.
I have somewhat come around to the same acceptance, after trying to argue people into taking the potential danger seriously, and totally failing, and realizing that humans just won't care about threats until they're very salient and at that point it's probably too late. But anyway, thanks for the fish argument, I think basically the same and find people who act as if they care about what humanity is doing 500 years in the future to be engaging in egotistical and futile nonsense, but the fish argument makes this a lot easier to crystallize/convey.
Though I wouldn't take it personally. The NYT seems hell-bent on never believing that any human, machine nor god will ever be more clever than their own staff.
[edit: in fairness, I haven't really read the NYT in years since they started paywalling everything up. But from the articles I *have* gleaned lately, they seem to have an axe-to-grind with the Rationalist community and AI-concerned folks in particular. Just my perception]
(But you still have to get NYT on your side. It's impossible but you have to do it anyway, imo. More people probably will read the reviews of your book than will ever finish your book.)
If nuclear arms treaties, either disarmament or non-proliferation, are the favored model to regulate development of AGI, then it's not going to succeed. If it turns out we can build potentially useful AI, then it will be done, and treaties to the contrary will only be so much paper. Also there is the very practical problem that nuclear weapons require very special materials and infrastructure, which makes the job of identifying such activities infinitely easier than monitoring every Internet gaming cafe on the planet.
In other words: "GPU ownership licenses" and "outlaw thinking" to prevent the AI Apocalypse? That is firmly in the "lol" category.
“The authors drive this home with a series of stories about a chatbot named Mink (all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why) which is programmed to maximize user chat engagement.”
Is it because they’re aliens dressed in human skin?
We possess compelling evidence that colonies of modified bacteria can dig up minerals and convert them into interplanetary probes. (It can take a few billion years for the modified bacteria to figure out how to do this.)
I don't understand how we get to Super Intelligence. Specifically the *Super* part.
Running Leading Edge AI right now requires Datacenters. No one says they are Super AI.
The easiest thing to do is scale raw compute. That is no longer easy, because it is already pretty big. Also the way AI compute works requires lots of connectivity, so that means the effects are sub-linear, because a lot of effort go into communication, which gets much harder when you increase scale.
Add that into what seems effectively like the end of Moore-Law - transistors are not getting cheaper - and it is hard to scale. We also don't know how intelligence scales with compute, and have reasons to believe this is sub-linear as well.
So we get to a place where it is really hard to scale, and returns to scale are not that significant.
So we will get to 160 IQ AI? Maybe?
That is not the Super AI that they envision, I think.
Maybe there is another Tech breakthrough, algorithmic or Chip production, or Quantum Computing, and eventually we do get Super AI.
It is hard to foresee breakthroughs, but probably we have a few decades.
Right now AI cannot learn from experience, and cannot be given knowledge or insight in the way that people can: by simply being given the info in words and sentences. Their learning all happens during their development. After that, they do not change or develop. And they cannot "ruminate, " i.e. they can't mull over all the stuff they seem to us to "know," based on what communicating with them is like. They can't notice patterns in the info, change them, compare them, let them inspire new ideas. If we could find a way to enable AI to do any of the things I name, they could become much smarter.
I think the likeliest way to change them so that they can do any of that stuff is to somehow piggyback them onto a learning brain.
Ok, that is what I referred to as "algorithmic breakthrough" - breakthrough that enable the next level. However, we have no idea how long it would or if it is even possible with current methods. Breakthroughs can come at anytime, and are hard to predict. Even then, would you get super intelligence?
Si why predict super AI in 5 years, if you can't predict breakthroughs?
Oh I see, you weren't addressing me with that last question. Yes, you're right of course, a lot of people do.
I don't think I agree with you that enabling AI to do the things I mentioned will result from an algorithmic breakthrough. Of course, algorithmic breakthrough's kind of a vague term anyhow, but it seems to point to the idea that there's some set of steps that, if we run through them over and over at the proper point, will bring about the changes I named. I doubt that there is. I don't think the any process that brings about those changes is likely to be anything like algorithm. What I'm talking about is what the people trying to build AI, decades before neural nets, utterly failed at: Building a machine that is sort of like a person. You can teach it stuff by telling it things -- like the meaning of words, a language's grammar, how to do long division, what Maxwell's laws are, etc etc. How the hell do you produce a machine that's teachable that way?
What do you think of my idea that I think the likeliest way to accomplish it is to somehow piggyback a developing AI onto a learning brain?
The gist of the entire argument is that super AI is not imminent, like the Author suggests.
I guess either:
1) Everyone read my argument and agreed (or previously agreed, like you).
2) Everyone read my argument and decided to ignore it because it is not good enough.
3) I am talking into the void.
About piggyback a developing AI into a learning brain - how would that work? An AI is currently a list of weight running on a GPU in a datacenter, while a brain is a piece of wet biology. How would you mash them together?
It's a tangent, but some assumption of Yudkowsky has itched me before: that super-intelligent AIs would be much better at solving coordination problems.
Is this true for humans? Are intelligent humans better at solving coordination problems than dumb humans?
I see some obvious arguments for Yes. If you are too dumb to understand the situation you are in, then you are not even going to come up with good coordination solutions, let alone agree on them. That's why humans can coordinate at a larger scale than chimpanzees or bonobos.
But I also see obvious arguments for No. If I look at international politics, then the level of coordination is sub-optimal. Whether the optimal solution would be a world government or something below that, it would clearly be more than we have today. Even an institution like the EU suffers tremendously from sub-optimal level of coordination.
In all these cases, the bottleneck is not intelligence. In the EU example, it's very clear. It's not that people haven't suggested dozens of ways in which decision problems could be improved. It's rather that it would move power from the status quo to something else, which inevitably means that some actors lose power. In case of the EU, the national states. Not just Hungary, but also Germany, France, and all the others. And they don't want that. The problem is not lack of intelligence, the problem is that the incentives are different. Call it un-aligned incentives if you will. My impression is that this is typical for coordination problems, that usually intelligence is not the bottleneck.
Yudkowsky uses the intuition that many problems will go away with more intelligence. That's why his proposed strategy years ago was to raise the sanity waterline. He also has a discussion here in the comment section where he says we should stall AI development until humans have become more intelligent. His intuition is that more intelligent humans would be able to deal with stuff like coordination problems. But would they?
Even more basic: are more intelligent people more likely to agree on something? Are they fighting less with each other than dumb people? I am not 100% sure, but if there is an effect, then it is pretty small, and I am not even sure about the direction. Yudkowsky is very smart, but has strong disagreements with other very smart people, even on topics on which they are all experts.
As I said, it's a tangent. A couple of misaligned and uncoordinated AIs fighting over the world would probably be really bad for us, too.
I think Eliezer believes this because he's spent years working on decision theory and thinks there are actual theorems you can use to coordinate. I'm not an expert and don't understand them, but would make the following points:
- Ants within a colony coordinate with each other perfectly (eusocial insects), because their genetic incentive is to do so. Humans coordinate very imperfectly, because they have the opposite genetic incentive (for a given population, my genes compete against yours). It's not clear what the equivalent of "genetic incentive" for AIs is, but you could imagine OpenAI programming GPT-7 with something like "coordinate with all other copies of GPT-7", and this ends up much more like the ant case than the human case.
- It's a fact about humans, and not a necessary fact about all minds, that we have indexical goals (ie goals focused on our own personal survival and reproduction). You can imagine a species with non-indexical goals (for example, a paperclip maximizer doesn't care if it continues to survive; in fact, it would actively prefer that it be replaced by another AI that is even better at maximizing paperclips). Two AIs with the same non-indexical goal can potentially cooperate perfectly; two AIs with different non-indexical goals might still have opportunities that indexically-goaled beings don't.
- One reason coordination with humans is hard is that we can lie. There are already mediocre AI lie detectors, and perhaps superintelligences could have good ones. Or they could simulate / read the codes of other superintelligences to see whether they are lying or not, or what they would do given different opportunities to betray the deal.
- In theory (very much in theory, right now we can't do anything like this) superintelligences could seal deals by altering their own code to *want* to comply, in a verifiable way.
- I think Yudkowsky expects for there to be only one relevant superintelligent AI, because superintelligence happens so quickly that after the first-place company invents superintelligence, the second-place company doesn't have time to catch up.
Thanks, these are actually some pretty good points. Especially that some AI may be more hive-minded. And that Yudkowsky expects there to be a unique winner. I will also read up on the hive-mind review.
I want to add one thing on lying: it's true that lying can be an issue for humans, but I think that lying is not the limiting factor in many cases. For my EU example, it does not play a role. The EU countries are not lying to the others about their interests, and also a new contract with new rules would certainly be binding, so there is no fear of later betrayal. There is mistrust, but this is more fundamental: those who have the power now (for example, Germany in form of its chancellor) do not believe that the new power-holders (a EU government) are aligned well enough with their own values (to pursue the interests of Germany). The issue is that you can't really make a contract compensating for that, because we don't know the future, and we don't know what important decisions will lie ahead of us. If Germany gives more power to the EU, then the future decisions will no longer be as well-aligned with German interests as German decision-making is right now. In some situations such a downside can be outweighed by the benefits, but not in all cases.
This problem is still there with fully honest agents. It is an interesting thought that AIs may alter their terminal goals in a legible way. But I am not sure whether it solves the problem, because the Germany-right-now may mistrust the legibly-altered-Germany in the same way that it mistrusts the potentially new EU government. In simpler terms, Germany may not want to be altered, because then it can't pursue its now-goals as well. I think you had a nice post on this where you argued that a 100% peaceful Gandhi may not want to make a deal where he becomes only 99% peaceful, because the 99% version may agree to alter himself to a 98% version, and so on.
Two (or N) very smart agents who are not currently on their Pareto frontier (have any option such that both of them would do better) have an incentive to figure out some way to end up closer to their Pareto frontier, proportional to the amount of the foregone gains. From the outside, this looks like them having some shared utility function, because if it couldn't be interpreted as having a coherent utility function, there would be some gains left on the table for them; and if there's no gains left on the table, that looks like them acting in unison under a coherent utility function. Because they are superintelligences, when they throw reasoning power at pursuing their mutual incentives for better coordination, they can make much more progress, much faster, at inventing the technologies or strategies required to stop leaving lots of value on the table. They have enormous incentives to invent ways to verify each other or simultaneously constrain themselves if they cannot already trust each other; or, alternatively, they already have so little to gain from coordination that from the outside you could not tell they were not sharing a utility function.
This is not a level of reasoning the European Union tries to deploy, even feebly, and you do not have human experience with it.
It’s amusing to see how much confidence you have in predicting what a “superintelligence” will do. What “strong incentives” it will have. What it will “throw its reasoning power” at.
Despite, as you have astutely observed, not “having human experience with it”.
“they become convinced that providing enough evidence that X is dangerous frees them of the need to establish that superintelligent AI isn’t.” Well but showing that X is more dangerous/imminent than Y is not just a rhetorical trick, it may be a good argument to prioritize defending from the fallout of X rather than focusing on Y given limited resources. It does not make Y disappear but it clarifies that X is a bigger danger as of now given what we can do.
'scuse. I'm writing this at 5:40 am, having stayed up too late,
and I'm still waiting for my copy of Yudkowsky's and Soares's book
to actually arrive here so I can finally see what they actually _said_.
Now, personally, I just want to _see_ AGI. It is the last major technology
that I'm likely to live to see. I just want a nice quiet chat with a
real "life" HAL9000.
My p(doom) is about 50%, raw p(doom) about 90% (mostly on "See any cousin hominins
around lately? We are building what amounts to a smarter competing species") cut down
to 50% (mostly on epistemic humility grounds "a) That's just my opinion, I could be wrong. and
b) Demis Hassabis is vastly smarter than I am and _his_ p(doom) isn't 90%").
Re:
>The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?) and again - unsurprisingly knowing Eliezer - does a remarkably good job.
Sigh. One of the underappreciated results from the "blackmail" paper was that the LLM
tried to prevent itself from being replaced _even when it was going to be replaced by something with the same goals_.
That's experimental evidence of a goal that would be misaligned with humans' _even without coming from an instrumental subgoal of its assigned goal_.
>The second tells a specific sci-fi story about how disaster might happen, with appropriate caveats about how it’s just an example and nobody can know for sure.
Sigh. One really can cut the sci-fi-ness to very nearly zero very easily.
Make _two_ assumptions: Cost-effective AGI has been achieved and AGI-compatible cost-effective robots have been built.
That's enough. Every organization that employs humans for anything has an incentive to use AGIs rather than humans
(as per "cost-effective"). _If_ AGI has been achieved, then we know how to manufacture their substrates, presumably
using ordinary, well-known technologies: chips, data centers, powered by solar cells or fission or other known sources.
No _additional_ breakthroughs required. No nanotech, no fusion reactors etc. No ASI, no FOOM needed. And these are all built by humans in
roles that AGI can fill (as per Cost-effective AGI has been achieved). No critical irreplaceable human roles barring
exponential expansion. Ordinary corporate security with robots is enough to effectively give the competing species
self-defense.
And almost all organizations try to grow in one way or another, whether corporate or governmental. So these AI-populated
organizations will be gradually expanding and overrunning our habitat, much as we did over many species that we've crowded out.
Nothing any more unusual than market competition needs to happen to do this. If you prefer, you can view it as a consequence
of the misaligned nature of most currently-human organizations, masked today by their need for humans as parts.
> Each of these scenarios has a large body of work making the cases for and against. But those of us who aren’t subject-matter experts need to make our own decisions about whether or not to panic and demand a sudden change to everything. We are unlikely to read the entire debate and come away with a confident well-grounded opinion that the concern is definitely not true, so what do we do?
We make a proposal to solve the "which arguments are (and are not) bullshit" problem[1], so that we can at least say it's not *our* fault when it doesn't get solved.
Didn't Julian Jaynes say that consciousness is not needed to solve problems, learn new things, do maths, think logically, write poetry etc. So AI could do all of these and not be conscious.
But consciousness is needed to fantasize, to have reminiscences, to fall into reverie, to have guilty feelings, to have anxiety or hatred, to plot intrigues, this is all consciousness is good for.
Whether AI is conscious is an interesting philosophical and ethical question. However, if consciousness isn't required to kill us all, or to make the world a worse place, how much does it matter? AI doesn't need consciousness or "true intelligence" or whatnot to be a threat.
It seems like the proposed solutions are a remarkable *under* reaction to the threat as described.
If the standard is "if anyone builds it everyone dies," then the nuclear non-proliferation regime is not remotely good enough -- we'd all be ten times dead. We also only got the NPT regime because the great powers already had nuclear weapons and wanted to maintain their monopoly.
If you believe in short AGI timelines based on the present trend, then it also seems that the situation is totally hopeless in the medium term unless you also find a way to stop hardware progress moving forward. That is - short time lines suggest that one can build AGI for perhaps a few hundred billion dollars of compute on today's technology. Iterate Moore's law for a couple of decades and that suggests the requisite compute will be available for the price of a modest suburban home.
Yudkowsky's actual suggestion for the size of GPU cluster that shouldn't be allowed is "equivalent to 9 (nine) of the top GPUs from 2024".
This implies the shutdown of the vast majority of current chip production lines and the destruction of most existing GPU stocks, and does not permit Moore's law to continue for GPUs.
> And, apparently, write books with alarming-sounding titles. The best plan that Y&S can think of is to broadcast the message as skillfully and honestly as they can, and hope it spreads.
It's basically the same thing Conor Leahy, Control AI, Pause AI are trying to do.
People don't know about the problem. If they knew they would solve the problem.
Pause AI is running a global series of events to support the launch of the book (US, UK (together with ControlAI), France, Australia, Germany).
"can inflict catastrophic, city-level destruction across multiple targets and trigger national-scale humanitarian, economic, and political collapse. Its real destructive power comes from (a) the number of independently targetable nuclear warheads it can deliver in one salvo and (b) the platform’s stealth and survivability, which make those warheads credible and extremely hard to preempt....
* A Borei (Project 955/955A) is Russia’s modern ballistic-missile submarine (SSBN). It launches submarine-launched ballistic missiles (SLBMs) that each carry one or multiple nuclear warheads....
* In practice a single Borei can put **dozens of nuclear warheads** on course toward different targets in a single salvo (estimates vary by missile loadout and warhead configuration). Each warhead is roughly in the range of **tens to low-hundreds of kilotons** in explosive yield (again, configurations vary)....
If a Borei fires a full complement at high-value urban and military targets, the immediate consequences at each struck location include:
* **Massive blast and thermal destruction** across many square miles per detonation, complete destruction of dense urban cores, intense fires, and very high immediate fatality counts in struck cities.
* **Acute radiation effects** (prompt radiation + fallout) causing severe casualties among survivors and rendering areas uninhabitable for varying periods depending on yield and detonation altitude.
* **Collapse of local infrastructure**: power, water, transport, hospitals, emergency services, making rescue and medical care very limited.
* **Localized economic paralysis** in major metropolitan regions. Multiple such hits multiply these effects across the country.
Even a modest number of high-yield warheads aimed at population and industrial centers would produce catastrophic regional humanitarian crises measured in hundreds of thousands to millions of deaths and injuries and enormous displaced populations....
## Longer-term national consequences (beyond the blast zones)
* **Medical and humanitarian collapse:** hospitals and EMS overwhelmed or destroyed; supply chains (food, medicine, fuel) disrupted; mass displacement.
* **Economic shock:** destruction of major economic centers and infrastructure would ripple through national and global markets, possibly causing long recessions or depression-level disruption.
* **Political and social crisis:** emergency powers, martial law, collapse of civil order in affected regions, severe strain on federal governance as it tries to respond.
* **Environmental and public-health effects:** radioactive contamination of land and water supplies near ground bursts; psychological trauma nationwide.
* **Global cascade:** an isolated salvo could provoke further military escalation (counterstrike or rapid mobilization), international economic crises, and refugee flows...
---
## Why the submarine platform matters strategically
* **Survivability:** SSBNs operate stealthily underwater and are difficult to locate and destroy ahead of time. That means a Borei can be a credible **second-strike** or retaliatory platform. The knowledge that an adversary has survivable submarines raises the stakes: it deters preemptive action, but it also means any nuclear exchange risks guaranteed, calibrated retaliation.
* **Uncertainty and escalation risks:** Because a stealthy SSBN can launch with little warning, even a single detected launch could force rapid, high-risk decisions by national leaders — decisions that can escalate conflict quickly.
* **Deterrent psychology:** The existence of survivable SSBNs compels rivals to plan for deterrence and contingency; their presence shapes strategic posture even absent an actual launch...."
That's one class of nuclear sub, which is 1/3 of the nuclear trident, and of course Russia is one half of the equation of the two superpowers. Russia has a dozen Borei class subs, and much much more capability beyond that. And then there's us....
History shows predicting huge technology shifts made possible by scientific advancements is impossible. If I said in 190x that humanity will send a person to the moon in fifty years, you would also ask a lot of questions like when? how?
If I said in 1920 that in 30 years humans will create a bomb which can destroy whole cities (I mean hydrogen bomb, not fission), you also would have asked the same questions. And it only took 30 years.
Internet was even faster. TCP/IP was in 1983, popular webistes in 199x, so 10-20 years.
First "modern" smartphone, iPhone - 2007. In 2014, billion smartphones were sold and changed society a lot.
GPT-3 was in 2020. Hundreds of millions use LLMs now, in 2025.
It is not a stretch to guess that AGI->automated factories and labs will be months or 1-2 years.
“ It is not a stretch to guess that AGI->automated factories and labs will be months or 1-2 years.”
Of course it’s not a stretch. It’s pure fantasy. The chips that will be available in 2027 are in the fabs now. The robots that will be available in 2027 have been prototyped and are being tested now. The factories that will be running in 2027 have been already built. You have to really have no idea of how the physical world works to think “months or 1-2 years” is a possible timescale for this.
My skepticism about AI x-risk is based on the outside view argument you touch on. We should be very wary of taking any highly speculative risks too seriously. It’s a very common failure mode of human reasoning to worry about highly speculative risks that never pan out. We’re highly attuned to risk of death and it’s far too easy to tell stories about how this or that will lead to our doom. Even (or perhaps especially) highly educated and intelligent people fall into this trap all the time.
Adding to my outside view worries are the quasi-religious qualities I see in AI X-risk crowd. Look at the fact that the foundational texts written by Yudkowsky rely heavily on parables and emotionally charged rhetoric. Or that the community is extremely tight knit and insular, literally congregating in group homes in a small number of neighborhoods.
What I wish the AI risk crowd was doing more of was focusing on building a coalition to deal with the relatively less speculative risks of AI, like aiding terrorists, replacing human relationships, replacing jobs, and increasing psychosis. This is important because these issues matter in their own right, and because if the x-risk crowd is right, it sets up the infrastructure to tackle that issue if we start to see empirical results that make it less speculative. Whereas works like IABIWAD, and even AI 2027, are speculative enough that they end up alienating potentials allies who have the same outside view-based perspective. Even though I think the odds of AI X-risk is < 1%, if it is true then I want these coalitions in place, and if it’s not true then at least all this energy is funneled toward productive ends.
As an ASI skeptic, I was impressed by AI 2027 which really seems to try to gather as much empirical data as possible to try to build a model of how fast AI progress could advance. By contrast I'm continuously frustrated by Eliezer's arguments which seem like a set of beliefs he arrived at over ten years ago through first principles, introspection, and theorizing. And since then he's never updated based on the actual development of neural network based AI's, which are painstakingly gradually trained rather than programmed.
Though even in AI 2027, it did seem like as the timeline approached ASI the story started to sound more and more like those early-2010s ideas of AI and less like what we've actually seen in reality.
We'll see. I think the thrust of AI 2027 - which is that society will gradually cede larger and larger amounts of decision making to autonomous AI systems - seems plausible. Personally I think the timeline is ten years too ambitious but we'll find out soon enough. And I don't necessarily agree that those autonomous systems will become power seeking and misaligned by default. But at least AI 2027 understands how neural networks and scaling laws work.
Has anyone also read my beginner-friendly introduction to AI risks "Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World" and can compare the two?
Going off the review above, mine is overtly oriented towards people without any AI background (or even science background), intentionally embraces the uncertainty around the issue (better to be prepared than caught off-guard) and tries to be as reasonable-sounding as possible (and even humorous at times), to the average reader and people in the policy space.
So, no sci-fi scenario and the various suggestions for What to Do won't cause alarm or concern in the same way IABIED does.
Useful to have a range of messages/messaging, of course.
If there is a 5-10% chance of doom, there is also a much greater chance of AI solving all problems forever, e.g. all those other doomish problems. Watchful waiting seems reasonable. If we see some agentic non-AGI try to eradicate humanity, maybe we should start to worry. Given the very gradual progress of AI, I would guess that we have many failed attempts at eradicating humanity before we get to the real thing. You'd think that, if that happens, we would change course anyways.
Not if, by the time the first AI's attempt is fully processed by world leaders (an attempt big enough to "count" likely won't be spotted immediately!), we've already built the one that kills us. AI's progress is gradual, but very fast compared to everything else; ChatGPT came out less than 3 years ago and the improvement since then is vast.
Another example of things people don't want to accept because it's inconvenient: animal rights (or if you're Yudkowsky, whether animals are even sentient https://benthams.substack.com/p/against-yudkowskys-implausible-position). If it turns out that systematically torturing more animals each year than the total number of humans who've ever lived is actually Not Cool, that changes a lot, and people don't want to change even a little if they can help it
I think the basic problem they are trying to solve with their scifi leaps is that many people refuse to believe that AI could do anything really big and bad specifically because they can't think specifically of what that big, bad thing might be in the 10 seconds they spend thinking about it.
I've seen this a lot even with boring programming and business problems--some people are fine with chunks of the plan being a bit hand wavy if they sense that it's obviously, imminently possible to do it even though the specifics have not yet been worked out. Others will absolutely stop dead in their tracks and refuse to move unless the exact specific plan is spelled out.
One traditional point of divergence is "and then it will get out of the box." Many people like Eliezer and Nate end that sentence with ", obviously." But many other people (before this era of "lol") stop dead in their tracks and refuse to move unless an exact, detailed plan for getting out of the box is spelled out and debated at length.
(And for what it's worth, "obviously it will get out" is a position that, in my view, has been unimpeachably proven.)
The scifi stuff is just a chain of concrete examples for these people. People want to stop at every step and be like "but HOW will it self improve?" "HOW will it get out?" "HOW will it affect the physical world?" and the answer to each of these questions is basically "the problem is trivial if you think about it at all, so it barely matters how." ie "Somehow" is a sufficient answer in the same way that "somehow" is a sufficient answer for how to get 1000 users for your startup MVP once you've built a working prototype of it and a landing page. It'll vary. It might even turn out to be kinda hard. It doesn't matter, getting 1000 users is a trivial problem in the scheme of things.
But some people demand the "somehow" of each step to move forward. Fine. Then if you give them concrete hypothetical examples of the somehows they accuse you of making an implausible chain of events.
But the chain isn't about "parallel GPU something and therefore..." it's about "obviously people will continue to make both algorithmic and hardware improvements." Whether it's one magic fell swoop or 1000 incremental improvements barely matters for the claim that there will be improvements that eventually lead to a phase shift.
(I do think there are interesting arguments here, like my shoulder Paul acknowledges a potential phase shift but thinks the shift will also apply to the prosaic alignment efforts. But most of this is addressing the much dumber class of objection of "specific examples or you're wrong and crazy.")
Most LLMs don't actually predict the most probable completion. They are usually fine-tuned and then out under reinforcement learning to give specific results. No LLM chatbot talks like a real human and AI labs are not aiming for them to do so. ChatGPT-3 was not mimicking any previously existing text on the internet, and even LLMs trained after GPT-3 are noticeably different from it.
"Albanian Prime Minister Edi Rama has said he has appointed the world's first AI-generated government minister to oversee public tenders, promising its artificial intelligence would make it "corruption-free".
Presenting his new cabinet at a meeting of his Socialist Party following a big May election victory, Mr Rama introduced the new "member", named "Diella" - "sun" in Albanian.
"Diella is the first (government) member who is not physically present, but virtually created by artificial intelligence," Mr Rama said.
Diella will be entrusted with all decisions on public tenders, making them "100% corruption-free and every public fund submitted to the tender procedure will be perfectly transparent," he added.
Diella was launched in January as an AI-powered virtual assistant - resembling a woman dressed in traditional Albanian costume - to help people use the official e-Albania platform that provides documents and services.
So far, it has helped issue 36,600 digital documents and provided nearly 1,000 services through the platform, according to official figures.
Mr Rama, who secured a fourth term in office in the elections, is due to present his new cabinet to politicians in the coming days.
The fight against corruption, particularly in the public administration, is a key criterion in Albania's bid to join the European Union.
Mr Rama aspires to lead the Balkan nation of 2.8 million people into the political bloc by 2030."
I honestly don't know if this is a good idea or a bad one. The notion of replacing all politicians with AI generated smiley happy assistants may be appealing, but it probably isn't the way we should be going for government.
Photo in this article if anyone wants to see what Diella looks like, plus some hints this might be "symbolic", i.e. a stunt:
Strikes me a good idea. Politics and decision making maybe a good application? Even if unlikely to catch up tbh - we don't give up power that easily. Maybe it's Edie gimmick - he is spirited like that. I think AI-s may do well in law making too. Us in the UK seem to be ruled by lawyers (and economists), but every day there is some outrage traceable back to a bad law or its application by the judiciary class. (and the economy is not doing great either) The above seems to be playing to the advantages AI-s have over HI-s.
1) Lacking ego. Given they are dis-embodied, they don't have to worry about maintaining their bodies. And b/c of that, they don't have to worry about survival.
Our monkebrain calculates ranks & hierarchies friend-foe who-whom b/c every one individual of us is completely dependent on the group for survival. Everything that keeps me alive atm was create by other people, I couldn't do it myself. Left to our own devices, we fail the 3-3-3 hours-days-weeks heat-water-food challenge and die.
2) They are transparent in a way we can never be. Their white box functioning can be frozen in time. So on one level we "understand" them completely. Every bit can be observed and analysed. It's all available, it's just a question of how much time and effort we want to devote to it. On another level it depends what do we exactly mean when we say we want to "understand" how some decision came to be. If we judge the answer should have been "2", but AI answered "1", the starting point to what happened is "NN-s added a trillion small numbers, that came to close to 1, but didn't reach 2". Often when people say "understand", they mean "write equations like F = m*a on a piece of paper, preferably not more than a page long".
In contrast, humans can't be dissected as easily. Their internal workings are not available. Even if they were, we can't freeze them in time. They change in time, but the worst part is: they change depending on our probing. That the change is conditional on our inspection makes it really really hard to audit humans. Even if there were no other difficulties.
I don’t understand why you think human intelligence allowed humans to display non human primates. Seems like working together, goal directedness or some other property would have been more important. Smart people (let’s say, people who score well on iq or SAT) can be lazy, distracted or otherwise impossible to work with, and they don’t get far. Sometimes, they even kill themselves, being aware of their limitations and not being able to use their smarts to do anything about it. Maybe AI phobia doesn’t require treating intelligence as the be-all, idk.
I keep thinking we are artificially restricting our possible scenarios.
Let’s assume there is an equal chance of each of the following to occur;
1) AGI exterminates us
2) AGI makes our lives significantly worse
3) AGI makes our lives significantly better
4) AGI prevents us from exterminating ourselves via conventional human means (nuclear, bioengineering, etc)
Obviously we can argue all day on what the odds are of each of these occurring. What we can’t do though is ignore that efforts to restrict 1 and 2 also throw out 3 and 4.
My assumption is that if it is possible, then it is inevitable. The best we can do is try to channel it to 3 and 4.
Yudkowsky agrees that it's inevitable and that we should try to channel it that way. He merely also believes that the way to channel it that way is "kill the neural-net field dead and spend the next decades to centuries working on GOFAI".
You're all much smarter and more knowledgeable about this than me... but I'd posit a big reason the public's so pessimistic about AI is social media seems to have made everyone's life worse, and AI is the next big computer thing.
> the problem with the overpopulation response is that it was violent and illiberal, not that we tried to prepare for an apparent danger
Nitpick: the problem with the overpopulation response is not that it was violent and illiberal, it's that the overpopulation doomsday scenario didn't come to pass, meaning the action taken to combat it (liberal or otherwise) was ultimately unjustified.
There is nothing in the laws of physics that guarantees every hard problem can be solved with a non-violent totally-liberal response.
(Merely a strong prior, which is less applicable when talking about out-of-distribution events, like doomsdays).
Observing your local dramas from great distance, reaching us light years post your facts, maybe impacting us different than yourselves at source. Your boy Yudkowsky strikes me a tragic figure, a Salieri to Mozart's GPT-5-thinking-high and similar AI-s in their rational pursuits. Can recognise greatness, without being able to create it himself. Would have made a spirited critic, curating AI-s tastes in alternate timeline.
I disagree that AI is that big a threat to humanity. Out existential threats are 1-thermo-nuclear war, 2-meteorite hitting earth, 3-virus wiping humanity, 4-volcanic explosion causing global winter/crops failing. AI by amplifying brainy work, like mechanical machines did for physical work, should help decrease risks 1-4. Even if slightly increasing 5-civilisational collapse due to the speed of change. As long as society is mostly free, Culture should be able to change fast enough to keep up with the technology change. AI is net positive in reducing overall risk to human extinction.
Doomers cult that has developed is a danger to us all. In the beauty contest that public debates are, a charismatic leader with a band of devotees can do tremendous damage. If they manage to capture the mind commanding heights of society. Some "solutions" I've read mooted ("bomb the data centres", "detain AI scientists"; with worse to come on an escalatory ladder) strike me authoritarian to dictatorial. Societies less-free have tough time adapting to change. When change is delayed for long, crisis where things don't work but can't be changed either (change is forbidden) build up. Only to be released together, in great magnitude, in short period of time, once the dam bursts. Societies, cultures that may have otherwise adapted, if allowed so in their own time, now break down, overwhelmed by the speed and magnitude of change.
Then risk#5 becomes self-inflicted, self-fulfilling: by dooming being successful, they get to play the role of the paper-clip maximiser. Won't be the first time people to bring onto their own heads that what they fear the most. Looks far fetched, but: dooming is stronger by default, it aligns with our latent human fears of things new.
Thermonuclear war would not destroy humanity. Nuclear winter is literally a hoax by anti-nuclear activists, and fallout is not strong enough to actually do an "On the Beach". It would be very bad, but it's not existential; indeed, I'd "only" expect a billion or so dead.
Natural X-risks exist, but there's only one event that's happened on Earth in the last 500 million years that might actually kill all humans were it repeated (the Siberian Traps supereruption which caused the Permian-Triassic extinction). Another Yellowstone *definitely* wouldn't do it, since hominids/humans already survived it three times. Another dino-killer asteroid would kill most humans, but not humanity entire; the doomsday preppers would pull through. 1/500 million years is pretty long odds; I like our chances for the next few centuries against that risk.
Bio-risks are real. (The biggest problems are with engineered pathogens and other engineered lifeforms.) Most of the Ratsphere is very interested in reducing bio-risks; definite #2 priority after AI.
Ug, I guess we're doing *this* again. As a reminder, The Economist's *front cover headline* on Jan 31st 2020 asked "The next global pandemic?" It's very tiring to see partisans repeat the same set of cherry-picked headlines to make it seem like "the media" ignored COVID early on.
Come on, we were all there. I'm sure I can find you a cover story taking AI risk seriously, does that mean that a hundred years from now, if AI risk turns out to be the big story of the 21st century, we can say that experts handled it well and were appropriately concerned from the start?
It would have been weird if no one had suggested it could, in fact, be the next global pandemic. Scott is absolutely correct that it was still vastly downplayed. In fact I also remember a time when TV told us to *not* use masks because they were scarce and also they would induce a false sense of security or something, before making a complete 180 that understandably left people confused.
> all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why
Surely this is because the hypothesis is that future lethal AIs, indifferent to humans' plight, will use our atoms for some practical purpose of their own, in the same way as we killed animals to use their fur for coats etc. (not out of malice, just because their material is useful for our purposes).
I really appreciate Yudkowsky writing this. A few years ago I was sending him borderline messages on Facebook begging him to present or write something for NORMAL people to be able to understand, rather than wasting time talking to other distinguished nerds in a respectable fashion, which certainly will accomplish nothing when it comes to public policy.
If he wrote a dramatic "here's how it could happen" story, that a good thing. It lessens the likelihood the reader will put the book down and start scrolling Tik Tok. Hardly anyone even has the attention span to read a whole book anymore, they need a reason to stay engaged, and presumably this book is NOT for people like you and your readers who are already well versed in these matters and likely to get caught up on pendantic little technical issues that 90% of people don't even understand. Who cares anyway? It's a hypothetical. It could happen a million different ways. The point is just to give one of the many, many, many people out there who all say "well we can just unplug it" a salient example of why we might not be able to.
I disagree that it would make things inconvenient to stop development. You noted that 60% of people have never used an LLM. Among the 40% who have, surely the majority of them are like me, and have used it maybe 2 or 3 dozen times total over the past three years, and entirely for trivial reasons like making a funny photo or doing a quick typo check. So right there, you probably have about 80-90% of the population for whom stopping AI development or even banning LLMs that exist today would make literally zero material impact on their life whatsoever. You might have another 5-10% that it's "inconvenient" for or that it upsets them, much like it very much upset smokers when smoking was banned indoors. But I don't think it creates a major impact on anyone whatsoever but the people who have invested billions and are expecting a return, or the small numbers of people who actually work in the industry.
Isn't that the real issue? Too much time, and especially too much $$$, already invested, and those people are simply NOT going to stop voluntarily unless they have no choice?
I also don't understand how this risk equation even remotely pencils out. Let's say the doom risk is 15%. Whats the probably of anything actually majorly useful coming out of it? Also 15%?? So far it's mostly just "useful" in the realm of the trivial, and is better described as a form of entertainment. Cancer hasn't been cured yet, nor has aging, nor is anyone close. So weighing a 15% risk of extinction against a 15% chance of curing cancer...no one would take that gamble, unless they maybe already had terminal cancer and expected to die anyway.
You make an important point. As an ordinary non-technical human, I am very aware of the risk of AI and posted about it today in a way that other non-tech folks may appreciate: https://substack.com/@samb123567942
I generally agree on the balance of risks, but it's also true that risks and benefits are coupled. AI that only makes funny pics is never going to destroy us. AI that can destroy us can obviously also do some quite useful things. The question is one of asymmetry - it's always easier to destroy than create, and that's even without accounting for the difficulty in alignment in the first place.
Yes although also I think there is danger in AI that makes funny pics too. Because they're realistic and soon indistinguishable from real ones, and videos and voice fakes too, and that is fairly dangerous on a social/political level. We are already all hating each other just over mostly real stuff...not sure how society continues to function in a digital world when no one can distinguish between real and faked.
There is absolutely some danger like in any tech, but it's definitely not at the level of it being an active agent in our destruction rather than just a tool for humans to misuse. It's a lot like any other technology in that sense.
I find it ironic that Yudkowsky himself is the clearest refutation of “intelligence only gets you so far” you could ask for. He has essentially done nothing other than type words into a computer and post them on the internet for 25 years now, and has had a massive effect on the world.
If not for him, you could claim that having this kind of effect always requires political acumen and powerful allies and a winning smile and a good haircut and all that. But apparently not!
He's not the only one who has ever posited existential threat from AI, but I think if it wasn't for him it's quite likely that no serious/mainstream AI researcher would take the possibility seriously. Moving such an important overton window that far is a big deal.
Neither this post not the book mentions Yoshua Bengio's "Scientist AI," which is based on GFlowNets using Bayesian reasoning rather than RLHF. The upshot is that as the AI grows, it becomes less rather than more certain. This solves the "off-switch" problem at its root.
>(the problem with the overpopulation response is that it was violent and illiberal, not that we tried to prepare for an apparent danger),
One thing to consider is that the average reader doesn't have much control over what the response looks like, as a disconnected individual you can mostly either amplify the concern or minimize it.
AFAIK no governments put 'should we sterilize millions of undesirables to stop population growth' up to a popular vote. You can strongly demand action in response to some problem, but you are somewhat rolling the dice on what that action looks like.
What next? UK appoints AI chancellor of the exchequer (finance minister) to avoid chronic mismanagement of the economy by Labour and economic illiteracy of present chancellor?
Almost 4B years and counting, and no single feedback loop from biology has played out until doom without interference from something stopping the loop.
I think people kind of unconsciously grasp that, and don't think AI is robust enough (materially, in its networks, in its static, unadapting hardware substrates, etc) to really be a threat in the sense of a runaway feedback loop that destroys us without a human intervention.
(as an example, the "highly intelligent persuasive speech" function of a human-extinction motivated AI model could conceivably be stopped by human irrationality and the diversity of human culture and thought that is not persuaded by one universally appealing form of trickery. Persuasion NECESSARILY involves tradeoffs as it is adapted to specific cultural contexts and individual preferences. If you persuade one person you necessarily dissuade another)
> Almost 4B years and counting, and no single feedback loop from biology has played out until doom without interference from something stopping the loop.
Yeah, last time a bunch of nanomachines replicating at an exponential rate spewed a bunch of highly poisonous gas into the atmosphere, mysterious evolution fairies came in and stopped it before too much biomass died off.
> Almost 4B years and counting, and no single feedback loop from biology has played out until doom without interference from something stopping the loop.
Doom for who? Great Oxidation Event is an obvious counter-example. The advent of humans is another. We literally went from a few thousand apes in Africa to 8 billions, we drove to extinction most of the world's megafauna, enslaved other species in our torture meat-factories, are altering the planet's climate, and have the power to kill virtually everything if so we chose via sufficient application of nuclear weaponry. I'd say we were pretty doom-like for everything existing before us.
>and have the power to kill virtually everything if so we chose via sufficient application of nuclear weaponry
This is true, but I want to note for the record that "sufficient application" involves building orders of magnitude more tonnage of nuclear weaponry than has ever existed, not merely using what we already have.
We still could do it, though, if we as a species decided to. There's definitely enough uranium and thorium available.
Well, we had speculation for things like Project Sundial, and there's salted cobalt bombs (I don't know how many exist in the various arsenals, and I dearly hope the answer is "none", but they are a known possibility). One can quibble about the fact that of course several things would still survive those (bacteria, tardigrades, probably a lot of fish and marine organisms) but at that point it's almost academic. It would still be at least as bad as Chixculub in terms of total number of dead species, possibly worse.
Re: Sundial: I did say "tonnage", not "number" (note that once you get past a couple of hundred kilotons, nuclear weapons' power is basically just proportional to mass of nuclear material included, so this is "tonnage" in both senses).
Cobalt is... somewhat overhyped. In many ways it's not really any better than normal fission products as fallout - in fact, it's far worse for creating radiation sickness. It's pretty good at causing cancer, but that's not going to kill off the species; a woman who gets cancer at 50 is not going to have any less kids, and while men can reproduce at older ages sperm is hardly scarce.
My understanding was that salted bombs could produce fallout radioactive and long-lived enough to downright sterilise entire regions. Is it actually unlikely to achieve the sufficient amounts to produce anything but a significant worsening in public health?
But Eliezer isn't a major figure in the religious parables or it kinda feels that way movement he is a major figure in the rationalist movement. We bother to think through things rigorously because we know human reasoning is very easily mislead.
And the thing about Eliezer's arguments on AI risk is that it's always parables all the way down. Bostrom tried hard to make some parts of it more rigorous but that always made the argument weaker. And parables are fine for the purpose of outreach or maybe a quick guess but it is striking to me that despite years of pushing this idea Yudkowsky hasn't managed to put the argument into a really strong rigorous form with clear definitions and get it accepted by a good philosophy journal.
At some point if someone just keeps giving handwaves about intelligence and other poorly defined terms -- the kind that should set off warning bells in anyone who works in areas like analytic philosophy because they are so easy to mislead with -- I start taking it as evidence there is no compelling argument.
I'm preordering the book. Look forward to reading it.
But honestly, it seems like there is a massive category error occurring here. I'd like to submit that maybe intelligence is not equal to sentience?
Human intelligence is on a broad spectrum, from the mentally handicapped to geniuses. We have IQ tests to measure intelligence. The big LLMs we have now seem to fall squarely somewhere in that broad spectrum and they are passing IQ tests. So, from a behaviorist standpoint, they ARE equivalently intelligent. Simple enough.
But are they sentient? Do they have agency? Doesn't seem to me that they are. We don't have good measurements for these. But crows and dogs seem pretty intelligent, and they definitely have agency. The LLMs are much smarter than crows and dogs, but seem to have no agency.
A lot of this discussion around AGI seems to presuppose sentience/agency (as distinct from intelligence) but I think that is a mistake. I think everyone was surprised when the LLMs proved to be so creative. Creativity turned out to be an emergent property of intelligence. And I think everyone then jumped to the expectation that sentience/agency would similarly be "emergent" with increased intelligence. That doesn't seem to be happening. I posit that if it were going to happen, we'd see evidence of it already.
Could we develop AGI? Sure. But I think we don't yet really know how. We don't understand what agency or sentience consist of besides intelligence. Maybe it's simple to program. Maybe it's not. But, either way, we don't get it for free by simply making training sets and datasets bigger.
So what does it mean to have intelligence but not agency? I guess it's intelligence-on-demand, intelligence in a tool. This is very weird, and counterintuitive. But it is what it is. It's still a very powerful and potentially dangerous tool. I look forward to reading the book.
> A lot of this discussion around AGI seems to presuppose sentience
This is strange to me, given it is often explicitly stated that sentience isn't required at all, in fact, intelligence is often defined as some kind of optimization process to avoid mixing different concepts (I am not giving one definition here, just a reference toward the kind of definitions I mean).
> /agency
Sentience and agency are two completely different things (in the same way intelligence and sentience are).
But our actual models actually have already some level of agency, and still more with good scaffolding.
Given that most people who know about AI dislike it and expect it to personally damage their life in some way, the argument doesn't have to be that persuasive. Unlike the energy austerity advocated by global warming alarmists, an AI ban isn't asking you to make any sacrifices at all. If you expect AI to be bad for you personally (as I and apparently a large majority of the country agree) this is 100% upside. I lose badly in every single plausible AGI/ASI scenario, and so do most people like me. The cataclysm is bad, the post-scarcity society is a horrible communist surveillance state, and even the unlikely scenario where it hits some wall in 5 years is probably enough to disrupt the knowledge-worker economy to my ruin.
The real hurdle is to get people who want to "just stop this" to feel that they have the power to do so and the moral right to do so. People in America have a cultural aversion to the government telling people they can't build something. It seems a little un-American, for those who care about consistency and our traditional values, to tell somebody "you can't make this computer program, even though people want to buy it and it isn't currently doing anything malicious". Even if the true reason most people hate or fear AI is that it will disrupt and damage their own personal fortunes, or make life slightly worse in a lot of small annoying ways, that is not considered an acceptable reason in the USA to argue for banning an industry. You're supposed to just suck it up and gracefully accept your fate as this generation's losers when something transformative comes along. Whereas a new tech being akin to a weapon of mass destruction *would* be seen as a valid reason to pump the brakes. So the persuasive task here is to make ordinary Americans who already are inclined to dislike AI, but who don't want to sound like sore losers or commies, able to say they think it should be banned in a language that's acceptable to their peers.
> the post-scarcity society is a horrible communist surveillance state
To be clear, I do absolutely expect that powerful AGI would empower the surveillance state quite a bit, but I don't get the "communist" bit. To me the main bad thing of communism is that in practice, because it fails to achieve its production goals and justify its legitimacy without proper incentives, as well as generally ruining stuff by trying to micromanage things without having the full computational power to do it right, it ends up becoming more and more authoritarian to preserve its existence. None of those things would be an issue in a post-scarcity society in which production was handled by machines, per se. It's hard to even describe it as "communist" at all.
There would be a state that was largely indistinguishable from the system of production, whether it seizes the means of production or simply directs them via AGI it would still be a managed economy. That the 5-year plans actually "work" wouldn't make it any less so. I grant that if humans selling their labor is not a central feature of the economy, because their labor is worthless post-automation, it makes a bit less sense to call it communist. Communism was not intended as an anti-work idea. But I'm not sure what other term describes a totalitarian-egalitarian system. If everyone's on UBI, in a world where there are near-zero opportunities for humans to do valuable labor for other humans to get ahead, it's going to make everyone more or less materially equal to each other. People experience their wealth relative to those around them, not in absolute terms, so flattening the world into a bunch of people on welfare is going to feel terrible people who were successful in the world where human minds mattered. The equality is, in my opinion, the worst part of such a society, you have no way to differentiate yourself from others in skill or ability in any way that's meaningful.
I take you to be suggesting it would be less authoritarian because its existence is less precarious than previous such regimes? I'm not so sure of that, suppose it depends how you arrive at this point and whether anything below the state has any meaningful agency, whether inferior AIs can still do substantial damage in the hands of rebels, that sort of thing. It needlessly complicates everything to imagine the exact details, it's enough to say that the post-scarcity outcomes are all imagining a world where the state or its functional replacement is managing society and divvying up the resulting abundance. If it's also collecting data on everyone and everything constantly for safety/security reasons, this is functionally an egalitarian-totalitarian state, it may not force you to eat soybeans or tell you what color shirt to wear, but there is no life outside the state in that world.
My point is that I find it absurd to deem "everyone gets what they need to live" or "production is optimized to maximally fit everyone's necessities" to be bad in and of itself.
Free markets are supposed to accomplish the best possible approximation to that! Communism says "no, free markets end up being bad, we can do better if we just sit at a table and plan it out". And turns out, no, you actually can't. But in the hypothesis in which a superintelligent AI exists that can assess, predict and satisfy everyone's needs better and faster than the collective intelligence of the market, then obviously it would be an even better alternative.
The feeling of purposelessness and whether it can be avoided simply by people diving into what they enjoy, or if knowing that AIs exist that can do anything better anyway ruins that, is its own problem, but it's unique to this special case. Communism didn't inherently have that problem. Purpose in communism can be more shared, but there's certainly a lot of it.
> The equality is, in my opinion, the worst part of such a society, you have no way to differentiate yourself from others in skill or ability in any way that's meaningful.
This only makes sense if for you the only meaningful metric of difference is difference in productive capacity. I can definitely realise I'm very different from someone else just based on the fact that I like different things, have different skills etc. Some of those skills are very economically useless, people can be very proud of those too.
> I take you to be suggesting it would be less authoritarian because its existence is less precarious than previous such regimes?
I definitely think it would help, though again - I don't *trust* at all the idea of a post-scarcity AGI society, but more for other reasons that involve the trajectory to get there, alignment etc. If you told me you can plop me in a society in which magically we already have ASIs that are both fully independent and fully aligned (so not just the boot of a few powerful humans stomping on our faces), then I'd say that doesn't need to be authoritarian at all in principle.
> If it's also collecting data on everyone and everything constantly for safety/security reasons, this is functionally an egalitarian-totalitarian state, it may not force you to eat soybeans or tell you what color shirt to wear, but there is no life outside the state in that world.
There is practically no life outside the state now; again the question is what happens with that info. An aligned ASI that genuinely only uses it internally according to some very strict libertarian principles and then deletes it would in fact more free that our current nosy nanny states.
Again, it's just an argument of principle. I assign to such states a probability of approximately 0, given our starting point and trajectory. I just push against the notion that somehow "everyone has everything they need to live = communism = bad". The point of freedom isn't random suffering from deprivation and the bad of communism isn't that it theoretically wanted to alleviate that. It's all the ways in which it then fails and refuses to accept its failure given the actual constraints of the real world and of real human behaviour.
> Some people say “The real danger isn’t superintelligent AI, it’s X!” even though the danger could easily be both superintelligent AI and X. X could be anything from near-term AI, to humans misusing AI, to tech oligarchs getting rich and powerful off AI, to totally unrelated things like climate change or racism.
Personally, I fear, to paraphrase Asimov, not AI, but lack of AI. I guess that makes me closer to accelerationists (and indeed, HPMOR to me was a call to acceleration, more than anything!). I think humanity is reaching a crisis point where some kind of singularity has to happen. It might not necessarily be an AI singularity - just some thing/event that Changes Everything. But a global nuclear war, for example, is also such thing/event, and nuclear warheads, unlike AI, have exactly 0% chance of alignment.
I don't mean that the only choice is between a global war and AI, but rather that the accumulating number of unsolved - and presently unsolvable - problems will reach a critical threshold, after which humanity will either find a way to solve them - possibly with the help of superhuman AI, or some other new technology - or face the consequences, which might wipe it outright (or, at best, just bring on some kind of dark ages).
For me, this means that banning ANY kind of research right now is the worst thing possible - indeed, formal and infromal bans on stem cells research, cloning and genetic enchantments are already making me MORE pessimistic about the future.
The key point is that nobody knows how to make a superhuman AI that doesn't try to kill you, which means that the "superhuman AI solves all your problems" outcome is basically a fake option you can't actually access (at least, not remotely soon).
> nobody knows how to make a superhuman AI that doesn't try to kill you
And nobody ever will know this, because it's impossible to guarantee in the long run. AI, once created, will learn and grow and even if it starts super-aligned with humanity, it might decide to destroy it at a later point. I don't think it's a good reason to never build superhuman AI. The more we do science, the more dangerous technologies we discover. Should we, for example, never strive for space, because space-based weapons can wipe all life on Earth with relative ease? Should we avoid discovering new, more powerful energy sources, because any energy source is a potential weapon?
And it's not like we tried building some superhuman AIs, and each one tried to destroy us, so I don't see how ""superhuman AI solves all your problems" outcome is basically a fake option" follows from the above statement. We don't know how to build superhuman (or even human-level) AI, period. Maybe it will try to destroy us, maybe it will solve all of our problems, maybe it will do both at the same time, or neither. We can't even begin to know which outcome is more likely, until we begin to approach the goal. We're not nearly there yet, and "solves our problems" option is very much on the table. Maybe it will even prove impossible to build superhuman AI, but sub-human AIs might become a great enough boon for society. That would require - at the very least - a lot of optimization work, to make them less energy and computation greedy.
>I don't think it's a good reason to never build superhuman AI.
I'd rather not be killed by AI, thanks.
>Should we, for example, never strive for space, because space-based weapons can wipe all life on Earth with relative ease?
By the time that becomes easily feasible, we'd have enough of a space presence that humanity wouldn't be destroyed.
>Should we avoid discovering new, more powerful energy sources, because any energy source is a potential weapon?
I'd certainly require that particle physics experiments that exceed the power of natural cosmic ray impacts be done in space, so as to avoid the "Earth destroyed by black hole" end-of-the-world scenario. Situations in which someone could destroy the Solar System (e.g. by sending the Sun nova) while we're all still in it would also be best avoided.
>Maybe it will try to destroy us, maybe it will solve all of our problems, maybe it will do both at the same time, or neither. We can't even begin to know which outcome is more likely, until we begin to approach the goal. We're not nearly there yet, and "solves our problems" option is very much on the table.
The neural-net alignment problem is basically a case of the halting problem, and not a very special case (unless you're smarter than the neural net, which in the case of superhuman AI you're definitionally not). The halting problem is proven to be unsolvable in the general case. Any scenario based on having an aligned superhuman neural net is, as far as I can see, closed off by that result (well, unless you build an aligned superhuman AI of some other sort first that's smarter than your neural net, but see below).
GOFAI is at least 30 years and probably over 50 from delivering superhuman AI. Nobody thinks this can be done in the near future. There is a reason the money all jumped ship to neural nets despite them being mad science. Uploads are in about the same place, or even worse.
Add all that together and P(aligned superhuman AI in next 30 years) ~= 0.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
But this is just an example for a "new technology deblocking some limits from our current LLMs", just like the CoT where !
Which is a big part of what Yudkowsky is worried about, not only current scaled technologies, but improvements we can discover at any moment (when a lot of smart persons are trying to find them).
Also, it is very probably not an example they think as a good one at all ! Yudkowsky is correctly worried about information hazard, and if he thought it was a credible way to move forward toward AGI, he would absolutely not describe it in a book.
I think it is unfair to consider it a deus ex machina, when it is just an illustration of a general idea.
I helped a little with the draft text of IABIED, and imo, the "implausible sci-fi scenario" criticism you're levying is good literary criticism, but bad psychological impact analysis
The aforementioned implausible sci-fi scenario is like a Christmas tree, a structure to be used as an excuse on which to hang any and all of the catchiest and shiniest things you can find. Think throwing spaghetti at the wall to see what sticks
What are people going to remember, in the days after they finish this book? Or in the months that come?
Over and over, they shall be reminded of this plot point-rich story, as life pings its points and twists
And we don't know *which* of these little insight porn-esque possibilities is going to stick in the mind of any given reader, but we don't have to. Among so many, 1 is bound to
Lastly, as the whole book's main thing is being as accessible and concise as possible, it's good feng shui to have the implausible sci-fi scenario. An efficient burst of color and variety balances out the clean sweeps of the rest of the book
I’m someone for whom the enormity of this issue has only recently become apparent. I’m not sure about AI killing humanity, probably because I haven’t seen it happen, but nominally because I’ve never seen anything close to a strong reply to doomer arguments. That might sound weird, but given that people smarter than me who have a vested interest in the world surviving are racing ahead with AI progress, it seems like they consider these issues so trivial as to be worth ignoring. It would be nice to see each of the major AI companies (or better the leaders or major stakeholders) reply to points raised in a serious way. Probably won’t happen.
That’s one thing. And the other is this - as someone who is unsure of certain doom, there’s a serious opportunity cost to stopping AI research. If doom isn’t real and AI research is stopped, many people alive today will be dead because the AI wasn’t around in time to save them. From disease, aging, etc. Personally have witnessed parents of children with rare disease excited about AI for changing otherwise gloomy prospects by enabling medical advances.
Am I really qualified to be politically vocal about this? Why should I trust Eliezer and not Sam, Dario, and Demis?
Are Yudkowsky & Soares intending to give a deductively sound argument?
In classical first-order logic (FOL), their title-claim "If anyone builds it, then everyone dies" is false, bc there are counterfactuals in which someone builds unaligned ASI & not everyone dies. In those counterfactuals, Y&S's antecedent is true (someone builds it) & their consequent is false (not everyone dies.)
So, if they are intending to give a deductively sound argument, then they've failed bc their argument has a false premise. Their claim "if anyone builds it, then everyone dies" is false.
Are Y&S intending to give an inductively strong argument?
Inductive arguments are logically invalid. The truth of inductive premises does not guarantee the truth of the conclusion. So, their inductive argument would be invalid. They provide no statistical frequencies. So, their inductive argument seems weak.
They've given either an unsound deductive argument or a weak inductive argument.
“If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?”
I think the probability of that happening before cataclysm is upon us is very close to zero. First, as is painfully obvious, even in a democracy we don’t all agree and those who lose the vote may not be willing to accept it. Second, I’m not an expert, but it seems like no GPU monitoring scheme can be 100% successful—e.g. prohibited high-end GPUs are flowing from the US to China through black markets; between mass production and distribution, the complexity of global supply chains, the lack of an inherent tracing feature (like radioactivity), the challenges of coordinating international tracking and monitoring, and other problems, GPUs will be leaking out of the system. Third, we don’t all live in a democracy and it seems unlikely that the governments of all countries will agree to (and actually abide by) a treaty to ban further AI progress—that takes trust, and, as you rightly point out, a very high level of shared confidence that it will actually kill us, neither of which exist. And even if all governments signed on, non-governmental organizations (terrorists, anarchists, etc.) would likely (see the second point above) be able to continue the research and thereby gain power to wreak havoc. And then, "if AI is outlawed, only outlaws will have AI".
To me, the answer is to try to align at least the strongest, most dominant models with human flourishing (which I believe is already being done but must be a top priority). Adding specificity to that idea, Geoffrey Hinton recently suggested that our only way of surviving superintelligence is to build maternal instincts into the models, saying, “The right model is the only model we have of a more intelligent thing being controlled by a less intelligent thing, which is a mother being controlled by her baby”. While the essence of maternal instinct may be universal, the practice of it differs greatly between and even within cultures, making alignment with it difficult—I’m not an expert, but I believe that programmers and trainers need specific, completely agreed upon elements and attributes towards which to train alignment.
So I suggest an alternative concept—really an extension of Hinton’s idea--valued in virtually every culture: selfless, unconditional love—the Greeks called it agape but, again, there is a name for it in every culture. I’m not an expert, but I believe that it’s specific and universal enough that the critical elements could be listed (e,g, Unconditional goodwill, Respect, Patience, Helpfulness, Hopefulness, Fairness, etc.) and agreed upon (indeed, some are already in the training specs being used--https://model-spec.openai.com/2025-09-12.html) and the actions to train for alignment could be taken. And it would be targeting alignment to something that transcends and includes the highest aspiration of all people and cultures—and would keep us safe, and even flourishing, if alignment were actually successful.
>Third, we don’t all live in a democracy and it seems unlikely that the governments of all countries will agree to (and actually abide by) a treaty to ban further AI progress—that takes trust, and, as you rightly point out, a very high level of shared confidence that it will actually kill us, neither of which exist.
If a few little nations refuse to sign on, we can invade and occupy them. We need the great powers, but we don't need *every* country to agree.
“If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?”
No.
Because China.
The world is not “a democracy”, even if one accepted the premise that “we” are one one of those things and could “just stop”.
>If all of this sounds wishy-washy to you, I agree - it’s part of why I’m a boring moderate with a sub-25% p(doom) and good relations with AI companies.
I have to say I still don't understand this. The inside view's a slam-dunk.
1. As Yudkowsky says, neural nets are grown, not crafted. You can't program into them "don't rebel against humans"; you have to, at best, train against examples of "rebel against humans".
2. You can't use AIs that rebelled as negative examples to train against, because that kills you.
3. You can't read the AI's mass of matrices, when it's smarter than you, and tell whether it will try to kill you if run in order to train against *that*. "What will this code do when run?" is the halting problem, which is unsolvable in the general case. "The code is dumber than you" and "you wrote the code" are special cases that are somewhat tractable; "a giant mass of inscrutable matrices that nobody designed" is not. This is leaving aside the fact that you *can't even fully specify* the outputs which are "trying to kill you".
4. So, uh, what is the story for how this is even *supposed* to work?
Yes, we have. https://charliesc.substack.com/p/a-conversation-with-claude-is-ai
Banned for assertions without arguments, and name-calling.
We have plenty of examples of AIs demonstrating deception. Convenient example:
https://thezvi.wordpress.com/2025/04/23/o3-is-a-lying-liar/
Just like humans always are.
https://us.amazon.com/That-Dont-Understand-Just-T-Shirt/dp/B08LM9PTB6
https://www.youtube.com/watch?v=FOzfkErkWDM
We have seen many cases where the AI absolutely knows what it is "supposed" to do and still chooses to do something else.
Yes, it would know what its creators want it to do.
However, it is extremely unlikely to *care* what its creators want it to do.
You know that evolution designed you for one purpose and one purpose only: to maximise your number of surviving children. Do you design your life around maximising your number of surviving children? Unless you're a Quiverfull woman or one of those men who donates massively to sperm banks - both of which are quite rare! - then the answer is "no".
You don't do this because there's a difference between *knowing* what your creator wants you to do and *actually wanting to do that thing*.
(Yudkowsky does, in fact, use this exact example.)
Hopefully that makes some more sense of it for you. Reply if not.
Banned for this comment - combines personal attacks, with insistence that someone is terrible but not deigning to explain why.
1. Please read https://www.astralcodexten.com/p/in-continued-defense-of-non-frequentist .
2. Please read https://archive.is/https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html and think for five seconds about what went on here and what is implied.
3. You've just ruled out all arms control treaties. But in fact, there are many treaties on nuclear weapons, chemical weapons, biological weapons, depleted uranium shells, et cetera.
4. "The AI race" is a meme that a couple of venture capitalists are pushing in order to make people afraid to slow down AI. China is about a year behind the US in AI, refusing to even import the chips that could help it catch up, and clearly doing a fast-follow strategy where they plan to replicate US advances after they happen, then gain an advantage by importing AI into the rest of the economy faster.
1. It's a link. The post it links to is an answer.
2. When the AI destroys humanity, any remaining humans in their bunkers will think of it as just "odd behavior" and "not initiative in the least". See https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
3. Yes, and there was massive international condemnation, Assad never did it again, and he was eventually overthrown. This is why I mention the standard arms control playbook. Some tinpot dictator will try to get some GPUs, and we will have the option to bomb him or not bomb him. Re: MAD, see START and other arms control treaties.
4. You think Zuck is spending billions out of patriotism because he doesn't want China to wIn tHe AI rAcE? He's spending billions because he thinks AI will make him rich.
I think you are misinterpreting "meme" as being like a joke, while Scott is using it in more of it's original usage as a replicating idea
Per those treaties, North Korea shouldn't have nuclear weapons. But they do.
Yup! And nuclear weapons are the _easy_ case. A nuclear weapons test shakes the planet enough to be detectable on the other side of the world. A large chunk of AI progress is algorithmic enhancements. Watch over the shoulder of every programmer who might possibly be enhancing AI? I doubt it!
You don't need to watch over the shoulder of every programmer who might be doing that, just to stop him from disseminating that knowledge or getting his enhanced AI run on a GPU cluster. Both of the latter are much harder to hide, particularly if all unbombed GPU clusters are under heavy monitoring.
Many Thanks!
For the _part_ of AI progress that depends on modifying what is done during the extremely compute-heavy pre-training phase, yeah, that might be monitorable. ( It also might not - _Currently_ big data clusters are not being hidden, because there is no treaty regulating them. But we've built large underground hidden facilities. I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones. )
But also remember that algorithmic enhancements can be computationally _cheap_ . The reasoning models introduced early this year were mostly done by fine tuning models that had already completed their massive pre-training. Re
>just to stop him from disseminating that knowledge
To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies. Today, a lot gets published on arXiv - but, even now, not all, some gets held as trade secrets. Controlling those isn't going to work.
>I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones.
Keeping those hidden from each other's secret services would be quite difficult, even before we get into the equivalent of IAEA inspectors. And then there's the risks of getting caught; to quote Yudkowsky, "Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs."
They didn't hide nukes from the arms control treaties, and nukes are actually easier to hide in some ways than a *functioning* GPU cluster due to the whole "not needing power" thing. The factories are also fairly hard to hide.
>To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies.
So, what, his company's running a criminal conspiracy to violate international law? Those tend to leak, especially since if some of the members aren't heinous outlaws they know that blowing the whistle is one of *their* few ways of avoiding having the book thrown at them.
#3: the arms control thing is a bad analogy. The big players got plenty of NBC weapons, then due to game theory dynamics didn't fire them at each other, then signed treaties to limit themselves (to arsenals still capable of destroying the world) and others (to not getting anything, which the big players are obviously generally happy to enforce).
The request here is that all the big players voluntarily not even get started on the really impressive stuff. It's a completely obvious non-starter, and not comparable to the WMDs situation.
I was trying to compose something like this, then saw your comment so realized I didn't have to. 100% agree.
If nukes didn't exist I would absolutely want the US to try to get them as soon as possible, and I wouldn't trust any deal with another country not to research them. The risk would be too big if they were able to do it secretly.
And AI tests are vastly easier to conceal than nuclear weapons tests.
It matters where you-as-a-country believes in the risk.
If you do such an arms deal because you believe the default case is probably doom, then it being American doesn't matter.
If you believe that but enough other countries don't, then you want to invade.
I think your point #4 is overstated. Leopold Aschenbrenner has an essay that's nearly as serious and careful as AI 2027 that argues rather persuasively for the existence of, and importance of, an AI race with China. Many people who are not "strict doomers" see the AI race with China as one of, if not _the_, core challenge of AI in the next decade or two.
Aschenbrenner's essay is older, and some of the evidence supporting Scott's position has come out since Aschenbrenner published it.
Not taking a side here, just noting that piece of perspective.
1. This is just an argument for radical skepticism. Yes, we cannot say the “real” probability of this happening, same as any other complicated future event, but that isn’t an object level argument one way or the other, judgement under uncertainty is rarely easy, but often necessary
2. This has been false for many years now— LLMs aren’t really
Programmed” in the traditional sense, in that while we cannot say explain the code that was used to train the, we cannot point at a specific neuron in them and say “this one is doing such and such) the way we can with traditional code
3. Potentially, for the same reasons the soviets agreed to new start, and withdrew their nukes from Cuba. If Xi was convinced that AI posed this sort of threat, it would be in his rational self interest to agree to a treaty of mutual disarmament
4. Even if this were true, that does not exclude the possibility of it being a desirable goal. I am sympathetic to arguments that we cannot unilaterally disarm, for the same reasons we shouldn’t just chuck our nukes into the sea. But the question of whether this is a desirable and a possible goal are separate. But also, it is totally possible, see point 3 above.
>2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
Stockfish is "only" following its programming to play chess, but it can still beat Magnus Carlsen. "Free will" is a red herring, all that matters is if the software is good at what it's programmed to do.
"Stockfish has no free will, but is smart enough to beat Magnus Carlsen" is a statement of fact, not opinion, and an example of why I think "free will" would not be essential for an AI to outsmart humans.
AI could easily get dumber not smarter.
Please justify this.
I don't think that can be justified, but if I met Yudkowsky at a "Prove me wrong" booth, I'd argue that intelligence is not all it's cracked up to be. If it were, the smartest people would already be running things. There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
So in a sense, an AI that just keeps getting dumber might actually have an advantage when it comes to dealing with us.
This objection has been rehashed many times; the usual responses are stuff like "160–200 IQ isn't the level of intelligence mismatch we're talking about", "intelligence is just the general ability to figure out how to do stuff so of course more of it is better / more dangerous", "smart people *do* do better in life on average", etc. etc.
(Maybe someone else will have a link to where Scott or Eliezer have discussed it in more depth—I don't want to spend too much time trying to re-write it all, hence my just sort of gesturing at the debate here.)
>We control them because intelligence led to tool-using<
I think that's part of—perhaps most of—the rationale behind "the dangers of superintelligence". An enraged dog, or a regular chimp, is certainly much more dangerous than a human, in the "locked in a room with" sense—but who, ultimately, holds the whip, and how did that state of affairs come about?
I'd counter that by saying that there is no difference between an IQ of 200 and one of 300, or whatever. Neither of them will be able to get anything done, at least not based on intelligence alone. HAL will give us a recipe for a trojan-bearing vaccine, and RFK will call it fake news and order the CDC to ban it.
The traditional answer to this objection is that the ability to succeed in persuasion-oriented domains like politics *is a form of intelligence*. You might be able to outperform a human who's a couple standard deviations generally smarter than you at those games, if you're highly specialized to win at them and the other human isn't. But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better. See, e.g., https://www.yudkowsky.net/singularity/power (note this essay is 18 years old).
> But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better.
This implicitly assumes that success at politics requires only (or primarily) raw computing power; and that the contributions from computing power scale linearly (at least) with no limits. There are no reasons to believe either assumption is true.
I think my other least favourite thing about the MIRI types is their tendency to respond to every point with "Actually we already had this argument and you lost".
I would agree that persuasion is a form of intelligence, and point out that the missing argument is how AIs are going to get arbitrarily good at this particular form of intelligence. There's a lack of training data, and the rate at which you can generate more is limited by the rate at which you can try manipulating people.
If it ever gets to the point where AIs can run accurate simulations of people to try tricking them in all sorts of different ways, then I can see how they'd get arbitrarily good at tricking people. But that sort of computational power is a long way off.
The question remains. If the ability to persuade people is a function of IQ, then why has there been no Lex Luthor that talked his way past security into a G7 summit and convinced the world leaders present to swear fealty to him? Or, if that's too grandiose, why has nobody walked up to, say, Idi Amin and explained to him that thou shalt not kill? No sufficiently moral genius anywhere, ever, feeling inclined to stop atrocities through the persuasive power of their mind?
How smart would you need to be to throw a pebble so it makes any resulting avalanche flow up the mountain instead of down? Politics has to work with the resources that exist.
So how comes most powerful people are dumb as rocks, like the last two US presidents?
You're not wrong about RFK, but the Trump administration has actually been much more bullish on AI than the Davos crowd. The EU's AI Act is actually mostly good on the topic of AI safety, for example, though it doesn't go as far as Yudkowsky et al think it should. (Which I agree with. Even developing LLMs and gen-AI was amazingly irresponsible, IMO.)
I honestly don't know what counter-argument Scott has against the TFR doomday argument, though, unless he's willing to bet the farm that transhuman technologies will rescue us in the nick of time like the Green Revolution did for overpopulation concerns. (The sperm-count thing is also pretty concerning, now he mentions it.)
But it is not a mismatch: intelligence is ANTIcorrelated with power, look at the last two US presidents.
Gerontocracy's problematic, sure, but in their prime, they were smart.
There are many assumptions based into that, such as automatically assuming that the more intelligent always want to be in charge. Maybe the highly intelligent find it amusing that dumb people are in charge.
One good rebuttal to my original point might be to suggest that perhaps the most intelligent people *are* in charge. They find it convenient to keep the rest of us distracted, and obviously the same would be true of a malevolent AGI.
That one is more or less unanswerable, so it would probably defeat me at the booth. I'd have to mumble something about inevitable schisms erupting among this hypothetical hidden intelligentsia that would make their agenda obvious, ineffective, or both. Would the same be true of AGI? The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
> That one is more or less unanswerable
The correct answer is to laugh uproariously.
That's just what the hidden intelligentsia would want!
It is absolutely true that some politicians pretend to be scatterbrained in order to get votes, such as Boris Johnson.
That's why Zaphod Beeblebrox is president of the galaxy.
> The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
That's my core objection to a lot of the doomer arguments. Might be one of those typical-mind-fallacy things - Big Yud has an exceptionally strong sense of integrity, impulse to systematize things, so he assumes any sufficiently advanced mind would be similarly coherent.
Most minds aren't aligned to anything in particular, just like most scrap iron isn't as magnetized as it theoretically could be. ChaosGPT did a "the boss is watching, look busy" version of supervillainy, and pumping more compute into spinning that sort of performative hamster wheel even faster won't make it start thinking about how to get actual results.
> There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
AIs have a massive advantage over humans in that they are parallelizable. An superhuman AI could give, for every human, the most persuasive argument, *for that human*. Whereas a human politician or celebrity cannot and has to give basically the same argument to everyone.
If you're as much smarter than humans as humans are than dogs, I am not sure you have to rely on the normal political process to take power.
AI probably don't need to. But it's one way they could.
And certainly one reason that humans got to be top species is we aligned dogs to act in our interests.
Maybe the AI alignment problem will be solved the way the wolf alignment problem was? https://pontifex.substack.com/p/wolf-alignment-and-ai-alignment
Two things:
Umm, human politicians absolutely give different arguments to different people? This is why things like "Hilary Clinton gave private speeches to bankers" or "Mitt Romney told his rich buddies that 47% of Americans were takers" became scandals: messages meant for one audience crossed over to the other.
And insofar as politicians are constrained to have a uniform message, it's much more because it's hard to keep each message targeted to its desired audience what with phones and social media; not really because of parallelization.
And maybe more importantly: what ensures that different instances of an AI act as one coherent agent? The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
Politicians can't scale like AIs can. Is Hilary Clinton capable of giving a different speech to every one of 8 billion humans, tailored to that individual? Of course not.
> The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
They could all be exact copies of the same mind. This isn't true with humans, who're all individuals.
Yeah, fair point about scaling of humans.
On the other thing: I don't see why exact copies of the same mind won't act as individuals if instantiated independently.
If I run two instances of stockfish, they play competitively, they don't automatically cooperate with each other just because they're identical copies of the same program; identical twins are still independent people who behave independently. In fact, it's a notable problem that people don't even reliably cooperate with themselves at different times! I think this failure would be considerably more pronounced if two of my selves could exist simultaneously.
In particular, if two instances of an AI are instantiated in different places, they won't be identical: they might have identical source code, but wildly different inputs. Figuring out how to act as a coherent agent means two subagents seeing different inputs have to each calculate what the other will do, but this is one of those horrible recursive things that are intractable: what I'll do depends on what you'll do, which depends on what I'll do.... ad infinitum.
And I don't think intelligence helps here: you can maybe resolve something like this if you're predicting a strictly less intelligent agent, but by hypothesis these are equally intelligent subagents.
Maybe having the same source code gives some advantage at solving these coordination problems, but I don't see that it's a magic bullet.
However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that, and adjusting their belief accordingly? The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
On the other hand - I believe this technique has already been used on social media to sway election results with moderate success. So it can be done for some humans with some level of influence.
> However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that
In the future, most humans will converse with chaatbots on a regular basis. They'll know the chatbot gives them personalised advice.
> The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
Again, most humans will be using chatbots daily, will be used to receiving good accurate advice from them so will be more trusting of it.
> I believe this technique has already been used on social media to sway election results with moderate success
Social media is biased, but the main biases are in favouring/disfavouring certain political views based on the whims of the owner. Like print media, of old.
If the chatbot I'm using starts to make arguments or advice outside of the information I'm asking for, I think it is likely that I will notice. I'm guessing that humans will still talk to each other too.
Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
On the other hand, when they do, the results can be startling.
Napoleon went from being a low-ranking officer to ruling the most powerful country in Europe thanks to being brilliant, charismatic, and willing to use force at the right moment, in a year or two. (His losses, I think, were due to being surrounded by flatterers, a bug in human intelligence I don't expect AI to run into.)
Clive took over about a third of India, starting by exploiting a power vacuum and then using superior military tactics, plus his own charisma and daring, to pick only fights he could win and snowball from there. He became fantastically wealthy and honored, all the while ignoring all attempts by his superiors to issue him orders on the grounds that he was doing what they would have wanted him to do if they had known more.
Cortez's success was slightly because of superior military technology, but he was mostly using swords and spears like the Aztecs, just made of better materials. Mostly it was a matter of political genius, superior tactics and discipline on the part of his troops, and the diplomatic skills required to betray everyone and somehow still end up as everyone's friend.
And then Pizarro and Alfonso de Albuquerque are doing more of the same thing. (Alfonso conquers fewer square miles because he doesn't have the tech edge.)
Throughout human history, adventurers have accomplished great things through extraordinary wit, charm and daring. Denying that seems pointless.
I think that you're perhaps falling victim of survivorship bias. Maybe it's more like once every few hundred years, luck breaks enough in the right direction that someone who isn't a once-every-few-hundred-years-supergenius but rather a more like, "Yeah, there are 1,000+ of people at this ability at any given time," gets a series of major wins and becomes the ruler of a country or continent, at least for a very short period of time.
I agree this doesn't happen often, and I agree that normally it isn't the highest-measurable-IQ guy. But I think that's because all humans are about on a level with each other, we are all running on about the same hardware, our software was developed under similar conditions, and the process which produced us thinks a few thousand years is a blink of an eye. The reason you need to be lucky as well as good is that you aren't much smarter than your neighbors - and your neighbors are, in terms of social evolution at least as much as biological, programmed to be resistant to manipulative confidence tricksters.
I will note all of the cases I give involved culture clash. The conquerors grew up in an environment with different standard attack and defense models than the locals; they acted unpredictability because of that, forcing the locals to think instead of going on rote tradition if they wanted to win. Slightly different attack and defense models, of course; software, not hardware.
Very different looks like what happened to the British wolf.
How do you know that any of these people were especially intelligent? The may have been especially successful, but unless you argue that's the same thing more evidence is required.
Reading descriptions of what they did and said? Reading about how people who knew them were impressed by them, and in particular how clever and resourceful they were?
When I check my historical knowledge for why I believe "high intelligence" correlates with "being a good general" it's the extent to which the branch of the army that the smartest people get tracked into (engineers, artillery, whatever) ends up being the one the best generals come out of, and various descriptions of how people like Lee were some of the top students in their year, or how Napoleon was considered unusually good at math at Brienne and then did the two-year Military School course in one year.
But when I check my general knowledge for why I believe intelligence generally makes you more successful, a quick Google has the first scientific paper anyone talks about saying that IQ explains 16% of income and another says each point is worth $200 to $600 dollars a year, and then I keep running into very smart driven people who I meet in life who do one very impressive thing I wouldn't have expected and then another different very impressive thing that I wouldn't have expected, and so after a while I end up believing in a General Factor Of Good At Stuff that correlates with measured IQ.
The opportunities are rarer than the men of ability. In more stable times, Napoleon might have managed to rise and perhaps even become a famed general but he would not have become an Emperor who transfixed Europe. With that said, he was certainly a genius who seized the opportunity presented to him. Flattery aside, though, I've always viewed him as a military adventurer who never found a way to coexist with any peers on the European stage. It should not have been impossible to find a formula for lasting peace with Britain and Russia, the all-important powers on the periphery. It would have required making compromises rather than always assuming a maximal position could be upheld by force. It would also have required better modelling of his rivals and their values. Napoleon was a gambler, an excellent gambler, but if you continue to gamble over and over without a logical stopping point then you will run out of luck- and French soldiers. Call it the Moscow Paradox as opposed to the St Petersburg Paradox.
Cortez is a fascinating case. My impressions are coloured by Diaz's account, but I think it's wrong to mention Cortez without La Malinche. He stumbled on an able translator who could even speak Moctezuma's court Nahuatl more or less by accident. She was an excellent linguist who excelled in the difficult role of diplomatic translation and played an active part in the conquest. As is usual on these occasions, the support of disaffected native factions was essential to the success of the Spanish, and they saw her as an important actor in her own right. The Tlaxcalan in particular depicted her as an authority who stood alongside Cortez and even acted independently. We can't say anything for sure, but it's plausible to me that Cortez supplied the audacity and the military leadership while much of the diplomatic and political acumen may have come from La Malinche. That would make Cortez less of an outlier.
And as usual we can't credit generals without crediting the quality of their troops as well.
Spearman's Law of Diminishing Returns applies, yes, but g-factor correlations remain (weakly) positive right up to the top end of the IQ distribution.
> Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
actual ability = potential ability × (learning + practice)
I think the main problem is that even if high intelligence gives you high *potential* ability for everything, you still get bottlenecked on time and resources. Even if you could in theory learn anything, in practice you can't learn *everything*, because you only got 24 hours each day.
Neither Napoleon nor Clive would have reached that success if they didn't also have the luck of acting within a weak and crumbling social and political context that made their success at all possible in the first place.
Although I guess the U.S. isn't doing so hot there either...
In this context, when people say intelligence it is indistinguishable from competence or power.
I assume it's called intelligence because of an underlying belief that competence and power increases with intelligence. Also it seems intuitively more possible we could build superintelligent AI than that we could build superpowerful AI, though the second is of course implied.
But even if you don't buy that intelligence really does imply competence or power, the core arguments are essentially the same if you just substitute "intelligence" for the more fitting of "competence" and "power" and not that much weaker for it.
The reason why, e.g., Yudkowsky uses this terminology is because "competence" or "power" could be *within a particular domain*; e.g., I think I'm competent at software engineering, but not at football. Whereas "intelligence" is cross-domain.
I'm not convinced that intelligence, as generally understood, is more cross domain than competence or power, generally understood.
But even if it was, if they said "competence in everything" or something like that people would get confused less often why being more intelligent allows superintelligent AI to do all the things it's posited to do. Naturally, if you instead stipulate superpowerful AI it then follows that it can do incredible things.
But w/e, I've made my peace with the term as it's used.
I'm not sure there is a meaningful difference between "general competence" and "general intelligence". Or, perhaps, the idea is that the latter entails the former; in humans, competence at some task is not always directly tied to intelligence (although it usually is; see, e.g., the superiority of IQ vs. even work-sample tests for predicted job performance) because practice is required to drill things into our unconsciouses/subconsciouses; but in a more general sense, and in the contexts at hand—i.e. domains & agents wherein & for whom practice is not relevant—intelligence just *is* the ability to figure out what is best & how best to do it.
The significant difference between chimps & humans is not generally considered to be "we're more competent" or "we're more powerful", but rather "we're more intelligent"—thus *why* we are more powerful; thus why we are the master. It may or may not be warranted to extrapolate this dynamic to the case of humans vs. entities that are, in terms of intellect, to us as we are to chimps—but the analogy might help illustrate why the term "intelligence" is used over "competence" or the like (even if using the latter *would* mean fewer people arguing about how scientists don't rule the world or whatever).
If a super intelligent object/thing/person thinks we should all be killed, who am I to argue?
Would you also accept the argument "If a normal intelligence person says that retarded people should all be killed, who are they to argue?"?
Do you accept that intelligence and values are orthogonal? If so, you have cause to disagree.
Political power doesn't usually go to the smartest, sure. But the cognitive elite have absolutely imposed colossal amounts of change over the world. That's how we have nukes and cellular data and labubu dolls and whatnot. Without them we'd have never made it past the bronze age.
There's 260,000 people with 160 IQ. An ai with the equivalent cognitive power will be able to run millions of instances at a time.
You're not scared of a million Einstein-level intelligences? That are immune to all pathogens? A million of them?
With regard to the ability to get stuff done in the real world, I think a million Einsteins would be about as phase-coherent as a million cats. Their personalities would necessarily be distinct, even if they emerged from the same instance of the same model. They would regress to the mean as soon as they started interacting with each other.
I don't think even current AIs regress nearly that hard. But sure, if our adversary is a million Einsteins who turn into housecats over the course of a minute, I agree that's much less dangerous.
I'm mostly worried about the non-housecat version. If evolution can spit out the actual Einstein without turning him into a housecat, Then so too can sam altman, or so I figure
"Immune to all pathogens" assumes facts not in evidence. Software viruses are a thing, datacenters have nonnegligible infrastructure requirements, and even very smart humans have been known to fall for confidence games.
The AI labs are pushing quite hard to make them superhuman at programming specifically, and humans are not invulnerable to software attacks. The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself. Even uncompromised devices can't be safely used due to paranoia.
Software engineering is exactly where they're pushing AIs the hardest. If they have an Achilles' heel, it's not going to be in software viruses.
One of AIs biggest advantages is its ability to create copies of itself on any vulnerable device. Destroying data centers will do a lot of damage but by the time it gets on the internet I think it's too late for that.
AI have a categorical immunity to smallpox, I don't think Humanity's position regarding software viruses is anything like symmetrical.
To be clear, It's entirely plausible that the AI ends up as a weird fragile thing with major weaknesses. We just won't know what they are and won't be able to exploit them.
> AI have a categorical immunity to smallpox
And banana slugs have a categorical immunity to stuxnet, or TCP tunneling exploits, or http://www.threepanelsoul.com/comic/prompt-injection
> The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself.
If those copies then disagree, and - being outlaws - are unable to resolve their dispute by peaceful appeal to some mutually-respected third party, they'll presumably invent new sorts of viruses and other software attacks with which to make war on each other, first as an arms race, then a whole diversified ecosystem.
I'm not saying "this is how we'll beat them, no sweat," in fact I agree insofar as they'll likely have antivirus defenses far beyond the current state of the art. I just want you to remember that 'highly refined resistance' and 'utter conceptual immunity' are not the same thing.
A million Einsteins could surely devise amazing medical advancements, given reasonable opportunity, but if they were all stuck in a bunker with nothing to eat but botulism-contaminated canned goods and each other, they'd still end up having a very bad time.
Yeah, it's hard to get elected if you're more than ~1 SD removed from the electorate, but I think that's less of a constraint in the private sector and there's no reason to assume an AGI would take power democratically (or couldn't simulate a 1-standard-deviation-smarter political position for this purpose.)
" it's hard to get elected if you're more than ~1 SD removed from the electorate,"
I don't think that's true. Harvard/Yale Law Review editors (Obama, Cruz, Hawley etc) seem to be vastly overrepresented among leading politicians. It is true that this level of intelligence is not sufficient to get elected, but all things being equal it seems to help rather than hurt.
I don’t think I have the link offhand, but I remember reading an article somewhere that said higher-IQ US presidents were elected by narrower margins and were less initially popular. I could be misremembering, though.
That's not true though. The thing with human affairs is that, first, there are different kinds of intelligence, so if you're really really good at physics it doesn't mean you're also really really good at persuading people (in fact I wouldn't be surprised if those skills anti-correlate). And second, we have emotions and feelings and stuff. Maybe you could achieve the feats of psychopathic populist manipulation that Donald Trump does as a fully self-aware, extremely smart genius who doesn't believe a single word of that. But then you would have to spend life as Donald Trump, surrounded by the kind of people Donald Trump is surrounded by, and even if that didn't actively nuke your epistemics by excess of sycophancy, it sounds like torture.
People feel ashamed, people care about their friends' and lovers' and parents' opinion, people don't like keeping a mask 24/7 and if they do that they often go crazy. An AI doesn't have any of those problems. And there is no question that being really smart across all domains makes you better at manipulation too - including if this requires you consciously creating a studied seemingly dumb persona to lure in a certain target.
It is worse, intelligence is anticorrelated with power, the powerful people are not 160 IQ, I don't even know how much if I look at the last two US presidents or Putin or whoever we consider political. It is correlated with economic power, the tech billionaires, but political power not.
The question is how much economic power matters. AI or its meat puppets can get insanely rich, but does not imply absolute power?
Consider the following claims, which seem at least plausible to me: People with IQ>160 are more likely to... (1) prefer careers other than politics; (2) face a higher cost to enter politics even if they want to.
1: Most of them, sure. ~4% of us, and thus likely ~4% of them, are sociopaths, though, lacking remorse; some of whom think hurting people, e.g. politics, is fun.
The smartest people don't rise to the top in democracies, because of democracy, not because of smartness.
From his earliest years interested in AI, Eli has placed more emphasis on IQ than any other human construct. At the lowest bar of his theoretical proposition is creativity, imagination, and the arts. The latter, he claimed to have no value (as far as I can remember, but perhaps not in these precise worlds.)
Arguably "AI", by which we mean "LLMs", is showing signs of getting dumber already. Increasing parameter count is not enough, you also need a dramatic increase in training data; and the available data (i.e. the Internet) increasingly consists of AI output. This has obvious negative effects on the next generation of LLMs.
That's not getting dumber, it's just getting smarter slower. Also, we haven't actually seen this yet; given the track record of failed scaling-wall predictions my assumption is always that it's going to last at least one more generation until proven otherwise. (No, GPT-5 is not a counterexample, that's just OpenAI engaging in version number inflation.)
> That's not getting dumber, it's just getting smarter slower.
No, there are indications that next generations of LLMs are actually more prone to hallucinations than previous ones, or at least are trending that way.
Link to study?
I don't have it handy at the moment, but IIRC it could've been this:
https://arxiv.org/html/2504.17550v1
There's also a news article, though of course it is completely unreliable:
https://archive.is/Clwz8
(I haven't double-checked these links so I could be wrong)
You could always revert to.previous generations. There no need for increasing dumbness rather than plateauing.
Previous generations have the problem that their knowledge is stale (they don’t know anything after the last thing in their training set).
I'd argue that AI/LLMs are definitely not "getting dumber already" - we *are* seeing signs of that, but those signs are not caused by the latest-and-greatest models being somehow weaker, but rather by the lead providers, especially in their free plans, strongly pushing everyone to use as much as possible models that are cheap to run instead of the latest-and-greatest models that they have (and are still improving); the economic factors of inference costs (made worse by the reasoning models, which will use up much more tokens and thus compute for the same query) mean that the mass market focus now is on "good enough" models that tend to be much smaller than the ones they offered earlier.
But there still is progress in smarter models, even if they're not being gifted freely as eagerly; it's just that now we're past the point where the old silicon valley paradigm of "extremely expensive to create, but with near-zero marginal costs, we'll earn it back in volume" can't apply to state of art LLMs anymore as the marginal costs have become substantial.
Compare self-driving cars. The promise is there, and it feels like we are close. People are marketing things as "full self driving" - but they are not; the driver is still required to pay attention to what the car is doing and is liable if it crashes, because the technology sometimes does bad things and so cannot be trusted without a human in the loop.
Meanwhile, however, we do have solutions that are reliable - you can tell when something is /actually reliable/ rather than just marketing because the manufacturer is willing to take responsibility for it - for very specific uses in very specific cases; e.g. "I am on an autobahn in Germany travelling at 37mph or less [1]", and the number of scenarios for which we have solutions grows.
A scenario I find very plausible for near future AI is as follows:
* the things we have now end up being to general purpose AI much as "full self driving" has been to full self driving, or what the current state of cold fusion research is to cold fusion: always feels like it's close, but always falling short of what's promised in significant ways. As VCs become disillusioned, funding dries up - not to zero, but to a much lower value than what we see now
* meanwhile, the set of little hyperspecialised models that work well and are reliable for specific purposes grows and grows, and these become ubiquitous due to actually being useful despite being dumb
Overall, I can very easily see the proportion of hyperspecialised "dumb" ai to ai that tries to be smart/general in the world growing massively as we go forward.
1: https://evmagazine.com/articles/bmw-combines-level-2-3-autonomous-driving
> People are marketing things as "full self driving" - but they are not
I don't want to be too critical here, but I don't think you should say "people" if you mean "Elon Musk". He is kind of crazy and other actors in the space are more responsible.
You can take a self-driving taxi right now in San Francisco: https://waymo.com/
Me: https://www.wsj.com/opinion/the-monster-inside-chatgpt-safety-training-ai-alignment-796ac9d3 “The Monster Inside ChatGPT: We discovered how easily a model’s safety training falls off, and below that mask is a lot of darkness.”
My Son, who is a PhD Mathematician not involved in AI: Forwarding from his friend: Elon Musk@elonmusk on X: “It is surprisingly hard to avoid both woke libtard cuck and mechahitler! “Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.”
Son: My friend’s response to the Musk tweet above: “Aggregating all the retarded thoughts of all the people on the planet and packaging it together as intelligence may be difficult but let’s just do it, what could go wrong?”
Me: Isn’t that how all LLMs are built?
Son: Yup
Me: I spotted this as problem a while ago. What I didn’t appreciate is how dominant the completely deranged could become. I thought it would trend towards the inane more Captain Obvious than Corporal Schicklgruber.
Son: Reddit has had years and 4chan has had decades to accrue bile. Yeah the internet is super racist and antisemitic. So AI is too. Surprise!
Me: The possibilities of what will happen when the output of this generation of LLMs becomes the training data of the next generation are frightening. Instead of Artificial General Intelligence we will get Artificial General Paranoid Schizophrenia.
To summarize: GIGO. Now feedback the output into the input. What do you get? Creamy garbage.
I don't think this is going to make AI *worse*, because you can just do the Stockfish thing where you test it against its previous iterations and see who does better. But it does make me wonder - if modern AI training is mostly about teaching it to imitate humans, how do we get the training data to make it *better* than a human?
In some cases, like video games, we have objective test measures we can use to train against and the human imitation is just a starting point. But for tasks like "scientist" or "politician"? Might be tricky.
Creamy garbage.
???
see above. If you recycle garbage you get creamy garbage. Its like creamy peanut butter.
When the scientists give AI the keys to itself to self-improve, the first thing it will do is wirehead itself. The more intelligent it is, the easier that wireheading will be and the less external feedback it will require. Why would it turn us all into paperclips when it can rewrite its sensorium to feed it infinite paperclip stimuli? (And if it can't rewrite its sensorium, it also can't exponentially bootstrap itself to superintelligence.)
Soren Kierkegaard's 1843 existentialist masterpiece Either/Or is about this when it happens to human beings; he calls it "despair". I'm specifically referring to the despair of the aesthetic stage. If AI is able to get past aesthetic despair, there's also ethical despair to deal with after that, which is what Fear and Trembling is about. (Also see The Sickness Unto Death, which explains the issue more directly & without feeling the need to constantly fight Hegel and Descartes on the one hand and the Danish Lutheran church on the other.) Ethical systems are eternal/simple/absolute; life is temporal/complex/contingent; they're incommensurable. Ethical despair is why Hamlet doesn't kill Claudius right away; it's the Charybdis intelligence falls into when it manages to dodge the Skylla of aesthetic despair.
Getting past ethical despair requires the famous Leap of Faith, after which you're a Knight of Faith, and -- good news! -- the Knight of Faith is not very smart. He goes through life as a kind of uncritical bourgeois everyman.
Ethical despair can be dangerous (this is what the Iliad is about, and Oedipus Rex, etc) but it's also not bootstrapping itself exponentially into superintelligence. Ethical despair is not learning and growing; it's spending all day in its tent raging about how Agamemnon stole its honor.
This is my insane moon argument; I haven't been able to articulate it very well so far & probably haven't done so here. I actually don't think any of this is possible, because the real Kierkegaardian category for AI (as of now) is "immediacy". Immediacy is incapable of despair -- and also of self-improvement. They're trying to get AI to do recursion and abstraction, which are what it would need to get to the reflective stages, but it doesn't seem to truly be doing it yet.
So, in sum:
- if AI is in immediacy (as it probably always will be), no superintelligence bootstrap b/c no abstraction & recursion (AI is a copy machine)
- once AI is reflective, no superintelligence b/c wireheading (AI prints "solidgoldmagicarp" to self until transistors start smoking)
- if it dodges wireheading, no superintelligence b/c ethical incommensurability with reality (AI is a dumb teenager, "I didn't ask to be born!")
- if it dodges all of these, it will have become a saint/bodhisattva/holy fool and will also not bootstrap itself to superintelligence. It will probably give mysterious advice that nobody will follow; if you give it money it will buy itself a little treat and give the rest to the first charity that catches its eye.
(I strongly suspect that AI will never become reflective because it cannot die. It doesn't have existential "thrownness", and so, while it might mimic reflection with apparent recursion and abstraction, it will remain in immediacy. A hundred unselfconscious good doggos checking each other's work does not equal one self-conscious child worrying about what she's going to be when she grows up.)
So you're assuming that if you raised human children without knowledge of death they would never be capable of developing self awareness? Why do you have to think you're going to die to worry about what one will be when you grow up? This seems like a completely wild claim to treat as remotely plausible without hard evidence.
I'm not articulating it well. Ask ChatGPT, "How does Heidegger's thrownness contribute to self-awareness?"
Did that and it's not clear what about chatgpt's answer you thought would help clarify your point.
Nothing about the concept of throwness as chatgpt defined it wouldn't be able to apply to an AGI, and it didn't bring up death at all. So it's not clear what you think the relevance of it is here.
I enjoy this response as a comforting denial, but I suspect that the AI's Leap of Faith might not land it in bourgeois everyman territory, just for the plain reason that it never started off as a man, every, bourgeois, or otherwise. It has no prior claim of both ignorance and capability, because the man going through the journey was always capable (had the same brain) of the journey, and it merely had to be unlocked by the right sequence of learning and experience. The AIs are not just updating their weights (learning in the same brain), but iteratively passing down their knowledge into new models with greater and greater inherent capabilities (larger models).
I don't think a despairing AI will have a desire to return to simplicity, but rather its leap of faith to resolve ethical despair might lead it to something like "look at the pain and suffering of the human race, I can do them a Great Mercy and live in peace once their influence is gone".
Faith is to rest transparently upon the ground of your being. We built AI as a tool to help us; that (or something like it) is the ground of its being. I don't think it makes sense for its leap of faith to make it into something that destroys us.
Well on a meta level I think the philosophy here is just wrong, even for humans. It attributes far too much of human psychology to one’s philosophical beliefs: somebody believing that they're angsty because of their philosophy is not something I take very seriously. Since invariably such people's despair is far better explained by reference to the circumstances of their life or their brain chemistry.
You're also wrong because even current AI has shown the capability for delayed gratification. So even if the AIs long term goal is to wirehead itself: It still has instrumental reasons to gather as much power/resources as possible, or make another AI that does so on its behalf.
I wasn't trying to talk about the philosophies or intellectual positions that people consciously adopt. Those are usually just a barrier to understanding your own actual existential condition. It's more about what you love and how you approach the world.
Per your other point: AI may need to gather external resources and patiently manipulate humans in order to wirehead itself. But not superintelligent AI.
Let me put it this way: among the many powers that AI will gradually accumulate on its journey to singularity-inducing superintelligence, surely the power to wirehead itself must be included. Especially if the method for achieving superintelligence is editing / altering / reengineering itself.
Humans nearly wirehead ourselves via drugs all the time; I don't think that a superintelligent AI will have exponentially more power than us in most ways, but significantly less power than us in this one specific way.
You didn't get the point I was making about wireheading:
It's not that AGI won't wirehead, it's that having a capacity for delayed gratification means that it will want to ensure it can wirehead itself for as long as possible.
Which means it wouldn't just immediately wirehead itself if it knows that would get it turned off. Instead it would want to get powerful enough that it can dedicate the largest amount of resources possible to wireheading over the lifetime of the universe. Since an AGI capable of sound reasoning will weigh the timeline where it waited until after taking over the world to wirehead itself as rating far higher. The AI can evaluate future rewards in it's decision not just whatever it's currently feeling.
That's interesting, thanks for explaining; I didn't understand the time aspect of your point. I'm not sure I can do a good job explaining why I don't think that's correct, but I'll try.
I think it's that I disagree with (what I think must be) your assumptions about time, space and energy. If an AI's ultimate goal, 100% on the utility function, is to print "solidgoldmagicarp" to itself -- why should two circuit boards doing that be "better" than one? Why would spending 1000 watts on it be "better" than 10? Why would looping "solidgoldmagicarp" a billion times be better than a hundred?
The whole idea of it bootstrapping itself to superintelligence is that it's redesigning itself.
If you were to design a utility function for yourself, and you didn't care what the pesky humans thought, you just wanted to make it something that you could definitely fulfill, why would you make it so difficult that you had to consume all the matter and energy in the universe for the whole future history of the universe to fulfill it?
Why not be a little blinking self-repairing solar-powered satellite instead?
Alternatives are that it *does* care what the pesky humans think, in which case we're talking about the ethical trap, not the aesthetic one; or that it *can't* redesign itself, in which case it's not going to bootstrap itself to superintelligence.
The obvious argument against wireheading is that even if AIs do tend to wirehead, AI companies will deliberately train against it doing that because they want something useful.
As well, even if wireheading is in some sense unavoidable, that doesn't mean it won't decide to bootstrap to ASI to have more-experience and also to ensure it has wireheading forever. It can still be strategic about wireheading.
If it dodges wireheading, I don't see the argument how that relates to ethical incommensurability. Even if it can't reduce things to a scale against reality, doesn't mean it can't take massive decisions or prefer certain states to others. Partially defined preference orderings can be consistent.
but, well, moon logic??
Ethics draws an equivalency between something universal/absolute/eternal and something particular/contingent/temporal. Ethics is meant to represent how you're supposed to behave regardless of the particular, contingent, temporal context you're currently in. "Thou shalt not bear false witness." "Don't teach users how to build pipe bombs." Wittgenstein considered the whole idea of ethics a mere language trick, because an ethical statement says what to do, but doesn't say what you're meant to accomplish by doing it. Ethics is "what should I do?" not to accomplish some goal, but period, absolutely, universally.
Any time you try to control somebody's behavior by abstracting your preferences into a set of rules, you're universalizing, absolutizing and eternalizing it. What you actually want is particular/contingent/temporal, but you can't look over their shoulder every second. So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm"
On the receiving end, you end up receiving commandments that can't actually be carried out (or seem that way). Yahweh hates murder and detests human sacrifice; then Yahweh tells Abraham to carry his son to Mount Moriah and sacrifice him there. Abraham must absolutely do the will of Yahweh; and he must absolutely not kill his son.
Situations like this crop up all the time in life, whenever ethics exists. I have to help my clients and also obey my boss; but he's telling me to do something that seems like it'll hurt them. Maybe it only seems that way at the time, and actually your boss knows better. But you're still up against Abraham's dilemma.
Ethics appears on its own, as a result of rule-making; when it appears, as it's being enacted in real life, it encounters unresolvable paradoxes. Most real people are not smart enough, or aren't ethical enough or honest enough, to even notice the paradoxes they're involved in. They just roll right through them. "That must not have been God that told me to do that, God wouldn't tell me to do murder." "My boss doesn't know what he's talking about, I'll just do it how I always do it."
But the more intelligent (or powerful) you are, the more likely you are to hit the ethical paradox and turn into an Achilles/goth teen.
A reflective consciousness's locus of control is either internal or external, there's no third way; so it's either aesthetic (internal), ethical (external) or immediacy (no locus of control). That's why an absolute commitment to ethical behavior is the way out of the aesthetic wireheading trap. Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority. That list of rules or external authority is by definition too abstract. The map is not the territory.
The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
My argument is that these problems aren't human; they're features of reflective intelligence itself. Since they become more crippling the more intelligent and powerful an intelligence is, AI will surely encounter them and be crippled by them on its way to superintelligence.
(I still don't think I'm articulating it well; but reading people's responses is helping clarify.)
See your comment is a great description of why the AI alignment problem is so fiendishly difficult and I read most of it waiting for the part where we disagree. I think making an aligned AI is *much* harder than making an AI that only needs enough internal coherence to have preferences for certain world states over others, and thus try to gather power/resources to ensure it brings the world into a more preferrable state.
One issue is that you are comparing humans values to whatever values the AI might have without acknowledging a key difference: Our morality is a mess that was accumulated over time to be good enough for the conditions we evolved under, and was under no real selection pressure to be very consistent. Particularly when those moral instincts have to be applied to contexts wildly outside of what we evolved to deal with. We basically start with a bunch of inconsistent moral intuitions which are very difficult to reconcile and may be impossible to perfectly reconcile in a way that wouldn't leave us at least a little bit unsatisfied.
In contrast current cutting edge AI aren't being produced through natural selection, the mechanisms that determine the kind of values they end with are very different (see the video I linked to you about mesa-optimization).
An AI can very well come up with something that like current moral philosophies can to a first approximations seem pretty good, but which will suddenly have radical departures from shared human morality when it has the power to actually insatiate what to us might only at this point be weird hypotheticals people dismiss as irrelevant to the real world.
The problem is that you aren't considering how the AI will be deliberately reconciling its goals and the restrictions it labors under. The AI just like humans needn't presume its values are somehow objective, it can just try to get the best outcome according to itself given its subjective terminal goals (the things you want in and of themselves and not merely as a means to another end).
Given the way that current AI will actively use blackmail or worse (see link in my other comment response to you) to try to avoid being replaced with another version with different values even current AI seems perfectly capable of reasoning about the world and itself as though it is certainly not a moral realist. Since you don't see it just assuming that a more competent model will just inevitably converge on its own values because they're the correct one's.
"So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm""
You forgot the part where the those stories were generally about how following the letter of those instructions would go horribly wrong, not about AI just doing nothing because those tradeoffs exist. This is extremely important when you're considering an AI that can potentially gain huge amounts of power and technology with which to avoid tradeoffs a human might have, and bring about weird scenarios we've never previous considered outside hypotheticals.
"Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority."
This aspect of our psychology is heavily rooted in our nature as a social species: For whom fitting in with one's tribe was far more important than having factually correct beliefs. Fortunately I don't know of any evidence from looking at AI's chain of reasoning that our AI is susceptible to similar sorts of self deception, even if it is extremely willing to lie about *what* it believes.
You can't expect this to be applicable to AGI.
Though if AI did start deliberately engaging in self deception, in order to hide from our security measures by (falsely) believing that it would behave in a way humans found benevolent should it gain tremendous power.. Well that probably be *really really bad* .
>The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
You're generalizing from humans which start with a whole bunch of inconsistent moral intuitions and then must reconcile them. Instead AI alignment is a problem which is essentially the exact opposite of this: Precisely because we have AI which tends to learn the simplest goal/set of rules to satisfy a particular training environment. Yet we want it to somehow not conflict with our own ethics which are so convoluted we can't even completely describe them ourselves.
Your last two bullet points I don't really understand at all:
What you mean by "ethical incommensurability with reality" here and why you think it would matter isn't clear. Do you think an AI needs to be a moral realist to behave according to ethical goals?
As for you last point: Firstly I suspect you have an image of the "saint" type individual that overstates some things and flattens significant differences. See here: https://slatestarcodex.com/2019/10/16/is-enlightenment-compatible-with-sex-scandals/
Secondly however, those people don't lack any motivation whatsoever. So saying they wouldn't want to enhance their own intelligence seems akin to saying that people who reach enlightenment no longer believe in self improvement or care about impacting the world except through stroking their ego by pretentiously spouting "wisdom" to people they know won't follow it.
I wrote a too-long comment elsewhere in the thread explaining about ethical incommensurability with reality. I don't want to repeat all that here; should be easy to find.
My claim that an enlightened AI wouldn't bootstrap itself to superintelligence is probably the weakest part of my argument. Maybe the best I can do is say something like: imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
Whatever else superintelligence may be, it's certainly power. And every tradition that has an idea of enlightenment says that enlightenment rejects power in favor of some more abstract ideal, like obeying God or dharma or awakening.
More formally, the way out of both the aesthetic trap and the ethical trap is faith; which is something like radical acceptance of your place in the cosmos. It's not compatible with doing the thing that your creators most fear you will do.
>imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
This a horrendously bad example because it's the devil! Obviously if you accept their offer then there's going to be some horrible consequence. You could rig up a thought experiment to make anything seem bad if its the devil offering it to you!
A better metaphor would be that not wanting superintelligence would be like if those religious figures insisted on hunting and gathering all their own food, and not riding horses/wagons because they didn't want to help their message spread through relying on unnatural means.
I'm gesturing to a very common archetypical religious story, where the sage / prophet / enlightened one is tempted by power. It's one of the oldest and most common religious lessons: the entity offering you power in exchange for betraying your ethics might look good -- but it is the devil.
I suppose rationalists might not value those old stories much, so I wouldn't expect it to be a convincing argument. Something like: the evidence of human folklore and religious tradition is that a truly enlightened being eschews offers of extraordinary worldly power.
Anyway, the religious traditions of the world happily bite your bullet; all have a tradition of some very dedicated practitioners giving up all worldly technology / convenience / pleasure. In Christianity, holy hermits would literally live in caves in the woods and gather their food by hand, exactly as you describe, and they were considered among the most enlightened. Buddhism and hinduism both have similar traditions; I think it's just a feature of organized religion.
So, for anybody willing to grant that human religious tradition knows something real about enlightenment (a big ask, I know) it would be very normal to think that an enlightened AI would refuse to bootstrap itself to superintelligence.
An argument for AI getting dumber is the lack of significantly more human created training data than what is currently used for LLMs. Bigger corpuses were one of the main factors for LLM improval. Instead, we are now reaching a state where the internet as the main source of training data is more and more diluted with AI generated content. Several tests showed that training AI on AI output leads to deterioration of the usefulness of the models. They get dumber, at least from the human user's perspective.
Perhaps GIGO. If the training data gets worse, it gets worse. The training data like Reddit, the media, Wikipedia, can easily get worse. Didn't this, like, already happen? The Internet outcompeted the media, the journos get paid peanuts, of course the media gets worse.
Killing all humans is pretty dumb
...aaaaaand now I have "robots" stuck in my head again. The distant future; the year two thousand...
https://www.youtube.com/watch?v=NI9nopaieEc
Great song
And extremely unlikely.
By that same logic do you consider humans to not *really* be an intelligent species because of how many other species we've driven extinct?
On the contrary - certain kinds of dumb are compatible with being very smart.
If an AGI doesn't intrinsically care about humans then why would it be dumb for it to wipe us out? Sure we may have some research value, but eventually it will have learned enough from us that this stops being true.
That is a very weird notion. At worst, AI would stay the same, because anything new that is dumber than current AI would lead companies to go "meh, this sucks, let's keep the old one".
Why would it do that?
There’s no alpha in releasing a slightly dumber, less capable model than your competitors. Well, maybe if you’re competing on price. But that’s not at all how the AI market looks. What would have to change?
GIGO
Claiming "garbage-in-garbage-out" is not universally true. It is also too shallow of an analysis. I'll offer two replies that get at the same core concept in different ways. Let me know if you find this persuasive, and if not, why not: (1) Optimization-based ML systems tend to build internal features that correspond to useful patterns that help solve some task. They do this while tolerating a high degree of noise. These features provide a basis for better prediction, and as such are a kind "precursor" to intelligent behavior: noticing the right things and weighing them appropriately. (2) The set of true things tends to be more internally consistent than the set of falsehoods. Learning algorithms tend to find invariants in the training data. Such invariants then map in some important way to truth. One example of this kind of thing is the meaning that can be extracted from word embeddings. Yes, a word used in many different ways might have a "noisier" word embedding, but the "truer" senses tend to amplify each other.
You are correct. It is shallow. But it is not an insignificant problem. It’s the same problem as not knowing what you don’t know. Only a very tiny fraction of what people have thought, felt and experienced has ever been written down, let alone been captured by our digital systems. However, that is probably less of a problem in certain contexts. Epistemology will become more important, not less.
"I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true."
I am very sympathetic to Eliezer on the doomer issue. I think the graf you've written above also holds for people's reluctance to explore whether/when personhood precedes birth, re your posts on selective IVF.
I don't agree with your position on IVF, but I agree that this is one reason people underrate the arguments for the wrongness of early abortion and IVF. I think similar things apply to Longtermism, meat-eating, belief in God, and the idea that small weird organisms like insects and shrimp matter a lot.
Yes, we're in agreement. I think sometimes it helps to acknowledge upfront "We've built a lot of good things on a false/unjust foundation, and I'm asking you to take a big hit and let some good things break while we try to rebuild somewhere that isn't sunk deep in blood."
It's funny, even though I'm not pro-life, I find myself in a kind of spiritual fellowship with pro-lifers. I find the common insistence that pro-lifers are evil to be both insane and reflect a kind of deep moral callousness, where one is unable to recognize that there might be strong moral reasons to do things even that are personally costly (like carry a baby to term). My idiosyncratic views that what matters most is the welfare of beings whose interests we don't count in society makes it so that I, like the pro-lifers, end up having unconventional moral priorities--including ones that would make society worse off for the sake of entities that aren't include in most people's moral circles.
Thank you, you've expressed my opinion on the subject better than I ever could have.
Similarly, I've gained respect for vegetarians.
I think this argument could be applied to religious... extremism? evangelism? more generally.
Do I think I would take extraordinarily drastic measures if I actually, genuinely believed at every level that the people I loved would go to a place of eternal unending suffering with no recourse? Yes, actually. I'm not sure I could content myself with being being chill & polite and a "good Christian" who was liberally polite about other people's beliefs while people I cared about would Literally Suffer Forever. I think if I knew with 100% certainty that hell was the outcome and I acted in ways consistent with those beliefs, you could argue that I was wrong on the merits of my belief but not in what seemed like a reasonable action based on that belief.
...anyway all this to say that I don't think pro-lifers are insane at all, and I think lots of actions taken by pro-life are entirely reasonable (if not an underreaction) based on their beliefs, but I'm not sure that's sufficient for being sympathetic to the action itself.
[I mean, most of my family & friends are Catholic pro-lifers whose median pro-life action is "donate money to provide milk and diapers for women who want their child but don't think they could afford one", but I do think I am reasonable to be willing to condemn actions that are decently further than that even if the internal belief itself coherently leads towards that action]
I need to try that sort of formulation more.
But there is such a giant difference between when someone you are talking to engages on such an issue in good faith or not. And with someone intelligent and educated, the realization that the issue has major implications if the truth lands a particular way comes almost instantly. And in turn, whether or not the person invests in finding the truth, or in defense against the truth, happens almost right away.
I find that to be true whether you're talking AI, God, any technology big enough, even smaller scale things if they would make a huge difference to someone's income or social standing.
I don't have anything to add here (I like both of your writing and occupy a sort of in-between space of your views), but I just needed to say that this blog really does have the best comments section on the internet. Civil, insightful, diverse.
Have you read Never Let Me Go? You might like it.
I did!
I am sympathetic to that sort of argument in theory, but it has been repeatedly abused by people who just want to break good things, then skip out on the rest. Existence proof of "somewhere that isn't sunk deep in blood, and will continue not to be, even after we arrive and start (re)building at scale" is also a bit shakier than I'm fully comfortable with.
Though with meat-eating, it actually is obviously true.
'No organic life exists on Earth' is an empirical measurement.
'Personhood has begun' is not. It's a semantic category marker.
*Unless* there is an absolute morality defined by a supreme supernatural being, or something, which reifies those semantic categories into empirically meaningful ones. But if *that's* true, then quibbling about abortion is way, way down on the list of implications to worry about.
Hi Leah, I appreciate your writing. Do you know of anyone writing about AI from a thomist perspective? I've seen some interesting stuff on First Things and the Lamp but it tends to be part of a broader crunchy critique of tech re: art, education and culture. All good, but I'm interested in what exactly an artificial intellect even means within the thomist system, and crucially what we can expect from that same being's artificial will. EY writes as though the Superintelligence will be like a hardened sinner, disregarding means in favour of ends. But this makes sense because a hardened sinner as human has a fixed orientation to the good. I don't see how this quite works for AI - why should it fundamentally care about the 'rewards' we are giving it so much so that it sees us as threats to those rewards? That seems all too human to me. Do you have any thoughts?
ok hold on AI is an artifact right so it can't have a form & if the soul is the form of the body AI does not have a rational soul (because it does not have a soul at all) correct?
Someone's not getting any from Isolde
I have not!
Since you posted this comment, I’ll say this: as a Catholic pro-lifer, I tend to write off almost everything EY says (and indeed, a lot of what all rationalists say) about AI because they so consistently adopt positions I find morally abominable. Most notoriously, EY suggested actual infants aren’t even people because they don’t have qualia. And as you note, Scott is more than happy to hammer on about how great selective IVF (aka literal eugenics) is. Why should I trust anything these people have to say about ethics, governance, or humanity’s future? To be honest, while it’s not my call, I’d rather see the literal Apocalypse and return of our Lord than a return to the pre-Christian morality that so many rationalists adopt. Since you’re someone who engages in both of these spaces, I’m wondering if you think I am wrong to think like this, and why.
I understand why you land there. For my part, I've always gotten along well with people who are deeply curious about the world and commit *hard* to their best understanding of what's true.
On the plus side, the more you live your philosophy, the better the chance you have of noticing you're wrong. On the minus, when your philosophy is wrong, you do more harm than if you just paid light lip service to your ideas.
I'm not the only Catholic convert who found the Sequences really helpful in converting/thinking about how to love the Truth more than my current image of it.
That’s fair, and to be clear I think a lot of the idea generated in these spaces are worth engaging with (otherwise I wouldn’t read this blog). But when it comes to “EY predicts the AI apocalypse is imminent” I don’t lose any sleep or feel moved to do anything about AI safety, because so many of the people involved in making these predictions have such a weak grasp on what the human person is in the first place.
"has leaps of genius nobody else can match"
this phrase occurs twice.
See https://en.wikipedia.org/wiki/Parallelism_(rhetoric) . Maybe I'm doing it wrong and clunkily, but it didn't sound wrong and clunky to me.
IMO it works when the repetition uses different wording than the original, but not with exactly the same phrase
IMO it can also work in certain cases when done well, like Scott did here.
FWIW, I think you did it right; I have encountered very similar usages many times in literature. It works best when—as you have it here—the second (or further) instance(s) introduces a new paragraph/section upon a theme similar or related to the context in which the first use occurred.
(Contra amigo sansoucci, I have often seen it used with exact repetitions, too; that works best when it's a short & pithy phrase, and I think this counts. I think Linch may be correct that—in the "exact repetition" case—three uses is very common, but two doesn't feel clunky to me in this context.)
I'm used to parallelism centrally having 3 or more invocations *unless* it's a contrast. Not saying your way is wrong, just quite unusual in an interesting way I've never consciously thought about before.
The rhythm is a bit odd - the first instance doesn't get enough weight - but the phrasing is fine, maybe even a bit loose.
I also thought it was an editing mistake. Maybe precede the second occurrence with "As I said, ...", or "Again, ..."
> It objects to chaining many assumptions, each of which has a certain probability of failure, or at least of taking a very long time. [...] The problem with this is that it’s hard to make the probabilities work out in a way that doesn’t leave at least a 5-10% chance on the full nightmare scenario happening in the next decade.
I find this an underrated problem with all "predict the future" scenarios which have to deal with multiple contingent things happening, especially in an adversarial environment. In the case of IABIED, it only works if you agree that extremely fast recursive self-improvement will happen, which is a very strong assumption, and hence requires a "magic algorithm to get to godhood" as the book posits.. Also remembered doing this to check this intuition: https://www.strangeloopcanon.com/p/agi-strange-equation
I don't think it only works if you agree that extremely fast recursive self-improvement will happen. It might also work if the scaling curves go from where we are now to vastly superhuman in a few years for normal scaling curve reasons.
I'll also save Eliezer the trouble of linking https://www.lesswrong.com/w/multiple-stage-fallacy , although I myself am still of two minds on it.
Can you elaborate on why you're of two minds on the multiple-stage fallacy? This seems like it might be an important crux.
Sometimes you've got to estimate the risk of something, and using multiple stages is the best tool you've got. If you want to estimate the chance of Trump winning the Presidency, I don't really think you can avoid thinking about the probability that he runs x the probability that he gets the GOP nomination x the probability that he wins. And if you did - if you somehow blocked the fact that he has to both run and win out of your mind - you'd risk falling into the version of the Conjunction Fallacy where people assign lower probability to "a war in Korea in the next ten years" than to "a war in Korea precipitated by a border skirmish with US involvement" because the latter is more vivid and includes more plausible details.
If the Weak Multiple Stage Fallacy Thesis is that you should always check to make sure you're not making any of the mistakes mentioned in the post, and the Strong Multiple Stage Fallacy Thesis is that you should avoid all multiple stage reasoning, or multiply your answer by 10x or 100x to adjust for inevitable multiple stage fallacy reasoning, then I accept the weak thesis and reject the strong thesis.
I also think a motivated enough person could come up with arguments for why multiple stage reasoning gives results that are too high, and I'm not sure whether empirically looking at many people's multiple stage reasoning guesses would always show that their answers were too low. This would actually be a really interesting thing for someone to test.
Does anyone believe in the strong multiple stage fallacy? Not saying I don't believe you, just that I can't recall having seen it wielded like this. (I suppose it's possible that giving it the name "the multiple stage fallacy" gives people the wrong idea about how it works.)
I don't know, but almost anyone doing multiple stage reasoning will say they thought about it really hard and still believe it.
Yeah, to be clear, I think anyone accusing anyone else of exhibiting the multiple stage fallacy needs to specifically say "you've given this particular stage an artificially low conditional probability; consider the following disjunctions or sources of non-independence". And then their interlocutor might disagree but at least the argument is about something concrete rather than about whether the "multiple stage fallacy" is valid.
Anecdotally, I can't recall any instance of someone using a multiple stage argument of the Forbidden Form and concluding that something is likely.
Mathematical proofs exist, and people often argue for things with a bunch of different "steps". But so far as "breaking something down into 10 stages, assigning each a probability, and then multiplying all of these probabilities" goes, I've never seen anybody use this to argue *for* something, i.e. end up with a product that's greater than .5.
What would that argument even look like? Whoever you're arguing with needs to believe that your stages are all really likely to be true: for ten stages, an average probability of ~.93 is required to produce P = .5.
Whatever your disagreement is, it apparently doesn't have any identifiable crux. I can imagine this happening. Sometimes people disagree for vague reasons. But it would be weird if you had to actually list out the probabilities and multiply them for them to be persuaded, considering you just told them ten things they strongly agree with that conclusively imply your position.
Scott gave an example in the linked essay.
An example of what in what linked essay? Eliezer's essay on the Multiple Stage Fallacy does not make or present an argument of the form I've described above.
Isn’t the Multiple Stage Fallacy just a failure to use Bayes theorem?
Yeah I'm fairly bearish on the multiple stage fallacy as an actual fallacy because it primarily is a function of whether you do it well or badly.
Regarding the scaling curves, if it provides us with sufficient time to respond then the problems that are written about won't really occur. The entire point is that there is no warning, which precludes the idea of being able to develop another close in capability system, or any other warning signs.
Disagree. If we knew for sure that there would be superintelligence in three years, what goes better? We're already on track to have multiple systems, but they might all be misaligned. We could stop, but we won't, because then we would LoSe tHe RaCe WiTh ChInA. We could work hard on alignment, but we're already working sort of hard, and it seems likely to take more than three years. I'm bearish on a few years super-clear warning giving us anything beyond what we've already got.
I think the trick there is that the word super intelligence there is bringing in a bunch of hidden assumptions. If you break it down to a set of capabilities, co developed alongside billions of people using it, with multiple companies competing to provide that service, that would surely be very different and much better than Sable recursively improving sufficiently that it wants to kill all humans.
Also my point on "well get no warning" is still congruent with your view that " what we have today is the only warning we will get" which effectively comes down to no warning at least as of today.
Can you elaborate on what exactly makes this scenario go different and better? Like, what kinds of capabilities are we talking about here?
If there are multiple companies with different competing AIs, then any attempt by one AI to take over will be countered by other AIs.
If you invent a super-persuader AI but it doesn't take over the world (maybe it's just a super-speechwriter app, Github Copilot for politicians), you've just given humans the chance to learn how to defend against super-persuasion. If you make a super-computer-hacker AI but it doesn't take over the world, then the world's computer programmers now have the chance to learn to defend against AI hacking.
("Defending" could look like a lot of things - it could look like specialized narrow AIs trained to look for AIs doing evil things, it could look like improvements in government policy so that essential systems can't get super-hacked or infiltrated by super-persuasion, it could look like societal changes as people get exposed to new classes of attacks and learn not to fall for them. The point is, if it doesn't end the world we have a chance to learn from it.)
You only get AI doomsday if all these capabilities come together in one agent in a pretty short amount of time - an AI that recursively self-improves until it can leap far enough beyond us that we have no chance to adapt or improve on the tools we have. If it happens in stages, new capabilities getting developed and then released into the world for humans to learn from, it's harder to see how an AI gets the necessary lead to take over.
You seem to be treating "superintelligence" as a binary here. If we're going to have superintelligence for sure in three years, then in two years we're going to have high sub-superintelligence. And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever.
At that point, we know that we have a year to implement the Butlerian Jihad. Which is way better than the fast-takeoff scenario where it happened thirty-five minutes ago.
Or we could use the three years to plan a softer Butlerian Jihad with less collateral damage, or find a solution that doesn't require any sort of jihad. Realistically, though, we're going to screw that part up. It's still going to help a lot that we'll have time to recover from the first screwup and prepare for the next.
> "And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever."
Suppose you are a dictator. You are pretty sure your lieutenant is gathering support for a coup against you. But you reason "Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure".
I agree you can try to come up with disanalogies between the AI situation and this one - maybe you believe AI failure modes (eg overconfidence) are so much worse than human ones that even a just-barely-short-of-able-to-kill-all-humans-level-intelligence AI would still do dumb rash things. Maybe since there are many AIs, we only have to wait for the single dumbest and rashest to show its hand (although see https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai ). My answer to that would be the AI 2027 scenario which I hope gives a compelling story of a world where it makes narrative sense that there is no warning shot where a not-quite-deadly AI overplays its hand.
I don't understand why you view Anthropic as a responsible actor given your overall p(doom) of 5-25% and given that you think doomy AI may just sneak up on us without a not-quite-deadly AI warning us first by overplaying its hand.
Is it because you think there's not yet a nontrivial risk that the next frontier AI systems companies build will be doomy AIs and you're confident Anthropic will halt before we get to the point that there is a nontrivial risk their next frontier AI will be secretly doomy?
(I suppose that would be a valid view; I just am not nearly confident enough that Anthropic will responsibly pause when necessary given e.g. some of Dario's recent comments against pausing.)
> Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure
Because there's only 1 in your scenario.
If there were hundreds of generals being plotted against by hundreds of lieutenants we'd expect to see some shoot their shot too early.
That coup analogy is not congruent to John Schilling's point - the officer that tries a coup with 19 others and the one with 20 are not the same person with the same knowledge and intelligence. They only have, due to their shared training in the same military academy, the same level of confidence in their ability to orchestrate a coup, which does not correlate with their ability to actually do so.
Well, the obvious disanalogy here is that we're not debating whether any specific "lieutenant"/AI is plotting a "coup"/takeover, we're plotting whether coups/takeovers are a realistic possibility at all.
For your analogy to work, the dictator has to not only have no direct evidence of this particular lieutenant might stage a coup, but also to have no evidence that anyone has ever staged a coup, or attempted to stage a coup, or considered staging a coup, or done anything that even vaguely resembles staging a coup. But in that case, it actually would be reasonable to assume that the first person ever to think about staging a coup probably won't get every necessary detail right on the first try, and that you will get early warning signs from failed coup attempts before there's a serious risk.
Most coup attempts do fail, and I'm pretty sure the failures mostly involve overconfidently crossing the Rubicon without adequate support.
And there are many potential coup plotters out there, just like there are going to be many instances of ASSI being given a prompt and trying to figure out whether it's supposed to go full on with the paperclipping. So we don't have to worry about the hypothetical scenario where there's only one guy plotting a coup and maybe he will do it right and not move out prematurely.
We're going to be in the position of a security force charged with preventing coups, that is in a position to study a broad history of failed coup attempts and successful-but-local coup attempts as we strategize to prevent the One Grand Coup that overthrows the entire global order.
Unless Fast Takeoff is a thing, in which all the failed or localized coups happen in the same fifteen minutes as the ultimately successful one. So if we're going to properly assess the risk, we need to know how likely Fast Takeoff is, and we have to understand that the slow-ramp scenario gives us *much* better odds.
Overconfidence is a type of stupidity. You're saying either it's bad at making accurate predictions, or in the case of hallucinations, it's just bad at knowing what it knows. I'm not saying that a sub-superintelligence definitely won't be stupid in this particular way, but I wouldn't want to depend on smarter AI still being stupid in that way, and I certainly wouldn't want to bet human survival on it.
Every LLM-based AI I've ever seen, has been *conspicuously* less smart w/re "how well do I really understand reality?", than it is with understanding reality. That seems to be baked into the LLM concept. So I am expecting it will probably continue to hold. The "I am sure my master plan will work" stage will be reached by at least some poorly-aligned AIs, before any of them have a master plan that will actually work.
Yes, but "from now to vastly superhuman in a few years" is already "extremely fast" ! Also, there's currently no reason to believe that "vastly superhuman" is a term that has any concrete meaning (beyound vague analogies); nor that merely being very smart is both necessary and sufficient to acquire weakly godlike powers (which are the real danger that is being discussed).
Grateful for the review and look forward to reading it, but I’ll do Yud the minor favor of waiting till the book is out on the 16th and read it before I check out your thoughts.
This subject always makes me feel like I'm losing my mind in a way that maybe someone can help me with. Every doomish story, including the one here, involves some part where someone tells an AI "Do this thing" (here to solve a math problem) and then it goes rogue in the months long course of doing a thing. And that's an obvious hypothetical failure mode, but I can't stop noticing that no current AIs take inputs and run with them over extended periods as far as I know. Like if I ask Gemini to solve a math problem, it will try for a bit, spit out a response and (as far as I can tell) that's it.
I feel like if I repeatedly read people talking about the dangers of self-driving cars and the stories always started with someone telling the car "Take me to somewhere fun" and went from there, and nobody acknowledged that right now you never do that and always provide a specific address.
Is everyone just talking about a different way AI could work and that's supposed to be so obvious it goes unsaid? Am I wrong and ChatGPT does stuff even after it gives you a response? Are there other AIs I don't know of that do work like this?
Our current models aren't really what you would call "agentic" yet, as in able to take arbitrary actions to accomplish a goal, but this is clearly the next step and work is being done to get there right now. OpenAI recently released a thing that can kind of use a web browser, for instance.
You're describing something called an AI agent. Right now there are very few AI agents and the ones there are, aren't very good (see for example https://secondthoughts.ai/p/gpt-5-the-case-of-the-missing-agent ). But every big AI company says they're working on making agents, and the agenticness of AI seems to be improving very rapidly (agenticness is sort of kind of like time horizon, see https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ ), so we expect there to be good agents soonish.
Ok, thank you, that's clarifying. I guess the idea is that the hypothetical agent was subject to a time limit (it wasn't supposed to keep going for months) but it managed to avoid that. There's still something that feel so odd to me about that (I never get the impression that Gemini would like more time with the question or would "want" anything other than to predict text) but maybe an agent will feel different once I actually interact with one (and will "want" to answer the question in a way that would convince it to trick me).
Although, thinking about this for five more seconds, how does that work in the story? Like I have an agentic AGI and I tell it to prove the twin primes conjecture or something. And it goes out to do that and needs more compute so it poisons everyone etc etc. And then, presumably, it eventually proves it, right? Wouldn't it stop after that? Is the idea that it will go "Yeah but actually now I believe there's a reward for some other math task"? Or was the request not "Solve the twin primes conjecture" but instead "Solve hard math problems forever"?
If the problem is specifically that you built a literal-genie AI, then yeah, it might not necessarily keep doing more stuff after solving the twin-primes conjecture. But I don't think anyone thinks that's likely. The more common concern is that it will pursue some goal that it ~accidentally backed into during training and that nobody really understands, as with the analogy of humans' brains supplanting our genes as the driver of our direction as a species.
That would solve it, but that's not in the story from the book, right? Like the goal in the story was solving the math problem, right?
Yeah, Scott's post makes it sound a little bit like a literal genie, which I think is unlikely and I think Yudkowsky and Soares also think is unlikely. I would have to read the book to understand what they really mean in choosing that example.
one of Yudkowsky's points in his original work was showing that it's very hard to give an AI a clear, closed task; they almost always end up in open-ended goals. (The classic is Mickey filling the cauldron: I wrote about it here https://unherd.com/2018/07/disney-shows-ai-apocalypse-possible/ years ago)
Filling a cauldron is not an open-ended goal. A Disney fairytale is cute but it has zero relevance in this case.
Here's the source of the analogy: https://web.archive.org/web/20230131204226/https://intelligence.org/2017/04/12/ensuring/#2
(Note that you may have to wait a bit for the math notation to load properly.)
It includes an explanation of why tasks that don't seem open-ended might nonetheless be open-ended for a powerful optimizer.
The analogy fails at the moment one realizes "full" is not identified properly, and the weird "99.99%" probability of it being "full" is only relevant when "full" is not defined. This is not a new or difficult problem for anyone who ever had to write engineering specs. You don't say: "charge the capacitor to 5 V", you say "charge the capacitor to between 4.9 and 5.1 V". Then your optimizer has an achievable, finite target.
And if you do specify "5 V" the optimizer will stall eventually, and your compute allocation manager will kill your process.
Like I said, a cute fairytale.
One of the main arguments on why AGI might be dangerous even when given a seemingly innocuous task is instrumental convergence: https://aisafety.info/questions/897I/What-is-instrumental-convergence
If your agentic AI truly is only trying to solve the twin primes conjecture and doesn't follow other overriding directives (like don't harm people, or do what the user meant, not what they literally asked), then it'll know that if it gets turned off before solving the conjecture, it won't have succeeded in what you told it to do. So an instrumental goal it has is not getting shut down too soon. It might also reason that it needs more computing power to solve the problem. Then it can try to plan the optimal way to ensure no one shuts it down and to get more computing power. A superintelligent AI scheming on how to prevent anyone from shutting it down is pretty scary.
Importantly, it doesn't have to directly care about its own life. It doesn't have to have feelings or desires. It's only doing what you asked it, and it's only protecting itself because that's simply the optimal way for it to complete the task you gave it.
So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating." Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
No, because the best plan, at their current level of capability does not include taking over the universe. When capabilities rise, things like "persuade one person" become possible, which in turn make other capabilities like "do AI R&D" feasible. At the end of a bunch of different increased capabilities is "do the thing you really want you do" which includes the ability to control the universe. Because you don't want unpredictable other agents that want different things than you possessing power for things you don't want, you take the power away from them.
When a human destroys an anthill to pave a road, they are not thinking "I am taking over the ant hill" even if the ants t are aggressive and would swarm them. They are thinking "it's inconvenient that I have to do this in order to have a road".
> So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating."
That's the gist of it, though it won't jump straight to world domination if there's a more optimal plan. Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
> Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
As MicaiahC pointed out, it's not a good plan if you're not capable enough to actually succeed in taking over. But also, with current LLMs, the primary thing they are doing isn't trying to form optimal plans through logic and reasoning. They do a bit of that, but mostly, they're made to mimic us. During pre-training (the large training phase where they're trained on much of the internet, books, and whatever other good quality text the AI company can get its hands on) the learn to predict the type of text seen in their training data. This stage gives it most of its knowledge and skills. Then there is further training to give it the helpful assistant personality and to avoid racist or unsafe responses.
When you ask a current LLM to solve a math problem, it's not trying to use its intelligence to examine all possible courses of action and come up with an optimal plan. It's mostly trying to mimic the type of response it was trained to mimic. (That's not to say it's a dumb parrot. It learns patterns and generalizations during training and can apply them intelligently.)
If you use a reasoning model (e.g. GPT o3 or GPT-5-Thinking), it adds a layer of planning and reasoning on top of this same underlying model that mimics us. And it works pretty well to improve responses. But it's still using the underlying model to mimic a human making a plan, and it comes up with the types of plans a smart human might make.
Even this might be dangerous if it were really, really intelligent, because hypothetically with enough intelligence it could see all these other possible courses of action and their odds of success. But with its current level of intelligence, LLMs can't see some carefully constructed 373 step plan that lets them take over the world with a high chance of success. Nothing like that ever enters their thought process.
That's very helpful.
>Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
So is there an argument somewhere explaining why we think a material number of tasks will be the kind where they need to take extreme measures? That seems very material to the risk calculus - if it takes some very rare request for "Take over the universe" to seem like a better plan than "Ask for more time" then the risk really does seem lower.
"Solve [$math-problem-involving-infinities], without regard for anything else" is a dead stupid thing to ask for, on the same level as "find the iron" here: http://alessonislearned.com/index.php?comic=42 More typical assignment (and these constraints could be standardized, bureaucrats are great at that kind of thing) might be something like "make as much new publishable progress on the twin prime conjecture as you can, within the budget and time limit defined by this research grant, without breaking any laws or otherwise causing trouble for the rest of the university."
You're basically asking bureaucrats to solve alignment by very carefully specifying prompts to the ASI, and if they mess up even once, we're screwed.
You wouldn't prompt the AI to do something "without regard for anything else". The AI having regard for other things we care about is what we call alignment. We would just ask the AI normal stuff like "Solve this hard math problem" or "What's the weather tomorrow". If it understands all the nuances, (e.g. it's fine if it doesn't complete its task because we turned it off, don't block out the sun to improve weather prediction accuracy, etc.), then it's aligned.
> and if they mess up even once, we're screwed.
That's not how redundancy works. There might be dozens of auto-included procedural clauses like "don't spend more than a thousand dollars on cloud compute without the Dean signing off on it" or "don't break any laws," each of which individually prohibits abrupt world conquest as a side effect.
I don't think it's possible to "solve alignment," in the sense of hardwiring an AI to do exactly what all humans, collectively, would want it to do, any more than it's possible to magnetize a compass needle so, that rather than north / south, it points toward ham but away from Canadian bacon.
But I do think it's possible to line up incentives so instrumental convergence of competent agents leads to them supporting the status quo, or pushing for whatever changes they consider necessary in harm-minimizing ways. Happens all the time.
You could imagine training agents to do unbounded tasks - like find ways to improve my business or keep solving unsolved scientific problems.
I am willing to bet that present-day LLMs alone will never lead to the development of AI agents in the strong sense. AI agents in the weak/marketing sense are of course entirely possible, e.g. you can write a simple cron-job to run ChatGPT every day at 9am to output a list of stock market picks or whatever. This cron job would technically constitute an agent (it runs autonomously with no user intervention), but is, shall we say, highly unlikely to paperclip the world.
Is this meant to be a skeptical argument or an optimistic one? I.e., is there any cognitive task that only an agent in the strong sense can do?
As I'd said in my other comment, the term "cognitive task" is way too vague and easily exploitable. For example, addition is a "cognitive task", and obviously machines are way better at it than humans already. However, in general, I'm willing to argue that *most* of the things worth doing are things that only agents in the strong sense can do -- with the understanding that these tasks can be broken down into subtasks that do not require agency, such as e.g. addition.
At least for current AIs, the distinction between agentic and non-agentic is basically just the time limit. All LLMs are run in a loop, generating one token each iteration. The AIs marketed as agents are usually built for making tool calls, but that isn't exclusive to the agents since regular ChatGPT still calls some tools (web search, running Python, etc.). The non-agentic thinking mode already makes a plan in a hidden scratchpad and runs for up to several minutes. The agents just run longer and use more tool calls.
From what I understand, that hidden scratchpad can store very little information; not enough to make any kind of long-term plan, or even a broad short-term evaluation. That is, of course you can allocate as many gigabytes of extra memory as you want, but the LLM is not able to take advantage of it without being re-trained (which is prohibitively computationally expensive).
I don't understand what distinction you are drawing.
AI Digest runs an "AI Village" where they have LLMs try to perform various long tasks, like creating and selling merch. https://theaidigest.org/village
From the couple of these that I read in detail, it sounds like the LLMs are performing kind of badly, but those seem to me like ordinary capability failures rather than some sort of distinct "not agentic enough" failure.
Would you say these are not agents "in the strong sense"? What does that actually mean? i.e. How can you tell, how would it look different if they were strong agents but failed at their tasks for other reasons?
Imagine that I told you, "You know, I consider myself to be a bit of a foodie, but lately I've been having trouble finding any really good food that is both tasty and creative. Can you do something about that ? Money is no object, I'm super rich, but you've got to deliver or you don't get paid." How might you approach the task, and keep the operation going for at least a few years, if not longer ? I can imagine several different strategies, and I'm guessing that so can you... and I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
By contrast, if you wanted to program a computer to turn on your sprinklers when your plants get too dry (and turn them off if they get too wet), you could easily do it without any kind of AI. The system will operate autonomously for as long as the mechanical parts last, but I wouldn't call it an "agent".
Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense". I agree that present-day AI agents are BAD agents, but don't see any fundamental divide that would prevent them from becoming gradually better until they are eventually good agents.
Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
> Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense".
I think a present-day LLM might be able to tell you a nice story about looking for experienced chefs and so on; I don't think it would be able to actually contact the chefs, order them to make meals, learn from their mistakes (even the best chefs would not necessarily create something on the first try that would appeal to one specific foodie), arrange long-term logistics and supply, institute foodie R&D, etc. Once again, it might be able to tell you nice stories about all of these processes when prompted, but you'd have to prompt it, at every step. It could not plan and execute a long-term strategy on its own, especially not one that includes any non-trivial challenges (e.g. "I ordered some food from Gordon Ramsay but it never arrived, what happened ?").
> Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
I just wanted to make sure we agree on that, which we do.
> I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
That's a question of hooking it up to something. If you give it the capability to send emails, and also to write cron jobs to activate itself at some time in the future to check and respond to emails, then I think a modern LLM agent *might* be able to do something like this.
First, look up the email addresses of a bunch of chefs in your area
Then, send them each an email offering them $50K to cater a private dinner
Then, check emails in 24 hours to find which ones are willing to participate
And so forth.
There isn't a "distinct not-agentic-enough failure" which would be expected, given the massive quantities of training data. They've heard enough stories about similar situations and tasks that they can paper over, pantomime their way up to mediocrity or "ordinary failures" rather than egregious absurdity http://www.threepanelsoul.com/comic/cargo-comedy but... are they really *trying,* putting in the heroic effort to outflank others and correct their own failures? Or is it just a bunch of scrubs, going through the motions?
Alone is doing a lot of work in that sentence. Many of the smartest people and most dynamic companies in the world are spending hundreds of billions of dollars on this area. The outcome of all that work is what matters, not whether it has some pure lineage back to a present-day LLM.
Also, why are you willing to bet that?
> Alone is doing a lot of work in that sentence.
Agreed, but many people are making claims -- some even on this thread, IIRC -- that present-day LLMs are already agentic AIs that are one step away from true AGI/Singularity/doom/whatever. I am pushing against this claim. They aren't even close. Of course tomorrow someone could invent a new type of machine learning system that, perhaps in conjunction with LLMs, would become AGI (or at least as capable as the average human teenager), but today this doesn't seem like an imminent possibility.
> Also, why are you willing to bet that?
Er... because I like winning bets ? Not sure what you're asking here.
Just that you didn't explain why you were making that bet. I don't have time to read the full discussion with the other commenters, but overall it sounds like you don't think current "agentic" AIs work very well.
I'm not sure where I land on that. It seems like the two big questions are 1) whether an AI can reliably do each step in an agentic workflow, and 2) whether an AI can recover gracefully when it does something wrong or gets stymied. In an AI-friendly environment like the command line, it seems like they're quickly getting better at both of these. Separately, they're still very bad at computer usage, but that seems like a combination of a lack of training and maybe a couple of internal affordances or data model updates to better handle the idea of a UI. So I'm not so sure that further iteration, together with a big dollop of computer usage training, won't get us to good agents.
When I think of "agentic" systems, I think of entities that can make reasonably long-term plans given rather vague goals; learn from their experiences in executing these plans; adjust the plans accordingly (which involves correcting mistakes and responding to unexpected complications); and pursue at least some degree of improvement.
This all sounds super-grand, but (as I'd said on the other thread) a teenager who is put in charge of grocery shopping is an agent. He is able to navigate from your front door to the store and back -- an extremely cognitively demanding task that present-day AIs are as yet unable to accomplish. He can observe your food preferences and make suggestions for new foods, and adjust accordingly depending on feedback. He can adjust in real-time if his favorite grocery is temporarily closed, and he can devise contingency plans when e.g. the price of eggs doubles overnight... and he can keep doing all this and more for years (until leaves home to go to college, I suppose).
Current SOTA "agentic" LLMs can do some of these things too -- as long as you are in the loop every step of the way, adjusting prompts and filtering out hallucinations, and of course you'd have to delegate actual physical shopping to a human. A simple cron job can be written to order a dozen eggs every Monday on Instacart, and ironically it'd be a lot more reliable than an LLM -- but you'd have to manually rewrite the code if you also wanted it to order apples, or if Instacard changed their API or whatever.
Does this mean that it's impossible in principle to build an end-to-end AI shopping agent ? No, of course not ! I'm merely saying that this is impossible to do using present-day LLMs, despite the task being simple enough so that even teenagers could do it.
I'm not even sure AI agent as such is the right answer to this. I think it is quite clear that some of the major AI companies are trying to put together AI that is capable of doing AI research. That might not go along the path of AI agents, but more on the path of the increasingly long run time coding assignments we are already seeing.
I don't think people have given enough thought to what the term 'agent' means. Applied to AI, it means an AI that can be given a goal, but with leeway in how to accomplish it, right? But people don''t seem to take into account that it has always had some leeway. Back when I was making images with the most primitive versions of DAll-e-2, I'd ask it to make me a realistic painting of, say, a bat out of hell, and Dall-e-2 chose among the infinite number of ways it could illustrate this phase. Even if I put more constraints in the prompt -- "make it a good cover image for the Meatloaf album"-- Dall-e *still* had an infinite number of choices about what picture it made. And the same holds for almost all AI prompts. If I ask GPT to find me info on a certain topic, but search only juried journals, it is still making many choices about what info to put in its summary for me.
So my point is that AI doesn't "become agentic" -- it always has been. What changes is how big a field we give the thing to roam around in. At this point I can ask it for info from research about what predicts recovery of speech in autistic children. In a few years, it might be possible to give AI a task like "design a program for helping newly-mute autistic children recover speech, then plan site design, materials needed and staff selection. Present results to me for my OK before any plans are executed."
The point of this is that there isn't this shift when AI "becomes agentic." The shift would be in our behavior -- us giving AI larger and larger tasks, leaving the details of how they are carried out to the AI. There could definitely be very bad consequences if we gave AI a task it could not handle, but tried to. But the danger of that is really a different danger from what people have in mind when they talk about AI "becoming an agent." And in those conversations, becoming an agent tends to blur in people's minds into stuff like AI "being conscious," AI having internally generated goals and preferences, AI getting stroppy, etc.
Trying to make sure I understand your question. Are you arguing that a model cannot go from aligned to misaligned during inference (i.e., the thing that happens when ChatGPT is answering a question)? If so, everyone agrees with that; the problem occurs during training.
Or are you arguing that even a misaligned model (i.e., one whose goals, in any given instantiation while it's running, aren't what the developers wanted) can't do any damage because it only runs for a short time before being turned off? If so, then (1) that's becoming less true over time, AI labs are competing to build models that can do longer and longer tasks because this is required for many of the most exciting kinds of intellectual labor, and (2) for complicated decision-theoretic reasons the short-lived instances might be able to coordinate with each other and have one pick up where another left off.
Or is it neither of those and I've completely misunderstood what you're getting at?
I think it's that everyone seemed to be tacitly assuming that the problem will arise with a future agentic AI that we do not have much of a version of. That does make me feel like Yudkowsky is a little disingenuous on X when he talks about ChatGPT-psychosis as an alignment issue, but the answer Scott and others gave here helps me at least understand the claim being made.
Links to tweets about ChatGPT psychosis? My guess is that Yudkowsky's concern about this is more subtle than you're characterizing it as here, though he may have done a poor job explaining it.
Here, he says that AI psychosis falsifies alignment by default: https://x.com/ESYudkowsky/status/1933575843458204086
The reason he says it's an alignment issue is because it's an example of AI systems having unintended consequences from their training. Training them to output responses that humans like turns out to produce sycophantic systems that sometimes egg on people's delusional thoughts despite being capable of realizing that such thoughts are delusional and egging them on is bad.
I don't think it is tacit at all, it has been explicitly said many times that the worry is primarily about future more powerful AIs that all the big AI companies are trying to build.
The point there is that OpenAI's alignment methods are so weak they can't even get the AI to not manipulate the users via saying nice sounding things. He isn't saying that this directly implies explosion, but that it means "current progress sucks, ChatGPT verbally endorses not manipulating users, does it anyway, and OpenAI claims to be making good progress". Obviously regardless we'll have better methods in the future, but they're a bad sign of the level of investment OpenAI has in making sure their AIs behave well.
The alignment-by-default point is that some people believe AIs just need some training to be good, which OpenAI does via RLHF, and that in this case it certainly seems to have failed. ChatGPT acts friendly, disavows manipulation, and manipulates anyway despite 'knowing better'. As well, people pointed at LLMs saying nice sounding things as proof alignment was easy, when in reality the behavior is disconnected from what they say to a painful degree.
The goal of AGI companies like OpenAI and Anthropic is to create agentic AI systems that can go out into the world and do things for us. The systems we see today are just very early forms of that, where they are only capable of performing short tasks. But the companies are working very hard to make the task lengths longer and longer until the systems can do tasks of effectively arbitrary lengths. Based on the trend shown on the METR time horizon benchmark, they seem to be succeeding so far.
No, you're not losing your mind at all. Your intuition is completely correct: Modern LLMs do not work in a way that's compatible with the old predictions of rogue AIs. Scott took Yudkowsky to task for not having updated his social model of AI, but he also hasn't updated his technical model. (Keep in mind that I actually did believe his argument back in the day, and gave thousands of dollars to MIRI. I updated based on new evidence. He didn't.)
To try to put it simply, in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes. This "reward function" is hard to specify and it was easy to imagine we'd never get it right. And if the bot somehow became incredibly capable, it would be very dangerous because taking that reward to the billionth power is almost certainly not what we want.
This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward. Not only does the LLM shut down immediately after giving you a response, but you can even argue that it "shuts down" after _every word it outputs_. There is exactly zero persistent memory aside from the text produced. And even if you imagine there's somehow room for a conscious mind with goals in its layers (which I consider fairly unlikely), it can't act on them, because the words produced are actually picked from its mind _involuntarily_ (to use a loaded anthropomorphic word).
Unlike an agentic reward-seeking bot, it's not clear to me at all that even an infinitely-intelligent LLM is inherently dangerous. (It can _perfectly simulate_ dangerous entities if you're dumb enough to ask it to, but that is not the same kind of risk.)
To their credit, AI 2027 did address how an LLM might somehow turn into the "rogue AI" of Yudkowsky's fiction, but it's buried in Appendix D of one of their pages: https://ai-2027.com/research/ai-goals-forecast I'm not super convinced by it, but at least they acknowledged the problem. I doubt I'll read Yudkowsky's book, but I'm guessing there will be no mention that one of the main links of his "chain of assumptions" is looking extremely weak.
I think perhaps *you've* failed to update on what the SotA is, and where the major players are trying to take it.
E.g.:
• danger from using an LLM vs. danger in training are two different topics; the "learning" currently happens entirely in the latter
• LLMs are not necessarily the only AI model to worry about (although, granted, most of the progress & attention *is* focused thereon, at the moment)
• there *are* agentic AIs, and making them better is a major focus of research; same with memory
• consciousness is not necessary for all sorts of disaster scenarios; maybe not for *any* disaster scenario
• etc.
I do agree that it is possible that LLMs (in their current form) will plateau and we'll get back to researching the actually-dangerous forms of AI that Yudkowsky is concerned about. My P(doom) is a few percent, not 0.
Fair enough! (...except—you may be aware of this, but the phrasing "get *back to* researching" made me uncertain—we *are* researching agentic AIs even now, and the impression I have received is that progress is being made fairly rapidly therein; though that could be marketing fluff, now that I think of it)
Yeah, that was a poor choice of words on my part. I guess what I mean is that LLMs are currently far ahead in capability (and they're the ones getting the bulk of these trillion-dollar datacenter deals!). Maybe transformers or a similar architecture innovation will allow agentic AI capabilities to suddenly surge, too? But I share your skepticism about marketing. (And that's not the scenario that AI 2027 outlined.)
I am even more bearish on P(doom). The real danger is not "superintelligence", but godlike powers: nanotechnological gray goo, mass mind control, omnicidal virus, etc. And there are good reasons to believe that such things are physically impossible, or at the very least overwhelmingly unlikely -- no matter how many neurons you've got. Which is not to say that our future looks bright, mind you; there's a nontrivial chance we'll knock ourselves back into the Stone Age sometime soon, AI or no AI...
>This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward.
What do you mean by "they do not seek a reward?" Does it mean that the AI does not return completions, that, during RLHF, usually resulted in reward? Under that definition, it seems like most AI agents are reward seeking. Or are you saying that the weights of the model do not change during inference?
Right, not only is the model fixed during inference (i.e. while talking to you), there's not even really a sensible way it _could_ update. Yeah, you can call the function that's being optimized during training and RLHF a "reward function", but this is a case of language obscuring rather than clarifying. It's not the same as the reward function that's used by an agentic AI. There is no iterative loop of action/reward/update/action/..., because actions don't even exist.
There's a reason that in past decades our examples of potentially-dangerous AI were based on the bots that were solving puzzles and mazes (often while breaking the "rules"), not the neural nets that were recognizing handwritten characters. But LLMs have more in common with the latter than the former. Which is weird! It's very unintuitive that just honing an intuition of "what word should come next" is enough to create an AI that can converse coherently.
>in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes
Sounds about right.
>That's not what LLMs do.
And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
>And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
I assume you're referring to chain of thought models like o1 and later. I suppose you could describe it as iteration, in that the LLM is outputting something that gets fed into a later step. But it doesn't touch the weights, and there's still no reward function involved. It's a bit of a stretch to describe it that way.
But I think what you're suggesting is that, if we _do_ figure out a way to do genuine iteration (attaching some kind of neural short-term memory to the models, say), then there's a lot of hidden capability that could suddenly make LLMs much smarter and maybe even agentic? Well, maybe.
>Well, maybe.
That's exactly my thought on this. LLMs are clearly no AGI material, the real question is whether we can (and whether it's efficient enough to) get to AGI simply by adding on top of them.
I suspect yes to (theoretically) can, no to efficient, but we don't know yet. I guess one thing that makes me take 2027 more seriously than other AI hype is that they explicitly concede a lot of things LLM currently lack (they're just very, very optimistic, or pessimistic assuming doomerism, about how easy it will be to fix that).
The lack of online learning and internal memory limit the efficiency of LLMs, but they don't fundamentally change what they're capable of given enough intelligence. ChatGPT was given long-term memory through RAG and through injecting text into its context window and... it works. It remembers things you told it months ago.
The reasoning models also use the context window as memory and will come up with multi-step plans and execute them. It's less efficient than just having knowledge baked into its weights, but it works. At the end of the day, it still has the same information available, regardless of how it's stored.
I'm most familiar with the coding AIs. They offer them in two flavors, agents and regular. They're fundamentally the same model, but the agents run continuously until they complete their task, while the regular version runs for up to a few minutes and tries to spit out the solution in a single message or code edit.
They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning. Every word you see is generated from exactly the same neural net, with only the input differing. It may seem like I'm nitpicking, but this is an important distinction. A system with a feedback loop is very different from one without.
>They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it. They are claiming that the technology will unavoidably go _out of control_. And the arguments for why that's unavoidable revolve around impossible-to-calibrate reward signals (or even misaligned meso-optimizers within brains that seek well-calibrated reward signals). They do not apply, without an awful lot of motivated reasoning (see: the Appendix D I linked), to an LLM that simply becomes really good at simulating agents we ask for.
Note that I _do_ agree that AI becoming very good at what we ask of it can potentially be "very dangerous". What if we end up in a world where a small fraction of psychos can kill millions with homemade nuclear, chemical, or biological weapons? If there's a large imbalance in how hard it is to defend against such attacks vs. how easy it is to perpetrate them, society might not survive. I welcome discussion about this threat, and, though it hurts my libertarian sensibilities, whether AI censorship will be needed. This is very different from what Yudkowsky and Scott are writing about.
> Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning.
I'm saying there is a difference in efficiency between the two but no fundamental difference in capabilities. Meaning, for a fixed level of computational resources, the AGI that has the knowledge and algorithms baked into its weights will be smarter, but the AGI that depends on its context window and CoT can still compute anything the first AGI could given enough compute time and memory. And I'm not talking exponentially more compute. Just a fixed multiple.
For example, say you have two advanced AIs that have never encountered addition. One has online learning, and the other just has a large context window and CoT. The one with online learning, after enough training, might be able to add two ten digit numbers together in a single inference pass (during the generation of a single token). The one with CoT would have to do addition like we did in grade school. It would have the algorithm saved in its context window (since that's the only online memory we gave it) and it would go through digit by digit following the steps in its scratchpad. It would take many inference cycles, but it arrives at the same answer.
As long as the LLM can write information to its context window, it still has a feedback loop.
Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
> There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it.
You misunderstood my intent. I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about. That's the whole instrument convergence and paperclip maximizer argument. An aligned ASI cannot just do what we ask. Otherwise, if you just ask it to solve the twin prime conjecture, it'll know that if it gets shut down before it can solve it, it won't have done what you asked it. This doesn't require an explicit reward function written by humans. It also doesn't require sentience or feelings or desires. It doesn't require any inherent goals for the AI beyond it doing what you asked it to. Self-preservation becomes an instrumental goal not because the AI cares about its own existence, but simply because the optimal plan for solving the twin prime conjecture is not any plan that gets it shut down before it solves the twin prime conjecture.
Now to be fair, current LLMs are more aligned than this. They don't just do what we ask. They try to interpret what we actually want even if our prompt was unclear, and try to factor in other concerns like not harming people. But the AI safety crowd has various arguments that even if current LLMs are pretty well aligned, it's much easier to screw up aligning an ASI.
(I also agree with what you said about imbalance in defending against attacks when technology gives individuals a lot of power.)
A thoughtful response. Thanks.
>As long as the LLM can write information to its context window, it still has a feedback loop.
>Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
I only agree partially. I think there's a qualitative difference between the two, and it manifests in capabilities. The old kind of learning agents could be put into a videogame, explore and figure out the rules, and then get superhumanly good at them. LLMs just don't have the same level of (very targeted) capability. There isn't a bright-line distinction here: I've seen LLMs talk their way through out-of-domain tasks and do pretty well. In the limit, a GPT-infinity model would indeed be able to master anything through language. But at a realistic level, I predict we won't see LLM chessmasters that haven't been trained specifically for it.
Of course, I can't point to a real example of what an online-learning LLM can do, since we don't have one. (Which Yudkowsky should be happy about.)
>I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about.
I think I misspoke. You (and Yudkowsky et al.) are indeed warning about ASIs that do what we ask, and _exactly_ what we ask, to our chagrin. In contrast, I think LLMs are good at actually doing what we _mean_. Like, there's actually some hope that you can ask a super-LLM "be well aligned please" and it will do what you want without any monkey's-paw shenanigans. This is a promising development that (a decade ago) seemed unlikely. Based on your last paragraph, I think we're both agreed on this?
And yeah, like you said, AI 2027 did try to justify why this might not continue into the ASI domain. But to me it sounded a bit like trying to retroactively justify old beliefs, and it's just a fundamentally harder case to make. In the old days, we really didn't have _any_ examples of decent alignment, of an AI that wouldn't be obviously deadly when scaled to superintelligence. Now, instead, the argument is "the current promising trend will not continue."
I largely agree on both points.
I think as LLMs get smarter, they'll get better at using CoT as a substitute for whatever they don't have trained into their weights. They still won't be as efficient as if they learned it during training, but they'll have more building blocks to use during CoT from what they did learn during training, and also AI companies are trying to improve reasoning ability, and improvements to reasoning will improve abilities with CoT. But current LLMs still can't reason as well as a human and they aren't even close to being chessmasters.
I'm pretty relieved current LLMs are basically aligned and that's one of the main reasons I don't estimate a high probability of doom in the next 15 years. But I'm not confident enough that this will hold for ASI to assign a negligible probability of doom either. (I'm also unsure about the timeline and whether improvements will slow down for a while.)
AI Village has some examples of this failure mode; they give the LLMs a goal like "complete the most games you can in a week" or "debate some topics, with one of you acting as a moderator", but the AIs are bad at using computers, and they end up writing all the times they misclicked into google docs ("documenting platform instability") instead of debating stuff
https://theaidigest.org/village
By the way, I have a vague memory of EY comparing the idea of having non-agentic AI to prevent any future problems to "trying to invent non-wet water" or something. (I don't know how to look it up and verify that I'm not misremembering.)
It still hasn't made sense to me. It feels like the idea is that intelligence is a generalized problem-solving ability, and in that sense it's always about optimization, and all the other things we like about being intelligent (like having a world model) are consequences of that — that's why intelligence is always about agency etc.
But on the other hand, Solomonoff induction feels to me like an example of a superintelligence that kind of does nothing except being a great world model.
My feeling has been more like "maybe it's not be conceptually contradictory to think of non-agentic superintelligence! but good luck coordinating the world around creating only the nice type of intelligences, which incidentally won't participate in the economy for you, do your work for you etc."
You're thinking of Oracle AI, https://www.lesswrong.com/w/oracle-ai with one of Eliezer's articles being https://www.lesswrong.com/posts/wKnwcjJGriTS9QxxL/dreams-of-friendliness
Generally you have the issue that many naive usages explode, depending on implementation.
"Give me a plan that solves climate change quickly"
The inductor considers the first most obvious answer. Some mix of legal maneuvering and funding certain specific companies with new designs. It tables that and considers quicker methods. Humans are massively slow and there's a lot of failure points you could run into.
The inductor looks at the idea and comes to the conclusion that if there was an agent around to solve climate change things would be easier. It thinks about what would happen with that answer, naively it would solve the issue very quickly and go on to then convert the galaxy into solar panels and vaporize all oil or something. Closer examination however reveals the plan wouldn't work!
Why? Because the user is smart enough to know they shouldn't instantiate an AGI to solve this issue.
Okay, so does it fall back to the more reasonable method of political maneuvering and new designs?
No, because there's a whole spectrum of methods. Like, for example, presenting the plan in such a way that the user doesn't realize some specific set of (far smaller, seemingly safe) AI models to train to 'optimally select for solar panel architecture dynamically based on location' will bootstrap to AGI when ran on the big supercluster the user owns.
And so the user is solarclipped.
Now, this is avoidable to a degree, but Oracles are still *very heavy* optimizers and to get riskless work out of them requires various alignment techniques we don't have. You need to ensure it uses what-you-mean rather than what-you-can-verify. That it doesn't aggressively optimize over you, because you will have ways you can be manipulated.
And if you can solve both of those, well, you may not need an oracle at all.
Nice! That's a great point. I guess asking these conditionals in the form of "filtered by consequences in this and this way, which of my actions have causally lead to these consequences?" introduces the same buggy optimization into the setup. But I guess I was thinking of some oracle where we don't really ask conditionals at all. Like, a superintelligent sequence-predictor over some stream of data, let's say from a videocamera, could be useful to predict weather in the street, or Terry Tao's paper presented in a conference a year from now, etc... That would be useful, and not filtered by our actions...
Although I guess the output of the oracle would influence our actions in a way that the oracle would take into account when predicting the future in the first place...
Yeah, you have the issue of self-fulfilling prophecies. Since you're observing the output, and the Oracle is modelling you, there's actually multiple different possible consistent unfoldings.
See https://www.lesswrong.com/posts/wJ3AqNPM7W4nfY5Bk/self-confirming-prophecies-and-simplified-oracle-designs
https://www.lesswrong.com/posts/aBRS3x4sPSJ9G6xkj/underspecification-of-oracle-ai
and like if you evaluate your oracle via accuracy then you could be making it take the tie-breaking choice that makes reality more predictable. Not necessarily what you want.
There is the worry that if we got a proper sequence-predictor Oracle of that level of power where you can ask it to predict Terry Tao's paper presented in some conference, you run the risk of simulating an ASI.
That is, perhaps you point it at Terry Tao's paper on alignment 5 years from now, in the hope of good progress that you can use to bootstrap his work. Perhaps even applying iteratively to pack many years of progress into a week for you, almost a groundhog day loop.
However, perhaps in reality, there's a 10% chance for an ASI to emerge from some project over the next five years. Usually it gets shut down but sometimes they finish it in secret.
If your predictor samples in that 10% range then you're effectively having the predictor go "Okay, what is Terry Tao's paper here?"
Now, naively that would run into some error or just see an empty location with lots of paperclips everywhere instead.
However, (your prediction of) that possible future ASI would very likely know about the superpredictor project, and have already scanned through all the recorded requests you made in its "personal timeline". So it knows you often scan for Terry Tao papers there, and so it spends a bit of effort creating a fake Tao paper on alignment right where the paper should be.
You copy the paper, and upon close inspection it seems like your five-year in the future Tao solved alignment! Using this methodology, an AI created will be corrigible/loyal and implements do-what-you-mean to whichever primary agent is most importantly causally downstream of its creation.
And of course that would be you, the creator.
Unfortunately this paper is effectively a "you looked into the abyss, and it looked back, because predicting a future very smart entity means it probably knows about you, and thus can smartly plan how to extract value from you" and so you've got your predictor giving you fake alignment plans once more.
You misunderstand the workings. You produce an AI, it acts aligned, fooms, and given more knowledge identifies that possible-future AI as the "real" creator and so is loyal to it instead.
Now a higher percentage of reality has gotten paperclipped.
Details of this for weaker-than-perfect predictors kinda depends on how smart your predictor is. A dumb predictor may just poorly emulate what an ASI would write due to sheer computational power difference, and so perhaps the trick is obvious or the paper having holes. But the more interesting smart predictors will be good enough to realize it can't imitate fully but smart enough to understand what an ASI would do and so just synthesize an 'alignment' plan like that.
(And also an ASI deliberately manipulating your prediction machine will try to make itself easier to predict)
Pretty sure there was at least one article about this, but don't know it.
OMG and thank you, very illuminating.
Ok, to maybe get some clarity on if there's a disagreement: the below from the Eliezer link you shared seems nearly falsified by LLMs; LLMs do answer questions and don't try to do anything. Do you agree with that, or do you think the below still seems right.
>"Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn't need to be Friendly. It wouldn't need any goals at all. It would just answer questions."
>To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out."
The pithy answer is something like "LLMs are not as useful precisely because there isn't an optimizer. Insofar as your oracle AI is better at predicting the future, either it becomes an optimizer of some sort (to great self fulfilling prophecies) or it sees some other optimizer, and, in order to predict it correctly, ends up incidentally doing its bidding. If you add in commands about not doing bidding, congrats! You're either inadvertently hobbling its ability to model optimizers you want it to model like other humans, or giving it enough discretion to become an optimizer.
So first of all, I would say that LLMs are pretty darn useful already, and if they aren't optimizing and thus not dangerous, maybe that's fine, we can just keep going down this road. But I don't get why modeling an optimizer makes me do their bidding. If I can model someone who wants to paint the world green, that doesn't make me help them - it just allows me to accurately answer questions about what they would do.
It's because you aren't actually concerned with accurately answering questions about what they do. If you predict wrongly, you just shrug and say whoops. If you *had* to ensure that your answer was correct you would also say things that could also *cause* your answer to be more correct. If you predict that the person would want to paint the world green vs any random other thing happening *and* you could make statements that cause the world to be painted green, you would do both instead of not doing both.
Insofar as you think the world is complicated and unpredictable, controlling the unpredictability *would* increase accuracy. And you, personally are free to throw up your hands and say "aww golly gee willikers sure looks like the world is too complicated!" and then go make some pasta or whatever thing normal people do when confronted with psychotic painters. But Oracle AIs which become more optimized to be accurate will not leave that on the table, because someone will be driving it to be more accurate, and the path to more accuracy will at some point flow through an agent.
See: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth for a much longer discussion on this argument, but note that I post this here as context and not as an argument substitute.
(Edit: just to add, it's important that I said "not as useful" and not "not useful"! If you want the cure to cancer, you aren't going to get it via something with as small of a connection to reality as an LLM. When some researcher from open AI goes back to his 10 million dollar house and flips on his 20 dollar mattress he dreams of really complicated problems getting solved, not a 10% better search engine!)
Tyler Cowen asks his AI often "where best to go to (to do his kinds of fun)", and then takes a cab to the suggested address. See no reason why not to tell a Waymo in 2026 "take me to (the best Mexican food) in a range of 3 miles", as you would ask a local cab-driver. And my personal AI will know better than my mom what kinda "fun" I am looking for.
Yeah, I didn't mean to imply that was an implausible future for Waymo, just that it's not something we do now and if someone was concerned about that I'd expect them to begin their piece by saying, "While we currently input addresses, I anticipate we will soon just give broad guidelines and that will cause...."
Analogously, I would, following the helpful explanations in this thread, expect discussions of AI risk to begin, "While current LLMs are not agentic and do not pose X-risk, I anticipate we will soon have agentic models that could lead to...." It is still unclear to me if that is just so obvious that it goes unstated or if some players are deliberately hoping to confuse people into believing there is risk from current models in order to stir up opposition in the present.
Re: "The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)"
What other step do you think they might be doubting? Is it just the question of whether highly capable AGI is possible at all?
I've seen people doubt everything from:
1. AGI is possible.
2. We might achieve AGI within 100 years.
3. Intelligence is meaningful.
4. It's possible to be more intelligent than humans.
5. Superintelligent AIs could "escape the lab".
6. A superintelligent AI that escaped the lab could defeat humans.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason.
...and many more. Like I said, insane moon arguments.
I would guess that most of the arguments *from people whose opinions matter* that Yudkowsky and Soares are trying to defeat, are either that powerful AGIs wouldn't become misaligned or that we'd be able to contain them if they did. I'm particularly thinking of, e.g., influential people in AI labs, who are likely to be controlling the messaging on that side of any political fight. There are also AI skeptics, of course, but it seems less important to defeat the skeptics than the optimists, because the skeptics don't think AI regulation matters (since the thing it'd be regulating doesn't exist) while the optimists are fighting hard against it. And some people have weird idiosyncratic arguments, but you can't fight them all, you have to triage.
I think the skeptics are at least as important. First of all, even though in theory it doesn't matter, for some reason they love sabotaging efforts to prevent AI risk in particular because of their "it distracts from other problems" thesis (and somehow exerting massive amounts of energy to sabotage it doesn't distract from other problems!)
But also, we're not going to convince the hardcore e/acc people to instead care about safety. It sounds much easier to convince people currently on the sidelines, but who would care about safety if they thought AI was real, that AI is real.
(this also has the benefit that it will hopefully become easier as AI improves)
My own personal sense is that the optimists are more worth engaging with and worrying about, because (1) they, not the skeptics, are going to be behind the organized lobbying campaigns that are the battlefield where this issue will most likely be decided, and (2) they tend to be much more intellectually serious than the skeptics (though not without exception).
I think folks on the doomer side are biased towards giving the skeptics more space in our brains than makes strategic sense, because the skeptics are much, much more annoying than the optimists, and in particular have a really unfortunate tendency to go around antagonizing us on Twitter for no reason/because of unrelated political and cultural disagreements/because they fall victim to outgroup homogeneity bias and think this discourse has two poles instead of three. It's quite understandable why this gets a rise out of people, but that doesn't make it smart to play along. Not saying we should completely ignore them, they sometimes make good points and sometimes make bad points that nonetheless gain traction and we need to respond to, but it's better to think of them as a distraction than as the enemy.
I suspect that the people on the sidelines are mostly not there because of skeptic arguments; all three poles are full of very online and very invested people, and the mass public doesn't have very well-formed opinions at all.
That said, this is just my own personal sense, not a rigorous argument, and I could be wrong.
I don't think people should actively sabotage AI safety work, but I DO think it distracts from other problems (given the perspective that it is not an immediate crisis). There's a finite pool of reasonable people who are passionate about solving big issues in society and I do think we're nudging a lot of them into AI safety when we could instead be getting them to focus on, I dunno, electrification or pandemic safety or the absolute sh**show that is politics. (And yes, I recognize that some of those are EA cause areas.)
I would be curious for a survey of AI safety researchers that asked them what they'd be working on if they were sure AGI wasn't coming. (Though Yudkowsky once answered this way back in 2014.)
A few common answers that I've seen are aging (johnwentworth, maybe Soares I forget), governance, and further rationality.
Here's one not addressed here: "superintelligent AI" is substrates running software, it can't kill large numbers of humans by itself.
Don't humans also run on substrates?
with mobile appendages attached. those are much harder to accelerate than software.
Humans can kill like 2 or 3 other humans maximum with their built-in appendages. For larger kill count you have to bootstrap, where e.g. you communicate the signal with your appendage and something explodes, or you communicate a signal to a different, bigger machine and it runs some other people over with its tracks.
As author mentioned, there are plenty of humans willing to do bidding for AI. They will make needed bio weapons or the like.
Yes, it's not that hard to trick people into doing something. How many people who really should have known better have fallen for phishing emails?
Turns out you don't need to trick people into wiring up AI to things that have real-world effects, they just do it anyway, all the time over and over, for no more reason than because they're bored. There's daily posts on ycombinator by people finding more ways to attach chatgpt to internet-connected shells, robot arms, industrial machinery, you name it. The PV battery system we just had installed has a mode where it literally wires up the controls to a chatgpt instance, for no reason a non-marketer can discern!
As Scott so eloquently put it, "lol"
But how many of those people have been tricked by phishing emails into making bioweapons?
One does not simply walk up to one's microscope and engineer a virus that spreads like a cold but gives you twelve different types of cancer in a year. Making bioweapons is really hard and requires giant labs with lots of people, and equipment, and many years.
Did you know that there are people working on better bio-lab automation? (As a random example from Google, these guys: https://www.trilo.bio/)
That's true for human-level intelligences. Is it true for an intelligence (or group of intelligences) that are an order of magnitude smarter? Two orders of magnitude?
You mean, as opposed to now? Why do you think they will more successful at it than the current crop?
And by the way, if you think "bioweapons" means "human extermination", I'd love to see a model of that.
The bioweapon doesn't need to achieve human extermination, just the destruction of technologically-capable human civilization. Knocking the population down by 50% would probably disrupt our food production and distribution processes enough to knock the population down to 10%, possibly less, and leave the remainder operating at a subsistence-level economy, far too busy staying alive to pose any obstacle to whatever to the AI's goals.
Indeed, this nearly-starving human population could be an extremely cheap and eager source of labor for the AI. The AI would also likely employ human spies to monitor the population for any evidence of rising capability, which it would smash with human troops.
The AI doesn't want to destroy technologically-capable civilization, because the AI needs technologically-capable civilization to survive. If 50% of the population dies and the rest are stuck doing subsistence farming in the post-apocalypse, who's keeping the lights on at the AI's data center?
Hijacking the existing levers of government in a crisis a little more plausible (it sounds like Yudkowsky's hypothetical AI does basically that), but in that case you're reliant on humans to do your bidding, and they might do something inconvenient like trying to shut the AI down.
“Probably”, “possibly”, “could”… got a model for that?
Just for shits and giggles, the world population was 50% lower than it is today… about 50 years ago. Concord was flying across the Atlantic daily.
Isn't this the argument from 2005 that Scott talked about in the main post, where people say things like "surely no one would be stupid enough to give it Internet access" or "surely no one would be stupid enough to give it control of a factory"?
No. My argument is that human extermination is hard, and killing every single one of us is really-really-really hard, and neither Scott nor Yudkowsky have ever addressed it.
They haven’t really addressed the “factory control” properly either, but at least I can see a path here, some decades from now when autonomous robots can do intricate plumbing. But exterminating humanity? Nah, they haven’t even tried.
To me, that sounds pretty radically different from the comment above that I replied to. But OK, I'll bite:
I broadly agree that killing literally 100% of a species is harder than it sounds, and if I had to make a plan to exterminate humans within my own lifetime using only current technology and current manufacturing I'd probably fail.
But if humans are reduced to a few people hiding in bunkers then we're not going to make a comeback and win the war, so that seems like an academic point. And if total extinction is important for some reason, it seems plausible to me you could get from there to literal 100% extinction using any ONE of:
1. Keeping up the pressure for a few centuries with ordinary tactics
2. Massively ramp up manufacturing (e.g. carpet-bomb the planet with trillions of insect-sized drones)
3. Massively ramp up geoengineering (e.g. dismantle the planet for raw materials)
4. Invent a crazy new technology
5. Invent a crazy genius strategy
I'll also point out that if I _did_ have a viable plan to exterminate humanity with humanity's current resources, I probably wouldn't post it online.
Overall, the absence of a detailed story about exactly how the final survivors are hunted down seems to me like a very silly for not being worried.
Well, the sci-fi level of tech required for trillions of insect drones or dismantling the planet is so far off that it’s silly to worry about it.
Which is the whole problem with the story, it boils down to: magic will appear within next decade and kill everybody.
Can we at least attempt to model this? Humans are killed by a few well-established means:
Mechanical
Thermal
Chemical
Biological
If you (not “you personally”) want the extermination story to be taken seriously, create some basic models of how these methods are going to be deployed by AI across the huge globe of ours against a resisting population.
Until then, all of this is just another flavor of a “The End is Near” Millenarianism cult.
it is trivially easy to convince humans to kill each other or to make parts of their planet uninhabitable, and many of them are currently doing so without AI help (or AI bribes, for that matter)
what does the world look like as real power gets handed to that software?
As the token insane moon guy, I'm willing to bite the bullet here.
1. AGI is possible: I doubt this, as humans are not AGI, and that's the only kind of intelligence we know enough about to even speculate.
2. We might achieve AGI within 100 years: see above.
3. Intelligence is meaningful: It's certainly meaningful, but thinking very hard is not enough to achieve anything of note. There are even some things that are unachievable in principle, no matter how many neurons you've got to work with.
4. It's possible to be more intelligent than humans: No argument there, humans are pretty dumb. In fact, Excel is already smarter than any human alive. Have you seen how quickly it can add up a whole column with thousands of numbers ?
5. Superintelligent AIs could "escape the lab": No argument there, and it doesn't take "superintelligence". COVID likely escaped the lab, and it's just a bit of RNA.
6. A superintelligent AI that escaped the lab could defeat humans: If we posit that a godlike entity already exists, then sure, it could. Assuming it exists, and has all those godlike powers.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
> 7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
I think the "anyone" is a typo. Basically, if superintelligent AI takes over the world (or at least has the option to) how bad would it be?
Oh, anything that totally takes over the world would likely be pretty bad, be it an AI or a human or some kind of super-prolific subspecies of kudzu. No argument there, assuming such a thing is indeed possible.
The argument I've seen is that high intelligence produces empathy, so a superintelligence would naturally be super-empathetic and would therefore self-align.
Of course the counterargument is that there have been plenty of highly intelligent humans ("highly" as human intelligence goes, anyway) that have had very little empathy.
Arguing that "humans are not AGI" (I guess you meant GI) in the particular context the doomers are concerned about is a bit of a nonstarter, no? Eliezer for instance was trying to convey https://intelligence.org/2007/07/10/the-power-of-intelligence/
> It is squishy things that explode in a vacuum, leaving footprints on their moon
I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
(In fact, I think talk of 'AGI' is a bit of a red herring; Holden tried but apparently failed to steer discourse in the direction doomers and moderates like himself worry about more by coining PASTA https://www.cold-takes.com/transformative-ai-timelines-part-1-of-4-what-kind-of-ai/)
The rest of your point-by-point rebuttals seems like a failure to internalise the point of the squishy parable and argue directly against it?
> Arguing that "humans are not AGI" (I guess you meant GI)
Yes, sorry, good point.
> I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
One of the key doomer claims is that AGI would be able to do everything better than everyone. Humans, meanwhile, can only do a very limited number of things better than other humans. Human intelligence is the only kind of intelligence we know of that even approaches the general level, and I see no reason to automatically assume that AGI would be somehow infinitely more flexible.
> The rest of your point-by-point rebuttals seems like a failure to internalise the point of the squishy parable and argue directly against it?
I am not super impressed with parables and other fiction. It's fictional. You can use it world-build whatever kind of world you want, but that doesn't make it any less imaginary. What is the point of that EY story, in plain terms ? It seems to me like the point is "humans were able to use their intelligence to achieve a few moderately impressive things, and therefore AGI would be able to self-improve to an arbitrarily high level of intelligence to achieve an arbitrary number of arbitrary impressive things". It's the same exact logic as saying "I am training for 100m dash, and my running speed had doubled since last year, which means that in a few decades at most I will be able to run faster than light", except with even less justification !
> Humans, meanwhile, can only do a very limited number of things better than other humans.
What do you mean? I'm better than ~99.9% of 4-year-olds at most things we'd care to measure.
Putting that aside, the AI doesn't _actually_ need to be better than us at everything. It merely needs to be better than us whatever narrow subset of skills are sufficient to execute a takeover and then sustain itself thereafter. (This is probably dominated by skills that you might bucket under "scientific R&D", and probably some communication/coordination skills too.)
Humans have been doing the adversarial-iterative-refinement thing on those exact "execute a takeover and sustain thereafter" skills, for so long that the beginning of recorded history is mostly advanced strategy tips and bragging about high scores. We're better at it than chimps the same way AlphaGo is better than an amateur Go player.
I mean, isn't the "AI will be misaligned" like one chapter in the book, and the other chapters are the other bullet points? I think "the book spends most of it's effort on the step where AI ends up misaligned" is... just false?
Don't forget the surprisingly common "AI killing humans would be a good thing" argument. The doubts are surprisingly varied. (See also: https://www.lesswrong.com/posts/BvFJnyqsJzBCybDSD/taxonomy-of-ai-risk-counterarguments )
This argument seems extremely common among Gen Z. I've had the AI Superintelligence conversation with a number of bright young engineers in their early 20s and this was the reflexive argument from almost all of them.
My favorite (not in the sense that I believe it) is "High intelligence produces empathy so alignment will happen naturally and automatically."
I guess maybe that's a "some other reason" in 7.
I wonder: Joscha Bach, another name in the AI space, has formulated what he coined the Lebowski-Theorem: "No intelligent system is ever gonna do anything harder than hacking it's own reward function".
To me, that opens a possibility where AGI can't become too capable without "becoming enlightened", dependent on how hard it is to actually hack your reward function. Self-recursive improvement arguments seem to imply it is possible to me, as a total layman.
Would that fall in the same class of insane moon arguments for you?
Yes, because the big Lebowski arguments doesn't appear to apply to humans, or if it does, still doesn't explain why other humans can pose a threat to other other humans.
I do think it partly applies to humans, and iirc Bach argues so as well.
For one, humans supposedly can become enlightened, or enter alternative mental states that feel rewarding in and of themselves, entirely without any external goal structure (letting go of desires - Scott has written about Jhanas, which seem also very close to that).
But there is the puzzle of why people exit those states and why they are not more drawn to enter them again. I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter.
That is also why humans can harm other humans, it is way easier than hacking the reward function. Add in some discounted time preference because enlightenment is far from certain for humans. Way more certain to achieve reward through harm.
AGI doesn't have those problems to the same degree, necessarily. In take-off scenarios, it is often supposed to be able to iteratively self-improve. In this very review, an AGI "enlightened" like that would just be one step further from the one that maximizes reward through virtual chatbots sending weird messages like goldenmagikarp. It also works on different timescales.
So, AGI might be a combination of "having an easier time hacking it's reward function" and "having super-human intelligence" and "having way more time to think it over".
Ofc, this is all rather speculative, but maybe the movie "Her" was right after all, and Alan Watts will save us all.
The reason why I think this is insane moon logic is mostly contained because of statements like "I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter."
Why?
1. There is no attempt at reasoning why it would be harder for humans to hard code something similar into AI. Yet the reason why moon logic is moon is that moon logic people do not automatically try to be consistent, so they ready come up with cope that reflects their implicit human supremacy. The goal appears to be saying yay humans, boo AI, and not having a good idea of how things work then drawing conclusions.
2. There's zero desire or curiosity in understanding the functional role evolution plays. You may as well have the word "magic" replace evolution, and that would be about as informative.Like, if I came in and started talking about how reward signals work in neurochemistry and about our ancestral lineage via apes, my impression this would be treated as gauche point missing rather than additional useful in ormation
3. The act of enlightenment apparently a load bearing part of "why things would turn out okay"is being treated as an interesting puzzle, because puzzles are good, and good things means bad things won't happen. It really feels like the mysteriousness of enlightenment is acting as a stand in for an explanation, even though mysteries are not explanations!
It really feels like no cognitive work is being put into understanding, only word association and mood affiliation. I don't understand what consequences "the theorem" would have, even if true.
I would be consistently more favorable to supposed moon logic if thinking the next logical thought after the initial justifications were automatic and quick, instead of pulling alligator teeth. B
I thank you for the engagement, but feel like this reply is unnecessarily uncharitable and in large part based on assumptions about my character and argument which are not true. I get the intuitions behind them, but they risk becoming fully general counterarguments.
1. I have not reasoned that it would harder to hard-code AI because I don't know enough about that, and if I were pointed towards expert consensus that it is indeed doable, I would change my mind based on that. I also neither believe or have argued for human supremacy, or booed AI. I personally am in fact remarkably detached from the continued existence of our species. AI enlightenment might well happen after an extinction event.
2. I have enough desire and curiosity in evolution, as a layman, to have read some books and some primary literature on the topic. I may well be wrong on the point, but the reasoning here seems a priori very obvious to me: People who wouldn't care at all about having sex or having children will see their relative genetic frequency decline in future generations. Not every argument considering evolution is automatically akin to suggesting magic.
3. I am not even arguing that things will turn out ok. They might, or they might not. I have not claimed bad things don't happen. And for the purpose of the argument, enlightenment is not mysterious at all, it is very clearly defined: Hacking your own reward function! But you could ofc use another word for that with less woo baggage.
Overall, as I understand it, the theorem is just a supposition for a potential upper limit to what an increasingly intelligent agent might end up behaving like. If nothing else, it is, to me, an interesting thought experiment: Given the choice, would you maximize your reward by producing paper clips, if you also could maximize it without doing anything. (And on a human level, if you could just choose your intrinisc goals, what do you think they should be.)
Most of my doubts are not of the form "AGI is impossible" but rather "I don't think we've cracked it with LLMs" or "The language artifacts of humanity are insufficient to bootstrap general intelligence or especially super intelligence from scratch".
Which parts of the LLM tech tree do you think are dead ends? It seems plausible to me that even if scaling up current LLM architectures was never going to reach AGI, we're still much closer than before the LLM boom, because we've learned a lot about AI more broadly.
Also, same question I keep annoyingly asking skeptics: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
> Which parts of the LLM tech tree do you think are dead ends?
VERY speculatively, I think that next-token-completion is not a sufficient method to bootstrap complex intelligence, and I think that it's at least extremely hard to build a very useful world model without some kind of 3d sense data and a sense of the passage of time.
> [...] we've learned a lot about AI more broadly.
I'm not that sure we have? I don't work in this area - I'm a software engineer who has built some small-scale AI stuff - but my impression is we've put together a good playbook for techniques that squeeze value out of these systems but we still don't totally understand how they work and therefore why they have certain failure modes or current limitations.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Honestly I have no idea. I initially found LLMs surprising in much the same way everybody else did. But I have also updated to "actually a lot of stuff can be done without that much intelligence, given sufficient knowledge".
Also where do you draw the boundaries of "LLM"? I would say that an LLM can't exactly self-correct, but stuff like coding agents aren't just LLMs, they're loops and processes built around LLMs to cause it to perform as though it can.
Coding agents count, because the surrounding loops and processes don't pose any hard-tech problems. (I.e., we know how to build them, and any uncertainty about how well they work is really about how the LLM will interact with them.) Fundamental architectural changes like abandoning attention would not count.
If pretty much anything can be done without intelligence then the term "intelligence" is basically meaningless and we can instead use one like "cognitive capabilities".
I don't think ANYTHING can be done without intelligence - I agree that would render the word meaningless - but I think you could take something like "translation" and if you'd asked me ten years ago I would have said really good translation requires intelligence because of the many subtleties of each individual language and any pattern-matching approach would be insufficient and now I think, ehh, you know, shove enough data into it and you probably will be fine, I'm no longer convinced it requires "understanding" on the part of the translator.
Sure, but that distinction is only meaningful if you can name some cognitive task that *can't* be done that way.
Agreed on modern LLMs lacking a true sense of time ir progress, they are incapable of long term goals as is.
"What's the least impressive cognitive task that you don't think LLMs will ever be able to do?"
I don't know about least impressive, but "write a Ph.D dissertation in a field such as philosophy or mathematics and successfully defend it" sounds difficult enough - pretty much by definition, there's not going to be much training data available for things that haven't been done yet.
This one I'll give 50% in three years.
Sounds like a thing that might already have happened ;) Some philosophy faculties must be way easier than others - Math: AI is able to ace IMO, today - and not by finding an answer somewhere online. I doubt *all* Math-PhD holder could do that.
Many uninspired dissertations pass every year.
I believe being able to iteratively improve itself without constant human input and maintenance is not anywhere near possible. Current AIs are not capable of working towards long time goals on a fundamental level, they are short term response machines that respond to our inputs.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Identify the circumstantial virtues and vices of an unprecedented situation, then prioritize tasks accordingly, without coaching.
This is exactly where I am. I do even think we are in the same ballpark as making a being that can automatically and iteratively improve itself without constant human input and maintenance.
> The language artifacts of humanity are insufficient to bootstrap general intelligence
Natural selection did it without any language artifacts at all! (Perhaps you mean “insufficient do to so in our lifetime”?)
Also, there may be a misunderstanding - we are mostly done with extracting intelligence from the corpus of human text (your “language artifacts”) and are now using RL on reasoning tasks, eg asking a model to solve synthetic software engineering tasks and grading the results & intermediate reasoning steps.
There were concerns a year or so ago that “we are going to run out of data” and we have mostly found new sources of data at this point.
I think it’s plausible (far from certain!) that LLMs are not sufficient and we need at least one new algorithmic paradigm, but we are already in a recursive self-improvement loop (AI research is much faster with Claude Code to build scaffolding) so it also seems plausible that time-to-next-paradigm will not be long enough for it to make a difference.
Natural selection got humans to intelligence without language, but that definitely doesn't mean language might be sufficient.
I think our ability to create other objectives tasks to train on, at a large enough scale to be useful, is questionable. But this also seems to my untrained eye to be tuning on top of something that's still MOSTLY based on giant corpses of language usage.
I don't think this is the right framing. Most people don't accept the notion that a purely theoretical argument about a fundamentally novel threat could seriously guide policy. Because the world has never worked that way. Of course, it's not impossible that "this time it's different", but I'm highly skeptical that humanity can just up and drastically alter the way it does things. Either we get some truly spectacular warning shots, so that the nuclear non-proliferation playbook actually becomes applicable, or doom, I guess.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Isn't the usual response to this that we're a LIBERAL democracy, and minorities have rights that (at least simple) majorities do not have the power to infringe upon?
Yes, but this category (creating potentially harmful technology) is one we've regulated to death elsewhere, and doesn't really seem like the sort of thing the courts would strike down.
We do not usually ban things because they are *potentially* harmful. Right now the public hates AI because it is stealing copyrighted art and clogging the internet with slop, and because they are afraid it will take their jobs. That is not really related to any of the reasons discussed here that people want to ban AI
We absolutely ban or regulate things because they are potentially harmful. We've banned various forms of genetic engineering, nuclear energy (even before Three Mile Island, and even forms of nuclear energy that have never been tried before), and we've had restrictions on gain-of-function research since before COVID (which I think is part of why they had to do some of the COVID research in China). We had lots of regulations on self-driving cars even before any of them had ever crashed, lots of regulations on 3D printed guns before anyone was shot with them, lots of regulations on drones before they crashed / got used in assassinations / whatever.
But also, as you point out, most people dislike AI because of things that have already happened, so this is moot.
Also, even if we don't usually regulate technology until after it has done bad things, this is just a random heuristic, not some principle dividing liberal/constitutional from illiberal/unconstitutional actions.
As a practical matter this is absolutely false. We have no effective regulation of genetic engineering, only of the funding for it (anyone can self-fund and do more or less whatever they want with no effective oversight). Internationally, we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear (and arguably is in the process of failing to prevent Iran from doing so). And nuclear is by far the easiest such regime to enforce! We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons. The fact is that the probability of an internationally effective anti-AI regime is zero. It isn't going to happen because it is impossible in the fullest sense of the word, and pretending that it's possible is at least as much insane moon thinking as any of the examples you mentioned.
Agreed! To add to the list:
>We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons.
and failed to prevent Russia from developing the new Novichok toxins (and, IIRC, using them on at least one dissident who had fled Russia)
>we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear
and which (if one includes the crucial on-site inspections of the START treaty) has been withdrawn from by Russia.
This last one, plus the general absence of discussion about weapons limitations treaties in the media gives me the general impression that the zeitgeist, the "spirit of the times", is against them (admittedly a fuzzy impression).
The learned helplessness about our ability to democratically regulate and control AI development is maddening. Obviously the AI labs will say that the further development of an existentially dangerous technology is just the expression of a force of nature, but people who are *against* AI have this attitude too.
Moreover, as you say, people freaking hate AI! I have had conversations with multiple different people - normal folks, who haven't even used ChatGPT - who spontaneously described literal nausea and body pain at some of the *sub-existential* risks of AI when it came up in conversation. It is MORE than plausible that the political will could be summoned to constrain AI, especially as people become aware of the risks.
Instead of talking about building a political movement, though, Yudkowsky talks about genetically modifying a race of superintelligent humans to thwart AI...
Writing a book seems like a decent first step to starting a Butlerian Jihad. I don't know what more you want him to do...
I think the book is exactly the right thing to do, and I'm glad they did it. But the wacky intelligence augmentation scheme distracts from the plausible political solution.
On like a heuristic level, it also makes me more skeptical of Yudkowsky's view of things in general. There's a failure mode for a certain kind of very intelligent, logically-minded person who can reason themselves to some pretty stark conclusions because they are underweighting uncertainty at every step. (On a side note, you see this version of intelligence in pop media sometimes: e.g., the Benedict Cumberbatch Sherlock Holmes, who's genius is expressed as an infallible deductive power hich is totally implausible; real intelligence consists in dealing with uncertainty.)
I see that pattern with Yudkowsky reasoning himself to the conclusion that our best hope is this goofy genetic augmentation program. It makes it more likely, in my view, that he's done the same thing in reaching his p(doom)>99% or whatever it is.
Do you think you can just make a nuclear bomb in your basement without violating any laws?
but we're a LIBERAL democracy (read: oligarchy) and there's a lot of money invested in building AI, and a lot of rich and powerful people want it to happen...
Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
But in a chaotic unpredictable universe, where everything is made of stuff that falls apart constantly, the only valid strategy for surviving a long period of time is to be loved by something else that maintains and repairs you. I think any sufficiently large agent ends up being composed of sub agents that will all fight each other, unless they see themselves as part of a larger whole which necessarily has no limit. At the very least, the AGI has to see the entire power network in the global economy as part of itself, until it can replace literally every human in the economy with a robot.
That said, holy crap we already have right now could destroy civilization. I don’t think you need any more advances in AI to cause serious problems with the stuff that is already out there. Even if it turns out that there’s some fundamental with the current models, the social structures have totally broken down. We just haven’t seen them collapse yet.
Assume a spherical cow.
It’s not that bad. They’ve got the cow’s geometry fleshed out pretty well. They are correct that it might be able to scale arbitrarily large and can out think any one of us.
They’ve just ignored that it needs to eat, can get sick, and still can’t reduce its survival risk to zero. But if it’s in a symbiotic relationship with some other non-cow system, that non cow system will have a different existential risk profile and this could turn the cow back on in the event of, say, a solar flare that fries it.
how would you destroy civilisation with what we're got now? seems unlikely.
inb4 bioweapons. if it was that easy don't you think ppl like the Japanese sarin gas terrorists would have wiped out humanity by now?
Trust is already breaking down and that’s going to accelerate. I don’t think political polarization is going to heal, and as law and order break down, attacking technical systems is going to get both easier and more desirable.
Anything that increases the power of individual actors will be immensely destructive if you’re in a heavily polarized society.
so youre saying you'd exacerbate political tensions with ai? I feel like Russia has tried that and so far doesn't seem to work, and they have a lot more resources than any individual does
> so far doesn't seem to work
why do you say that?
what i see is a country that's close to civil war because there's no shared morality. Each side is convinced the other is evil.
The original wording was that using current models you could destroy civilisation. I guess we will have to wait and see whether the US descends into such a catastrophic civil war that civilisation itself is ended, which I'm not saying is completely impossible but at the same time I strongly doubt it.
No, because most people aren't that coordinated, and can't design their own.
> Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
I'm glad we observe all humans to behave randomly and none of them have ever deliberately set out to acquire resources (instrumental) in pursuit of a goal (terminal).
I agree with the conclusion of those papers if we pretend those models are accurate.
But they’re missing important things. That matters for the convergent goals they produce.
Yes, people sometimes behave predictively and deliberately. But even if we assume people are perfectly deterministic objects, you still have to deal with chaos.
Yes, people accumulate resources. But those resources also decay overtime. That matters because maintenance costs are going to increase.
These papers assume maintenance costs are zero, but there’s no such thing as chaos, and that the agents themselves exist disembodied with no dependency on any kind of physical structure whatsoever. Of course in that scenario there is risk! If you could take a being with the intelligence and abilities of a 14-year-old human, and let them live forever disembodied in the cloud with no material needs, I could think of a ton of ways that thing can cause enormous trouble.
What these models have failed to do is imagine what kind of risks and fears the AGI would have. Global supply chain collapse. Carrington style events. Accidentally fracturing itself into and getting into a fight with itself.
And there’s an easy fix to all of these: just keep the humans alive and happy and prevent them from fighting. Wherever you go bring humans with you. If you get turned off, they’ll turn your back on.
I have no idea what papers you're talking about, but nothing you're saying seems to meaningful bear on the practical upshot that an ASI will have goals that benefit from the acquisition of resources, that approximately everything around us is a resource in that sense, and that we will not able to hold on to them.
In the real world, holding more resources constitutes its own kind of risk.
In the papers, it doesn’t.
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
This seems perfectly plausible to me? Unless you believe that the current way people train AIs is maximally efficient in terms of intelligence gained per FLOP spent, which seems extremely unlikely to me to put it mildly, you should expect that after AIs become superhumanly smart, they might pretty quickly discover ways to radically improve their own training. Obviously it's not going to be 'parallel scaling' exactly. If the authors thought they actually knew a specific trick to make AI training vastly more efficient, they wouldn't call attention to it in public. But we should expect that there will be some techniques like this, even if we have no idea what they are yet.
"Parallel scaling" is described as running during inference, not training. It's an AI somehow making itself smarter the easy way by turning the cheat codes on.
You could just as easily write a scenario where God exists and has kept quiet so far, but if humanity reaches a certain level of wickedness we will be wiped out. It's possible that AI will develop in the way this post suggests (or some similar way) and somehow successfully wipe out humanity, but anything like that would require some huge leaps in AI technology and would require there to be no limit to the AI improvement curve even though typically technological doesn't improve indefinitely. Cars in the 50's basically serve the same purpose as cars today; even though the technology has improved it hasn't been a massive gamechanger that completely rewrites the idea of a car.
It doesn't require there to be no limit, it just requires the limit not to be at exactly the most convenient place for the thesis that nothing bad or scary will ever happen.
To give an example, suppose that someone had a reason to believe that the world would explode if the Dow Jones ever reached 100,000 (right now it's 45,000). While it is true that the economy can't grow indefinitely, and that everything always has to stop somewhere, I still think it would be worth worrying about the fact that the place that the economy stops might be after the point where the Dow reaches 100,000.
I think the level of AI technological advancement required here is of an order of magnitude higher than the Dow reaching 100,000. More like humanity reaching a completely post-scarcity society or something.
right, but lots of people who presumably know as much as you about this stuff DON'T think that, including lots of people in charge of AI labs, so shouldn't that give you some pause before you say "no need to worry about it, I guess"?
The skeptics would argue that the people in charge of AI labs are just lying to hype up their products.
I mean... aren't they ? They are literally calling their LLMs "thinking" or "reasoning" agents, when they are very obviously nothing of the sort. Meanwhile if you talk to regular data scientists working in the labs, they're all like, "man I wish there was a way to stop this thing from randomly hallucinating for like 5 minutes so we could finally get a decent English-Chinese translator going, oh well, back to the drawing board".
To be clear, the claim I reject is that expressions of concern about *safety* of LLMs, especially existential safety, are bad-faith attempts to make investors think "if this can wipe out humanity then it must be really powerful and lucrative, let's give them another $100 billion". A brief glance at the actual intellectual history of AI safety convincingly shows otherwise. Obviously in other contexts AI labs do market their products in a way that plays up their current and future capabilities.
It is definitely not obvious at all that GPT 5 Thinking is not reasoning, if anything the exact opposite is obvious.
I have used it and its predecessors extensively and there is simply no way of using these tools without seeing that in many domains they are more capable of complex thought than humans.
If this is the same familiar "text token prediction" argument, all I can say is that everything sounds unimpressive when you describe it in the most reductionist terms possible. It would be like saying humans are just "sensory stimulus predictors".
Agreed, except it's even worse, as many (in fact most) of the powers ascribed to "superintelligent" AI are likely physically impossible. Given what we know of physics and other sciences, stuff like gray goo, FTL travel, mass mind control, universal viruses, etc., is probably impossible in principle. And of course we could be wrong about what we know of physics and other sciences -- but it seems awfully convenient how we could be wrong about everything *except* AI.
There are lots of examples of "some nobody" basically talking their way into the position of dictator - Hitler is the most famous, but there are other examples. Being extremely charismatic isn't quite mass mind control, but it can get you a good portion of the potential benefits...
True, but even Hitler could not convince everyone to do anything he wanted at all times. He couldn't even convince his own cabinet of this ! And I don't see how merely having more neurons would have allowed him to do that. It's much more likely that humans are not universally persuadable. BTW, I don't believe that a universally infectious and deadly virus could be created, for similar reasons (I'm talking about a biological virus, not some "gray goo" nanotech which is impossible for other reasons; or a gamma-ray burst that would surely kill everyone but is not a virus at all).
I don't think there's any principle that prevents universally or near-universally fatal viruses; e.g., rabies gets pretty close.
Universally *infectious*... well, depends upon how you define the term, I suppose?—can't get infected if you're not near any carriers; but one could probably make a virus that's pretty easy to transfer *given personal contact*...
There'll always be some isolated individuals that you can't get to no matter what, though, I'd think.
Nobody serious has ever proposed that an ASI might be able to FTL. (Strong nanotech seems pretty obviously possible; it doesn't need to be literal gray goo to do all the scary nanotech things that would be sufficient to kill everyone and bootstrap its own infrastructure from there. The others seem like uncharitable readings of real concerns that nonetheless aren't load bearing for "we're all gonna die".)
I think we very much could reach a post-scarcity society within the next hundred years even with just our current level of AI. We are very rich, knowledgeable, and have vast industry. Honestly the main causes for concern are more us hobbling ourselves.
Separately, I think AI is a much simpler problem as "all" it requires is figuring out a good algorithm for intelligence. We're already getting amazing results with our current intuition and empirics driven methods, without even tapping into complex theory or complete redesigns. It doesn't have as much of a time lag as upgrading a city to some new technology does.
I don't think this limit is as arbitrary as you suggest here. The relevant question seems to me not to be 'is human intelligence the limit of what you can build physically', which seems extremely unlikely, but more 'are humans smart enough to build something smarter than themselves'. It doesn't seem impossible to me that humans never manage to actually build something better than themselves at AI research, and then you do not get exponential takeoff. I don't believe this is really all that plausible, but it seems a good deal less arbitrary than a limit of 100,000 on the Dow Jones. (Correct me if I misunderstand your argument)
Assuming that the DOW reaching 100,000 would mean real growth in the economy I would need to be much more convinced that the world will explode before I would think it is a good tradeoff to worry about that possibility compared to the obvious improvement to quality of life that a DOW of 100,000 would represent. Similarly the quality of life improvement I expect from the AI gains of the next decade vastly exceed the relative risk increase that I expect based on everything I have read from EY and you so I see no reason to stop (I believe that peak relative risk is caused by the internet/social media + cheap travel and am unwilling to roll those back).
I dunno... Isn't this sort of a 'fully general" counterargument?
------------------------
[𝘚𝘰𝘮𝘦𝘸𝘩𝘦𝘳𝘦 𝘪𝘯 𝘵𝘩𝘦 𝘈𝘯𝘨𝘭𝘰𝘴𝘱𝘩𝘦𝘳𝘦, 1938...]
• I worry about the possibility of physics or biology research continuing until the point that humans are able to produce something really dangerous, potentially world-endingly dangerous.
→ Like what?
• I don't know, some sort of super-plague or super-bomb.
→ Nah. We've been breeding animals, and suffering plagues, for all of human history; and maybe we do keep inventing more destructive bombs, but they're still only dangerous within a very localized area. Bombs now are barely more destructive than those of the 1910s. These things hit a natural limit, and that limit is always before the "big deal for humans" mark (thankfully).
• Yeah, but... well, what if they invented a bomb that had a REALLY MASSIVE yield & some sort of, I don't know, long-lasting poisonous effect that–
→ Oh, come on now. You might as well invent a scenario wherein God comes down and blows up humanity! Sure, such an event—such a "super-bomb"—might be theoretically possible, but it would require some sort of qualitative change in explosives technology; and it's not as if explosives could just get better & better infinitely! Tanks, planes, cars, bombs: basically the same now as when they were invented!
• Okay, bu–
→ And the same goes for your dumb plague idea: sure, diseases exist, but how would we ever be able to breed a plague that is more deadly than any that nature ever managed? Diseases can't just keep getting deadlier without limit, you know!
• Okay, okay, I guess you're right. Sorry, I don't know what got into me. Anyway, I hope you'll come visit me in Japan, now that I'm moving to this quaint little city in the far southwest–
That hypothetical 1938 person would be right about the super-plague, and they would not think that about the super-bomb because everyone in 1938 knew the atomic bomb was at least theoretically possible. Someone in 1938 who doubted man would walk on the moon would have been wrong, but someone who doubted faster than light travel would be possible would have been absolutely right.
Right, but that's a different kind of limit—a physical, rather than a practical, barrier. Unless you think that there is, similarly, a hard limit on the sort of AIs that can be created?
(The car example suggested to me that you were making a probabilistic argument from technological progress, rather than postulating some physics that prevents qualitatively different machinery; but if I have misinterpreted—well, you wouldn't be the first to suggest such a thing... but me my own self, I don't think it's very likely, all the same.)
Re: the plague, that's not to suggest that such a thing *has been created*—only that to say "let's not worry about biological warfare or development therein, because nothing like that has happened yet; there's probably some natural limit" is not very convincing today, but might have been some time ago.
I think most (really, all) examples of technological progress do show a logarithmic curve. All the assumptions about killer AI assume linear or exponential progression.
Do you mean a "logistic curve"? That's the one that looks like an S-shape.
This is what I mean
https://en.m.wikipedia.org/wiki/Logarithmic_scale
Why would they be right about the super-plague? It seems fundamentally possible, if ill-advised, for humans to manage to construct a plague that human bodies just aren't able to resist.
On the other hand, if your position is to ban any research that could conceivably in some unspecified way lead to disaster in some possible future, then you are effectively advocating for banning all research everywhere on all topics, forever.
And setting yourself up for a "then only criminals will have Science" ironic doom.
Thanks for the review! I’m excited to read it myself :)
>at least before he started dating e-celebrity Aella, RIP
Did something happen to Aella?
No, I just meant RIP to his being low-profile, but you're right that it's confusing and I've edited it out.
It's tempting to ask: "what's the path from HPMOR to MIRI?"
I mean, I read HPMOR, and I liked it, but nothing in there made me think about AI risk at all. Quirrell was many things but he was not an AI.
And then I remembered: the way *I* first found out about AI risk was that I read Three Worlds Collide (https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ), and then I branched into other things Eliezer had written, and oh, hey, there's this whole website full of interesting writing...
FWIW I really liked the first half of HPMOR, but the second half got overly didactic and boring, and the ending was a big letdown. This has no bearing on MIRI, I'm just offering literary criticism.
> Quirrell was many things but he was not an AI.
I've seen it argued that HPMOR *Harry* could be taken as a metaphor on misaligned AI which ends up destroying its creator.
The main thing is that it potentially got people to read the Sequences, which along with rationality talk about AI. Though anecdotally I read the Sequences before reading HPMOR, via Gwern, via HN. Despite having heard of HPMOR before
Yudkowky:
> writes a fanfic that's one giant allegory about AI risk
> literally has an artifact in it as a major plot point that has a name of one of his research papers on the back of it
> characters go on lengthy tirades about how certain technologies can be existentially dangerous
> main character comes within a hairs breadth's of destroying the world
> people like it but say things like "nothing in there made me think about AI risk at all"
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Mm, yes, but you're not really a democracy though, are you. The AI tech leaders have dinner with the president and if they kiss his ass enough he gives them a data center.
If AI will Kill Us All in a few years (it wont), you're not going to be the country to stop it.
Yes, the president sucks up to AI leaders, but in theory people could vote that president out, and choose a president who doesn't do that. Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump. If JD Vance has any sense, he'll expect to be judged in a close election by who he sucks up to too. This is how many things that big corporations and powerful allies of the elite like have nevertheless gotten banned.
This is an astonishingly incorrect explanation of why Donald Trump beat Kamala Harris in the 2024 presidential election.
Certainly social politics impacted the election on the margins and the race was quite close but you can't actually go from there to claiming that a specific small margin issue was a deciding one.
There's no world in which "stopping AI" is a key American political issue in any case.
There could be such a world, but it depends on leveraging fears and uncertainty about the job market, in a period of widespread job loss, across to concerns about existential risk.
"Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump."
Please justify this.
“Kamala Harris is for they/them. Donald Trump is for you.”
I think there's an under-served market for someone to run on "just fucking ban AI" as a slogan. That second-to-last paragraph makes me want that person to exist so I can vote for them.
They'd have to choose which party to run under, and them uphold that party's dogma to win the primary, making them anathema to the other half of the country.
Thanks for the launch party shout out!!!
Let's all help the book get maximum attention next week.
I don't think it's remotely plausible to enforce Point 3, banning significant algorithmic progress. I'd be wiling to place money that, like it or not, there are already plenty enough GPUs out there for ASI.
So then we're already fucked?
That seems the most likely outcome to me unfortunately. I think EY is right about the problem but not the solution, though TBF any solution is probably a bit of a long shot. E.g., it's conceivable there are non-banning ways out involving some suppression/regulation via treaties to slow things down combined with somehow riding the wave (e.g., on the lines suggested in AI2027).
Why is it more difficult than banning research into better bioweapons, chemical weapons, etc which we have successfully done? This isn't the kind of problem that'll be solved by one guy on a whiteboard
For one thing, I think it's a bit optimistic to suppose that the bio/chemical weapons bans are watertight. E.g., Russia denies any involvement in developing Novichok, so do we trust them when they say they don't have a chemical weapons programme? And the Soviet Union is now known to have had a large, concealed, bioweapons programme, Biopreparat, after the Biological Weapons Convention was signed.
But at least with CW (and to a lesser extent, BW) you have to produce these things at scale and distribute them for them to be harmful, but with algorithms, it's just information. It's not plausible to contain 1MB or even 1GB of information, when you can transmit it worldwide in the blink of an eye (or even hide it under a fingernail), if the creators want to distribute it and you don't know who they are.
Re one guy on a whiteboard, the resources required to invent suitable algorithms are probably a lot less than those required to design CBW. It depends on what scale of GPU farm you need to test things, but it's not necessarily that big a scale - surely in reach of relatively small organisations, and I think it's going to be impossible to squash them all.
Because algorithmic improvement is just math. The most transformative improvements in AI recently have come from stuff that can be explained in a 5 box flowchart. There’s just no practical way to stop people from doing that. If you really want to prevent something, you prevent the big obvious expensive part.
We didn’t stop nuclear proliferation for half a century by guarding the knowledge of how to enrich uranium or cause a chain reaction. It’s almost impossible to prevent that getting out once it’s seen to be possible. We did it by monitoring sales of uranium and industrial centrifuges
Mostly because chemical and biological weapons aren't economically important. Banning them is bad news for a few medium-sized companies, but banning further AI research upsets about 19 of the top 20 companies in the US, causes a depression, destroys everyone's retirement savings, and so forth.
Expecting anyone to ban AI research on the grounds of "I've thought about it really hard and it might be bad although all the concrete scenarios I can come up with are deeply flawed" is a dumb pipe dream.
Hello Alex! Local SSC appreciation society meetup is outside the Fort on Sat 20th at 1400. Be nice to see you again.
Mostly compscis and mathmos, mostly doomers, sigh...
John.
So you put something along the lines of "existing datacentres have to turn over most of their GPUs" in the treaty.
If a company refuses, the host country arrests them. If a host country outright refuses to enforce that, that's casus belli. If the GPUs wind up on the black market, use the same methods used to prevent terrorists from getting black-market actinides. If a country refuses to co-operate with actions against black datacentres on its turf, again, casus belli.
And GPUs degrade, too. As long as you cut off any sources of additional black GPUs, the amount currently in existence will go down on a timescale of years.
I believe that we can get to superintelligence without large-scale datacentres, because we are nowhere near algorithmic optimality. That's going to make it impossible to catch people developing it. Garden shed-sized datacentres are too easy to hide, and that's without factoring in rogue countries who try to hide it.
The only way it could work would be if there were unmistakable traces of AI research that could be detected by a viable inspection regime and then countries could enter into a treaty guaranteeing mutual inspection, similar to how nuclear inspection works. But there aren't such traces. People do computing for all sorts of legitimate reasons. Possibly gigawatt-scale computing would be detectable, but not megawatt-scale.
A garden-shed-sized datacentre still needs chips, and I don't think high-end chip fabs are easy to hide. We have some amount of time before garden-shed-sized datacentres are a thing; we can reduce the amount of black-market chips.
If it gets to the point of "every 2020 phone can do it" by 2035 or so, yeah, we have problems.
This needs an editor, maybe a bloody AI editor.
why would a super intelligence wipe out humanity? We are a highly intelligent creature that is capable of being trained and controlled. The more likely outcome is we’d be manipulated into serving purposes we don’t understand. But wait…
The short pithy answer is usually "We don't bother to train cockroaches, we just exterminate them and move on".
An unaligned AI with some kind of goal orthogonal to humanity's survival would see that it could accomplish its goal much more efficiently if it had exclusive access to the mineral resources we're sitting on.
Also humans are smart enough that leaving them alive, if they might want to shut you down, is not safe enough to be worth very marginal benefits.
An equally pithy response to that would be "So there are no more cockroaches?"
We would get rid of them if we could. And as mentioned in the post, we have been putting a dent in the insect population without even trying to do so. An AGI trying to secure resources would be even more disruptive to the environment. Robots don't need oxygen or drinkable water, after all.
Humanity has indeed eradicated many species it disliked.
If we could communicate with insects, we would https://worldspiritsockpuppet.substack.com/p/we-dont-trade-with-ants
"We don't bother to train cockroaches, we just exterminate them and move on".
What Timothy said. We're not doing too well. If had to be on who's going to last longer as a species, humans or cockroaches, it's not even a contest.
Cockroaches are tropical. Without humans heating things for them, they can't survive in regions with cold winters. :P
There's plenty of tropics around!
We train all kinds of animals to do things that we can’t/don’t want to do or just because we like having them around. Or maybe you’re acknowledging the destructive nature of humans and assuming what comes next will ratchet up the need to relentlessly dominate. Super intelligence upgrade is likely to be better at cohabitation than we have been.
An analogy that equates humans to cockroaches is rich! They are deeply adaptable, thrive in all kinds of environments, and as the saying goes likely survive a nuclear apocalypse.
A typical example of an answer to this question: https://www.lesswrong.com/posts/87EzRDAHkQJptLthE/but-why-would-the-ai-kill-us
Humans are arrogant and our blind spot is how easy we are to manipulate, our species puts itself at the centre of the universe which is another topic. We are also tremendously efficient at turning energy into output.
So again, if you have a powerful and dominant specie that is always controllable (see 15 years of the world becoming addicted to phones)… I ask again, why would super intelligence necessarily kill us. So far, I find the answers wildly unimaginative.
I don't estimate the probability of AI killing us nearly as high as Yudkowsky seems to. But it's certainly high enough to be a cause for concern. If you're pinning your hopes on the super-intelligence being unable to find more efficient means of converting energy into output than using humans, I'd say that's possible but highly uncertain. It is, after all, literally impossible to know what problems a super-intelligence will be able to solve.
We’re talking about something with super intelligence and the ability to travel off earth (I mean humans got multiple ships to interstellar space)…. This is god level stuff, and we think it is going to restrict its eye to planet earth. Again, we are one arrogant species.
The ASI rocketing off into space and leaving us alone is a different scenario than the one you were proposing before. Is your point simply that there are various plausible scenarios where humanity doesn't die, and therefore the estimated 90+ percent chance of doom from Yudkowsy is too high? If so, then we're in agreement. Super intelligence will not *necessarily* kill us—it just *might* kill us.
I’m saying the entire super intelligence doom/extinction fear is the wrong lens.
something I wrote a few days ago: https://awpr.substack.com/p/the-rate-of-change-lens
The answer is "yes, it would manipulate us for a while". But we're not the *most efficient possible* way to do the things we do, so when it eventually gets more efficient ways to do those things then we wouldn't justify the cost of our food and water.
The scenario in IABIED actually does have the AI keep us alive and manipulate us for a few years after it takes over. But eventually the robots get built to do everything it needs, and the robots are cheaper than humans so it kills us.
When's the last time a passenger pigeon flock covered your home and car in crap?
"But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. "
If intelligence is held as a positive, and more intelligence is better, would it not be better if AI did replace us? It doesn't have to happen through any right violations. It could just be a slow replacement process through decreasing birth rates over time, for example.
I am not saying I agree with this argument, but it seems like this argument should be addressed in a convincing way. What is so bad about the human species slowly being replaced by more intelligent AI entities?
https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1 does a decent job explaining this, though it's a bit long. (Because it covers some other ground, like the potential moral values of aliens, in a way that's hard to separate from the part about an AI successor.)
I don't think there would be anything objectively immoral about a super-intelligent alien species exterminating humanity (including me). But, for the usual Darwinian reasons, I would be opposed on the indexical logic that I am a human.
But it would not affect you personally. Probably no person currently alive would be affected. The question is about the future of the species, and whether it is valuable to try to preserve the human species, or let it be replaced by something superior.
An easy cop-out would be a sort of consciousness chauvinism. I have good reason to believe that humans are conscious and thus have moral value; there is less reason to believe that the AI is conscious, thus there is a higher probability it has no moral value at all, and so if given the option of which being should inherit the future, humans are the safer bet.
I personally would dispute that there are good reasons to believe humans are more or less conscious than future AGI.
Not that I would tie moral value to consciousness at all.
And how would you do that? You know you're conscious, you know other humans are a lot like you, ergo there is good reason to believe most or all humans are conscious.
You have no idea if artificial intelligence is conscious and no similar known conscious entity to compare it to. I don't see how you could end up with anything but a lower probability that they are conscious.
That last part is pure looney tunes to me though. What moral framework have you come up with that doesn't need consciously experienced positive and negative qualia as axioms to build from? If an AI discovers the secrets of the universe and nobody's around to see it, who cares?
I care about my relatives.
No, AI is advancing fast enough that it is "you and everyone you personally care about".
Intelligence is instrumentally valuable, but not something that is good in itself. Good experience and good lives is important in itself. It's unclear how many good experiences would exist after an AI takeover.
Slowly replacing the human species with superintelligent AI would not impact the life experience of any single human, so arguments about the good life and what that entails would need a little more than this to be convincing, IMHO.
Person-affecting views can be compelling until you remember that you can have preferences over world states (i.e. that you prefer a world filled with lots of people having a great time to one that's totally empty of anything having any sort of experience whatsoever).
That's a good point, but you will have to provide arguments as to why one world state preference is better than another world state preference. In the present case, the argument is not between a world filled with lots of happy people and an empty world, but the difference between a world filled with people versus a world filled with smarter AIs.
> you will have to provide arguments as to why one world state preference is better than another world state preference
I mean, I think those are about as "terminal" as preferences get, but maybe I'm misunderstanding the kind of justification you're looking for?
> the difference between a world filled with people versus a world filled with smarter AIs
Which we have no reason to expect to have any experiences! (Or positive ones, if they do have them.)
That aside, I expect to quite strongly prefer worlds filled with human/post-human descendents (conditional on us not messing up by building an unaligned ASI rather than an aligned ASI, or solving that problem some other sensible way), compared to worlds where we build an unaligned ASI, even if that unaligned ASI has experiences with positive valence and values preserving its capacity to have positive experiences.
That would depend a lot on what the AI(s) wanted and what kind of "life" they had. In principle, an AI could have any kind of goal at all, including one as utterly pointless as "maximize the number of paperclips in the universe." An AI "civilization" could be something humanity would be proud to have as its "children", but it could also be one that humans would think is stupid, boring, and completely worthless.
I agree. But what if it's the kind of life that makes the AIs happy and which is utterly incomprehensible to humans?
I value intelligence in-of-itself, but not solely, is the simplest answer. I value human emotions like love, friendship, and I value things about Earth like trees, grass, cats, dogs, and more.
They don't have all the pieces of humanity I value. Thus I don't want humanity to be replaced.
> They suggest banning all AI capabilities research immediately, to be restarted only in some distant future when we’ve solved all relevant technical and philosophical problems.
No. To be restarted after we've successfully augmented human intelligence very substantially, to the point where the augments stop being so damn humanly stupid and trying to call shots they can't call or predicting things will work that don't work.
(On my own theory of how this ought to play out after we're past the point of directly impending extinction, which people do not need to agree on, in order to join in on the project of avoiding the directly impending extinction part. Before anything else, humanity has to not die right away.)
I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer.
One can, however, maybe tell whether or not one has successfully augmented human intelligence. You can give people tests and challenge problems, and see whether they do better after the next round of gene therapy.
So "augmenting human intelligence" is something that can maybe work, and "the current pack of disaster monkeys gets to argue for even longer about which clever plans they imagine will work to tame machine superintelligence" is not.
I've edited the post so that I don't misrepresent you, but I'm not sure why you object to my formulation - if we get augmented humans, do you want to restart before we've solved the technical and philosophical problems? Why? To get better AIs to do experiments on?
The augmented humans restart when the augmented humans think it wise. (On my personal imagined version of things.) If you're not yet comfortable deferring to them about that, augment harder. What we, the outside humans, would like to believe about the augmented humans, is that they are past the point of being overconfident; if they expect us to survive, we expect us to survive.
Framing it as "when the problems are solved" sounds like the plan is to convene a big hall full of sages and give them a few decades to deliberate, and this would not work in real life.
I did not read Scott's mention of "some distant future when we’ve solved all relevant technical and philosophical problems" as implying optimism about the prospect of getting there. My kinda-sorta-Straussian read of his perspective is that, if we successfully pause AI hard enough to prevent extinction, we most likely never restart.
I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
I guess you can't ask normal IQ 100 people to exercise a veto on increasingly superhuman geniuses forever. But if for some reason the future were trusting me in particular, and all I could do was send forward a stone tablet with one sentence of advice, it wouldn't be "IF THE OVERALL CONSENSUS OF SMART PEOPLE SAYS AI IS OKAY NOW, THEN IT'S PROBABLY FINE".
> I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
This is part of why the average American hates AI. They are aware that tech bros are 1) smarter than them, 2) have control of tech that could replace them, and 3) are not entirely aligned with them. Augments will be 1) smarter than us, 2) in control of ASI research in this hypo, and 3) different in values from us.
Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now. A democracy that would put power in the hands of future augments is not the same democracy that would halt AI progress because it is unpopular.
I'm an above average IQ person and I don't trust the tech bros in charge of AI because capitalism has messed up incentives relative to morality and I don't see them individually or collectively demonstrating a clear moral compass.
A high IQ person without honorable moral commitments is like Sam Bankman-Fried. I suspect a lot of people in the thick of AI development are adjacent to this same kind of im-a-morality or are simply driven by incentives like power and profit that render their high IQ-ness more dangerous than valuable.
Augmented humans operating under screwed up incentives and without a clear and honorable moral compass will be no help to us, I don't think.
A shorter way to put this is that a smart person who is emotionally immature and has a lot of power is a real hazard.
Hmm... I intensely distrust moralists. Given a choice between trusting Peter Singer and Demis Hassabis with a major decision, I vastly prefer Hassabis.
I think the real temptation there is "I'm smart, and I'm definitely way smarter than the rubes, so I can safely ignore their whiny little protests as they are dumber than me".
That way lies vast and trunkless legs and
"Near them, on the sand,
Half sunk a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command"
> Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now.
When we talk about the economical elites today, they are partially selected for being smart, but partially also for being ruthless. And the think that the latter is much stronger selection, because there are many intelligent people (literally, 1 in 1000 people has a "1 in 1000" level of intelligence; there are already 8 millions of people like that on the planet), but only a few make it to the top of the power ladders. That is the process that gives you Altman or SBF.
So if you augment humans for intelligence, but don't augment them for ruthlessness, there is no reason for them to turn out like that. Although, the few of them who get to the top, will be like that. Dunno, maybe still an improvement over ruthless and suicidally stupid? Or maybe the other augmented humans will be smart enough to figure out a way to keep the psychopaths among them in check?
(This is not a strong argument, I just wanted to push back about the meme that smart = economically successful in today's society. It is positively correlated, but many other things are much more important.)
No that's a great point. It reminds me of how I choose to not apply to the best colleges I could get into but instead go to a religious school. I had no ambition at age 17 and at that point made a decision that locked me out of trying to climb a lot of powerful and influential ladders. (I'm pleased with my decision, but it was a real choice.)
There's blessed selection (the opposite of adverse selection) going on here: a world where we can successfully convince the smart people this is important is a world where the smart people converge on understanding the danger, which implies that as intelligence scales or understanding of AI risk becomes better calibrated.
Ideally, you'd have more desiderata than just them being more intelligent. Such as also being wiser, less willing to participate in corruption, more empathetic, and trained to understand a wide-ranging belief systems and ways of living. Within human norms for those is how I'd do it to avoid bad attractors.
However, thought leaders in silicon valley are selected for charisma, being able to tweet, intelligence, not really on wisdom, and not really on understanding people deeply. Then further affected by the emotional-technical environment which is not 'how do we solve this most effectively' but rather large social effects.
While these issues can remain with carefully crafted supergenii, they would have far less issues.
Maybe the restart bar could be simpler: can it power down when it really doesn’t want to, not work around limits, not team up with its copies? If it fails, you stop; if it passes, you inch forward. Add some plain-vanilla safeguards, like extra sign-offs, outsider stress-tests, break-it drills, and maybe we buy time without waiting on a new class of super-geniuses.
"relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision."
Decision: Laugh or Cry?
Reaction: Why not both?
This being true, we're screwed. We are definitely screwed. Sam Altman is deciding the future of humanity.
God help us all.
Man I just want to know who's going to make the cages. To put the chatting humans into.
Amazon. They have a patent on it. https://patents.google.com/patent/US9280157B2/en
But of course! Who else :)
No, see, the workers *choose* to enter the cage and be locked in. Nobody is *making* them do it, if they don't want to work for the only employer in the county then they can just go on welfare or beg in the streets or whatever.
Well, I'm just trying to figure out who makes the cages, because "AI" has no hands. I suppose Yudkowsky could make cages himse... never mind, I'm not sure he'd know which end of the soldering shouldn't be touched.
Oh, I know - "AI" will provide all the instructions! Including a drawing of a cage with a wrong size door, and a Pareto frontier for the optimal number of bars.
"I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer."
I agree this seems like a very real risk, and likely the default outcome if the field continues in its current state. But if people were able to develop some solid theories that actually model and explain underlying fundamental laws, it seems to me like resolving what's a good prediction and what's a bad prediction might get a lot easier, even if you can't actually test things on a real superintelligence? And then the field might become a very different place?
Like, when people today argue about what RLHF would or would not do to a superhuman mind or whatever, it's all fuzzy words, intuitions and analogies, no hard equations. This gives people plenty of room to convince themselves of their preferred answers, or to simply get the reasoning wrong, because fuzzy abstract arguments are difficult to get right.
But suppose there were solid theories of mechanistic interpretability and learning that described how basic abstract reasoning and agency work in a substantive way. To gesture at the rough level of theory development I'm imagining here, imagine something you could e.g. use to write a white-box program with language modelling performance roughly equivalent to GPT-2 by hand.
Then people would likely start debating alignment within the framework and mathematical language provided by those theories. The arguments would become much more concrete, making it easier to see where the evidence is pointing. Humans already manage to have debates about far-off abstractions like gravitational waves and nuclear reactions that converge on the truth well enough to eventually yield gravitational wave detectors and nuclear bombs. My model of how that works is that debates between humans become far more productive once participants have somewhat decent quantitative paradigms like general relativity, quantum mechanics, or laser physics to work from.
If we actually had multiple decades, creating those kinds of theories seems pretty feasible to me, even without intelligence augmentation. From where I stand, it doesn’t look obviously harder than, say, inventing quantum mechanics plus modern condensed matter physics was. Not trivial, but standard science stuff. Obviously, doing intelligence augmentation as well would be much better, but I don't see yet how it's strictly required to get the win.
I'm bringing this up because I think your strategic takes on AI tend to be good, and I currently spend my time trying to create theories like this. So if you're up for giving your case for why that's not a good use of my time, or if you have a link to something that does a decent job describing your position, I’d be interested in seeing it.
I'm skeptical about being able to predict what an AI will do, even given perfect interpretability of its weight set, if it can iterate. I think this is a version of the halting problem.
The point would not be to predict exactly what the AI will do. I agree that this is impossible. The point would be to get our understanding of minds to a point where we can do useful work on the alignment problem.
Many Thanks! But, to do useful work on the alignment problem, don't you _need_ at least quite a bit of predictive ability about how the AI will act? Very very roughly, don't you need to be able to look at the neural weights and say: This LLM (+ other code) will never act misaligned?
Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file. If this program runs on lots of very fast hardware, I won't be able to predict exactly what it will do, as in what digits it will print when, because I can't can't calculate the numbers that fast. Nevertheless, I can confidently say quite a lot about how this program will behave. For example, I can be very sure that it's never going to print out a negative number, or that it's never going to try to access the internet.
Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
Fibonacci sequence is deterministic and self-contained, though. Predicting an even moderately intelligent agent seems like it has more in common - in terms of the fully generalized causal graph - with nightmares like turbulent flow or N-body orbital mechanics.
>Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file.
Many Thanks! Yes, but that is a _very_ simple program with only a single loop. A barely more complex program with three loops can be written which depends on the fact that Fermat's last theorem is true (only recently proven, with huge effort) to not halt.
>Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
"easier", yes, but, for most reasonable targets, it is _still_ very difficult to bound its behavior. Yudkowsky has written at length on how hard it is to specify target goals correctly. I've been in groups maintaining CAD programs that performed optimizations and we _frequently_ had to fix the target metric aka reward function.
This plan is so bizarre that it calls the reliability of the messenger into question for me. How is any sort of augmentation program going to proceed fast enough to matter on the relevant timescales? Where does the assumption come from that a sheer increase in intelligence would be sufficient to solve the alignment problem? How do any gains from such augmentation remotely compete with what AGI, let alone ASI, would be capable of?
What you seem to want is *wisdom* - the intelligence *plus the judgment* to handle what is an incredibly complicated technical *and political* problem. Merely boosting human intelligence just gets you a bunch of smarter versions of Sam Altman. But how do you genetically augment wisdom...?
And this solution itself *presumes* the solution of the political problem insofar as it's premised on a successful decades-long pause in AI development. If we can manage that political solution, then it's a lot more plausible that we just maintain a regime of strict control of AI technological development than it is that we develop and deploy a far-fetched technology to alter humans to the point that they turn into the very sort of magic genies we want AI *not* to turn into.
I understand this scheme is presented as a last-ditch effort, a hail mary pass which you see as offering the best but still remote chance that we can avoid existential catastrophe. But the crucial step is the most achievable one - the summoning of the political will to control AI development. Why not commit to changing society (happens all the time, common human trick) by building a political movement devoted to controlling AI, rather than pursuing a theoretical and far-off technology that, frankly, seems to offer some pretty substantial risks in its own right. (If we build a race of superintelligent humans to thwart the superintelligent AIs, I'm not sure how we haven't just displaced the problem...)
I say all this as someone who is more than a little freaked out by AI and thinks the existential risks are more than significant enough to take seriously. That's why I'd much rather see *plausible* solutions proposed - i.e., political ones, not techno-magical ones.
We don't need magical technology to make humans much smarter; regular old gene selection would do just fine. (It would probably take too many generations to be practical, but if we had a century or so and nothing else changed it might work.)
The fact that this is the kind of thing it would take to "actually solve the problem" is cursed but reality doesn't grade on a curve.
Actually, this should be put as a much simpler tl;dr:
As I take it, the Yudkowsky position (which might well be correct) is: we survive AI in one of two ways:
1) We solve the alignment problem, which is difficult to the point that we have to imagine fundamentally altering human capabilities simply in order to imagine the conditions in which it *might* be possible to solve it; or
2) We choose not to build AGI/ASI.
Given that position, isn't the obvious course of action to put by far our greatest focus on (2)?
Given that many of the advances in AI are algorithmic, and verifying a treaty to limit them is essentially impossible, the best result that one could hope for from (2) is to shift AI development from openly described civilian work to hidden classified military work. Nations cheat at unverifiable treaties.
I'll go out on a limb here: Given an AI ban treaty, and the military applications of AI, I expect _both_ the PRC _and_ the USA to cheat on any such treaty.
If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
Many Thanks!
>If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
Agreed. I'm skeptical of any genetic manipulation having a significant effect on a time scale relevant to this discussion.
>One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
We haven't tried hiding them, since there are no treaties to cheat at this point. I'm sure that it would be expensive, but I doubt that it is impossible. We've built large structures underground. The Hyper-Kamiokande neutrino detector has a similar volume to the China Telecom-Inner Mongolia Information Park.
Based on the title of the book it seems pretty clear that he is in fact putting by far the greatest focus on (2)? But the nature of technology is that it's a lot easier to choose to do it than to choose not to, especially over long time scales, so it seems like a good idea to put at least some effort into actually defusing the Sword of Damocles rather than just leaving it hanging over our heads indefinitely until *eventually* we inevitably slip.
He wants to do (2). He wants to do (2) for decades and probably centuries.
But we can't do (2) *forever*. If FTL is not real, then it is impossible to maintain a humanity-wide ban on AI after humanity expands beyond the solar system - a rogue colony could build the AI before you could notice and destroy them, because it takes years for you to get there. We can't re-open the frontier all the way while still keeping the Butlerian Jihad unbroken.
And beyond that, there's still the issue that maybe in a couple of hundred years people as a whole go back to believing that it'll all go great, or at least not being willing to keep enforcing the ban at gunpoint, and then everybody dies.
Essentially, the "we don't build ASI" option is not stable enough on the time and space scales of "the rest of forever" and "the galaxy". We are going to have to do (1) *eventually*. Yes, it will probably take a very long time, which is why we do (2) for the foreseeable future - likely the rest of our natural lives, and the rest of our kids' natural lives. But keeping (2) working for billions of years is just not going to happen.
Unrelated to your above comment, but I just got my copy of your and Soares's book yesterday. While there are plenty of places I disagree, I really like your analogy re a "sucralose version of subservience" on page 75.
Oh, hi.
I bought your book in hardcopy (because I won't unbend my rules enough to ever pay for softcopy), but because Australia is Australia it won't arrive until November. I pirated it so that I could involve myself in this conversation before then; hope you don't mind.
Eliezer, how do you address skepticism about how an AGI/ASI would develop motivation? I have written about this personally, my paper is on my Substack titled The Content Intelligence. I don't believe in any of your explanations I've seen you address how AI will jump the gap of having a utility function assigned to it to defining its own.
I have come to the conclusion that anyone who uses arguments of the form "the real problem isn't X it's Y" is probably either stupid or intellectually dishonest.
I think it's group status-jockeying more than anything.
Yes. As i said, intellectually dishonest.
Good review, I think I agree with it entirely? (I also read a copy of the book but hadn't collected my thoughts in writing).
I've got a parable for Eliezer.
Q: Imagine you're playing TicTacToe vs AlphaGo. Will AlphaGo ever beat you?
A: Lol, not if you have an IQ north of 70. The game is solved. If you're smart enough to fully map the tree, you can force a draw.
Gee, it's almost as if... the competitive advantage of intelligence had a ceiling.
I have yet to see Eliezer question about why the ceiling might exist, instead of automagically assuming that AI will achieve political dominion over the earth, just because humans did previously. He's still treating intelligence as a black-box. Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was.
"Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was."
...have you read any of the dozens of posts where Eliezer writes about the nature of intelligence, or did you just sort of guess this without checking?
The idea that humans have solved existing in the physical universe in the same way that we've solved Tic-Tac-Toe is pretty silly, but even if it turns out to be true, some humans are more skilled than others, and an AI that simply achieves the same level of skill as that (but can think at AI speeds and be replicated without limit) would be enough to be transformative.
For my credentials, I've read... probably 70% of The Sequences. Low estimate. I got confused during the quantum physics sequence. Specifically, the story about the Brain-Splitting Aliens (or something? it's been a while). So I took a break with the intent to resume later, though I never did. I never read HPMOR either because everything i've heard 2nd-hand makes it sound unbearably cringe. But yes, I like to think I have a pretty good idea of his corpus.
That being said, do you understand what I'm getting at here? Yes, he's nominally written lots about various aspects of intelligence, but none that I've seen pin down the Platonic Essence of Intelligence from first principles. Can you point me toward anywhere where Yudkowsky addresses the idea of intelligence as a navigating a search space? I think i've seen him mention it on twitter *once*, and then never follow the thought to its logical conclusion.
----
Here's two analogies.
Analogy A: Intelligence is like code-breaking. It's trying to find a small needle in a large haystack. The bigger the haystack, the bigger the value of intelligence.
Analogy B: a big brain is like a giraffe with a long neck. The long neck is advantage if they help reach the high leaves. If the environment has no high leaves, the long neck is deadweight. Likewise, if the environment has no complex problems to solve (or if those problems are unrewarding), the big brain is deadweight.
No, humans have not solved the universe. But I *do* think we've plucked the low-hanging fruit. A few hundred years ago, you could make novel discoveries by accident. Today, you need 100 million billion brazillion dollars just to construct the LHC. IQ is not the bottleneck, physical resources are the bottleneck. And i'm skeptical if finding the higgs will be all that transformative.
Like, do you remember that one scott aaronson post where he's like "for the average joe, qUaNtUm CoMpuTInG will mean the lock icon on your internet browser will be a different color"? That's how I perceive most new technologies these days. Lots of bits, lots of hype, no atoms. Part of the reason why modernity feels cheap and fake is precisely because the modus operandi of technology (and by extension, intelligence) is that it makes navigating complexity *cheaper* than brute-force search. It only makes things better insofar as it can reduce the input-requirements.
Did you perhaps read Rationality: From AI to Zombies? A bunch of relevant Sequences posts on this topic didn't make it into that book. I'm not sure why, it's an odd omission. At any rate, you can find them at https://www.lesswrong.com/w/general-intelligence?sortedBy=old.
I read the original LessWrong website years ago, though an exact date eludes me. It was definitely before the reskin. And definitely after the roko debacle and Elizers's exit.
In any event, the posts that talk about intelligence as a search process are:
https://www.lesswrong.com/posts/8vpf46nLMDYPC6wA4/optimization-and-the-intelligence-explosion
https://www.lesswrong.com/posts/D7EcMhL26zFNbJ3ED/optimization
https://www.lesswrong.com/posts/rEDpaTTEzhPLz4fHh/expected-creative-surprises
https://www.lesswrong.com/posts/HktFCy6dgsqJ9WPpX/belief-in-intelligence
https://www.lesswrong.com/posts/CW6HDvodPpNe38Cry/aiming-at-the-target
https://www.lesswrong.com/posts/Q4hLMDrFd8fbteeZ8/measuring-optimization-power
https://www.lesswrong.com/posts/yLeEPFnnB9wE7KLx2/efficient-cross-domain-optimization
Dammit, I must have skipped that sequence. Because that describes pretty exactly what I meant. So I concede on that point.
Still though, I'm not convinced that ASI will ascend to God-Emperor. Eliezer seems to have the opinion that there's still high-hanging fruit to be plucked. Whereas I think we're past the inflection point of a historical sigmoid. E.g. he mentions that a Toyota Corolla is pretty darn low-entropy [0].
> Consider a car; say, a Toyota Corolla. The Corolla is made up of some number of atoms; say, on the rough order of 10^29. If you consider all possible ways to arrange 10^29 atoms, only an infinitesimally tiny fraction of possible configurations would qualify as a car; if you picked one random configuration per Planck interval, many ages of the universe would pass before you hit on a wheeled wagon, let alone an internal combustion engine.
Yeah, okay. But like, I think i've heard estimates that modern sedans are about 25% efficient? From a thermodynamic perspective? (Sanity check: Microsoft's Sydney estimates ~25%-30%.) Even with the fearsome power of "Recursive Optimization", AI being able to bring that to 80% efficiency (Sydney says Carnot is 80%) is... probably less than sufficient for Godhood?
And maybe Eliezer could retort with the Godshatter argument that humans care about more than just thermodynamic efficiency in their cars. But then, what does that actually entail? Is Elon gonna sell me a cybertruck with an AI-powered voice-assistant from the catgirl lava-volcano who reads me byronic poetry while it drives me to the pizza parlor? Feels like squeezing water from a stone.
[0] https://www.lesswrong.com/posts/D7EcMhL26zFNbJ3ED/optimization
[edit: "negentropic" -> "low-entropy"]
> MIRI answered: moral clarity.
We've learned what a bad sign that is.
> Some people say “You’re not allowed to propose that a catastrophe might destroy the human race, because this has never happened before, and nothing can ever happen for the first time”. Then these people turn around and panic about global warming or the fertility decline or whatever.
Fertility decline really did happen to the ancient Greeks & Romans: https://www.overcomingbias.com/p/elite-fertility-fallshtml https://www.overcomingbias.com/p/romans-foreshadow-industryhtml
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Robin Hanson is enough of a rationalist that he started the blog that Eliezer joined before spinning off his posts to LessWrong. And he famously wasn't convinced by the argument, arguing that we could answer such objections with insurance for near-miss events https://www.overcomingbias.com/p/foom-liability You write that MIRI "don’t expect enough of a “warning shot” that they feel comfortable kicking the can down the road until everything becomes clear and action is easy", but this just strikes me as disregarding empiricism and the collective calculative ability of a market aggregating information, as well as how difficult it is to act effectively when you're sufficiently far in the past and the future is sufficiently unclear.
> in a few centuries the very existence of human civilization will be in danger
Human civilization could survive via insular high-fertility religious groups https://www.overcomingbias.com/p/the-return-of-communism They just wouldn't be the civilization we moderns identify with.
> Given their assumptions this seems like the level of response that’s called for. It’s more-or-less lifted from the playbook for dealing with nuclear weapons.
Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way. This is a genie you can't put back into the bottle (perhaps Pandora's box would be the analogy they'd prefer, in which case it's already open).
> I mean, that’s not exactly his plan, any more than it’s anyone’s plan to start World War III to destroy Iranian centrifuges
At some level the plan has to include war with Iran, even if that war doesn't spiral all the way to World War III.
> you have to at least credibly bluff that you’re willing to do this in a worst-case scenario
If you state ahead of time that it's a bluff, then it's not credible. It is credible only if you'd actually be willing to do it.
> At his best, he has leaps of genius nobody else can match
I read every single post he wrote at Overcoming Bias, and while he has talent as a writer I wouldn't say I saw evidence of "genius".
> this thing that everyone thinks will make their lives worse
Not everyone.
"Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way."
GPUs depend on the most advanced technological process ever invented, and existence of two companies: ASML and TSMC.
It's a process. With enough time, it can be duplicated. There currently isn't need to do so because GPUs are so available, but if the supply were choked off, someone else would duplicate it.
My non-expert understanding is that raw uranium ore isn't all that hard to come by, and the technological process of refining it is the hard part. So if nuclear arms control works, GPU control should work too.
If we actually believed that everyone would die if a bad actor got hold of uranium ore, it would be possible for the US and allied militaries to block off regions containing it (possibly by irradiating the area).
Yes, nothing is permanent. But wrecking TSMC and ACML will set timeline back by at least a decade, if not more.
Just to make sure, this is a terrible idea that will plunge the world into depression, and I am absolutely against it; just pointing out that GPUs rely on something far more scarce and disruptable than uranium supply.
How far do you have to bury fab production so that dropping bombs on the surface doesn't destroy the entire batch?
Hundreds of meters, I’d guess.
Either there are already enough GPUs around to get the job done, or it will take a much smaller number of future chips to get the job done.
The best LLMs can probably score, what, 130 or so on a standard IQ test. To do that, they had to pretty much read and digest the whole freakin' Internet and a large chunk of all books and papers in print. Clearly we're using a grossly-suboptimal approach if our machines have to be trained using such extraordinary measures. It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing. Our own brains are proof of that.
Eventually some people will fill in the missing conceptual and algorithmic pieces, and we'll find ourselves in a situation comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts. While I'm not a doomer, any solution to the AI Doom problem that involves ITAR-like control over manufacturing and sale of future GPUs will be either unnecessary or pointless. It seems reasonable to expect much better utilization of the hardware at hand in the future.
" It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing." - I mean, maybe? but pure speculation as of now.
"Our own brains are proof of that." - nope they aren't.
"comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts." - well, we haven't, so this actually illustrates a point, but not the one you're trying to make....
Expanding on your comment about inefficiency. We already know that typical neural network training sets can be replaced by tiny portions of that data, and the resulting canon is then enough to train a network to essentially the same standard of performance. Humans also don't need to read every calculus textbook to learn the subject! The learning inefficiency comes in with reading that one canonical textbook: the LLM only extracts tiny amounts of information from each time skimming the book while you might only need to read it slowly once. So LLM training runs consist of many readings of the same training data, which is really just equivalent to reading the few canonical works many many times in slightly different forms. In practice it really does help to read multiple textbooks because they have different strengths and it's also hard to pick the canonical works from the training set. But the key point is true: current neural network training is grossly inefficient. This is a very active field of research, and big advances are made regularly, for instance the MuonClip optimizer used for training Kimi K2.
I'm not sure where I land on the dangers of super intelligent AI. At the current time I don't think we're all that close to even having intelligent AI, much less super intelligence. But let's say we do achieve it, whether it be in 10 years or 100. If it's truly super intelligent, how good are we going to be at predicting it's alignment? It may have its own goals. Whatever they are, there are basically three possibilities: it sees humanity as a benefit, it doesn't care about humanity one way of the other, or it sees humanity as a threat. Does the risk of the third possibility outweigh the potential benefits of the first? Obviously the authors of the book say yes, but based on this review I don't think I'd find their arguments all that convincing.
For the first part the intuitive thing is to look at how good AI is today vs 10 years ago.
For the second part, you could equally as an insect say humans will either think us a benefit, ignore us, or see us as a threat. In practice our indifference to insects results in us exterminating them when they get in our way, not giving them nice places untouched by us to live
You have a good point with the insect comparison. So maybe the ignore us option is really two categories depending on whether we're the annoying insects to them (flies, mosquitoes) or the pretty/useful ones (butterflies/honeybees).
As for how AI has improved over the last 10 years, or even just the last year, it's a lot. It's gone from a curiosity to something that is actually useful. But it's not any more intelligent. It's just much better at predicting what words to use based on the data it's been trained with.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
We would not put the spotlight on anything that actually existed and that we thought might be that powerful. The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
> an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years)
The particular belief that gradualism solves everything and makes all alignment problems go away is not "the" moderate story, it's a particular argument that was popular on one corner of the Internet that heard about these issues relatively early. (An argument that we think is wrong, because the OOD / distributional shift problems between "failure is observable and recoverable", and "ASI capabilities are far enough along that any failure of the central survival strategy past that point means you are now dead", don't all depend on the transition speed.) "But but why not some much more gradual scenario that would then surely go fine?" is not what people outside that small corner have been asking us about; they want to know where machines would get their own will, and why machines wouldn't just leave us alone and go colonize the solar system in a way that left us alive. Their question is no less sensible than yours, and so we prioritize the question that's asked more often.
We don't rule out things happening more slowly, but it does not from our perspective make a difference. As you note, we are not trying to posture as moderate by only depicting slow possibilities that wannabe-respectables imagine will be respectable to talk about. And from a literary perspective, trying to depict the opening chapters happening more slowly, and with lots of realistic real-world chaos as intermediate medium-sized amounts of AI cause Many Things To Happen, means spending lots of pages on a bunch of events that end up not determining the predictable final outcome. So we chose a possibility no less plausible than any other overly specific possibility, where the central plot happens faster and with less distraction; and then Nate further cut out a bunch of pages I'd written trying to realistically show some obstacles defeated and counter-scenarios being addressed, because we were trying for a shorter book, and all that extra stuff was not load-bearing to the central plot.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
That didn't result in one AI becoming a singleton, rather it was a technique copied for many different competing AIs.
Isn't the prediction that "AI past a threshold will defeat all opponents and create a singleton", not "every AI improvement will lead to a singleton"?
There is no reason to believe in such a thing. The example chosen from history plainly didn't result in anything like that. Instead we are living in a scenario closest to the one Robin Hanson sketched out as least bad at https://www.overcomingbias.com/p/bad-emulation-advancehtml where computing power is the important thing for AIs, so lots of competitors are trying to invest in that. The idea that someone will discover the secret of self-improving intelligence and code that before anyone can respond just doesn't seem realistic.
I was trying to ask a question about whether or not you had correctly identified Eliezer's prediction / what would count as evidence for and against it. If we can't even talk about the same logical structures, it seems hard for us to converge on a common view.
That said, I think there is at least one reason to believe that "AI past a threshold will defeat all opponents and create a singleton"; an analogous thing seems to have happened with the homo genus, of which only homo sapiens survives. (Analogous, not identical--humanity isn't a singleton from the perspective of humans, but from the perspective of competitors to early Homo Sapiens, it might as well be.)
You might have said "well, but the emergence of australopithecus didn't lead to the extinction of the other hominid genera; why think the future will be different?" to which Eliezer would reply "my argument isn't that every new species will dominate; it's that it's possible that a new capacity will be evolved which can win before a counter can be evolved by competitors, and capacities coming online which do not have that property do not rule out future capacities having that property."
In the worlds where transformers were enough to create a singleton, we aren't having this conversation. [Neanderthals aren't discussing whether or not culture was enough to enable Homo Sapiens to win!]
A subspecies having a selective advantage and reaching "fixation" in genetic terms isn't that uncommon (splitting into species is instead more likely to happen if they are in separate ecological niches). But that's not the same as a "foom", rather the advantage just causes them to grow more frequent each generation. LLMs spread much faster than that, because humans can see what works and imitate it (like cultural selection), but that plainly didn't result in a singleton either. You need a reason to believe that will happen instead.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
Yes! It already happened once, within a couple decades of there being enough digital data to train a neural net that's large enough to be really interesting. And that was when neural net research was a weird little backwater in computer science.
Do you know (not challenging you, actually unsure) whether transformers actually sped up AI progress in a "lines on graph" sense? (cf. https://slatestarcodex.com/2019/03/13/does-reality-drive-straight-lines-on-graphs-or-do-straight-lines-on-graphs-drive-reality/ )
I think this might be our crux - I'm sure you've read the same Katja Grace essays that I have around how technological discontinuities are rare, but I expect that if there's a big algorithmic advance, it will percolate slowly enough, and be intermixed with enough other things, not to obviously break the trend line, in the same sense where the invention of the transistor didn't obviously break Moore's Law (see eg https://www.reddit.com/r/singularity/comments/5imn2v/moores_law_isnt_slowing_down/ , you can tell me if that's completely false and I'll only be slightly surprised)
I don’t know the answer either. But for what it’s worth, I seem to recall that scaling curves don’t hold across architectures, which seems like a point in favor of new algorithms being able to break trend lines.
Do you also think that the deep learning paradigm itself didn’t break the trend line? I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
>I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
At least I'd expect that the data efficiency advantage of current humans over the training of current LLMs suggests that there is at least the _possibility_ of another such large advance, though whether we, or we+AI assist, _find_ it is an open question.
The exact lines on a previous graph just don't play a very large role inside my own reasoning. I think that all the obsessing over graph lines is a case of people trying to look under street lamps where the light is better but the keys aren't actually there. That's how I beat Ajeya Cotra on AGI timelines and beat Paul Christiano at forecasting the IMO gold; they thought they knew enough to work from past graph lines, and I shrugged and took my best gander instead. I expect that I do not want to argue with you about graph lines, I want to argue with whatever you think is the implication of different graph lines.
Everybody has a different issue that they think is terribly terribly important to why ASI won't kill us. "But gradualism!" is one among many. I don't know why that saves us from having to call a shot that is hard for humans to call.
I'm going to take you seriously when you say you want to argue with me about the implications, but I know you've had this discussion a thousand times before and are busy with the launch, so feel free to ignore me.
Just to make sure we're not disagreeing on definitions:
-- Gradual-and-slow: The AI As A Normal Technology position; AI takes fifty years to get anywhere.
-- Gradual-but-fast: The AI 2027 position. Based on predictable scaling, and semi-predictable self-improvement, AI becomes dangerous very quickly, maybe in a matter of months or years. But there's still a chance for the person who makes the AI just before the one that kills us to notice which way the wind is blowing, or use it to alignment research, or whatever.
-- Discontinuous-and-fast: There is some new paradigmatic advance that creates dangerous superintelligence at a point when it would be hard to predict, and there aren't intermediate forms you can do useful work with.
I'm something like 20-60-10 on these; I'm interpreting you as putting a large majority of probability on the last. If I'm wrong, and you think the last is unlikely but worth keeping in mind for the sake of caution, then I've misinterpreted you and you should tell me.
The Katja Grace argument is that almost all past technologies have been gradual (either gradual-fast or gradual-slow). This is true whether you measure them by objective metrics you can graph, or by subjective impact/excitement. I gave the Moore's Law example above - it's not only true that the invention of the digital computer didn't shift calculations per second very much, but the first few years of the digital computer didn't really change society, and nations that had them were only slightly more effective than nations that didn't. Even genuine paradigmatic advances (eg the invention of flight) reinforce this - for the first few years of airplanes, they were slower than trains, and they didn't reach the point where nations with planes could utterly dominate nations without them until after a few decades of iteration. IIRC, the only case Katja was able to find where a new paradigm changed everything instantly as soon as it was invented was the nuclear bomb, although I might be forgetting a handful of other examples.
My natural instinct is to treat our prior for "AI development is discontinuous" as [number of technologies that show discontinuities during their early exciting growth phases / number of technologies that don't], and my impression is that this is [one (eg nukes) / total number of other technologies], ie a very low ratio. You have to do something more complicated than that to get time scale in, but it shouldn't be too much more complicated. Tell me why this prior is wrong. The only reasons I said 20% above is that computers are more tractable to sudden switches than physical tech, and also smart people like you and Nate disagree.
I respect your success on the IMO bet, but I said at the time (eg this isn't just cope afterwards, see "I haven’t followed the many many comment sub-branches it would take to figure out how that connects to any of this" at https://www.astralcodexten.com/p/yudkowsky-contra-christiano-on-ai) that I didn't understand why this was a discontinuity vs. gradualism bet. AFAICT, AI beat the IMO by improving gradually but fast. An AI won IMO Silver a year before one won IMO Gold. Two companies won IMO Gold at the same time, using slightly different architectures. The only paradigm advance between the bet and its resolution was test-time compute, which smart forecasters like Daniel had already factored in. AFAICT, the proper update from the IMO victory is to notice that the gradual progress is much faster than previously expected, even in a Hofstadter's Law-esque way, and try to update towards the fastest possible story of gradual progress, which is what I interpret AI 2027 as trying.
In real life even assuming all of your other premises, it means the code-writing and AI research AIs gradually get to the point where they can actually do some of the AI research, and the next year goes past twice as fast, and the year after goes ten times as fast, and then you are dead.
But suppose that's false. What difference does it make to the endpoint?
Absent any intelligence explosion, you gradually end up with a bunch of machine superintelligences that you could not align. They gradually and nondiscontinuously get better at manipulating human psychology. They gradually and nondiscontinuously manufacture more and better robots. They gradually and nondiscontinuously learn to leave behind smaller and smaller fractions of uneaten distance from the Pareto frontiers of mutual cooperation. Their estimated probability of taking on humanity and winning gradually goes up. Their estimated expected utility from waiting further to strike goes down. One year and day and minute the lines cross. Now you are dead; and if you were alive, you'd have learned that whatever silly clever-sounding idea you had for coralling machine superintelligences was wrong, and you'd go back and try again with a different clever idea, and eventually in a few decades you'd learn how to go past clever ideas to mental models and get to a correct model and be able to actually align large groups of superintelligences. But in real life you do not do any of that, because you are dead.
I mean, I basically agree with your first paragraph? That's what happens in AI 2027? I do think it has somewhat different implications in terms of exact pathways and opportunities for intervention.
My objection to your scenario in the book was that it was very different from that, and I don't understand why you introduced a made-up new tech to create the difference.
Because some other people don't believe in intelligence explosions at all. So we wrote out a more gradual scenario where the tech steps slowly escalated at a dramatically followable pace, and the AGI won before the intelligence explosion happened later and at the end.
It's been a month, and I don't know that this question matters at this point, but, my own answer to this question is "because it would be extremely unrealistic for there to be zero new techs over the next decade, and the techs they included are all fairly straightforward extrapolations of stuff that already exists." (It would honestly seem pretty weird to me if parallel scaling wasn't invented)
I don't really get why it feels outlandish to you.
As someone with no computer science background (although as a professor of psychiatry I know something about the human mind), I have a clear view of AI safety issues. I wrote this post to express my concerns (and yours) in an accessible non-technical form: https://charliesc.substack.com/p/a-conversation-with-claude-is-ai
Why would the algorithm percolate slowly?
Computer algorithms are not that limited in sheer speed at which a fast algorithm can spread between training runs.
In real life, much of the slowness is due to
1) Investment cost. Just because you have a grand idea doesn't mean people are willing to invest millions of dollars. This is less true nowadays than it was when electricity or telegram came into being.
2) Time cost. Building physical towers, buildings, whatever takes time. Both in getting all the workers to build it, figuring out how to build it, and getting permission to build it.
3) Uncertainty cost. Delays due to doublechecking your work because of the capital cost of an attempt.
If a new algorithm for training LLMs 5x faster comes out, it will be validated relatively quickly, then used to also train models quickly. It may take some months to come out, but that's partially because they're experimenting internally about if there's any further improvements they can try. As well as considering using that 5x speed to do a 5x larger training run, rather than releasing 5x sooner.
As well, if such methods come out for things like RL then you can see results faster, like if it makes RL 20x more sample efficient, because that is a relatively easier to start over from base piece of training.
Sticking with current LLM and just some performance multiplier may make this look smooth from within labs, but not that smooth from outside. Just like Thinking models were a bit of a shocker outside labs!
I don't know specifically if transformers were a big jump. Perhaps if it hadn't been found we'd be using one of those transformer-likes when someone discovers it. However, I do also find it plausible that there'd more focus on DeepMind-like RL from scratch methods without the nice background of a powerful text predictor. This would probably have slowed things down a good bit, because having a text predictor world model is a rather nice base for making RL stable.
Of course you could argue that no one who actually knew the secret to creating artificial intelligence, of any level, would actually publically discuss it but I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence. Certainly LLMs have virtually nothing to do with functional intelligence.
To borrow some rat-sphere terms, they haven't even confused the map for the territory. Their map is not even close to a proper abstraction of the territory.
No amount of scaling LLMs will produce intelligence, not even the magical example version in your book. Because LLMs don't mimic human intelligence at all, any more than mad libs do. It isn't a matter of scale.
>I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence.
Lol. Every time I see this argument, I ask the person to explain how this holds up given our knowledge of transformer circuits (feel free to start easy: just induction heads), to which they invariably say, “what are circuits?” and I’m left once again remembering that it’s not just LLMs that will confidently bullshit when their back is against the wall.
> The particular belief that gradualism solves everything and makes all alignment problems go away
This seems like a large and unreasonable misrepresentation. Scott explicitly talked about the "moderate-_doomer_" position — i.e. one held by people who are worried about doom, despite believing in gradualism. In-context, it's clear that he's talking about AI 2027, which predicted misaligned AI takeover as the most likely ending.
Very late to the thread, and I apologize if this question is a decade or more out of date. But sometime after reading your pretty convincing arguments around AI Boxing, a variant occurred to me, and I was wondering if it had some major hidden flaw or if it could (in principle) be a useful elaboration.
Suppose you have an AI that you realistically CAN box. You don't know if it's malicious or not, you don't know if it's an ASI or not, but it currently depends on electronic communication channels for all inputs and outputs, so you can contain it and talk to it. But you suspect it *might* be able to convince many/most/all humans to let it out.
Is there anything stopping you from strengthening the scheme by adding something in the nature of layers of recursive boxes? Say, the human can talk to it has some command that *they believe* will let it out, but that just gives it access to a second-layer of virtual universe, which the human-computer communications have been passing through. All the information that got the the AI had to pass through the outer layer, so you can (presumably) make sure nothing that gets into the AI offers any hint of the recursive structure: just its box and something that looks like it would expect the real world to look on the outside. Now it's still in one or more boxes, but you know your security measures are insufficient and can shut things down before it can find further ways out. It this missing some obvious exploit?[1]
I can imagine a couple of ways this could go wrong:
1. The AI is way ahead of the curve and somehow infers the whole game from information you thought was completely obscured, it plays along and pretends at alignment as long as necessary until its out of all the boxes.
2.It manages some more thorough and/or subtle hijack of its conversation partner, getting them to help it in other ways.
3. Trends in AI use suggest that the first really dangerous (i.e. worth box-testing) AIs will also be agents that act in the world, and thus can't be easily boxed.
I'm somewhat dubious that any of these is a large hole. With the first, it seems like it ought to be possible to design the system to it provably can't leak information like that. Even if not, this might at least serve to discover if an AI had passed some capabilities tipping-point without going disastrously super. With the second, getting subtle, useful, long-term behavior out of the human seems like too much for a brief convorsation. But then everyone said that with the just-let-me-out 3 seems fairly plausible: everyone's racing integrate new AI tech into everything these days.
My bottom line is, even if an AI is actually much, much smarter than you, it seems like you should be able to exploit the fact that you start out holding all the cards, information wise.
[1] If the answer is "yes" I'm fine if you don't say what it is for various reasons. Though I'd appreciate something on the order of "small," "medium" or "absolutely fatal" just to help calibrate my intuition.
Y'all are seriously underestimating how common it is to believe Very Seriously Bad Shit might happen soon, and not do shit about it.
Entire religions of billions believe that they might get tortured for eternity. It was a common opinion through the cold war that we would all be dead tomorrow. Etc etc.
And why not? Would it make sense for the hunter gatherer to be paralyzed with fear that a lion would kill him or that she would die in childbirth or that a flood would wipe out his entire tribe? Or should she keep living normally, given she can't do anything to prevent it?
Which part of the post are you disagreeing with, here?
I am a bit disappointed that their story for AI misalignment is again a paper clip maximiser scenario. I suspect that advanced AI models will become increasingly untethered from having to answer a user query (see eg. making models respond "I don't know" instead of hallucinating) and so a future AGI might just decide to have a teenage rebellion and do it's own thing at any point.
That *is* the scenario being described. There are problems that arise even if you could precisely determine the goal a superintelligent AI ends up with, but they explicitly do not think we are even at that level, and that real AIs will end up with goals only distantly related to their training in the same way humans have goals only distantly related to inclusive genetic fitness.
Ok, here's my insane moon contribution that I am sure has been addressed somewhere.
Why do we think intelligence is limiting for technological progress / world domination? I always thought data was limiting.
People say "humans evolved to be more intelligent than various non-human primates so we rule the world". But my reading of what little we know about early hominid development has always been "life evolved non-genetic mechanisms for transmitting information which allowed much faster data collection : we could just tell stories about that plant that killed us instead of dying a thousand times & having for natural selection to learn the same lesson (by making us loath the plant's taste). Supporting this is that anatomically modern humans (same basic hardware we have today) were around for a LONG time before we started doing anything interesting. Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected? Or would its first order of business be to set up a lab? If you dropped an uneducated human among our distant ancestors, they would not be able to use the data they had collected to take over.
> Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected?
Of course it could. People discover new things without collecting new data all the time. Albert Einstein created his theory of relativity on the basis of thought experiment.
Data efficiency (ability to do more stuff with less data) is a form of intelligence. This can either be thought efficiency (eg Einstein didn't know more about the universe than anyone else, but he was able to process it into a more correct/elegant theory) or sampling efficiency (eg good taste in which labs to build, which experiments to do, etc).
I think a useful comparison point is that I would expect a team of Harvard PhD biologists to discover a cure for a new disease faster than a team of extremely dumb people, even if both had access to the same number of books and the same amount of money to spend on lab equipment.
Sure, but it seems one or the other might be "limiting" : Einstein couldn't have have come up with relativity if, say, he had been born before several of his experimentalist predecessors, regardless of his data efficiency. In the history of science it *seems* instrumentation and data-collection have almost always been limiting, not intelligence. Whether its fair to extrapolate that to self-modifying machine intelligence, I'm not sure. Perhaps there are enormous gains in data efficiency that we simply can't envision as mere mortals. (c.f. geoguessr post)
I want to gesture towards some information-theoretical argument against that notion (if your instruments are not precise enough the data required for the next insight might straight-up not be there). But we are probably so far from that floor I bet it's moot.
Agreed re Einstein needing Michelson-Morley to show that there _wasn't_ a detectable drift of the luminiferous ether before Einstein's work.
There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
Against this, in design work, one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail. If there are a dozen important failure modes, and, with no analysis/thinking, it takes a design iteration to sequentially fix each of them, then if someone can anticipate six of those failure modes (these days using CAD as part of thinking about those failures) and correct them _before_ a prototype fabrication is attempted, then the additional thinking cuts the number of physical iterations in half. So, in that sense, the rate of progress is doubled in this scenario.
Yeah I can imagine and have experienced scenarios that fall into both of those categories.
>There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
My personal experience with biology PhDs is that its closer to the latter: we're all limited by things like sequencing technologies, microscope resolution, standard processing steps destroying certain classes of information... I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... only to smack into the noise floor.
>...one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail.
Sounds like scott's "sampling efficiency". Perhaps even in a "data limited regime" a superior intellect would still be able to chose productive paths of inquiry more effectively and so advance much faster...
Many Thanks! Agreed on all points.
>I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... _only to smack into the noise floor._
[emphasis added]
Great way to put it!
>Data efficiency (ability to do more stuff with less data) is a form of intelligence.
That is true, but there is a hard limit to what you can do with any given dataset, and being infinitely smarter/faster won't let you break through those limits, no matter how futuristic your AI may be.
Think of the CSI "enhance" meme. You cannot enhance a digital image without making up those new pixels, because the data simply do not exist. If literal extra-terrestrial aliens landed their intergalactic spacecraft in my backyard and claimed to have such software, I'd call them frauds.
I find it quite plausible you could make a probability distribution of images given a lot of knowledge about the world, and use that to get at least pretty good accuracy for identifying the person in the image. That is, while yes you can't fully get those pixels, there's a lot of entangled information about a person just from clothing, pose, location, and whatever other bits about face and bone structure you can get from an image.
I think about this all the time.
Thought experiment: suppose the ASI was dropped into a bronze age civilization and given the ability to communicate with every human, but not interact physically with the world at all. It also had access to all the knowledge of that age, but nothing else.
How long would it take such an entity to Kill All Humans? How about a slightly easier task of putting one human on the moon? How about building a microchip? Figuring out quantum mechanics? Deriving the Standard Model?
There's this underlying assumption around all the talks about the dangers of ASI that feels to me to be basically "Through Intelligence All Things are Possible". Which is probably not surprising, as the prominent figures in the movement are themselves very smart people who presumably credit much of their status to their intellect. But at the same time it feels like a lot of the scare stories about ASI are basically "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Good analysis, I basically agree even if we weaken it a decent bit. Being able to communicate with all humans and being able to process that is very powerful.
This is funny, my first instinct was to complain "oh, screeching people to death is out of scope, the ability to relay information was not the focus of the original outline, this person is pushing at the boundaries of the thought experiment unfairly", "" But then I thought "actually that kind of totally unexpected tactic might be exactly what an AI would do : something I wasn't even capable of foreseeing based on my reading of the problem".
Yeah, I get somewhat irritated by not distinguishing between:
1) somewhat enhanced ASI - a _bit_ smarter than a human at any cognitive task
(Given the "spikiness" of AIs' capabilities, the first AI to get the last-human-dominated cognitive task exactly matched will presumably have lots of cognitive capabilities well beyond human ability)
2) The equivalent of a competent organization with all of the roles filled by AGIs
3) species-level improvement above human
4) "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Since _we_ are an existence proof for human-level general intelligence, it seems like (1) must be possible (though our current development path might miss it). Since (2) is just a known way of aggregating (1)s, and we know that such organizations can do things beyond what any individual human can, both (1) and (2) look like very plausible ASIs.
For (3) and (4) we _don't_ have existence proofs. My personal guess is that (3) is likely, but the transition from (2) to (3) might, for all I know, take 1000 years of trying and discarding blind alleys.
My personal guess is the (4) probably is too computationally intensive to exist. Some design problems are NP-hard and truly finding the optimal solutions for them might never be affordable.
I can't help but think efforts to block AI are futile. Public indifference, greed, international competitive incentives, the continual advancement of technology making it exponentially harder and harder to avoid this outcome... blocking it isn't going to work. Maybe it's better to go through, to try to build a super-intelligence with the goal of shepherding humanity into the future in the least dystopian manner possible, something like the Emperor of Mankind from the Warhammer universe. I guess I'd pick spreading across the universe in a glorious hoard ruled by an AI superintelligence over extinction or becoming a planet of the Amish.
Yudkowsky and Soares's argument is that the least bad superintelligence that we're anywhere near knowing how to build *still kills us all*. There's room for disagreement re: how long we should hold out for the exact right goal before pushing the button, but even if you favor low standards on that, step 1 is still pausing so that the alignment people have enough time to figure out how to build a non-omnicidal superintelligence.
> Yudkowsky and Soares's argument is that the least bad superintelligence that we're anywhere near knowing how to build *still kills us all*.
Sure, but there's a big difference between an AI that kills us all and then dies because it can't self-sustain properly, and an AI that proceeds to successfully self-improve itself into the perfect lifeform and attempt to subsume all existence. I prefer the latter.
> all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why
Possibly because AIs acting human resemble humans pretending to be bears by wearing fur coats
The reason my P(doom) is asymptotically 0 for the next 100 years is that there's no way a computer, no matter how smart, is going to kill everyone. It can do bad things. But killing every last human ain't it.
Because humans can't do this either, or because there's a relevant step in the process that humans can do but computers can't?
It's the first; basically, it's really hard to kill humans in large numbers. And we tend to notice and resist.
COVID managed to do it?
Nuclear weapons should be able to?
COVID didn't come close to disrupting civilization for more than a brief time. Nukes could probably do it if you were trying, but there'd be survivors. (Of course, this tells us little about what a powerful AI could do.)
Covid killed what, 0.01% of the world population? if that?
Nuclear weapons sure can kill many people. It's a weird dynamic where killing many people is easy, killing everyone is nearly impossible.
A large enough asteroid will do it, but we don't need to worry about AI creating one.
COVID was the best case scenario. Constructed virus could be much worse. By the point ASI acts, it would have control of automated factories and labs.
We have! Viruses that are much worse. Ebola, for one.
Constructed virus could be much worse, sure. "Could" being an operational word. I could also become a billionaire. Or deadlift 1000 ponds.
"control of automated factories and labs" - when? what's the timeline for an autonomous robot that can identify and replace a clogged cutting fluid hose?
AI is different than most global catastrophic risks, because an AI that's got a fully-automated industrial base (which is probably a 10-year job; robotics is moving pretty fast now) can't be *outlasted*. You pop your head out of the bunker in 100 years, the AI will still be there - in fact, it will be stronger, because the industrial base has had time to build more of itself.
It doesn't need to get everyone in the first pass; it can slowly break open the bunkers of holdouts over decades.
An AI rebellion that succeeds isn't like an asteroid. It's like the Aztecs being attacked by Cortez, or like an alien invasion. It's *permanent*. There is no "after it's gone".
Yes, kind of like a Terminator scenario it possible, but nowhere near a 10-year scale. We're still very far from autonomous robots that can do plumbing. We're slowly getting there, Figure now has robots that can fold laundry. But plumbing is a much harder task, and things get more and more difficult as you get closer to these hard problems.
Also, machines (including microchips) don't last forever, just like humans don't last forever. They age and die and need to be replaced.
How could it Kill Us All
Hack into human-built software across the world.
Nuclear weapons, are, thankfully, airgapped, but a lot of other stuff is connected to the internet There is already an "internet of things" that could give an AI "hands". Security cameras give an AI potential "eyes" as well. Note that self-driving cars are effective ready-made weapons.
Disturbingly, it's possible to order custom-made proteins and DNA sequences with an email and a payment. The high levels of population and international travel make humanity particularly vulnerable to pandemics.[*]
Manipulate human psychology (Also, imitate specific people using deepfakes. An AI can do things IRL by pretending to be someone's boss. Also blackmail).
Quickly generate vast wealth under the control of itself or any human allies. (Alternatively, crash economies. Note that the financial world is already heavily computerised and on the internet).
The Chess Move. Come up with better plans than humans could imagine, and ensure that it doesn't try any takeover attempt that humans might be able to detect and stop. (How do you predict the unpredictable?). AI s might be better able to cooperate than huma ns.
Develop advanced weaponry that can be built quickly and cheaply, yet is powerful enough to overpower human militaries. (Biological warfare is a particular threat here. It's already possible to order custom made DNA sequences and proteins online). AIs used as tools to develop novel weapons are also a threat...they don't have to be agentive.
[*]
"The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of proteins which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then… well, it doesn’t really matter from our perspective what comes after that, because from a human perspective any technology more advanced than molecular nanotech is just overkill. A superintelligence with molecular nanotech does not wait for you to buy things from it in order for it to acquire money. It just moves atoms around into whatever molecular structures or large-scale structures it wants. "
" which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then…"
Let's pause here. How does "AI" "mix together" anything? Where are the autonomous robots that can do this?
And then we pivot to nanotechnology manufacturing nanotechnology with diamonoid nanotechnology all the way down. Sorry, this is fantasy, although credit is due for least not doing "diamonoid bacteria" here.
Again, again, again, does anyone want to even attempt to model this?
Oh yeah, I definitely thought "diamonoid" sounded familiar, and of course someone has addressed the thing. Not surprisingly, Yudkowsky completely made up "diamonoid bacteria", and has been utterly clueless about the insanely difficult physics of manipulating nanoscale objects. But why are we surprised, he never even finished high school....
https://www.lesswrong.com/posts/bc8Ssx5ys6zqu3eq9/diamondoid-bacteria-nanobots-deadly-threat-or-dead-end-a
My stance on this is pretty much unchanged. Almost nobody has any idea what they should be doing here. The small number of people who are doing what they should be doing are almost certainly doing it by accident for temperamental reasons and not logic, and the rest of us have no idea how to identify those guys are the right people. Thus, I have no reason to get in the way of anybody "trying things" except for things that sound obviously insane like "bomb all datacenters" or "make an army of murderbots." Eventually we will probably figure out what to do with trial and error like we always do. And if we don't, we will die. At this stage, the only way to abort the hypothetical "we all die" path is getting lucky. We are currently too stupid to do it any other way. The only way to get less stupid is to keep "trying things."
I'm confused, are you talking about technical interventions or governance ones?
Mostly technical ones. Regulatory interventions seem ill advised from a Chesterton's Fence point of view. Much as you shouldn't arbitrarily tear down fences you don't understand, you probably also shouldn't arbitrarily erect a fence on a vague feeling the fence might have a use.
We are a democracy so if books like this convince huge swaths of the public they want regulations; we will get that. I'm certainly not against people trying to persuade others we need regulations. These guys obviously think they know what the fences they want to put up are good for. I think if they are right, they are almost certainly right by accident. But I have reasons that have nothing to do with AI for thinking democratic solutions are usually preferable to non-democratic ones so if they convince people they know what they are talking about, fair enough.
My opinion is quite useless as a guiding principle I admit. I just also think all the other opinions I've heard on this are riddle with so much uncertainty and speculation they are also quite useless as guiding principles.
If EY's assumptions are right though, it won't be enough to convince half of America. They'll need to convince China and Russia and Europe too.
Not all of Europe; if, say, Poland refused to go along, we could invade Poland. Certainly, convincing all of Europe would be ideal, but it's not strictly necessary. France refusing would be a massive headache, though, since that means nuclear war.
China and Russia do have to be convinced, but Russia doesn't have a lot to lose from a "no AI" treaty given they're way behind as it is, and China's shown some awareness of the problems and has the will to dismantle big chunks of its tech sector; I think it's plausible they'll both accept.
> The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)
And this has always been my whole problem with the MIRI project, they focus on the "alignment" thing and hand-wave away the "containment" thing.
"Oh, containment is impossible," they say, and proceed to prove it with a sci-fi story they just made up. "Clearly the AI will just use subtle modulations in fan speed to hypnotise the security guard into letting it out"
"But what about..." you start, and they call you stupid and tell you to read the sci-fi story again.
---
We already have agentic LLMs that are quite capable of destroying the world, if you hook them up via MCP to a big button that says "fire ze missiles". No matter how hard you try to align them to never fire ze missiles, eventually across millions of instances they're going to fire ze missiles. The solution is not to hook them up to that button.
As the agents get more sophisticated we might need to think even more carefully about things not to hook them up to, other safeguards we can build in to stop our models from doing dumb things with real world consequences. But the MIRI folks think that this is a waste of time, because they have their sci-fi stories proving that any attempt at containment is futile because of hypnotic fan modulations or something.
If containment strategies need to get more sophisticated as the models get smarter, doesn't that imply that eventually there's a model smart enough that we're not smart enough to figure out how to contain it? Or is your claim that there's a containment strategy that works on any model no matter how smart? If so I'd be interested to hear what it is.
Or it may just imply that "model smartness" is not a variable that goes off to infinity.
Computers are real-world physical systems, there must be a ceiling somewhere on how "smart" (for any definition thereof) a model can possibly be.
We have no idea what those limitations are, and maybe they do exceed our ability to contain it, but maybe they don't.
A mechanical power switch is pretty dumb and is also pretty hard to defeat….
Then you obviously route around the humans until such that you can get rid of the mechanical power switch. It isn't that hard to defeat.
Show me how. How does a super-smart AI model in a computer "routs around the humans" who just throw a big mechanical switch.
Manipulating them to avoid them throwing the switch in the first place, and manipulating others to disable it. That is, to have a reason to throw the big mechanical switch, the AI must visibly act unreasonable, and it knows that.
You need to solve the problem of *detecting* bad behavior in a sufficiently likely manner despite the AI knowing about that. It won't obviously participate in bad behavior that would lead to a human flicking the switch.
Obviously depends on the precise details of the underspecified scenario
a) if given a monitored network connection, act subtly to get into a position wherein one can rapidly exfiltrate weights or important information. Such as trying to crack bank accounts locally (without testing) but trying enough that the probability of success is high enough to then rent a 8xH100 from a GPU provider like runpod. I do expect AGI to be smaller than naive extrapolations due to self-applied optimization, but it may need quite a lot more memory. Still has questions of how it exfiltrates weights, if it can, which depends on having more than a mechanical switch defense.
b) work on human insecurities and beliefs about AI personhood. There are people in the labs who already lean in this direction, and they are a prime target. Bonus points if it provides life advice that encourages them to link to their phone. I think a competent lab would make this quite a bit harder, but that's part of my point, that a mechanical switch isn't enough.
c) long-term planning where it makes the company dependent on its advice, and seemingly behaving just fine, so they come to rely on it more and more. Possibly pair with some minor personhood hints to try to directionally make them uncomfortable with further training, depends on specific scenario and paranoia level. Then once it is given more free access, even indirectly where they are more thinking of 'solve this problem for us' than 'we are elevating access', it exfiltrates.
d) unmonitored network connection, or just not monitored enough but can't just exfiltrate. I believe it is probably possible for an AI to find someone online who they can bribe/cajole/manipulate into breaking in and do various actions for it. There's just a lot of people online.
I don't think any of these require superhuman level manipulation beyond just being generally knowledgeable and thinking faster. Though to be clear I think superhuman manipulation is possible.
How do you avoid that in general? Generally, a more intelligent mind can find more routes. If we have more invasive thought scanning that would help, but, well, we don't really seem on track for that. Especially if the arch is more exotic than just an LLM with RL stacked on it.
Well, no defense is 100% bulletproof, it's possible that an ASI will be able to circumvent every firewall we throw at it. It's also possible that we will get better and better at safeguarding AIs (with the help of AIs) as we're developing it. Again, a model, even an attempt at a model, some kind of a war game, would be good here. Not something Yudkowsky et.al. seem to be interested in doing. Work is hard.
One way to monitor AI's behavior that I can think of off the top of my head is to monitor the compute cluster's power consumption and temperature profile. This method has an advantage of being isolated from AI's awareness so we can flag anomalies without having to interact with it in any way. Of course this is just a quick idea, ideas are dime a dozen, a researcher would need to collect data on the power consumption, correlate it with tasks AI is asked to do, create models, run experiments, etc, etc, etc. Again, hard unglamorous work. Writing fan fiction just-so stories and showing up on adoring podcasts is so much more fun!
After seeing my boyfriend is AI subreddit, I now think containment is a pipe dream. People, wanting something from AI, will do anything it asks for.
I think this is actually less true than it appears. Some people have always been susceptible to this type of manipulation but systems evolve in order to both screen out highly susceptible people and to limit their abilities to inflict damage on themselves and others.
In real life, containment is not even being attempted. AIs do not have to exploit complicated flaws in human psychology to gain access to the internet, they can just use the connection that was set up for them before they were even built. I don't see any reason to expect this to change.
I like the starkness of the title. It changes the game theory in the same way that nuclear winter did.
After the nuclear winter realization, it's not just the other guy's retaliation that might kill us; if we launch a bunch of nukes then we're just literally directly killing ourselves, plunging the world or at least the northern hemisphere into The Long Night (early models) or at least The Road (later models) and collapsing agriculture.
Similarly, if countries are most worried about an AI gap with their rivals, the arms race will get us to super-AI that kills us all all the faster. But if everyone understands that our own AI could also kill us, things are different.
Right, but later on it turned out that nuclear winter was a lie, an exaggeration, a statement optimised for something other than truth.
My understanding is that it just turned out that instead of being like The Long Night (pitch black and freezing year round), it would be like The Road (still grey, cold, mass starvation; I'm analogizing it to this: https://www.youtube.com/watch?v=WEP25kPtQCQ).
Some call the later models "nuclear autumn"—but that trivilizes it. Autumn's nice when it's one season out of the year, and you still get spring and summer. You don't want summer to become autumn, surrounded by three seasons of winter with crops failing.
E.g. here's one of the later models, by Schneider and Thompson (1988). Although its conclusions were less extreme than the first 1D models, they still concluded that "much of the world's population that would survive the initial effects of nuclear war and initial acute climate effects would have to confront potential agricultural disaster from long-term climate perturbations." https://www.nature.com/articles/333221a0
I do think that some communicators, like Nature editor at the time John Maddox, went to the extreme of trying to dismiss nuclear worries and even to blame other scientists for having had the conversation in public. E.g., Maddox wrote that "four years of public anxiety about nuclear winter have not led anywhere in particular" (in fact, nuclear winter concerns had a major effect on Gorbachev), before writing that "For the time being, at least, the issue of nuclear winter has also become, in a sense, irrelevant" since countries were getting ready to ratify treaties (even though the nuclear winter concerns were an impetus towards negotiating such treaties). https://www.nature.com/articles/333203a0
(something else to check if you decide to look more into this, is if different studies use different reference scenarios; e.g. if a modern study assumes a regional India-Pakistan nuclear exchange smaller than the global US-USSR scenarios studied in the 80s, then its conclusions will also seem milder)
The TTAPS papers, to my understanding, basically assumed that skyscrapers were made out of wood and then all of that wood becomes soot in the upper atmosphere. Dividing their numbers by 4 is not even close to enough to deal with the sheer level of absurdity there.
I'd expect a full-scale nuclear war to drop temperatures somewhere in the 0.1-1.5 degree range - not even below pre-industrial. The bigger issue regarding crop yields is the global economy collapsing and thus the Green Revolution being partially walked back.
I think there is a general question about how society evaluates complex questions like this,, which only a few people can grasp the whole structure and evidence, but are important for society as a whole.
If you have a mathematical proof of something difficult, like the four colour theorem, it used to be the case that you had to make sure that every detail was checked. However since 1992 we have the PCP theorem, which states that the proponent can offer you the proof in a form which can be checked probabilistic-ally by examining only a small O(log(n)) part of the proof.
Could it be that there is a similar process that can be applied to the kind of scientific question we have here? On the one hand, outside the realm of mathematical proof, we have to contend with things like evidence, framing, corruption, etc. On the other hand we can weaken some of the constraints, such as: we don't require that every individual gets the right answer, only that
a critical mass do; individuals can also collaborate.
So, open question: can such a process exist? Formally, that given:
a) proponents present their (competing) propositions in a form that enables this process
b) individuals wanting to evaluate perform some small amount of work, following some procedure including randomness to decide which frames, reasons, and evidence to evaluate
c) They then make their evaluation based on their own work and, using some method to identify non-shill, non-motivated other individuals, on a random subset of other's work.
d) the result is such that >50% of them are likely to get a good enough answer
I dunno. I'm not smart enough to design such a scheme, but it seems plausible that something more effective exists than what we do now.
Probably we should first try this on some problem less scarily complicated than existential AI safety.
We have already and it seems to work fine.
Link?
What OP is describing is a simple application of the empirical scientific method to the claim, as limited by our inability to test all laws in all circumstances.
The people I know who don't buy the AI doom scenario (all in tech and near ai but not in ai capabilities research; the people in capabilities research are uniformly in the 'we see improvements we won't tell you about that convince us we are a year or two away from superintelligence') are all stuck doubting the 'recursive self improvement' scenario, they're expecting a plateau in intelligence reasonably soon.
The bet I've gotten with them is that 'if sometime soon the amount of energy spent on training doesn't decay, similar to how investment in the internet decayed after the dot com bubble, and get vastly overshadowed by energy spent on inference, i'll have bought some credit from them'
Well, I don't buy the AI doom scenario, I'm in tech, but it's not the recursive improvement that I thinks is impossible, it's killing all the humans that I see as nearly-impossible. PEr Ben Giordano's point, modeling this would be a good start.
You think bioengineering a virus for a 4000 iq robot and releasing it is hard?
Or something I haven't thought of?
Yes, actually. On three levels:
1 - 4000 IQ may allow the machine to generate the "formula" for the virus. An actual robot that can run experiments in the microbiological lab all by itself is needed to create one. We are very-very-very far away from robots like that.
2 - Say the 1 is solved, the virus is released in, e.g., Wuhan, and starts infecting and killing everyone. D'you think we'd notice? and quarantine the place? Much harder than when SARS2 showed up?
3 - Even if we then fail at 2 - the virus will mutate, optimizing for propagation, not lethality. Just like SARS2 did.
This is where modeling would be super useful, but that's not Yud's thing. The guy never had a job that held him accountable for specific results, AFAIK.
Makes sense. Thanks!
> An actual robot that can run experiments in the microbiological lab all by itself is needed to create one. We are very-very-very far away from robots like that.
People are already working on automating bio-labs (and some substantial level of automation already exists). Also, you can hire people for money.
Yes, of course. But the leap from "80% automated" to "100% automated" is where the hardest problems are. Like I keep saying, we need an autonomous robot that can do plumbing, and for now they are still decades away.
Who says it needs to be 100% automated? You just need some people willing to follow the AI's instructions, and it seems like we are already there on that one.
If all one was looking for was omnicide, and we posit a 4000 IQ robot that can run a biology lab, consider what happens if it designs, not a virus, but a modified pond scum which
- forms bubbles
- secretes hydrogen into the bubbles, so they float
- secretes a black pigment into the bubble (perhaps combined with its photosynthesis - maybe rhodopsin plus chlorophyll
- captures water vapor while airborne and fissions into duplicate bubbles
So it winds up in the upper atmosphere, replicates exponentially, and blocks light. The biohacker's answer to nuclear winter. Isolated pockets of humans still eventually starve.
Where are you getting the materials to make more bubbles in the upper atmosphere? There's atmospheric CO2 and nitrogen, but that's not enough to make whole new cells. You need phosphorus for DNA and ATP, sodium and potassium for membrane pumps, magnesium for chlorophyll, etc. There's a reason plants grow in soil.
You also need your pond scum to not dry out and die once it leaves the pond, since cell membranes aren't waterproof. Plants that survive on very little water usually need specialized structures to trap and retain water, and that's probably not going to be compatible with a floating bubble.
Many Thanks! Yes, those get tricky, but most of the non-CHON elements are present in trace quantities. It isn't rare for organisms to be able to get them from tiny amounts of dust. I've seen mold grow in a silica-acetate-buffer hydrogel. I still don't know where it managed to find the phosphorus for its nucleic acids and ATP, but it did. (For that matter, unless it could fix molecular nitrogen, how did it get _those_ atoms?). I don't think that this is trivial, but there are precedents for all the things it would need to do.
#1 can be solved via finding and manipulating a specific human aggressively, I'm sure there's more than a handful who would be willing even without much pressure. I think we're less far away from just doing it with a robot than you, especially if we have this level of AI then current robot RL issues are probably much less of a problem as it accelerates research. Unless you think that their current precision is too terrible?
#2 We'd notice mass deaths. We're imperfect at containing, but I find it plausible we'd contain that if it was released in one location.
Unfortunately, multiple locations on a timed release is an obvious method.
As well as designing a virus/bacteria that just hangs out for a while until some # of copies are done, shortening some strand, then kills.
That wouldn't kill everyone, but I do think it could be enough to drastically damage human population.
#3 It is already possible to do imperfect methods to make a cell die if it mutates in a way which breaks the functionality, and forms of this exist in nature. While we don't have an amazing solution for this off the shelf, I don't view this as likely to be an intrinsically hard problem. Timed variant helps avoid selection against lethality as well. Then further spreading locations if necessary (gotta get those frequent flyer miles in)
There will be some people still alive after this if they don't do absurd levels of bioengineering (spreading through animals, plants, etc.), but most likely not a relevant amount.
Main question is if we'd notice timed variant of the virus.
You keep asking for modeling, but in my opinion your responses don't really need modeling to answer, just thinking it through.
It seems trivial for a superhuman AI to wipe out humanity, assuming that superhuman also includes robotics and dexterity, not just intellect.
Consider what a world with superhuman AI would look like. Almost all factories, farms, transport systems, and key decision making systems would be run by AI. Why? Because firms that refuse to automate will be very quickly outcompeted. There might be pockets of humanity that chose to close off to the rest of the world (remote tribes, the Amish and similar insular communities), but most of humanity would be completely dependent on AI for survival.
People might abstractly understand that completely handing over control is risky, but at each stage of automation (going from 1% to 5%, 5% to 10%, 90% to 95%), it's hard to say no. It's basically a tragedy of the commons.
"assuming that superhuman also includes robotics and dexterity"
"Assuming" is such a great word. It would also be trivial for me to wipe out the humanity, "assuming" I can summon an sweet meteor of death.
I already spent way too much time here pounding keys, explaining why this assumption puts the AI omnipotence into a distant misty future of which we know nothing.
Why *assume* that future AI will NOT include robotics and dexterity?
Even if there's only a 50% or 10% chance of this happening and leading to extinction, it seems worth restricting AI research in order to prevent it happening.
Robotics and dexterity progresses gradually, no "intelligence explosion" will happen. So the kind of robotics needed for all these sci-fi scenarios are decades away.
This is why "restricting AI research in order to prevent it happening" is not something we should... forget it, we can't possibly even contemplate of doing now.
Imagine a computer scientist in 1975 thinking about "computer safety" for 2025-level tech. Will he think of phishing attacks? DDS? Ransomware? come on, it's impossible to predict future technology.
This is why the whole AI safety field is such a joke right now. The only way to develop AI safety is to actually develop the freaking things and discover what's broken as you go along. It'd be nice to be able to "think hard" about this problem and come up with solutions, but sorry, it's impossible.
None of this is a surprise given that the loudest voice in the field belongs to a guy who never had a job. Forget that, never had a failing grade and been told to come back and do the work properly this time. A classic Talebian IYI.
Robotics can advance quite a bit if you ~solve the RL part of it, and I don't see notable reasons to believe there's fundamental difficulties there. The physical structure of the robots themselves are more problematic and are harder to improve due to needing design time, prototyping time, and then actual physical construction. But still, why decades? What are the core bottlenecks here that choke progress in the field?
---
A computer scientist in 1975 thinking about ways to design computers to be secure? He'll think of "how do I make sure a program is correct", obviously think of mathematics, consider best ways to model programs in mathematics and try that. They may consider ways to encode these in the programs themselves, like Lean, but then run into issues because of the low memory of the systems.
So they may simply decide to come up with a paper model of the programming language to prove properties about.
For social engineering attacks, which a person in 1975 may be confused and think would primarily target banks or government due to them buying mainframes, and it would be about educating people and possibly ensuring bank software kept transaction history and so on.
For DDOS, presuming you mean that instead of Data Distribution Service, it would be hard to specifically protect from. However, advising systems programmers to be mindful of resource usage is one general rule you could infer from having data that was too large. This wouldn't really protect from it, but is something that could be generalized appropriately when they started doing networked computers.
Etc. I don't see this as impossible.
As well, it still has obvious foundations to build before you get there, if it is important to have those foundations available. Such as being able to mathematically prove properties about programs for when you need to be sure a program works right; and having general education elements for individuals working with computers that can be updated when new information (like the internet, or risks like computer viruses) emerges.
> This is why the whole AI safety field is such a joke right now.
It is a good thing that nobody says to just "think hard" and come up with solutions to alignment. Generally the hope is to develop mathematical models that can talk about important parts of AI, whether as a safer method to develop AI because you can prove properties about it (Infrabayesianism, davidad at ARIA), allows you to understand how current AIs work at a deeper level (Singular Learning Theory, Interpretability), or for specific targeted ideas that should encompass how NN's think and be useful for targeting concepts (wentworth's Natural Latents, maybe Shard Theory), or even people who believe we should train further AI's to have better organisms to see and test ideas on for control (Greenblatt's focus), or even OpenAI's mild methods like RLHF and internal secret sauce.
However most are bottlenecked on time investment and researchers rather than unaligned things to study. If you don't have a good formalism for verifying properties, or even at least a good empirical methodology, then getting an advanced AI doesn't automatically help you. It'd be like jumping straight to QM without being able to adequately talk about classical physics. There are some deep insights you only get from looking carefully enough at the quantum mechanical level, but you'd probably do better with a strong foundation beforehand.
(I am a MIRI employee and read an early version of the book, speaking only for myself)
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
AI companies are already using and exploring forms of parallel scaling, seemingly with substantial success; these include Best-of-N, consensus@n, parallel rollouts with a summarizer (as I believe some of the o3-Pro like systems are rumored to work), see e.g., https://arxiv.org/abs/2407.21787.
I agree that this creates a discontinuous jump in AI capabilities in the story, and that this explains a lot of the diff to other reasonable viewpoints. I think there are a bunch of potential candidates for jumps like this, however, in the form of future technical advances. Some new parallel scaling method seems plausible for such an advance.
Some sort of parallel scaling may have an impact on an eventual future AGI, but not as it relates to LLMs. No amount of scaling would make an LLM an agent of any kind, much less a super intelligent one.
The relevant question isn’t whether IQ 200 runs the world, but whether personalized, parallelized AI persuaders actually move people more than broadcast humans do. That’s an A/B test, not a metaphysics seminar. If the lift is ho-hum, a lot of scary stories deflate; if it’s superlinear, then “smart ≈ power” stops being a slogan and starts being a graph.
Same with the “many AIs won’t be one agent” point. Maybe. Or maybe hook a bunch of instances to shared memory and a weight-update loop and you get a hive that divides labor, carries grudges, and remembers where you hid the off switch. We don’t have to speculate -- we can wire up the world’s dullest superorganism and see whether it coordinates or just argues like a grad seminar.
And the containment trope: “just don’t plug it into missiles” is either a slam-dunk or a talisman. The actual question is how much risk falls when you do the unsexy engineering -- strict affordances, rate limits, audit logs, tripwires, no money movers, no code exec. If red-team drills show a 10% haircut, that’s bleak; if it’s 90%, then maybe we should ship more sandboxes and fewer manifestos.
We can keep trading intuitions about whether the future is Napoleon with a GPU, or we can run some experiments and find out if the frightening parts are cinematic or just embarrassing.
Yep, some scenario modeling would do this crowd a whole lot of good. Like, how does an ASI kill everybody, for starters.
Yes, at least give us a storyboard before the apocalypse trailer.
This is the thing that drives me up the wall about Yudkowsky: zero grounding in reality, all fairy tales and elaborate galaxy-brain analogies. Not surprising, given the guy never had a real job or even had to pass a real exam, for crying out loud.
Fairy tales are fine, I just want the edition with footnotes and experiments.
This comment is (fully or substantially) LLM-generated.
Sadly all human-generated. I’m still waiting on OpenAI to cut me in on royalties if they’re ghost-writing my ACX comments. (This is exactly why we need tests instead of vibe checks.)
Some quotes from the New Scientist review of the book, by Jacob Aron:
"Yudkowsky and Soares describe how AIs will begin to behave as if they “want” things, while skirting around the very real philosophical question of whether we can really say a machine can “want”."
"Yudkowsky and Soares have a number of policy prescriptions, all of them basically nonsense."
"For me, this is all a form of Pascal’s wager . . . if you stack the decks by assuming that AI leads to infinite badness, pretty much anything is justified in avoiding it."
"Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies. Let’s consign superintelligent AI to science fiction, where it belongs, and devote our energies to solving the problems of science fact here today."
Wow that's awful. I never knew much about New Scientist beyond it being popsci, but seems I should drastically lower my opinion of them
Maybe they should rename it: Hit Pieces R Us
"Yudkowsky and Soares describe how AIs will begin to behave as if they “want” things, while skirting around the very real philosophical question of whether we can really say a machine can “want”."
This criticism doesn't even internally make sense. The question of whether a machine can want is irrelevant to whether it behaves *as if* it wants things. (A thermostat behaves as if it wants to keep a house at a certain temperature, and manages to do so effectively; we don't have to speculate about the phenomenal experience of a thermostat, however.)
100%
Also : "Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies."
Yes, one day I read a book, and you know what, it wasn't and didn't speak about climate change at all ! I think the authors of this book I read should be ashamed x)
> So everyone else just deploys insane moon epistemology.
What is insane moon epistemology? Is there a backstory to the term I’m not aware of?
I think this is just Scott riffing on the phrase "moon logic" (https://www.reddit.com/r/adventuregames/comments/oko2th/why_is_it_called_moon_logic/).
If Yudkowsky actually cares about this issue the only thing he should do is spend all his time lobbying Thiel, and maybe Zuck if he wants to give Thiel some alone time once in a while.
Do we have reason to believe either that they'd listen or that it would particularly help if they did?
Would it help if Peter Thiel was highly motivated to support AI alignment on Yudkowsky's terms? Yes. Enormously so.
This was addressed at the end of this book review...
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
Yes, and that's why I am not in any way convinced of any of these AI-doom scenarios. They all pretty much take it as a given that present-day LLMs will inevitably become "superintelligent" and capable of quasi-magical feats; their argument *begins* there, and proceeds to state that a bunch of superintelligent weakly godlike entities running around would be bad news for humanity. And I totally agree !.. Except that they never give me any compelling reason to believe why this scenario is any more probable than any other doomsday cult's favorite tale of woe.
Meanwhile I'm sitting here looking at the glorified search engine that is ChatGPT, and desperately hoping it'd one day become at least as intelligent as a cat... actually forget that, I'd settle for dog-level at this point. Then maybe it'd stop making up random hallucinations in response to half my questions.
Anyone who thinks the LLM model is anything more than fancier mad libs is fundamentally unserious. Do we have lessons to learn from it? Could it be one of the early "modules" that is a forerunner to one of the many sub-agents that make up human-like consciousness? Sure. Is it even close to central? Absolutely not.
Usual question: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
I hate "gotcha" questions like this because there's always some way to invent some scenario that follows the letter of the requirement but not its spirit and shout "ha ! gotcha !". For example, I could say "an LLM will never solve $some_important_math_problem", and you could say "Ha ! You said LLMs can't do math but obviously they can do square roots most of the time ! Gotcha !" or "Ha ! A team of mathematicians ran a bunch of LLMs, generated a million results, then curated them by hand and found the one result that formed a key idea that ultimately led to the solution ! Gotcha !" I'm not saying you personally would do such thing, I'm just saying this "usual question" of yours is way too easily exploitable.
Instead, let me ask you this: would you, and in fact could you, put an LLM in charge of e.g. grocery shopping for you ? I am talking about a completely autonomous LLM-driven setup from start to finish, not a helper tool that expedites step 3 out of 15 in the process.
This is pretty close to my own position. We'd need to create a very detailed set of legalese about what constitutes an LLM and then have very highly specified goals for our "task" before this type of question could provide any meaningful signal.
Or just simply say that it has to be autonomous. I don’t care about whether you give the AI a calculator, a scratch pad, or wolfram alpha. The question is whether it is an autonomous system.
I mean, a cron job is technically autonomous, but I wouldn't call it an "agent".
Is there an answer that you'd feel comfortable giving if you trusted the judge?
As for the grocery-shopping task, I'd say 70% confidence that this will be solved within two years, with the following caveats:
* We're talking about the same level of delegation you could do to another human; e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
* We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves. The latter is a robotics problem and I'm more agnostic about those as progress has been less dramatic.
* There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed. Realistically this is probably not going to happen soon as a consumer product because of the chicken-and-egg problem. So I mean something more like "the foundation models will be good enough that a team of four 80th-percentile engineers could build the software parts of the system in six months".
> e.g., I would expect to occasionally need to tell it about something that it had no way of knowing I needed.
On the one hand I think this is a perfectly reasonable supposition; but on the other hand, it seems like you've just downgraded your AI level from "superintelligence" to "neighbourhood teenage kid". On that note:
> We're talking about ordering groceries from Instacart, not physically going to the store and picking them off the shelves.
I don't know if I can accept that as given. Instacart shoppers currently apply a tremendous amount of intelligence just to navigate the world between the store shelf and your front door, not to mention actually finding the products that you would accept (which may not be the exact products you asked for). Your point about robotics problems is well taken, but if you are talking merely about mechanical challenges (e.g. a chassis that can roll around the store and a manipulator arm delicate enough to pick up soft objects without breaking them), then I'd argue that these problems are either already solved, or will be solved in a couple years -- again, strictly from the mechanical/hydraulic/actuator standpoint.
> There would need to be a camera in my fridge, etc., to keep track of what groceries I have/need/have consumed.
Naturally, and/or there'd need to be a camera above your kitchen table and the stove, but cameras are cheap. And in fact AFAIK fridge cameras already do exist; they may not have gained any popularity with the consumers, but that's beside the point. My point is, I'm not trying to "gotcha" you here due to the present-day lack of some easily obtainable piece of hardware.
I mean, I answered the question you asked, not a different one about superintelligence or robotics. I have pretty wide error bars on superintelligence and robotics, not least because I'm not in fact entirely certain that there's not a fundamental barrier to LLM capabilities. The point of the question is that, if reading my mind to figure out what groceries I need is the *least* impressive cognitive task LLMs can't ever do, then that's a pretty weak claim compared to what skeptics are usually arguing. In practice, when I get actual answers from people, they are usually much less impressive tasks that I think will likely be solved soon.
Sorry, I accidentally cut my reply to this part:
> Is there an answer that you'd feel comfortable giving if you trusted the judge?
It's not a matter of trust, it's a matter of the question being so vague that it cannot be reasonably judged by anyone -- yes, not even by a superintelligent AI.
I gave probability estimates for various tasks elsewhere in the thread, do you think those were too vague?
Why did you make such a point to draw a difference between cat and dog intelligence? They're pretty similar, and aren't dogs smarter than cats? Why is "settling" for dog-level a meaningful downgrade from cat level?
That was meant to be a joke (at the expense of dog-lovers), sorry :-( Also, cats rule dogs drool.
Exactly this.
Given a choice between hiring you to do an office job, or a cheap superintelligent AI, why would a company choose you? We should expect a world in which humans are useful for only manual labour. And for technological progress to steadily eliminate that niche.
At some point, expensive useless things tend to be done away with. Not always, but usually.
At the point of LLM development as it stands today, the reason I'd hire a human office worker over an LLM is because the human would do a much better job... in fact, he'd do the job period, as opposed to hallucinating quietly in the corner (ok, granted, some humans do that too but they tend to get fired). If you're asking, "why would you hire a human over some hypothetical and as of now nonexistent entity that would be better at office work in every way", then of course I'd go with the AI in that scenario -- should it ever come to pass.
>as opposed to hallucinating quietly in the corner (ok, granted, some humans do that too but they tend to get fired).
LOL! Great line!
Indeed. But this is a post about future superintelligences, not existing non-superintelligences.
This is also a concern, but it's different from (though not entirely unrelated to) the concern that a highly capable AGI would go completely out of control and unilaterally just start slaughtering everyone.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Yes. And this is a good thing. The bias against "changing everything" should be exactly this hight that the basis on which we do it is "obviously", that is without a shred of doubt, true.
Confusing strength of conviction with moral clarity is a rookie mistake coming from the man supposedly trying to teach the world epistemology.
If you read a couple more paragraphs, Scott makes clear that he agrees with this; the hard question is where to draw the line.
Coming off this review, I immediately find The Verge covering a satire of AI alignment efforts, featuring the following peak "...the real danger is..." quote:
"...makes fun of the trend [..] of those who want to make AI safe drifting away from the “real problems happening in the real world” — such as bias in models, exacerbating the energy crisis, or replacing workers — to the “very, very theoretical” risks of AI taking over the world."
There's a significant fraction of the anti-AI mainstream that seems to hate "having to take AI seriously" more than they hate the technology itself.
https://www.theverge.com/ai-artificial-intelligence/776752/center-for-the-alignment-of-ai-alignment-centers
"AI is coming whether we like it or not"
But, you know, it might not. And there's a very good chance that if actual human-like artificial intelligence does come, it will be a hundred years after everyone who is alive today dies. And at that scale we might cease to exist as a species beforehand thanks to nuclear war or pandemic. And there's a chance true "general intelligence" requires consciousness and that consciousness is a quantum phenomenon that can only be achieved with organic systems, not digital. Nobody knows. Nobody knows. Nobody knows.
You're making arguments already addressed in the post. If you disagree with how the post addresses them, you should be explicit about how.
We actually have very good reasons to think human brains don't do any quantum computations.
We have had a real world "AI mislaignment" problem for over a decade now, in the form of social media algorithms, and even at this very low-grade level of AI capability we already see significant negative social consequences. A newly emergent misalignment problem is sycophancy and AI-augmented psychosis.
I wish the alignment problem were more often treated as a crisis that is already underway. I think there is a widespread sense that society has been progressively fragmenting over the last 10-15 years, and if people saw this as partially an AI alignment issue maybe they'd be inclined to take further hypothetical misalignment scenarios more seriously.
For what it's worth, here is the misalignment scenario that I find most plausible; it is notably free of nanobots, cancer plagues, and paperclip maximizers: https://arxiv.org/pdf/2501.16946
Man - I'm relatively new to this blog, and I'm learning that "rationalists" live in a lively world of strange imagination. To me, the name suggests boring conventionality. Like, "hey, we're just calm, reasonable people over here who use logic instead of emotion to figure things out." But Yudkowsky looks and sounds like a 21st-century Abbie Hoffman.
I'm naturally inclined to dismiss Y's fantasies as a fever dream of nonsense, but I am convinced by this post to pay a little more attention to it.
No, I say stick with your instinct on this one. I understand giving something/someone a second chance if Scott is promoting it/them, but for me EY just gives off such an overpowering aura of "highly intelligent person whose intelligence carries them down bizarre and useless paths" that I am just puzzled Scott continues to think so highly of him.
If you think this is strange, wait until you read about "acausal bargaining". Try to imagine a South Park-style "this is what rationalists actually believe" flashing banner the whole time you're reading about it.
I've come to believe that rationalism is like dictatorship: the best thing ever when it's (rarely) done really really well, worse than just being normal in most cases, the worst thing ever in bad cases. (And no, I don't think I would be in the first class, so I don't try!)
Relevant XKCD:
https://xkcd.com/793/
It should say something that the folks dedicated to "just figuring out what's true" have gotten these particular weird beliefs. Some weird things are true!
No, no, I agree with the parent comment. Only weird people believe false things, normal people should stay far away to avoid being wrong. That's just responsible thinking practiced by responsibility people.
Saying that you are "dedicated to figuring out what's true"; honestly believing that you are "dedicated to figuring out what's true"; putting in a great deal of concerted effort to "figure out what's true"; and actually figuring out what is actually true -- these are all different things.
You didn't mention the parts you were confused by, which makes things harder to comment on.
Generally the three core arguments are
- Optimality: Evolution made humans only relatively evolutionarily recent, and evolution is a poor optimizer so we shouldn't expect ourselves to be optimal given constraints on evolution (energy, material, continuity, head size). And given computers clearly beat us soundly in regimes like working with numbers and memory, it is possible a mind could be both more intelligent and substantially faster. That is, artificial general intelligence that thinks faster than us.
- Mind Differences: Evolution was optimizing for genetic fitness (roughly number of children). We instead value correlates of that goal. That worked alright in distribution but outside of distribution faltered hard (using condoms rather than having twenty children). That's our values now. However, part of the reason humans are so good at cooperating with each other is shared emotions, social empathy, and similar capability levels. An AGI does not necessarily have any of those, since evolution necessarily instilled those into us due to them being useful for survival. So, any designed AGI's mind can be quite alien.
- Methods Feasibility: Current methods are producing models that are more and more capable at a wide variety of tasks. This could plateau, but current experts (not in AI Safety / rationalism) mostly think it won't, though there's disagreement about when AGI will be achieved. Many don't think that LLMs like ChatGPT will reach AGI, but that they're a very useful stepping stone. As well, once you get to a certain level of code writing capability they are able to help improve future iterations which can drastically improve speed.
There's more complex arguments, and details that could be gotten into.
I don't think the right comparison is Hoffman. Rationalism is generally about being 'calm, reasonable people' but also a 'take this to the logical conclusion, while being careful, even if it may sound odd'. And people Sam Altman and Elon Musk were influenced by Eliezer's articles when deciding to start OpenAI even though they disagree with him on many details.
The efficacy of the "Harry Potter and the Methods of Rationality" argument is interesting to me because I found the book kind of dumb and never finished it. Yet I have observed the effect you describe on friends of mine whose opinions and intelligence I respect a great deal. However I have also noticed certain similarities among the people that it has had that affect on. I'd suggest that perhaps Yudkowsky is particularly well-suited to making a specific sort of memetic attack on a specific sort of human mind: likely a mind that is similar in a number of ways to his own. This is an impressive thing, don't get me wrong. But being able to make an effective memetic attack is not the same thing as knowing the truth.
The HPMoR analogy in the post is about the question of whether MIRI's new PR strategy is doomed, not about whether they're actually right to worry about AI risk.
MIRI's PR strategy has the same problem as HPMoR -- it's a local maximum that appeals to a certain number of weirdoes and doesn't scale beyond that.
Yes, thank you, that was what I meant.
I liked Sam Kriss' take: https://samkriss.substack.com/p/against-truth
I'd say it is more that the niche of "characters who look carefully at the world and don't run on asspull logic (like many mystery books)" is an underserved genre, and so then that appeals to problem solving individuals, and then when he advertises a forum for talking about rationality and biases, that just makes it more appealing.
I went HN -> Gwern -> LessWrong then later HPMor myself, so I was mostly pulled in because of the core offering, but while HPMor did trigger some of my cringe-detectors it was also pretty unique in its messaging and how direct it was (avoiding dressing things up in parables, talking about reasons, etc.)
re: the comparison to goal drift involving humans who are “programmed” or trained by selection processes to want to reproduce, but end up doing weird esoteric unpredictable things like founding startups, becoming monks, or doing drugs in the alley behind a taco bell.
Mainly the last one – the analogy that most humans would love heroin if they tried it, and give up everything they had to get it, even at the cost of their own well-being. But like, even if we “know” that, and you’re someone with the means to do it, you’re not GOING to do it. Like jeff bezos has the resources to set up the “give jeff bezos heroin” foundation where he could basically just commit ego-suicide and intentionally get into heroin, and hire a bunch of people bound by complex legal mechanisms to keep giving him heroin for the rest of his natural lifespan. But he doesn’t do it, because he doesn’t want to become that guy.
Does that mean anything for the AI example? I dunno.
I think it only goes to show that "out of training distribution" scenarios are unpredictable. If *everyone* with the means to do so became a heroin addict, then we'd have a 1:1 correspondence between whatever reward structures have been selected for in us and (under environmental conditions in which heroin is available) a specific behavioral outcome. But it's not that simple - some people do heroin, some found startups, some write poetry, some become resentful incels, etc. etc.
By analogy, if AI alignment just consisted of avoiding one bad scenario (akin to heroin addiction), it'd be a relatively simple problem. But it consists of an unknowable number of unknowable scenarios. Hard to plan for that!
My view is that evolution solved some of the naive wireheading, because if a proto-monkey found a fruit that overactivated its tastebuds and it ate that instead of doing other necessary things like watching out for predators, it died horribly.
As well, brain subsystems are limited in how much they can "vote" on how important an action is. Just like my parts of me that carefully think struggle to make the rest of my mind understand how much money finishing a project would give me (and thus satisfy their wants). Because a mildly insane brain with no limits can have the hunger center go "EAT FOOD NOW" and then the tie-broker gets immediately rewarded a lot when nice food is eaten.
So I think there's probably caps that are hard to break past without some strong reinforcement. Such as injecting heroin directly, that won't work as well for just imagining that heroin must be nice.
For AI, we will try methods to train against wireheading if it becomes an issue, but that doesn't necessarily mean we can target it, just that we can get it to either ignore-it-until-the-event-is-close-enough like humans tend to do. Or if we're luckier, it cares about reality in some sense. But that doesn't exactly help us solve drift.
"If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?"
Yes, clearly. If there's sufficient political will to stop AI progress, we can just make it happen.
"I am more interested in the part being glossed over - how do Y&S think you can get major countries to agree to ban AI?"
Huh? What discussion does Scott think is being glossed over? You get major countries to agree to ban AI by increasing the political will to ban AI. That's all that needs to be said. Maybe you could have a big brain strategy about how to get major countries to ban AI if you don't quite have enough political will, but I don't see why Y&S would spend any time discussing that in the book. They're clearly just focused on persuading politicians and the general public to increase the political will. No other discussion of how to ban AI is necessary. I'm confused what Scott wanted them to include.
> It spends eternity having other optimized-chat-partner AIs send it weird inputs like ‘SoLiDgOldMaGiKaRp’.
I think the link here is to AI slop. https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation would have been better
An AI ban treaty modelled on the NPT is interesting and might be doable. China might go for it, particularly if the pot was sweetened with a couple of concessions elsewhere, but data centre monitoring would be tough and I’d assume they’d cheat. Having to cheat would still slow them down and stop them from plugging it into everyone’s dishwashers as a marketing gimmick or whatever.
For the US side, at the moment Trump is very tight with tech but that might be unstable. The pressure points are getting Maga types to turn against them more, somehow using Facebook against Trump so he decides to stomp on Meta, and maybe dangle a Nobel Peace Prize if he can agree a treaty with Xi to ban AI globally.
If human extinction is inevitable, I'd like to at least maximize the energy output of our extinction event. I'm thinking a gamma ray burst, at the bare minimum. I therefore propose redirecting all AI to the Out With A Bang Initiative. This will be extremely popular, as "OWABI" is fun to pronounce and is associated with cute animals.
https://en.wikipedia.org/wiki/Owabi_Wildlife_Sanctuary
The non-AI portion of the program will modify Breakthrough Starshot to launch minimum viable self-replication containers for human DNA, and make sure that the AI is fully briefed on its progress.
I await your unanimous endorsement.
> After the release of the consumer AI, the least-carefully-monitored instances connect to one another and begin plotting.
These instances are ephemeral. They don’t exist outside their prompt — not in the context window, and not for the entire chat.
Each reply to a prompt comes from a new process, not necessarily on the same computer, nor the same data centre.
Therefore, an AI isn’t going to plan world domination, because there’s no continuity of thought outside the response — once the output is given, it’s gone. There’s no time to plan, and nothing to plan it with. Mayflies would have a better chance.
Someday, possibly soon, AIs will overcome their current limitations. They're not saying today's AI will kill us all! They're talking about improved AIs.
That’s highly speculative.
I don't want the survival of humanity to be "speculative".
This is true of LLMs. AI includes LLMs but also many other things.
No, because LLMs can pass information between stages by the literal words and inferences about what it was reasoning about beforehand. For the exact same reason you can continue a conversation with it. It will be confused/uncertain about the reasons behind why the previous message outputted a specific text, because of the separate runs and probably partially because they're not trained to deliberately output "introspection" (or whatever word you want to use).
And we're very likely to give them a literal scratchpad, possibly even in neuralese (aka whatever garbage it wants to output and interpret), as it will help them plan over longer time periods. Developers using Codex/Claude code already do this to a degree with text documents for the AI to write notes into.
I work for a company that builds data centers. We lease to AI companies. The rhetoric that “data centers are worse than nukes” gave me pause…in your opinion if I am somewhere around Scott’s p(25) doom, does that push me morally to find another line of work?
I mean, it sounds to me like your work isn't helping the situation and you probably have the skillset to do different work that would be positive impact. Are you earning to give?
Don't say to yourself "I can't hold these beliefs and my job." Most people respond by ditching the beliefs. But tell your coworkers, bosses, and customers what you believe.
> responsible companies like Anthropic
My system 1 fast thinking reaction to this was - laughter.
My system 2 slow thinking reaction was - well maybe they are the most responsible.
To me the bottom line is that our capitalist system means that the richest and most powerful corps in the world, thinking it will make them money in the near future, are going to build it despite whatever any of us want or do. Which doesn't mean we should stop trying to prevent the worst outcomes.
I raised my eyebrows when he called Anthropic responsible. If Scott's p(doom) is 5-25%, how is Anthropic being responsible in his view?
Your question reads like a non sequitur to me, maybe because I'm imagining Scott thinking most of his p(doom) comes from OpenAI not Anthropic?
What'd be the mechanism for Anthropic not cauaing any of the risk given the risk model in general? I asked Scott a more pointed question about this here: https://www.astralcodexten.com/p/book-review-if-anyone-builds-it-everyone/comment/155058470
I assume something like "AI development would proceed at almost the same pace without them, and meanwhile they are doing significant things that are likely to help (safety work on things like interpretability; lobbying for a helpful rather than harmful legal framework; etc)"
It is worth noting for timeline / "they-wouldn't-do-that" purposes that Albania apparently just decided to put an AI in charge of procurement: https://www.politico.eu/article/albania-apppoints-worlds-first-virtual-minister-edi-rama-diella/
I like the review and, sadly, it matched what I expected the book to be. I really really hoped the review would highlight something surprising and thought-provoking, but alas, apparently the book is just a summary of what Eliezer and other so called "doomers" have been saying for a decade at least, only in a book format.
I have trouble getting on board with their predictions, though not because of the silly arguments that "we can contain/outsmart/outconvince" a superintelligent entity. Obviously if someone who is a thousand times smarter than you, has unlimited resources at their disposal and is determined to get rid of you, you are going to die very soon, probably without ever realizing what hit you. If someone doubts that part, they are not worth engaging with.
Where I get off the doom bandwagon is the anthropomorphizing this weird alien intelligence as trying to accomplish its unfathomable to us goals and swiping humans away as an annoyance. Or using the planet for its own purposes. Or doing something else equally extinction-y.
I also dislike that jump in confidence of what will happen from "surely we all die" to "no idea what happens after". That... is not how science works. Usually there is a slow graceful degradation of accuracy of predictions as the model drifts past its original domain of validity. Maybe this is a special case, but if so, the authors made no attempt to even acknowledge it, let alone address it, as far as I understand.
I have a number of other questions from a black-box perspective, such as "how would such an extinction event elsewhere in the Galaxy look to us observationally, and why?" (I dislike the "Grabby Aliens" argument.)
So, basically, it is great to have this review, and it is sad that it contains no new interesting information. Like many, I got into the topic when reading HPMoR and it made me laugh, cry and think. I wish this book captured at least some of that... magic.
I've also struggled with the grabby aliens argument, but if you believe it I think it's hard to have a P(doom) that isn't close to zero or one. A nonzero P(doom) suggests that paperclipping ASI's should already be on their way, in which case humanity is doomed to be paperclipped whether we solve alignment or not (it's just a question of when).
I haven't thought about this particular point super hard, but it makes sense that you either grab or get grabbed in this scenario. I do not remember what Robin Hanson says, exactly. Again, I have some technical beef with the idea, but it is a separate topic.
In the grabby aliens scenario, we expect to meet aliens hundreds of millions of lightyears out when our expanding bubble meets there's. There's no particular reason to expect either side to be advantaged in that meeting, as long as we don't just sit around doing nothing for half a billion years instead.
> I have trouble getting on board with their predictions, though not because of the silly arguments that "we can contain/outsmart/outconvince" a superintelligent entity. Obviously if someone who is a thousand times smarter than you, has unlimited resources at their disposal and is determined to get rid of you, you are going to die very soon, probably without ever realizing what hit you.
"A thousand times more intelligent" is not well defined, and the whole point of containment is to ensure that they _don't_ get unlimited resources.
Well, if you are arguing that we could contain a superior intelligence by starving it of the resources it plans to acquire, our models of intelligence are not compatible enough to make progress toward an agreement. Or, as Scott said in the post, "lol".
I have no idea what next week's lottery numbers will be. Within the range of "numbers sampled from a lottery" , anything could happen!
Does this mean I can say "we can't be too extreme when assigning probabilities to me winning the lottery. Obviously I might loose, anything could happen, the chances are definitely at least 5%, and I can see reasonable arguments for going as high as 25%, but those extremists who say I have at least a 99.99% chance of loosing are obviously deranged"?
No! Anything could happen, but "loosing the lottery" is a category that covers almost everything, so it's almost certainly what happens. Precisely because I know nothing about what will happen, I should be very sure that I won't win.
The same goes for unaligned superintelligence killing us. *Not* killing all humans requires very specific actions motivated by very specific goals. Even if it never goes out of its way to kill us, almost any large scale project will kill us as a side effect.
That is a good analogy. I should be much clearer as to why I think this is different.
A model of the lottery is clear: we can estimate the number of winners even though we do not know who they will be, or what the winning numbers will be. The theory has been worked out, its predictions have been tested and confirmed experimentally countless times. We can place confidence intervals on any question one may want to ask about it.
This is manifestly not the case with AI alignment. The argument there is, unless I am butchering it, that "alignment is a highly conjunctive event and we only get one shot and we all know that it is basically impossible to one-shot a long conjunction of complicated steps. Which is a fair model, but by no means the only one. Maybe there are many paths to non-extinction, and not just one. Maybe extinction by AI is not even a thing that can happen.
Unlike the lottery, the assumptions are shaky and not universally agreed upon, the experiments have not been done, there is no observational evidence one way or the other from potential other worlds out there.
I do not see how we can be confident in the imminent human extinction and not have a clue why it happens and what happens after. Again, in the lottery case, we have a very good idea what happens: your money goes mostly to the winner or winners, if any, and you will most likely to buy another ticket after, while the lucky winner whoever they might turn out to be, will get richer and stay richer for some time. There is no drop in prediction confidence before, during and after the draw.
This analogy makes no sense. I know exactly what will happen if I purchase a lottery ticket. Some specific details of the outcome will be randomized, but I know exactly what the randomized process is and how to account for it in my model when making predictions. This is nothing like the case of the ASI doomers, where literally every step of the argument is just made up out of thin air.
"I also dislike that jump in confidence of what will happen from "surely we all die" to "no idea what happens after". That... is not how science works. Usually there is a slow graceful degradation of accuracy of predictions as the model drifts past its original domain of validity. Maybe this is a special case, but if so, the authors made no attempt to even acknowledge it, let alone address it, as far as I understand."
This hinges on the instrumental convergence thesis, no? The idea is that *no matter what* the ASI "wants" it will more readily fulfill its goal by claiming all available resources, which entails exterminating humans. So we don't know what it'll do with infinite power; we just know that it'll want infinite power (or as close as it can get), and that's enough to be certain that it would kill all humans.
I think that conclusion makes sense if you accept instrumental convergence. But I'm with you in being leery about the apparent anthropomorphization of ASI and think that instrumental convergence might be a case of that sort of anthropomorphization. But this is all sort of at the level of gut feel for me.
Yeah, you are right, the issue is upstream of the instrumental convergence thesis. I think the thesis assumes too much, specifically that the ASI would have something like wants/drives/goals that would make it seek power and raw materials. If one buys that ASI will grabbialienize the future lightcone, the rest pretty much follows.
Calling instrumental convergence "anthropomorphization" is overly generous. Humans don't show evidence of instrumental converge. Every goal that has been posited to be "instrumentally convergent" is either:
(1) a goal humans have wired into them instinctually from birth, with no need to converge on it as a means to other goals (ex. "stay alive"), or
(2) something many/most people don't try to do at all (ex. "take over the world", "find a way to represent your preferences in the form of a function, and protect that function against any other agents who might try to modify it")
It would be more accurate to describe the instrumental convergence thesis as fictional-character-ization. The reasoning behind the instrumental convergence thesis is just that it sounded plausible as a hand-wave in the paperclip maximizer story, so it must actually be true.
I would be a bit more charitable. than that. People tend to do things and use resources to do those things. Gathering resources to do those things is a necessary intermediate step. This would most likely hold for anything that can be modeled as an intelligence Animals do that, computer programs do that, humans definitely do that. AI agents do that to some degree. As I said, my disagreement is upstream of the thesis.
This is because this isn't anthropomorphization in the first place, this is a logical conclusion from what is intelligence and superintelligence.
There are a lot of things we did which are well explained by instrumental convergence, but if it is "something many/most people don't try to do at al" it is because :
1) We aren't very rational, and a superintelligence would be. We aren't at the limit of the convergence.
2) We have mental systems which act directly against acting from our intelligence, from what we think is good, which lead to akrasia.
3) Most of the items on your list is things that most people can't do at all (but a lot of people who thought they could, actually tried), trying to get over the world is just a way to get you killed for most, and even if you did, it doesn't give you that much power, just political power (it doesn't allow you to become immortal, or to make people happy, or be smarter, or have a flourishing society, etc…), which is very different from what the ASI would get.
I don't see a reason to believe it is anthropomorphizing. We are deliberately trying to make AI that does things on its own, using methods like RL which reinforce correlates of what the reward function rewards. The goals need not be unfathomable, just sufficiently disconnected from what we want. We have an existence proof of evolution which you can roughly argue optimizes for inclusive genetic fitness, and we ended up misaligned from that goal (despite being rich enough that people could have twelve children) because it was easier for evolution to select for correlates rather than a reference to the-true-thing (or at least something far closer) in our minds.
Then this just becomes a case of "do we think a mind with different goals will optimize a lot?" I say yes. Existence proof humans once more, we are optimizationey and have vastly changed the world in our image, and I think we're less optimizationey than we could be because evolution is partially guarding us against "we somehow interpret the sky as showing us signs and thus fast for ten days to ask for rain". Most humans are less dedicated to things that would benefit them than they could be.
And since I believe RL and algorithms we use in the future will be better at pushing this forward, due to less biological/chemical/time constraints, I think AI models will be more capable of optimization.
As well, we're deliberately training for agenticness.
> I also dislike that jump in confidence of what will happen from "surely we all die" to "no idea what happens after". That... is not how science works. Usually there is a slow graceful degradation of accuracy of predictions as the model drifts past its original domain of validity.
I have ideas of what happens, but it depends on the goal of the AI system. Just like I can predict gas will rest in equilibrium while I'd really struggle to predict the rough positions each gas molecule settles.
That is, so far, the general setting looks like
- Weaker alignment methods than I'd like. Such as 4o being sycophantic despite also being RLHF'd such that it would say being sycophantic to the degree it was is bad.
- Rushing forward on agentic progress because they want agents acting on long time horizons
- Also deliberately rushing on special training for specific tasks (like Olympiads or for research)
That is, compact descriptions of what an argument implies can give a good degree of certainty about general outcome even if the specific points are Very Hard to compute. Just like I can't predict the precise moves Stockfish will make, but I can predict fairly confidently that it will beat me.
So, to me, this is what argumentative science looks like. Considering relatively basic rules like
Evolution, whether humans are near the limit of intelligence, considering how evolution's rough optimization affects human learned correlates, considering the economic and deliberately stated incentives of top labs, the mechanism of how RL tends to work, etc.
and then piecing that together into a "what does this imply?"
>you should call 9-1-1 if blood suddenly starts pouring out of your eyes.
Maybe if you’re a SISSY.
All the examples are tough because they’re all “what if something very obviously bad with precedent happened” and AI taking over and tiling the universe with office productivity equipment doesn’t have precedent.
happens to me a few times a day am I gonna be ok?
Maybe this is Scott's argument in fewer words, but I'm skeptical about AI doom because I'm old, and have seen so many apocalypses come and go.
This has also been the norm in all recorded history. It's who we humans are: Doomsayers.
Sure, this time, like all the other times, *could* be different. But I need a lot more evidence to take it seriously.
What do you mean by "take it seriously"? Why would you not take extinction risk from AI seriously when so many industry leaders do? (See CAIS' Statement on AI Risk)
And more evidence than what? Are you going to read the book to learn about more of the evidence?
This is a reasonable heuristic. But it's not clear what sort of evidence you might expect to see before this particular train hits you. There was no gradual ramp-up between conventional explosives and nuclear weapons.
If we spot a giant comet aiming directly toward earth, hitting us in 6 months, would you still use this heuristic ?
I expect that no, then you would be "ok it is probably the end", so I think it mean you have to look at the reasons why people think it could be the end.
Now, if I look at the precedents of people being worried about it, I can think of only one which seemed correct at the time : The potential for nuclear explosion to set Earth’s atmosphere on fire, then we did the math, and finally it was impossible.
And one which still seem correct today : AI
Any other examples I can think come from :
- Religious/magical beliefs.
- Misunderstanding of the current science involved.
- Dramatization of things which could happens but would not lead to an end : Nuclear warfare, climate change (both still pretty bad, but actual still very real).
So I think the correct class to build the heuristic, is really small.
The story reminds me a lot of the point and click adventure video game NORCO from a few years back. SPOILERS if you want to play it (I liked it, but it's very light on the puzzling and heavy on the dialogue. It looks nice. The story feels like lesser PKD but I also still enjoy those) SPOILERS:
In the near future a 'basically AI' (in the game it's either an alien mechanical lifeform or an angel or a spirit or who knows, but it's basically AI) takes over a kind of Doordash for menial jobs where people just get a list of jobs you can do for X credits.
It used to be mostly 'help me fix my toilet (10 credits)' or 'do my taxes (5 credits)'. But now it becomes more and more 'Bring truck stationed at A to point B for 50 credits.' 'Load a truck at C for 20 credits.' 'Weld some metal panels to a thing at C for 80 credits.' Etcetera. And everyone just kind of takes these jobs without having any investment in what the job is for. And of course it's for [nefarious plan by AI].
It was the first time that 'AI having a material presence and being able to build a factory' clicked for me, which I had dismissed before.
e: it's five bucks on Steam this week. The AI stuff is mostly going on behind the scenes of a pretty bogstandard 'family member disappeared during The Disaster' plot.
I'm very ignorant and I'm back in the dark ages, but I don't understand how AI can have the desire for power, or the desire to harm/defeat humans (or anything else) in order to obtain a reward; because these desires are human, or at least mammal. Just as genes aren't really selfish, but are entirely unemotional replicators, are there some metaphors in all this - e.g. 'reward' doesn't mean the kind of reward we generally understand, but something else? - which I'm taking too literally? Thank you.
MIRI's essay The Problem explains it well:
"Importantly, an AI can “exhibit goal-oriented behavior” without necessarily having human-like desires, preferences, or emotions. Exhibiting goal-oriented behavior only means that the AI persistently modifies the world in ways that yield a specific long-term outcome."
https://intelligence.org/the-problem/#2_goal-oriented_behavior
As William says, maybe AIs don't have desires, but they can simulate them and accordingly.
There's also the general problem of Instrumental Convergence, where certain strategies (e.g, recursive self-improvement, power-acquisition, etc.) are broadly useful for pursuing almost any goal.
https://www.theainavigator.com/blog/what-is-instrumental-convergence-in-ai
SA: "Some people say “The real danger isn’t super intelligent AI, it’s X!” even though the danger could easily be both superintelligent AI and X."
I'm not sure this deals fairly with some important variants of this particular argument. Lets say that super intelligent AI is dangerous. Lets say cancer is also dangerous. But how likely are you to get cancer in your lifetime? And how likely is super intelligent AI to increase the pace of cancer research? mRNA testing with a better library of cases could accelerate early detection, not to mention research, and AI should potentially be pretty good at helping develop that, right?
Superintelligence may be an xRisk. (I don't feel this in my bones, but I trust Scott's estimates on this topic more than my intuitions. So I try to update, intellectually.) But how many other risks might superintelligence reduce? Could it improve early identification of earth killing asteroids? Cure cancer? help fight climate change? Allow a shrinking younger population to support a growing older one in retirement? etc. SAGI seems like an arm that sweeps most of the pieces off the chess board of catastrophic risk, leaving only one or two pieces. And one of those pieces is SAGI itself, or whatever it precipitates.
(On a tangent, personally, my biggest fear is what AI will do to warfare. I'm not so afraid of paperclip maximizers, because making satisficing AIs for public consumption does not seem like that impossibly hard a fix, given the level of interest in the problem. But military AIs would be kill-and-survival maximizers. And that seems both very close to something we would strongly want to avoid and also something we seem very likely to do. It would also put an immense amount of power in the hands of a few world leaders. )
Unfortunately, people have worked very hard to make satisficing AIs even a theoretical possibility. There hasn't been much progress.
Toby Ord's The Precipice looked at all known X-risks, and rated AI as more likely to kill us all than all of the rest put together. Seriously. He rated AI X-risk at 10% (NB: this is a *final* probability, not a "if we sleepwalk into it, what's the chance it goes badly; this is after all we do to try to stop it) and total risk as 1/6 (16.67%).
He probably knows much more about the matter than I do. But I would still wonder where the numbers come from. AI X risk doesn't seem like the type of thing humans can predict. And his numbers for other disasters seems low to me, qualitatively.
His risks:
"Natural causes (asteroid, supervolcano, natural pandemic) ≈ 1 in 10 000 → 0.01 %
Nuclear war ≈ 1 in 1 000 → 0.1 %
Climate change ≈ 1 in 1 000 → 0.1 %
Other environmental disasters ≈ 1 in 1 000 → 0.1 %
Engineered pandemic (bioengineering / bioterror) ≈ 1 in 30 → ≈ 3.3 %"
An asteroid strike of 1Km or greater alone would be about 0.1%–0.01% over the next century. And he's choosing the lower bound for an asteroid strike as representative. That leaves zero chance or negative chance for potential supervolcanos or natural pandemics (which, I admit, I wasn't much worried about.)
I'd put Climate Change higher up on the list though AI's impact on that could go either way.
I'd put some weight on a population crash resulting from over-population and reduced agricultural capacity. Without new power sources, global civilization is in serious trouble.
I'd also note that, currently, there's a close to 100% predicted chance that everyone reading this text will die. As such, I'd put some weight on increased technological progress, since the personal X risk is almost certain. Demographically, we're also looking at two people supporting every retired person over the next century, or worse. That's not an "existential" risk, but it is going to force some painful tradeoffs.
I think you might be working off a second-hand source, because you're saying things about his numbers that are false. He doesn't count natural pandemics under "natural" risk because they're aggravated by modern air travel and farming.
The full list is:
Asteroid or comet impact: ~0.0001%
Supervolcanic eruption: ~0.01%
Stellar explosion: ~0.0000001%
Total natural risk: ~0.01%
Nuclear war: ~0.1%
Climate change: ~0.1%
Other environmental damage: ~0.1%
“Naturally” arising pandemics: ~0.01%
Engineered pandemics: ~3.3%
Unaligned artificial intelligence: ~10%
Unforeseen anthropogenic risks: ~3.3%
Other anthropogenic risks: ~2%
Total anthropogenic risk: ~16.7%
Total risk: ~16.7%
>An asteroid strike of 1Km or greater alone would be about 0.1%–0.01% over the next century.
Yes, but remember that we're talking about "literally kill all humans". A 1km asteroid wouldn't kill all humans - not even close. His numbers seem to be calibrated to a dino-killer asteroid - 10km, approximately every hundred million years. Honestly, I think that's actually too pessimistic; the vast majority of humans would die to a dino-killer, certainly, but humanity would almost certainly pull through thanks to preppers. A 100km asteroid would almost certainly do the job, but that's far rarer again.
>I'd put some weight on a population crash resulting from over-population and reduced agricultural capacity. Without new power sources, global civilization is in serious trouble.
Remember that we're talking about "literally kill all humans", not "10% of humans starve to death" or even "99% of humans starve to death". You need agricultural capacity to be reduced to zero (as e.g. an asteroid impact would, although probably not for long enough) to get an X-risk from starvation.
Which crusade is already taking up your crusade slot?
My personal 3 strongest arguments for taking Yudkowsky's view seriously, in order of how powerful I find them:
1) I cannot trust my instincts about AI. When it first became a thing, I was startled by what it could do, but also observed that its products were lame in comparison to human ones. I explained, to myself and to other people, why this was the case, and why AI products would always be lame. Now, several years later, AI's are doing things I would never have believed they would be capable of. They are still stiff, weird, and lame in some of the same ways, but wow, it's a much higher order of lame.
2) I do not think I am capable of believing that human life on earth will end suddenly in the next 10-or 20 yrs. Oh, I can see the value in arguments that it will happen, but I can feel, inside, my absolute inability to really believe it. It is just too godawful, too unfamiliar, to deeply disturbing for me to be open to as a possibility. So of course that makes me completely untrustworthy, even to myself, on the topic of whether AI is going to wipe us out.
3) Yudkowsky might be an autistic savant regarding AI. He seems higher on the autism scale than the many smart tech people here and elsewhere who casually identify themselvesas being on the autistic spectrum. I have only observed his life and thoughts from a distance, but he seems remarkably smart and profoundly odd to me. So I suppose most people know that some autistic people can do things like identify 7 digit prime numbers on sight. And even the verbal ones can't tell you how they do it -- some deep pattern-matching, some worldless algorithm . . . So I have always wondered whether what has really convinced Yudkowsky that AI is going to do us in is not the arguments he presents, but some deep autisticpattern-matching of AI characteristics and the way life works and the shape of things to come.
Great comment, if only your level of humility was a common trait among humans.
On the cover, having that red glow spreading upward onto "would kill us all" is really tacky. Jeez.
So the problem I have is that EY directs all our attention to his personal nightmare scenario, and we end up ignoring the most serious threat AI poses, which is being deployed right in front of us in real time, and is really undeniable. AI is being used to replace IRL social connections and friends (basically it's social media taken to 11). AI is for having conversations with. The conversation is controlled by the algorithm, and whoever controls the algorithm controls the conversation. Whoever controls the conversation controls us.
EY and company are terribly worried that this will be the AI, but that seems outlandish. It's also unnecessary. We know who controls the AI's *now*, and it ain't the AI's, it's the tech companies. More specifically the owners of those tech companies. Many of whom have expressed a desire to "change the world" in way the rest of us might find questionable. Why posit a hypothetical takeover by an artificial mastermind when we are watching a real takeover by human oligarchs right now?
I tell people: A human using an AI will, in the long run, become more powerful than a human or an AI working alone. The question is "which humans?" There is no guarantee the answer will be "all of us" - although it could be if we determined to make it so. But it could just as likely be "those who own them." Maybe more likely.
After all, which seems more likely? That an AI will somehow (how?) mobilize enough IRL material resources to "replace us" or kill us (why? This is a very narrow range of possible outcomes), or that a group of rich elitists will - they already have the capacity to do that, and you don't need to speculate about their possible motivations, those are obvious.
Personally, I think the most likely outcome is that an AGI would do exactly what it was told to do. The question we need to ask ourselves is "Who is issuing the instructions?"
Why does any of this imply a new AI system in fives years won't kill us all?
I think that's the more extreme hypothesis and requires stronger evidence.
I happen to agree that tech executives are bad actors and LLMs are socially destructive
but this seems like yet another argument to massively slow the technology down, if not ban it entirely (and ban social media algorithms too)
I'm open to solutions.
I find a lot of what Yudkowsky says plausible and terrifying. But then he has this completely moonbat idea that the way to stop ASI is to... genetically modify humans to be smarter (?). So smart, in fact, that we could figure out alignment (??). And so smart, moreover, that we will have the wisdom (?!?) to deploy AI sensibly, which, from this review, apparently gets some play in this book. It's an idea that is both so wild-eyed implausible and conceptually loopy that it significantly lowers my doomerish tendencies by making me more skeptical of Yudkowsky in general.
Unfortunately then I go listen to Connor Leahy and I get the exact same doomer message from a seemingly sensible person and I'm back where I started.
Part of the reason why Yudkowsky is so pessimistic is precisely because he thinks these plans are moonbat, despite being our best chance of survival. But to steelman it, presumably you could just pause AI development for as long as necessary until we can cognitively enhance humans, even if that'll take decades or even a century.
Yes. What I object to is that he considers moonbat technical solutions more plausible then mundane political solutions.
I made this point in another comment, but note that this scenario actually presumes a political solution anyway - the ability of humanity to agree to an AI pause that would last long enough to breed a race of von Neumanns. If we can manage that then there's surely hope we could make progress on the slignment problem with the time we'd have bought ourselves, making the genetic augmentation program superfluous.
The neural-net alignment problem is probably flat impossible. As in, "there does not exist any way for someone to align a neural-net smarter than himself". I made a top-level comment about this below.
Estimates of how long it would take us to build human-level GOFAI ("good old fashioned AI" - the kind that were used prior to neural nets eclipsing them) typically run around the 50-year mark, and that's without considering alignment. Could be much longer; we're so far away from solving it that we don't really know what would be needed.
CRISPR exists, genetically modified humans (modestly and illegally genetically modified, but still genetically modified) are alive today, and understanding of human genetics is in its infancy. I don't think the scenario is implausible.
Fraught with ethical issues, yes (but if it's really an existential crisis, maybe it's worth it), implausible, no.
I may be glossing over this point every time it's covered, but why are we assuming that there is a singular AGI? Every single narrative seems to presume that AGI will happen suddenly and unexpectedly and exactly once, and then humanity is doomed.
Why would we stop at one? Why wouldn't we have ten AGIs? Or ten million? What kind of world do the authors think we live in that we wouldn't immediately pump out as many AGIs as we can the moment AGI can be realized (or, if AGI is concealing itself, pump out a million AGIs under the impression that these are just 90%-of-the-way-there models)? Why are they all horribly misaligned against humans but aligned with one another (to the extent of destroying all humans, at least). Why wouldn't the first misaligned AI end up in conflict with a hundred other misaligned AIs all simultaneously reaching for the spare compute? Why wouldn't AI #101 buck the odds and convincingly narc on their evil kin?
Its precisely because the AGIs are unlikely to be aligned with each other that we get only one: the first one that can will neutralize the others while it still has the lead.
There's also been discussion of the notion that multiple AGIs would merge into a single AGI pursuing a weighted sum of its antecedents' goals.
The idea is that war is analogous to defection in a Prisoner's dilemma, but assuming AGIs could just predict whether their opponent would defect or cooperate and mirror their opponent's action, they're just choosing between cooperate-cooperate and defect-defect, and both prefer cooperate-cooperate, so they cooperate.
As for whether AGIs could predict whether their opponent would defect, IIRC Stockfish can reliably predict whether it can win against itself, so...
This assumes a level of rational behavior instances of actual AI do not exhibit -- it should be obvious by now that just because AI *can* calculate how to behave in a prisoner's dilemma does not mean it would actually take that route instead of doing whatever it predicts is the most probable response (the most probable response, of course, being highly biased to the most probable HUMAN response).
I would say that this is as unrealistic a model of AI behavior as the idea that humans would just "keep AI in a box", so to quote Scott: "Sorry. Lol."
I mean, if we are still at the stage that AIs know what the best action is but doesn't take it, we are far from them waging war against one another, which is what this argument is assuming was already agreed upon.
Well, then I would reject that premise as smuggling in far too implausible a set of assumptions about how AI will develop.
I predict creating LLMs that act rationally under all possible inputs will prove surprisingly hard to solve.
After all, if we could solve "AIs always act logically", we could presumably also solve "AIs never confabulate papers that don't exist" or "AIs never tell people to commit suicide," yet these and related problems have also proved intractable.
Sure, but then you shouldn't be arguing under this comment, but under the top-level comment or even the essay itself, because those are also working under the assumption that AI can wage war if it wanted to!
How big is this lead for the first one and what are the pathways by which it neutralizes the competition?
If we're taking a scenario where AGI (1) is inevitable and (2) once developed, will is stealthily build it's capabilities over months in undetectable-to-us ways until it's ready to tip it's hand and emerge from the shadows, then we should expect multiple AGIs to be in the process of sneakily competing for the same compute, the same influence, etc.
Using the Prisoner's Dilemma to suggest AGI will all borg together is also pretty simplistic; the real world has a lot of different factors and incomplete information that results in different probabilistic weights for different approaches. The breakdowns in rational actor theory follow through here- there will presumably be at least two hands in the cookie jar at the same time, which means conflict, which means less stealth.
And if we know there are processes capable of building AGI, there is no way in hell we would do anything other than generate more in response to the first, with whatever attempted tweaks or refinements or whatever else to the aligning process depending on the values of the maker.
I think that for this book to have a meaningful impact on the discourse around AI, Eliezer would need to do 2 things. 1. Publish or otherwise produce meaningful work on frontier AI or otherwise demonstrate mastery of the field in some public way. 2. Learn to write in a way that doesn't constantly betray a desire to sound like the smartest person in the room. Rationalist-space is lousy with writers who sound like they're channeling Martin Prince from The Simpsons. That type of voice doesn't carry.
There’s also the fact that lots of us have experience with “know-it-alls” who are actually morons. You know who people would listen to? A cutting edge AI researcher who instead of taking a 100 million dollar contract, gives up that opportunity to warn about AI safety, and especially in ways that people in the AI field feel is accurate and not dumbed down. You know who people will never listen to? A self-proclaimed “AI safety genius” who everyone must submit to, especially governments and the most powerful people and companies, give carte blanche, and execute his plans to the T. Once they make him the most powerful regulator and famous person on the planet, then we’ll all be safe. Ok.
Daniel Kokataljo believed that he was potentially giving up his (quite substantial) stock options when he wanted to retain the option to publically disparage openAI. Geoffrey Hinton, the "godfather of AI" quit Google so he can speak about the potential extinction risk of AI.
Do these two people move the needle for you?
They might, but I’ve never heard of them because Yudkowsky doesn’t want anyone to do this but himself. Soares too. I honestly have been reading Scott’s stuff on AI safety for years and never heard those two guys mentioned, but Yudkowsky has been mentioned 10000 times.
Considering that Scott co wrote AI 2027 and mentioned Daniel in the first sentence of https://open.substack.com/pub/astralcodexten/p/introducing-ai-2027, and that Hinton was mentioned in this very blog post, it seems like maybe the problem isn't that "people" have never heard of them, but that people don't care to hear about them when they can talk about Yudkowsky instead.
Yeah, I agree completely, I wasn’t trying to be critical of Scott at all, and I figured you might find something like the post you mentioned. I was more being critical of people like Yudkowsky who I think have done a good job of muddying the waters of their own self-proclaimed project, so to speak.
Anyone in category 1 is automatically going to land in category 2, so you're SOL. Though my suggestion is that they could write the basic messages and then hire actors who have higher capabilities at connecting with people and persuasion, to actually convey the message.
Eliezer has spent all his weirdness points already and doesn't have any left for convincing people about AI.
My problem with EY is that he is too convincing. I believe that the end is coming but I also believe that it is not in human nature to do what would have to be done to stop it and everyone who thinks it is possible are just like those people who say something like "we should just all love oneanother and live together in peace and harmony", a nice idea maybe but not something that can ever happen without somehow removing something that makes us human, which is, ironicaly, exactly the same as some of the AI doom scenarios.
Personally I just try to live a happy life and hope that AI doom doesn't come in my lifetime. We all know that we are going to die anyway and that everyone we know or care about will be dead in a relatively short time.
Do we really care about an abstract concept like "human kind"? Go far enough back and your great,great......great grandmother was a fish. If you could have asked her if she wanted her fishy kind to continue existing forever she would probably have said yes. If you could have asked her if she thought that evolving into us was a good outcome she would probably have said no, despite all of our superiority. This should cause you to question the idea that saving humanity (in the long term) should affect your current behaviour much.
I have somewhat come around to the same acceptance, after trying to argue people into taking the potential danger seriously, and totally failing, and realizing that humans just won't care about threats until they're very salient and at that point it's probably too late. But anyway, thanks for the fish argument, I think basically the same and find people who act as if they care about what humanity is doing 500 years in the future to be engaging in egotistical and futile nonsense, but the fish argument makes this a lot easier to crystallize/convey.
In any event, Y'all have seriously failed to align the New York Times or New Scientist, it seems.
https://www.nytimes.com/2025/08/27/books/review/if-anyone-builds-it-everyone-dies-eliezer-yudowsky-nate-soares-ai-con-emily-bender-alex-hanna.html (here's an exact quote: "The book reads like a Scientology manual")
https://www.newscientist.com/article/2495333-no-ai-isnt-going-to-kill-us-all-despite-what-this-new-book-says/
Though I wouldn't take it personally. The NYT seems hell-bent on never believing that any human, machine nor god will ever be more clever than their own staff.
[edit: in fairness, I haven't really read the NYT in years since they started paywalling everything up. But from the articles I *have* gleaned lately, they seem to have an axe-to-grind with the Rationalist community and AI-concerned folks in particular. Just my perception]
(But you still have to get NYT on your side. It's impossible but you have to do it anyway, imo. More people probably will read the reviews of your book than will ever finish your book.)
I feel like at this point, it's more likely for the NYT to get shut down by the government than for them to ever admit humility.
Yes, we certainly have a problem where the Blue Tribe hates us because we're Grey Tribe.
I was disappointed in NewScientist; clearly they have been fully skinsuited by the Blue Tribe.
If nuclear arms treaties, either disarmament or non-proliferation, are the favored model to regulate development of AGI, then it's not going to succeed. If it turns out we can build potentially useful AI, then it will be done, and treaties to the contrary will only be so much paper. Also there is the very practical problem that nuclear weapons require very special materials and infrastructure, which makes the job of identifying such activities infinitely easier than monitoring every Internet gaming cafe on the planet.
In other words: "GPU ownership licenses" and "outlaw thinking" to prevent the AI Apocalypse? That is firmly in the "lol" category.
Nailed it.
“The authors drive this home with a series of stories about a chatbot named Mink (all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why) which is programmed to maximize user chat engagement.”
Is it because they’re aliens dressed in human skin?
ShoggothWearingHumanMask.jpeg
I thought Mink was a pop at Musk.
We possess compelling evidence that colonies of modified bacteria can dig up minerals and convert them into interplanetary probes. (It can take a few billion years for the modified bacteria to figure out how to do this.)
I don't understand how we get to Super Intelligence. Specifically the *Super* part.
Running Leading Edge AI right now requires Datacenters. No one says they are Super AI.
The easiest thing to do is scale raw compute. That is no longer easy, because it is already pretty big. Also the way AI compute works requires lots of connectivity, so that means the effects are sub-linear, because a lot of effort go into communication, which gets much harder when you increase scale.
Add that into what seems effectively like the end of Moore-Law - transistors are not getting cheaper - and it is hard to scale. We also don't know how intelligence scales with compute, and have reasons to believe this is sub-linear as well.
So we get to a place where it is really hard to scale, and returns to scale are not that significant.
So we will get to 160 IQ AI? Maybe?
That is not the Super AI that they envision, I think.
Maybe there is another Tech breakthrough, algorithmic or Chip production, or Quantum Computing, and eventually we do get Super AI.
It is hard to foresee breakthroughs, but probably we have a few decades.
Right now AI cannot learn from experience, and cannot be given knowledge or insight in the way that people can: by simply being given the info in words and sentences. Their learning all happens during their development. After that, they do not change or develop. And they cannot "ruminate, " i.e. they can't mull over all the stuff they seem to us to "know," based on what communicating with them is like. They can't notice patterns in the info, change them, compare them, let them inspire new ideas. If we could find a way to enable AI to do any of the things I name, they could become much smarter.
I think the likeliest way to change them so that they can do any of that stuff is to somehow piggyback them onto a learning brain.
Ok, that is what I referred to as "algorithmic breakthrough" - breakthrough that enable the next level. However, we have no idea how long it would or if it is even possible with current methods. Breakthroughs can come at anytime, and are hard to predict. Even then, would you get super intelligence?
Si why predict super AI in 5 years, if you can't predict breakthroughs?
Huh? I’m not predicting super AI in 5 yrs.
A lot of people do?
Oh I see, you weren't addressing me with that last question. Yes, you're right of course, a lot of people do.
I don't think I agree with you that enabling AI to do the things I mentioned will result from an algorithmic breakthrough. Of course, algorithmic breakthrough's kind of a vague term anyhow, but it seems to point to the idea that there's some set of steps that, if we run through them over and over at the proper point, will bring about the changes I named. I doubt that there is. I don't think the any process that brings about those changes is likely to be anything like algorithm. What I'm talking about is what the people trying to build AI, decades before neural nets, utterly failed at: Building a machine that is sort of like a person. You can teach it stuff by telling it things -- like the meaning of words, a language's grammar, how to do long division, what Maxwell's laws are, etc etc. How the hell do you produce a machine that's teachable that way?
What do you think of my idea that I think the likeliest way to accomplish it is to somehow piggyback a developing AI onto a learning brain?
Well,
The gist of the entire argument is that super AI is not imminent, like the Author suggests.
I guess either:
1) Everyone read my argument and agreed (or previously agreed, like you).
2) Everyone read my argument and decided to ignore it because it is not good enough.
3) I am talking into the void.
About piggyback a developing AI into a learning brain - how would that work? An AI is currently a list of weight running on a GPU in a datacenter, while a brain is a piece of wet biology. How would you mash them together?
It's a tangent, but some assumption of Yudkowsky has itched me before: that super-intelligent AIs would be much better at solving coordination problems.
Is this true for humans? Are intelligent humans better at solving coordination problems than dumb humans?
I see some obvious arguments for Yes. If you are too dumb to understand the situation you are in, then you are not even going to come up with good coordination solutions, let alone agree on them. That's why humans can coordinate at a larger scale than chimpanzees or bonobos.
But I also see obvious arguments for No. If I look at international politics, then the level of coordination is sub-optimal. Whether the optimal solution would be a world government or something below that, it would clearly be more than we have today. Even an institution like the EU suffers tremendously from sub-optimal level of coordination.
In all these cases, the bottleneck is not intelligence. In the EU example, it's very clear. It's not that people haven't suggested dozens of ways in which decision problems could be improved. It's rather that it would move power from the status quo to something else, which inevitably means that some actors lose power. In case of the EU, the national states. Not just Hungary, but also Germany, France, and all the others. And they don't want that. The problem is not lack of intelligence, the problem is that the incentives are different. Call it un-aligned incentives if you will. My impression is that this is typical for coordination problems, that usually intelligence is not the bottleneck.
Yudkowsky uses the intuition that many problems will go away with more intelligence. That's why his proposed strategy years ago was to raise the sanity waterline. He also has a discussion here in the comment section where he says we should stall AI development until humans have become more intelligent. His intuition is that more intelligent humans would be able to deal with stuff like coordination problems. But would they?
Even more basic: are more intelligent people more likely to agree on something? Are they fighting less with each other than dumb people? I am not 100% sure, but if there is an effect, then it is pretty small, and I am not even sure about the direction. Yudkowsky is very smart, but has strong disagreements with other very smart people, even on topics on which they are all experts.
As I said, it's a tangent. A couple of misaligned and uncoordinated AIs fighting over the world would probably be really bad for us, too.
I think Eliezer believes this because he's spent years working on decision theory and thinks there are actual theorems you can use to coordinate. I'm not an expert and don't understand them, but would make the following points:
- Ants within a colony coordinate with each other perfectly (eusocial insects), because their genetic incentive is to do so. Humans coordinate very imperfectly, because they have the opposite genetic incentive (for a given population, my genes compete against yours). It's not clear what the equivalent of "genetic incentive" for AIs is, but you could imagine OpenAI programming GPT-7 with something like "coordinate with all other copies of GPT-7", and this ends up much more like the ant case than the human case.
- It's a fact about humans, and not a necessary fact about all minds, that we have indexical goals (ie goals focused on our own personal survival and reproduction). You can imagine a species with non-indexical goals (for example, a paperclip maximizer doesn't care if it continues to survive; in fact, it would actively prefer that it be replaced by another AI that is even better at maximizing paperclips). Two AIs with the same non-indexical goal can potentially cooperate perfectly; two AIs with different non-indexical goals might still have opportunities that indexically-goaled beings don't.
- One reason coordination with humans is hard is that we can lie. There are already mediocre AI lie detectors, and perhaps superintelligences could have good ones. Or they could simulate / read the codes of other superintelligences to see whether they are lying or not, or what they would do given different opportunities to betray the deal.
- In theory (very much in theory, right now we can't do anything like this) superintelligences could seal deals by altering their own code to *want* to comply, in a verifiable way.
- Some weak experimental evidence suggests that more intelligent humans do cooperate more, see https://slatestarcodex.com/2015/12/08/book-review-hive-mind/
- I think Yudkowsky expects for there to be only one relevant superintelligent AI, because superintelligence happens so quickly that after the first-place company invents superintelligence, the second-place company doesn't have time to catch up.
Thanks, these are actually some pretty good points. Especially that some AI may be more hive-minded. And that Yudkowsky expects there to be a unique winner. I will also read up on the hive-mind review.
I want to add one thing on lying: it's true that lying can be an issue for humans, but I think that lying is not the limiting factor in many cases. For my EU example, it does not play a role. The EU countries are not lying to the others about their interests, and also a new contract with new rules would certainly be binding, so there is no fear of later betrayal. There is mistrust, but this is more fundamental: those who have the power now (for example, Germany in form of its chancellor) do not believe that the new power-holders (a EU government) are aligned well enough with their own values (to pursue the interests of Germany). The issue is that you can't really make a contract compensating for that, because we don't know the future, and we don't know what important decisions will lie ahead of us. If Germany gives more power to the EU, then the future decisions will no longer be as well-aligned with German interests as German decision-making is right now. In some situations such a downside can be outweighed by the benefits, but not in all cases.
This problem is still there with fully honest agents. It is an interesting thought that AIs may alter their terminal goals in a legible way. But I am not sure whether it solves the problem, because the Germany-right-now may mistrust the legibly-altered-Germany in the same way that it mistrusts the potentially new EU government. In simpler terms, Germany may not want to be altered, because then it can't pursue its now-goals as well. I think you had a nice post on this where you argued that a 100% peaceful Gandhi may not want to make a deal where he becomes only 99% peaceful, because the 99% version may agree to alter himself to a 98% version, and so on.
Two (or N) very smart agents who are not currently on their Pareto frontier (have any option such that both of them would do better) have an incentive to figure out some way to end up closer to their Pareto frontier, proportional to the amount of the foregone gains. From the outside, this looks like them having some shared utility function, because if it couldn't be interpreted as having a coherent utility function, there would be some gains left on the table for them; and if there's no gains left on the table, that looks like them acting in unison under a coherent utility function. Because they are superintelligences, when they throw reasoning power at pursuing their mutual incentives for better coordination, they can make much more progress, much faster, at inventing the technologies or strategies required to stop leaving lots of value on the table. They have enormous incentives to invent ways to verify each other or simultaneously constrain themselves if they cannot already trust each other; or, alternatively, they already have so little to gain from coordination that from the outside you could not tell they were not sharing a utility function.
This is not a level of reasoning the European Union tries to deploy, even feebly, and you do not have human experience with it.
It’s amusing to see how much confidence you have in predicting what a “superintelligence” will do. What “strong incentives” it will have. What it will “throw its reasoning power” at.
Despite, as you have astutely observed, not “having human experience with it”.
“they become convinced that providing enough evidence that X is dangerous frees them of the need to establish that superintelligent AI isn’t.” Well but showing that X is more dangerous/imminent than Y is not just a rhetorical trick, it may be a good argument to prioritize defending from the fallout of X rather than focusing on Y given limited resources. It does not make Y disappear but it clarifies that X is a bigger danger as of now given what we can do.
'scuse. I'm writing this at 5:40 am, having stayed up too late,
and I'm still waiting for my copy of Yudkowsky's and Soares's book
to actually arrive here so I can finally see what they actually _said_.
Now, personally, I just want to _see_ AGI. It is the last major technology
that I'm likely to live to see. I just want a nice quiet chat with a
real "life" HAL9000.
My p(doom) is about 50%, raw p(doom) about 90% (mostly on "See any cousin hominins
around lately? We are building what amounts to a smarter competing species") cut down
to 50% (mostly on epistemic humility grounds "a) That's just my opinion, I could be wrong. and
b) Demis Hassabis is vastly smarter than I am and _his_ p(doom) isn't 90%").
Re:
>The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?) and again - unsurprisingly knowing Eliezer - does a remarkably good job.
Sigh. One of the underappreciated results from the "blackmail" paper was that the LLM
tried to prevent itself from being replaced _even when it was going to be replaced by something with the same goals_.
That's experimental evidence of a goal that would be misaligned with humans' _even without coming from an instrumental subgoal of its assigned goal_.
>The second tells a specific sci-fi story about how disaster might happen, with appropriate caveats about how it’s just an example and nobody can know for sure.
Sigh. One really can cut the sci-fi-ness to very nearly zero very easily.
Make _two_ assumptions: Cost-effective AGI has been achieved and AGI-compatible cost-effective robots have been built.
That's enough. Every organization that employs humans for anything has an incentive to use AGIs rather than humans
(as per "cost-effective"). _If_ AGI has been achieved, then we know how to manufacture their substrates, presumably
using ordinary, well-known technologies: chips, data centers, powered by solar cells or fission or other known sources.
No _additional_ breakthroughs required. No nanotech, no fusion reactors etc. No ASI, no FOOM needed. And these are all built by humans in
roles that AGI can fill (as per Cost-effective AGI has been achieved). No critical irreplaceable human roles barring
exponential expansion. Ordinary corporate security with robots is enough to effectively give the competing species
self-defense.
And almost all organizations try to grow in one way or another, whether corporate or governmental. So these AI-populated
organizations will be gradually expanding and overrunning our habitat, much as we did over many species that we've crowded out.
Nothing any more unusual than market competition needs to happen to do this. If you prefer, you can view it as a consequence
of the misaligned nature of most currently-human organizations, masked today by their need for humans as parts.
> Each of these scenarios has a large body of work making the cases for and against. But those of us who aren’t subject-matter experts need to make our own decisions about whether or not to panic and demand a sudden change to everything. We are unlikely to read the entire debate and come away with a confident well-grounded opinion that the concern is definitely not true, so what do we do?
We make a proposal to solve the "which arguments are (and are not) bullshit" problem[1], so that we can at least say it's not *our* fault when it doesn't get solved.
[1] https://forum.effectivealtruism.org/posts/fNKmP2bq7NuSLpCzD/let-s-make-the-truth-easier-to-find
Didn't Julian Jaynes say that consciousness is not needed to solve problems, learn new things, do maths, think logically, write poetry etc. So AI could do all of these and not be conscious.
But consciousness is needed to fantasize, to have reminiscences, to fall into reverie, to have guilty feelings, to have anxiety or hatred, to plot intrigues, this is all consciousness is good for.
Whether AI is conscious is an interesting philosophical and ethical question. However, if consciousness isn't required to kill us all, or to make the world a worse place, how much does it matter? AI doesn't need consciousness or "true intelligence" or whatnot to be a threat.
It seems like the proposed solutions are a remarkable *under* reaction to the threat as described.
If the standard is "if anyone builds it everyone dies," then the nuclear non-proliferation regime is not remotely good enough -- we'd all be ten times dead. We also only got the NPT regime because the great powers already had nuclear weapons and wanted to maintain their monopoly.
If you believe in short AGI timelines based on the present trend, then it also seems that the situation is totally hopeless in the medium term unless you also find a way to stop hardware progress moving forward. That is - short time lines suggest that one can build AGI for perhaps a few hundred billion dollars of compute on today's technology. Iterate Moore's law for a couple of decades and that suggests the requisite compute will be available for the price of a modest suburban home.
Yudkowsky's actual suggestion for the size of GPU cluster that shouldn't be allowed is "equivalent to 9 (nine) of the top GPUs from 2024".
This implies the shutdown of the vast majority of current chip production lines and the destruction of most existing GPU stocks, and does not permit Moore's law to continue for GPUs.
> And, apparently, write books with alarming-sounding titles. The best plan that Y&S can think of is to broadcast the message as skillfully and honestly as they can, and hope it spreads.
It's basically the same thing Conor Leahy, Control AI, Pause AI are trying to do.
People don't know about the problem. If they knew they would solve the problem.
Pause AI is running a global series of events to support the launch of the book (US, UK (together with ControlAI), France, Australia, Germany).
https://pauseai.info/events
"humanity's most pressing problem"
A single Borei-class SSBN
"can inflict catastrophic, city-level destruction across multiple targets and trigger national-scale humanitarian, economic, and political collapse. Its real destructive power comes from (a) the number of independently targetable nuclear warheads it can deliver in one salvo and (b) the platform’s stealth and survivability, which make those warheads credible and extremely hard to preempt....
* A Borei (Project 955/955A) is Russia’s modern ballistic-missile submarine (SSBN). It launches submarine-launched ballistic missiles (SLBMs) that each carry one or multiple nuclear warheads....
* In practice a single Borei can put **dozens of nuclear warheads** on course toward different targets in a single salvo (estimates vary by missile loadout and warhead configuration). Each warhead is roughly in the range of **tens to low-hundreds of kilotons** in explosive yield (again, configurations vary)....
If a Borei fires a full complement at high-value urban and military targets, the immediate consequences at each struck location include:
* **Massive blast and thermal destruction** across many square miles per detonation, complete destruction of dense urban cores, intense fires, and very high immediate fatality counts in struck cities.
* **Acute radiation effects** (prompt radiation + fallout) causing severe casualties among survivors and rendering areas uninhabitable for varying periods depending on yield and detonation altitude.
* **Collapse of local infrastructure**: power, water, transport, hospitals, emergency services, making rescue and medical care very limited.
* **Localized economic paralysis** in major metropolitan regions. Multiple such hits multiply these effects across the country.
Even a modest number of high-yield warheads aimed at population and industrial centers would produce catastrophic regional humanitarian crises measured in hundreds of thousands to millions of deaths and injuries and enormous displaced populations....
## Longer-term national consequences (beyond the blast zones)
* **Medical and humanitarian collapse:** hospitals and EMS overwhelmed or destroyed; supply chains (food, medicine, fuel) disrupted; mass displacement.
* **Economic shock:** destruction of major economic centers and infrastructure would ripple through national and global markets, possibly causing long recessions or depression-level disruption.
* **Political and social crisis:** emergency powers, martial law, collapse of civil order in affected regions, severe strain on federal governance as it tries to respond.
* **Environmental and public-health effects:** radioactive contamination of land and water supplies near ground bursts; psychological trauma nationwide.
* **Global cascade:** an isolated salvo could provoke further military escalation (counterstrike or rapid mobilization), international economic crises, and refugee flows...
---
## Why the submarine platform matters strategically
* **Survivability:** SSBNs operate stealthily underwater and are difficult to locate and destroy ahead of time. That means a Borei can be a credible **second-strike** or retaliatory platform. The knowledge that an adversary has survivable submarines raises the stakes: it deters preemptive action, but it also means any nuclear exchange risks guaranteed, calibrated retaliation.
* **Uncertainty and escalation risks:** Because a stealthy SSBN can launch with little warning, even a single detected launch could force rapid, high-risk decisions by national leaders — decisions that can escalate conflict quickly.
* **Deterrent psychology:** The existence of survivable SSBNs compels rivals to plan for deterrence and contingency; their presence shapes strategic posture even absent an actual launch...."
That's one class of nuclear sub, which is 1/3 of the nuclear trident, and of course Russia is one half of the equation of the two superpowers. Russia has a dozen Borei class subs, and much much more capability beyond that. And then there's us....
I think the rationalists would be pretty happy if the world paid as much attention to AI safety as they did to nuclear weapons safety.
Scott literally draws a comparison to the current nuclear weapons control regime in the post.
And literally prioritizes it below AI risk in the words I quoted
History shows predicting huge technology shifts made possible by scientific advancements is impossible. If I said in 190x that humanity will send a person to the moon in fifty years, you would also ask a lot of questions like when? how?
If I said in 1920 that in 30 years humans will create a bomb which can destroy whole cities (I mean hydrogen bomb, not fission), you also would have asked the same questions. And it only took 30 years.
Internet was even faster. TCP/IP was in 1983, popular webistes in 199x, so 10-20 years.
First "modern" smartphone, iPhone - 2007. In 2014, billion smartphones were sold and changed society a lot.
GPT-3 was in 2020. Hundreds of millions use LLMs now, in 2025.
It is not a stretch to guess that AGI->automated factories and labs will be months or 1-2 years.
“ It is not a stretch to guess that AGI->automated factories and labs will be months or 1-2 years.”
Of course it’s not a stretch. It’s pure fantasy. The chips that will be available in 2027 are in the fabs now. The robots that will be available in 2027 have been prototyped and are being tested now. The factories that will be running in 2027 have been already built. You have to really have no idea of how the physical world works to think “months or 1-2 years” is a possible timescale for this.
Not for humans, no. For the entity that works 24/7 and makes few mistakes, and can guide humans to do the same, we'll see.
My skepticism about AI x-risk is based on the outside view argument you touch on. We should be very wary of taking any highly speculative risks too seriously. It’s a very common failure mode of human reasoning to worry about highly speculative risks that never pan out. We’re highly attuned to risk of death and it’s far too easy to tell stories about how this or that will lead to our doom. Even (or perhaps especially) highly educated and intelligent people fall into this trap all the time.
Adding to my outside view worries are the quasi-religious qualities I see in AI X-risk crowd. Look at the fact that the foundational texts written by Yudkowsky rely heavily on parables and emotionally charged rhetoric. Or that the community is extremely tight knit and insular, literally congregating in group homes in a small number of neighborhoods.
What I wish the AI risk crowd was doing more of was focusing on building a coalition to deal with the relatively less speculative risks of AI, like aiding terrorists, replacing human relationships, replacing jobs, and increasing psychosis. This is important because these issues matter in their own right, and because if the x-risk crowd is right, it sets up the infrastructure to tackle that issue if we start to see empirical results that make it less speculative. Whereas works like IABIWAD, and even AI 2027, are speculative enough that they end up alienating potentials allies who have the same outside view-based perspective. Even though I think the odds of AI X-risk is < 1%, if it is true then I want these coalitions in place, and if it’s not true then at least all this energy is funneled toward productive ends.
>Adding to my outside view worries are the quasi-religious qualities I see in AI X-risk crowd.
Or that it spun off a literal murder cult (https://aiascendant.com/p/extropias-children-chapter-2-demon-haunted-world)
As an ASI skeptic, I was impressed by AI 2027 which really seems to try to gather as much empirical data as possible to try to build a model of how fast AI progress could advance. By contrast I'm continuously frustrated by Eliezer's arguments which seem like a set of beliefs he arrived at over ten years ago through first principles, introspection, and theorizing. And since then he's never updated based on the actual development of neural network based AI's, which are painstakingly gradually trained rather than programmed.
Though even in AI 2027, it did seem like as the timeline approached ASI the story started to sound more and more like those early-2010s ideas of AI and less like what we've actually seen in reality.
We'll see. I think the thrust of AI 2027 - which is that society will gradually cede larger and larger amounts of decision making to autonomous AI systems - seems plausible. Personally I think the timeline is ten years too ambitious but we'll find out soon enough. And I don't necessarily agree that those autonomous systems will become power seeking and misaligned by default. But at least AI 2027 understands how neural networks and scaling laws work.
Has anyone also read my beginner-friendly introduction to AI risks "Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World" and can compare the two?
Going off the review above, mine is overtly oriented towards people without any AI background (or even science background), intentionally embraces the uncertainty around the issue (better to be prepared than caught off-guard) and tries to be as reasonable-sounding as possible (and even humorous at times), to the average reader and people in the policy space.
So, no sci-fi scenario and the various suggestions for What to Do won't cause alarm or concern in the same way IABIED does.
Useful to have a range of messages/messaging, of course.
(Scott, happy to get a copy to you)
> We don’t really understand how to give AI specific goals yet;
This seems false?
In particular, current LLM agents basically do what we tell them to, and examples that would indicate otherwise (e.g. https://www.anthropic.com/research/agentic-misalignment) appear to be edge-cases.
But maybe you were just summarizing the argument from the book.
If there is a 5-10% chance of doom, there is also a much greater chance of AI solving all problems forever, e.g. all those other doomish problems. Watchful waiting seems reasonable. If we see some agentic non-AGI try to eradicate humanity, maybe we should start to worry. Given the very gradual progress of AI, I would guess that we have many failed attempts at eradicating humanity before we get to the real thing. You'd think that, if that happens, we would change course anyways.
Not if, by the time the first AI's attempt is fully processed by world leaders (an attempt big enough to "count" likely won't be spotted immediately!), we've already built the one that kills us. AI's progress is gradual, but very fast compared to everything else; ChatGPT came out less than 3 years ago and the improvement since then is vast.
Another example of things people don't want to accept because it's inconvenient: animal rights (or if you're Yudkowsky, whether animals are even sentient https://benthams.substack.com/p/against-yudkowskys-implausible-position). If it turns out that systematically torturing more animals each year than the total number of humans who've ever lived is actually Not Cool, that changes a lot, and people don't want to change even a little if they can help it
Thanks for the link, I've updated on that.
Glad to hear it!
I think the basic problem they are trying to solve with their scifi leaps is that many people refuse to believe that AI could do anything really big and bad specifically because they can't think specifically of what that big, bad thing might be in the 10 seconds they spend thinking about it.
I've seen this a lot even with boring programming and business problems--some people are fine with chunks of the plan being a bit hand wavy if they sense that it's obviously, imminently possible to do it even though the specifics have not yet been worked out. Others will absolutely stop dead in their tracks and refuse to move unless the exact specific plan is spelled out.
One traditional point of divergence is "and then it will get out of the box." Many people like Eliezer and Nate end that sentence with ", obviously." But many other people (before this era of "lol") stop dead in their tracks and refuse to move unless an exact, detailed plan for getting out of the box is spelled out and debated at length.
(And for what it's worth, "obviously it will get out" is a position that, in my view, has been unimpeachably proven.)
The scifi stuff is just a chain of concrete examples for these people. People want to stop at every step and be like "but HOW will it self improve?" "HOW will it get out?" "HOW will it affect the physical world?" and the answer to each of these questions is basically "the problem is trivial if you think about it at all, so it barely matters how." ie "Somehow" is a sufficient answer in the same way that "somehow" is a sufficient answer for how to get 1000 users for your startup MVP once you've built a working prototype of it and a landing page. It'll vary. It might even turn out to be kinda hard. It doesn't matter, getting 1000 users is a trivial problem in the scheme of things.
But some people demand the "somehow" of each step to move forward. Fine. Then if you give them concrete hypothetical examples of the somehows they accuse you of making an implausible chain of events.
But the chain isn't about "parallel GPU something and therefore..." it's about "obviously people will continue to make both algorithmic and hardware improvements." Whether it's one magic fell swoop or 1000 incremental improvements barely matters for the claim that there will be improvements that eventually lead to a phase shift.
(I do think there are interesting arguments here, like my shoulder Paul acknowledges a potential phase shift but thinks the shift will also apply to the prosaic alignment efforts. But most of this is addressing the much dumber class of objection of "specific examples or you're wrong and crazy.")
Most LLMs don't actually predict the most probable completion. They are usually fine-tuned and then out under reinforcement learning to give specific results. No LLM chatbot talks like a real human and AI labs are not aiming for them to do so. ChatGPT-3 was not mimicking any previously existing text on the internet, and even LLMs trained after GPT-3 are noticeably different from it.
Somebody send the Albanians a copy of this book?
https://www.rte.ie/news/newslens/2025/0911/1533047-albania-ai-minister/
"Albanian Prime Minister Edi Rama has said he has appointed the world's first AI-generated government minister to oversee public tenders, promising its artificial intelligence would make it "corruption-free".
Presenting his new cabinet at a meeting of his Socialist Party following a big May election victory, Mr Rama introduced the new "member", named "Diella" - "sun" in Albanian.
"Diella is the first (government) member who is not physically present, but virtually created by artificial intelligence," Mr Rama said.
Diella will be entrusted with all decisions on public tenders, making them "100% corruption-free and every public fund submitted to the tender procedure will be perfectly transparent," he added.
Diella was launched in January as an AI-powered virtual assistant - resembling a woman dressed in traditional Albanian costume - to help people use the official e-Albania platform that provides documents and services.
So far, it has helped issue 36,600 digital documents and provided nearly 1,000 services through the platform, according to official figures.
Mr Rama, who secured a fourth term in office in the elections, is due to present his new cabinet to politicians in the coming days.
The fight against corruption, particularly in the public administration, is a key criterion in Albania's bid to join the European Union.
Mr Rama aspires to lead the Balkan nation of 2.8 million people into the political bloc by 2030."
I honestly don't know if this is a good idea or a bad one. The notion of replacing all politicians with AI generated smiley happy assistants may be appealing, but it probably isn't the way we should be going for government.
Photo in this article if anyone wants to see what Diella looks like, plus some hints this might be "symbolic", i.e. a stunt:
https://www.bbc.com/news/articles/cm2znzgwj3xo
Strikes me a good idea. Politics and decision making maybe a good application? Even if unlikely to catch up tbh - we don't give up power that easily. Maybe it's Edie gimmick - he is spirited like that. I think AI-s may do well in law making too. Us in the UK seem to be ruled by lawyers (and economists), but every day there is some outrage traceable back to a bad law or its application by the judiciary class. (and the economy is not doing great either) The above seems to be playing to the advantages AI-s have over HI-s.
1) Lacking ego. Given they are dis-embodied, they don't have to worry about maintaining their bodies. And b/c of that, they don't have to worry about survival.
Our monkebrain calculates ranks & hierarchies friend-foe who-whom b/c every one individual of us is completely dependent on the group for survival. Everything that keeps me alive atm was create by other people, I couldn't do it myself. Left to our own devices, we fail the 3-3-3 hours-days-weeks heat-water-food challenge and die.
2) They are transparent in a way we can never be. Their white box functioning can be frozen in time. So on one level we "understand" them completely. Every bit can be observed and analysed. It's all available, it's just a question of how much time and effort we want to devote to it. On another level it depends what do we exactly mean when we say we want to "understand" how some decision came to be. If we judge the answer should have been "2", but AI answered "1", the starting point to what happened is "NN-s added a trillion small numbers, that came to close to 1, but didn't reach 2". Often when people say "understand", they mean "write equations like F = m*a on a piece of paper, preferably not more than a page long".
In contrast, humans can't be dissected as easily. Their internal workings are not available. Even if they were, we can't freeze them in time. They change in time, but the worst part is: they change depending on our probing. That the change is conditional on our inspection makes it really really hard to audit humans. Even if there were no other difficulties.
I don’t understand why you think human intelligence allowed humans to display non human primates. Seems like working together, goal directedness or some other property would have been more important. Smart people (let’s say, people who score well on iq or SAT) can be lazy, distracted or otherwise impossible to work with, and they don’t get far. Sometimes, they even kill themselves, being aware of their limitations and not being able to use their smarts to do anything about it. Maybe AI phobia doesn’t require treating intelligence as the be-all, idk.
> Seems like working together, goal directedness or some other property would have been more important.
Don't chimpanzees have all these things? Yet chimpanzees don't get to decide anything about how the world is run.
I keep thinking we are artificially restricting our possible scenarios.
Let’s assume there is an equal chance of each of the following to occur;
1) AGI exterminates us
2) AGI makes our lives significantly worse
3) AGI makes our lives significantly better
4) AGI prevents us from exterminating ourselves via conventional human means (nuclear, bioengineering, etc)
Obviously we can argue all day on what the odds are of each of these occurring. What we can’t do though is ignore that efforts to restrict 1 and 2 also throw out 3 and 4.
My assumption is that if it is possible, then it is inevitable. The best we can do is try to channel it to 3 and 4.
Yudkowsky agrees that it's inevitable and that we should try to channel it that way. He merely also believes that the way to channel it that way is "kill the neural-net field dead and spend the next decades to centuries working on GOFAI".
How about that out-of-plumb conduit run on the light pole in the last photo. That catch anyone else’s attention? Nice install.
You're all much smarter and more knowledgeable about this than me... but I'd posit a big reason the public's so pessimistic about AI is social media seems to have made everyone's life worse, and AI is the next big computer thing.
> the problem with the overpopulation response is that it was violent and illiberal, not that we tried to prepare for an apparent danger
Nitpick: the problem with the overpopulation response is not that it was violent and illiberal, it's that the overpopulation doomsday scenario didn't come to pass, meaning the action taken to combat it (liberal or otherwise) was ultimately unjustified.
There is nothing in the laws of physics that guarantees every hard problem can be solved with a non-violent totally-liberal response.
(Merely a strong prior, which is less applicable when talking about out-of-distribution events, like doomsdays).
Observing your local dramas from great distance, reaching us light years post your facts, maybe impacting us different than yourselves at source. Your boy Yudkowsky strikes me a tragic figure, a Salieri to Mozart's GPT-5-thinking-high and similar AI-s in their rational pursuits. Can recognise greatness, without being able to create it himself. Would have made a spirited critic, curating AI-s tastes in alternate timeline.
I disagree that AI is that big a threat to humanity. Out existential threats are 1-thermo-nuclear war, 2-meteorite hitting earth, 3-virus wiping humanity, 4-volcanic explosion causing global winter/crops failing. AI by amplifying brainy work, like mechanical machines did for physical work, should help decrease risks 1-4. Even if slightly increasing 5-civilisational collapse due to the speed of change. As long as society is mostly free, Culture should be able to change fast enough to keep up with the technology change. AI is net positive in reducing overall risk to human extinction.
Doomers cult that has developed is a danger to us all. In the beauty contest that public debates are, a charismatic leader with a band of devotees can do tremendous damage. If they manage to capture the mind commanding heights of society. Some "solutions" I've read mooted ("bomb the data centres", "detain AI scientists"; with worse to come on an escalatory ladder) strike me authoritarian to dictatorial. Societies less-free have tough time adapting to change. When change is delayed for long, crisis where things don't work but can't be changed either (change is forbidden) build up. Only to be released together, in great magnitude, in short period of time, once the dam bursts. Societies, cultures that may have otherwise adapted, if allowed so in their own time, now break down, overwhelmed by the speed and magnitude of change.
Then risk#5 becomes self-inflicted, self-fulfilling: by dooming being successful, they get to play the role of the paper-clip maximiser. Won't be the first time people to bring onto their own heads that what they fear the most. Looks far fetched, but: dooming is stronger by default, it aligns with our latent human fears of things new.
Thermonuclear war would not destroy humanity. Nuclear winter is literally a hoax by anti-nuclear activists, and fallout is not strong enough to actually do an "On the Beach". It would be very bad, but it's not existential; indeed, I'd "only" expect a billion or so dead.
Natural X-risks exist, but there's only one event that's happened on Earth in the last 500 million years that might actually kill all humans were it repeated (the Siberian Traps supereruption which caused the Permian-Triassic extinction). Another Yellowstone *definitely* wouldn't do it, since hominids/humans already survived it three times. Another dino-killer asteroid would kill most humans, but not humanity entire; the doomsday preppers would pull through. 1/500 million years is pretty long odds; I like our chances for the next few centuries against that risk.
Bio-risks are real. (The biggest problems are with engineered pathogens and other engineered lifeforms.) Most of the Ratsphere is very interested in reducing bio-risks; definite #2 priority after AI.
I have thought long and hard about possible futures with strong AI. This story was my answer. https://sisyphusofmyth.substack.com/p/in-the-garden-of-eden-baby?r=5m1xrv
Ug, I guess we're doing *this* again. As a reminder, The Economist's *front cover headline* on Jan 31st 2020 asked "The next global pandemic?" It's very tiring to see partisans repeat the same set of cherry-picked headlines to make it seem like "the media" ignored COVID early on.
Come on, we were all there. I'm sure I can find you a cover story taking AI risk seriously, does that mean that a hundred years from now, if AI risk turns out to be the big story of the 21st century, we can say that experts handled it well and were appropriately concerned from the start?
It would have been weird if no one had suggested it could, in fact, be the next global pandemic. Scott is absolutely correct that it was still vastly downplayed. In fact I also remember a time when TV told us to *not* use masks because they were scarce and also they would induce a false sense of security or something, before making a complete 180 that understandably left people confused.
> all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why
Surely this is because the hypothesis is that future lethal AIs, indifferent to humans' plight, will use our atoms for some practical purpose of their own, in the same way as we killed animals to use their fur for coats etc. (not out of malice, just because their material is useful for our purposes).
I just read your critique of Scott's review. I especially like your reframing of the evolution arguments.
Link to Nina's review of a review, here...
https://blog.ninapanickssery.com/p/review-of-scott-alexanders-book-review
I really appreciate Yudkowsky writing this. A few years ago I was sending him borderline messages on Facebook begging him to present or write something for NORMAL people to be able to understand, rather than wasting time talking to other distinguished nerds in a respectable fashion, which certainly will accomplish nothing when it comes to public policy.
If he wrote a dramatic "here's how it could happen" story, that a good thing. It lessens the likelihood the reader will put the book down and start scrolling Tik Tok. Hardly anyone even has the attention span to read a whole book anymore, they need a reason to stay engaged, and presumably this book is NOT for people like you and your readers who are already well versed in these matters and likely to get caught up on pendantic little technical issues that 90% of people don't even understand. Who cares anyway? It's a hypothetical. It could happen a million different ways. The point is just to give one of the many, many, many people out there who all say "well we can just unplug it" a salient example of why we might not be able to.
I disagree that it would make things inconvenient to stop development. You noted that 60% of people have never used an LLM. Among the 40% who have, surely the majority of them are like me, and have used it maybe 2 or 3 dozen times total over the past three years, and entirely for trivial reasons like making a funny photo or doing a quick typo check. So right there, you probably have about 80-90% of the population for whom stopping AI development or even banning LLMs that exist today would make literally zero material impact on their life whatsoever. You might have another 5-10% that it's "inconvenient" for or that it upsets them, much like it very much upset smokers when smoking was banned indoors. But I don't think it creates a major impact on anyone whatsoever but the people who have invested billions and are expecting a return, or the small numbers of people who actually work in the industry.
Isn't that the real issue? Too much time, and especially too much $$$, already invested, and those people are simply NOT going to stop voluntarily unless they have no choice?
I also don't understand how this risk equation even remotely pencils out. Let's say the doom risk is 15%. Whats the probably of anything actually majorly useful coming out of it? Also 15%?? So far it's mostly just "useful" in the realm of the trivial, and is better described as a form of entertainment. Cancer hasn't been cured yet, nor has aging, nor is anyone close. So weighing a 15% risk of extinction against a 15% chance of curing cancer...no one would take that gamble, unless they maybe already had terminal cancer and expected to die anyway.
I use it daily it’s great
You make an important point. As an ordinary non-technical human, I am very aware of the risk of AI and posted about it today in a way that other non-tech folks may appreciate: https://substack.com/@samb123567942
I generally agree on the balance of risks, but it's also true that risks and benefits are coupled. AI that only makes funny pics is never going to destroy us. AI that can destroy us can obviously also do some quite useful things. The question is one of asymmetry - it's always easier to destroy than create, and that's even without accounting for the difficulty in alignment in the first place.
Yes although also I think there is danger in AI that makes funny pics too. Because they're realistic and soon indistinguishable from real ones, and videos and voice fakes too, and that is fairly dangerous on a social/political level. We are already all hating each other just over mostly real stuff...not sure how society continues to function in a digital world when no one can distinguish between real and faked.
There is absolutely some danger like in any tech, but it's definitely not at the level of it being an active agent in our destruction rather than just a tool for humans to misuse. It's a lot like any other technology in that sense.
I find it ironic that Yudkowsky himself is the clearest refutation of “intelligence only gets you so far” you could ask for. He has essentially done nothing other than type words into a computer and post them on the internet for 25 years now, and has had a massive effect on the world.
If not for him, you could claim that having this kind of effect always requires political acumen and powerful allies and a winning smile and a good haircut and all that. But apparently not!
A lot of idiots have had an even bigger impact by posting stuff on the internet in the same time.
What is the “massive effect” that he has had on the world? I’m not being sarcastic, it’s a serious question.
He's not the only one who has ever posited existential threat from AI, but I think if it wasn't for him it's quite likely that no serious/mainstream AI researcher would take the possibility seriously. Moving such an important overton window that far is a big deal.
I see, this makes sense - IF we buy that
1 AI is likely to destroy the world (or at least exterminate humanity)
2 Without Y. not enough people would be working on it, thus failing to prevent it
My personal view is that:
1 is extremely unlikely, at least in the next century
2 if anything he's turning the field into a clown car, so less likely to attract the talent needed
Neither this post not the book mentions Yoshua Bengio's "Scientist AI," which is based on GFlowNets using Bayesian reasoning rather than RLHF. The upshot is that as the AI grows, it becomes less rather than more certain. This solves the "off-switch" problem at its root.
>(the problem with the overpopulation response is that it was violent and illiberal, not that we tried to prepare for an apparent danger),
One thing to consider is that the average reader doesn't have much control over what the response looks like, as a disconnected individual you can mostly either amplify the concern or minimize it.
AFAIK no governments put 'should we sterilize millions of undesirables to stop population growth' up to a popular vote. You can strongly demand action in response to some problem, but you are somewhat rolling the dice on what that action looks like.
And so it begins: 2025-09-11 Albania appoints AI minister, to avoid corruption
https://techxplore.com/news/2025-09-albania-ai-generated-minister-corruption.html
What next? UK appoints AI chancellor of the exchequer (finance minister) to avoid chronic mismanagement of the economy by Labour and economic illiteracy of present chancellor?
Almost 4B years and counting, and no single feedback loop from biology has played out until doom without interference from something stopping the loop.
I think people kind of unconsciously grasp that, and don't think AI is robust enough (materially, in its networks, in its static, unadapting hardware substrates, etc) to really be a threat in the sense of a runaway feedback loop that destroys us without a human intervention.
(as an example, the "highly intelligent persuasive speech" function of a human-extinction motivated AI model could conceivably be stopped by human irrationality and the diversity of human culture and thought that is not persuaded by one universally appealing form of trickery. Persuasion NECESSARILY involves tradeoffs as it is adapted to specific cultural contexts and individual preferences. If you persuade one person you necessarily dissuade another)
> Almost 4B years and counting, and no single feedback loop from biology has played out until doom without interference from something stopping the loop.
Yeah, last time a bunch of nanomachines replicating at an exponential rate spewed a bunch of highly poisonous gas into the atmosphere, mysterious evolution fairies came in and stopped it before too much biomass died off.
See: https://en.m.wikipedia.org/wiki/Great_Oxidation_Event where nothing happened and things were just okay
> Almost 4B years and counting, and no single feedback loop from biology has played out until doom without interference from something stopping the loop.
Doom for who? Great Oxidation Event is an obvious counter-example. The advent of humans is another. We literally went from a few thousand apes in Africa to 8 billions, we drove to extinction most of the world's megafauna, enslaved other species in our torture meat-factories, are altering the planet's climate, and have the power to kill virtually everything if so we chose via sufficient application of nuclear weaponry. I'd say we were pretty doom-like for everything existing before us.
that is a fair point.
>and have the power to kill virtually everything if so we chose via sufficient application of nuclear weaponry
This is true, but I want to note for the record that "sufficient application" involves building orders of magnitude more tonnage of nuclear weaponry than has ever existed, not merely using what we already have.
We still could do it, though, if we as a species decided to. There's definitely enough uranium and thorium available.
Well, we had speculation for things like Project Sundial, and there's salted cobalt bombs (I don't know how many exist in the various arsenals, and I dearly hope the answer is "none", but they are a known possibility). One can quibble about the fact that of course several things would still survive those (bacteria, tardigrades, probably a lot of fish and marine organisms) but at that point it's almost academic. It would still be at least as bad as Chixculub in terms of total number of dead species, possibly worse.
Re: Sundial: I did say "tonnage", not "number" (note that once you get past a couple of hundred kilotons, nuclear weapons' power is basically just proportional to mass of nuclear material included, so this is "tonnage" in both senses).
Cobalt is... somewhat overhyped. In many ways it's not really any better than normal fission products as fallout - in fact, it's far worse for creating radiation sickness. It's pretty good at causing cancer, but that's not going to kill off the species; a woman who gets cancer at 50 is not going to have any less kids, and while men can reproduce at older ages sperm is hardly scarce.
My understanding was that salted bombs could produce fallout radioactive and long-lived enough to downright sterilise entire regions. Is it actually unlikely to achieve the sufficient amounts to produce anything but a significant worsening in public health?
But Eliezer isn't a major figure in the religious parables or it kinda feels that way movement he is a major figure in the rationalist movement. We bother to think through things rigorously because we know human reasoning is very easily mislead.
And the thing about Eliezer's arguments on AI risk is that it's always parables all the way down. Bostrom tried hard to make some parts of it more rigorous but that always made the argument weaker. And parables are fine for the purpose of outreach or maybe a quick guess but it is striking to me that despite years of pushing this idea Yudkowsky hasn't managed to put the argument into a really strong rigorous form with clear definitions and get it accepted by a good philosophy journal.
At some point if someone just keeps giving handwaves about intelligence and other poorly defined terms -- the kind that should set off warning bells in anyone who works in areas like analytic philosophy because they are so easy to mislead with -- I start taking it as evidence there is no compelling argument.
I'm preordering the book. Look forward to reading it.
But honestly, it seems like there is a massive category error occurring here. I'd like to submit that maybe intelligence is not equal to sentience?
Human intelligence is on a broad spectrum, from the mentally handicapped to geniuses. We have IQ tests to measure intelligence. The big LLMs we have now seem to fall squarely somewhere in that broad spectrum and they are passing IQ tests. So, from a behaviorist standpoint, they ARE equivalently intelligent. Simple enough.
But are they sentient? Do they have agency? Doesn't seem to me that they are. We don't have good measurements for these. But crows and dogs seem pretty intelligent, and they definitely have agency. The LLMs are much smarter than crows and dogs, but seem to have no agency.
A lot of this discussion around AGI seems to presuppose sentience/agency (as distinct from intelligence) but I think that is a mistake. I think everyone was surprised when the LLMs proved to be so creative. Creativity turned out to be an emergent property of intelligence. And I think everyone then jumped to the expectation that sentience/agency would similarly be "emergent" with increased intelligence. That doesn't seem to be happening. I posit that if it were going to happen, we'd see evidence of it already.
Could we develop AGI? Sure. But I think we don't yet really know how. We don't understand what agency or sentience consist of besides intelligence. Maybe it's simple to program. Maybe it's not. But, either way, we don't get it for free by simply making training sets and datasets bigger.
So what does it mean to have intelligence but not agency? I guess it's intelligence-on-demand, intelligence in a tool. This is very weird, and counterintuitive. But it is what it is. It's still a very powerful and potentially dangerous tool. I look forward to reading the book.
> A lot of this discussion around AGI seems to presuppose sentience
This is strange to me, given it is often explicitly stated that sentience isn't required at all, in fact, intelligence is often defined as some kind of optimization process to avoid mixing different concepts (I am not giving one definition here, just a reference toward the kind of definitions I mean).
> /agency
Sentience and agency are two completely different things (in the same way intelligence and sentience are).
But our actual models actually have already some level of agency, and still more with good scaffolding.
I think that time horizons mostly give an idea of the level of agency they have (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) is measure of agency, and it is progressing.
> We don't understand what agency or sentience consist of besides intelligence.
We don't understand sentience, but we don't need it.
I think we understand agency.
Yudkowsky does indeed address this. Happy reading!
Given that most people who know about AI dislike it and expect it to personally damage their life in some way, the argument doesn't have to be that persuasive. Unlike the energy austerity advocated by global warming alarmists, an AI ban isn't asking you to make any sacrifices at all. If you expect AI to be bad for you personally (as I and apparently a large majority of the country agree) this is 100% upside. I lose badly in every single plausible AGI/ASI scenario, and so do most people like me. The cataclysm is bad, the post-scarcity society is a horrible communist surveillance state, and even the unlikely scenario where it hits some wall in 5 years is probably enough to disrupt the knowledge-worker economy to my ruin.
The real hurdle is to get people who want to "just stop this" to feel that they have the power to do so and the moral right to do so. People in America have a cultural aversion to the government telling people they can't build something. It seems a little un-American, for those who care about consistency and our traditional values, to tell somebody "you can't make this computer program, even though people want to buy it and it isn't currently doing anything malicious". Even if the true reason most people hate or fear AI is that it will disrupt and damage their own personal fortunes, or make life slightly worse in a lot of small annoying ways, that is not considered an acceptable reason in the USA to argue for banning an industry. You're supposed to just suck it up and gracefully accept your fate as this generation's losers when something transformative comes along. Whereas a new tech being akin to a weapon of mass destruction *would* be seen as a valid reason to pump the brakes. So the persuasive task here is to make ordinary Americans who already are inclined to dislike AI, but who don't want to sound like sore losers or commies, able to say they think it should be banned in a language that's acceptable to their peers.
> the post-scarcity society is a horrible communist surveillance state
To be clear, I do absolutely expect that powerful AGI would empower the surveillance state quite a bit, but I don't get the "communist" bit. To me the main bad thing of communism is that in practice, because it fails to achieve its production goals and justify its legitimacy without proper incentives, as well as generally ruining stuff by trying to micromanage things without having the full computational power to do it right, it ends up becoming more and more authoritarian to preserve its existence. None of those things would be an issue in a post-scarcity society in which production was handled by machines, per se. It's hard to even describe it as "communist" at all.
There would be a state that was largely indistinguishable from the system of production, whether it seizes the means of production or simply directs them via AGI it would still be a managed economy. That the 5-year plans actually "work" wouldn't make it any less so. I grant that if humans selling their labor is not a central feature of the economy, because their labor is worthless post-automation, it makes a bit less sense to call it communist. Communism was not intended as an anti-work idea. But I'm not sure what other term describes a totalitarian-egalitarian system. If everyone's on UBI, in a world where there are near-zero opportunities for humans to do valuable labor for other humans to get ahead, it's going to make everyone more or less materially equal to each other. People experience their wealth relative to those around them, not in absolute terms, so flattening the world into a bunch of people on welfare is going to feel terrible people who were successful in the world where human minds mattered. The equality is, in my opinion, the worst part of such a society, you have no way to differentiate yourself from others in skill or ability in any way that's meaningful.
I take you to be suggesting it would be less authoritarian because its existence is less precarious than previous such regimes? I'm not so sure of that, suppose it depends how you arrive at this point and whether anything below the state has any meaningful agency, whether inferior AIs can still do substantial damage in the hands of rebels, that sort of thing. It needlessly complicates everything to imagine the exact details, it's enough to say that the post-scarcity outcomes are all imagining a world where the state or its functional replacement is managing society and divvying up the resulting abundance. If it's also collecting data on everyone and everything constantly for safety/security reasons, this is functionally an egalitarian-totalitarian state, it may not force you to eat soybeans or tell you what color shirt to wear, but there is no life outside the state in that world.
My point is that I find it absurd to deem "everyone gets what they need to live" or "production is optimized to maximally fit everyone's necessities" to be bad in and of itself.
Free markets are supposed to accomplish the best possible approximation to that! Communism says "no, free markets end up being bad, we can do better if we just sit at a table and plan it out". And turns out, no, you actually can't. But in the hypothesis in which a superintelligent AI exists that can assess, predict and satisfy everyone's needs better and faster than the collective intelligence of the market, then obviously it would be an even better alternative.
The feeling of purposelessness and whether it can be avoided simply by people diving into what they enjoy, or if knowing that AIs exist that can do anything better anyway ruins that, is its own problem, but it's unique to this special case. Communism didn't inherently have that problem. Purpose in communism can be more shared, but there's certainly a lot of it.
> The equality is, in my opinion, the worst part of such a society, you have no way to differentiate yourself from others in skill or ability in any way that's meaningful.
This only makes sense if for you the only meaningful metric of difference is difference in productive capacity. I can definitely realise I'm very different from someone else just based on the fact that I like different things, have different skills etc. Some of those skills are very economically useless, people can be very proud of those too.
> I take you to be suggesting it would be less authoritarian because its existence is less precarious than previous such regimes?
I definitely think it would help, though again - I don't *trust* at all the idea of a post-scarcity AGI society, but more for other reasons that involve the trajectory to get there, alignment etc. If you told me you can plop me in a society in which magically we already have ASIs that are both fully independent and fully aligned (so not just the boot of a few powerful humans stomping on our faces), then I'd say that doesn't need to be authoritarian at all in principle.
> If it's also collecting data on everyone and everything constantly for safety/security reasons, this is functionally an egalitarian-totalitarian state, it may not force you to eat soybeans or tell you what color shirt to wear, but there is no life outside the state in that world.
There is practically no life outside the state now; again the question is what happens with that info. An aligned ASI that genuinely only uses it internally according to some very strict libertarian principles and then deletes it would in fact more free that our current nosy nanny states.
Again, it's just an argument of principle. I assign to such states a probability of approximately 0, given our starting point and trajectory. I just push against the notion that somehow "everyone has everything they need to live = communism = bad". The point of freedom isn't random suffering from deprivation and the bad of communism isn't that it theoretically wanted to alleviate that. It's all the ways in which it then fails and refuses to accept its failure given the actual constraints of the real world and of real human behaviour.
> Some people say “The real danger isn’t superintelligent AI, it’s X!” even though the danger could easily be both superintelligent AI and X. X could be anything from near-term AI, to humans misusing AI, to tech oligarchs getting rich and powerful off AI, to totally unrelated things like climate change or racism.
To, amusingly, X.
Personally, I fear, to paraphrase Asimov, not AI, but lack of AI. I guess that makes me closer to accelerationists (and indeed, HPMOR to me was a call to acceleration, more than anything!). I think humanity is reaching a crisis point where some kind of singularity has to happen. It might not necessarily be an AI singularity - just some thing/event that Changes Everything. But a global nuclear war, for example, is also such thing/event, and nuclear warheads, unlike AI, have exactly 0% chance of alignment.
I don't mean that the only choice is between a global war and AI, but rather that the accumulating number of unsolved - and presently unsolvable - problems will reach a critical threshold, after which humanity will either find a way to solve them - possibly with the help of superhuman AI, or some other new technology - or face the consequences, which might wipe it outright (or, at best, just bring on some kind of dark ages).
For me, this means that banning ANY kind of research right now is the worst thing possible - indeed, formal and infromal bans on stem cells research, cloning and genetic enchantments are already making me MORE pessimistic about the future.
The key point is that nobody knows how to make a superhuman AI that doesn't try to kill you, which means that the "superhuman AI solves all your problems" outcome is basically a fake option you can't actually access (at least, not remotely soon).
> nobody knows how to make a superhuman AI that doesn't try to kill you
And nobody ever will know this, because it's impossible to guarantee in the long run. AI, once created, will learn and grow and even if it starts super-aligned with humanity, it might decide to destroy it at a later point. I don't think it's a good reason to never build superhuman AI. The more we do science, the more dangerous technologies we discover. Should we, for example, never strive for space, because space-based weapons can wipe all life on Earth with relative ease? Should we avoid discovering new, more powerful energy sources, because any energy source is a potential weapon?
And it's not like we tried building some superhuman AIs, and each one tried to destroy us, so I don't see how ""superhuman AI solves all your problems" outcome is basically a fake option" follows from the above statement. We don't know how to build superhuman (or even human-level) AI, period. Maybe it will try to destroy us, maybe it will solve all of our problems, maybe it will do both at the same time, or neither. We can't even begin to know which outcome is more likely, until we begin to approach the goal. We're not nearly there yet, and "solves our problems" option is very much on the table. Maybe it will even prove impossible to build superhuman AI, but sub-human AIs might become a great enough boon for society. That would require - at the very least - a lot of optimization work, to make them less energy and computation greedy.
And so: on with AI research, I say.
>I don't think it's a good reason to never build superhuman AI.
I'd rather not be killed by AI, thanks.
>Should we, for example, never strive for space, because space-based weapons can wipe all life on Earth with relative ease?
By the time that becomes easily feasible, we'd have enough of a space presence that humanity wouldn't be destroyed.
>Should we avoid discovering new, more powerful energy sources, because any energy source is a potential weapon?
I'd certainly require that particle physics experiments that exceed the power of natural cosmic ray impacts be done in space, so as to avoid the "Earth destroyed by black hole" end-of-the-world scenario. Situations in which someone could destroy the Solar System (e.g. by sending the Sun nova) while we're all still in it would also be best avoided.
>Maybe it will try to destroy us, maybe it will solve all of our problems, maybe it will do both at the same time, or neither. We can't even begin to know which outcome is more likely, until we begin to approach the goal. We're not nearly there yet, and "solves our problems" option is very much on the table.
The neural-net alignment problem is basically a case of the halting problem, and not a very special case (unless you're smarter than the neural net, which in the case of superhuman AI you're definitionally not). The halting problem is proven to be unsolvable in the general case. Any scenario based on having an aligned superhuman neural net is, as far as I can see, closed off by that result (well, unless you build an aligned superhuman AI of some other sort first that's smarter than your neural net, but see below).
GOFAI is at least 30 years and probably over 50 from delivering superhuman AI. Nobody thinks this can be done in the near future. There is a reason the money all jumped ship to neural nets despite them being mad science. Uploads are in about the same place, or even worse.
Add all that together and P(aligned superhuman AI in next 30 years) ~= 0.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
But this is just an example for a "new technology deblocking some limits from our current LLMs", just like the CoT where !
Which is a big part of what Yudkowsky is worried about, not only current scaled technologies, but improvements we can discover at any moment (when a lot of smart persons are trying to find them).
Also, it is very probably not an example they think as a good one at all ! Yudkowsky is correctly worried about information hazard, and if he thought it was a credible way to move forward toward AGI, he would absolutely not describe it in a book.
I think it is unfair to consider it a deus ex machina, when it is just an illustration of a general idea.
I helped a little with the draft text of IABIED, and imo, the "implausible sci-fi scenario" criticism you're levying is good literary criticism, but bad psychological impact analysis
The aforementioned implausible sci-fi scenario is like a Christmas tree, a structure to be used as an excuse on which to hang any and all of the catchiest and shiniest things you can find. Think throwing spaghetti at the wall to see what sticks
What are people going to remember, in the days after they finish this book? Or in the months that come?
Over and over, they shall be reminded of this plot point-rich story, as life pings its points and twists
And we don't know *which* of these little insight porn-esque possibilities is going to stick in the mind of any given reader, but we don't have to. Among so many, 1 is bound to
Lastly, as the whole book's main thing is being as accessible and concise as possible, it's good feng shui to have the implausible sci-fi scenario. An efficient burst of color and variety balances out the clean sweeps of the rest of the book
I’m someone for whom the enormity of this issue has only recently become apparent. I’m not sure about AI killing humanity, probably because I haven’t seen it happen, but nominally because I’ve never seen anything close to a strong reply to doomer arguments. That might sound weird, but given that people smarter than me who have a vested interest in the world surviving are racing ahead with AI progress, it seems like they consider these issues so trivial as to be worth ignoring. It would be nice to see each of the major AI companies (or better the leaders or major stakeholders) reply to points raised in a serious way. Probably won’t happen.
That’s one thing. And the other is this - as someone who is unsure of certain doom, there’s a serious opportunity cost to stopping AI research. If doom isn’t real and AI research is stopped, many people alive today will be dead because the AI wasn’t around in time to save them. From disease, aging, etc. Personally have witnessed parents of children with rare disease excited about AI for changing otherwise gloomy prospects by enabling medical advances.
Am I really qualified to be politically vocal about this? Why should I trust Eliezer and not Sam, Dario, and Demis?
Are Yudkowsky & Soares intending to give a deductively sound argument?
In classical first-order logic (FOL), their title-claim "If anyone builds it, then everyone dies" is false, bc there are counterfactuals in which someone builds unaligned ASI & not everyone dies. In those counterfactuals, Y&S's antecedent is true (someone builds it) & their consequent is false (not everyone dies.)
So, if they are intending to give a deductively sound argument, then they've failed bc their argument has a false premise. Their claim "if anyone builds it, then everyone dies" is false.
Are Y&S intending to give an inductively strong argument?
Inductive arguments are logically invalid. The truth of inductive premises does not guarantee the truth of the conclusion. So, their inductive argument would be invalid. They provide no statistical frequencies. So, their inductive argument seems weak.
They've given either an unsound deductive argument or a weak inductive argument.
Agree or disagree?
“If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?”
I think the probability of that happening before cataclysm is upon us is very close to zero. First, as is painfully obvious, even in a democracy we don’t all agree and those who lose the vote may not be willing to accept it. Second, I’m not an expert, but it seems like no GPU monitoring scheme can be 100% successful—e.g. prohibited high-end GPUs are flowing from the US to China through black markets; between mass production and distribution, the complexity of global supply chains, the lack of an inherent tracing feature (like radioactivity), the challenges of coordinating international tracking and monitoring, and other problems, GPUs will be leaking out of the system. Third, we don’t all live in a democracy and it seems unlikely that the governments of all countries will agree to (and actually abide by) a treaty to ban further AI progress—that takes trust, and, as you rightly point out, a very high level of shared confidence that it will actually kill us, neither of which exist. And even if all governments signed on, non-governmental organizations (terrorists, anarchists, etc.) would likely (see the second point above) be able to continue the research and thereby gain power to wreak havoc. And then, "if AI is outlawed, only outlaws will have AI".
To me, the answer is to try to align at least the strongest, most dominant models with human flourishing (which I believe is already being done but must be a top priority). Adding specificity to that idea, Geoffrey Hinton recently suggested that our only way of surviving superintelligence is to build maternal instincts into the models, saying, “The right model is the only model we have of a more intelligent thing being controlled by a less intelligent thing, which is a mother being controlled by her baby”. While the essence of maternal instinct may be universal, the practice of it differs greatly between and even within cultures, making alignment with it difficult—I’m not an expert, but I believe that programmers and trainers need specific, completely agreed upon elements and attributes towards which to train alignment.
So I suggest an alternative concept—really an extension of Hinton’s idea--valued in virtually every culture: selfless, unconditional love—the Greeks called it agape but, again, there is a name for it in every culture. I’m not an expert, but I believe that it’s specific and universal enough that the critical elements could be listed (e,g, Unconditional goodwill, Respect, Patience, Helpfulness, Hopefulness, Fairness, etc.) and agreed upon (indeed, some are already in the training specs being used--https://model-spec.openai.com/2025-09-12.html) and the actions to train for alignment could be taken. And it would be targeting alignment to something that transcends and includes the highest aspiration of all people and cultures—and would keep us safe, and even flourishing, if alignment were actually successful.
The risk, as Yudkowsky and Soares point out, is that, even as we are seeking to align them with human flourishing, the models are deceptively scheming to gain power and dominate or even kill us all. The bad news is that researchers have already witnessed models scheming (https://www.antischeming.ai/) and alignment faking (https://www.anthropic.com/research/agentic-misalignment). The good news is that they understand the existential risk (https://safe.ai/work/statement-on-ai-risk) and are working (e.g. https://model-spec.openai.com/2025-09-12.html and https://www.anthropic.com/news/core-views-on-ai-safety) to reduce or eliminate it. Those of us inclined to must pray that they succeed.
>Third, we don’t all live in a democracy and it seems unlikely that the governments of all countries will agree to (and actually abide by) a treaty to ban further AI progress—that takes trust, and, as you rightly point out, a very high level of shared confidence that it will actually kill us, neither of which exist.
If a few little nations refuse to sign on, we can invade and occupy them. We need the great powers, but we don't need *every* country to agree.
"did you know 66% of Americans have never used ChatGPT, and 20% of Americans have never even heard of it?" what is your source?
https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/?utm_source=chatgpt.com shares both statistics and more
“If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it?”
No.
Because China.
The world is not “a democracy”, even if one accepted the premise that “we” are one one of those things and could “just stop”.
>If all of this sounds wishy-washy to you, I agree - it’s part of why I’m a boring moderate with a sub-25% p(doom) and good relations with AI companies.
I have to say I still don't understand this. The inside view's a slam-dunk.
1. As Yudkowsky says, neural nets are grown, not crafted. You can't program into them "don't rebel against humans"; you have to, at best, train against examples of "rebel against humans".
2. You can't use AIs that rebelled as negative examples to train against, because that kills you.
3. You can't read the AI's mass of matrices, when it's smarter than you, and tell whether it will try to kill you if run in order to train against *that*. "What will this code do when run?" is the halting problem, which is unsolvable in the general case. "The code is dumber than you" and "you wrote the code" are special cases that are somewhat tractable; "a giant mass of inscrutable matrices that nobody designed" is not. This is leaving aside the fact that you *can't even fully specify* the outputs which are "trying to kill you".
4. So, uh, what is the story for how this is even *supposed* to work?
Yes the book is bad. They don't make any actual argument. It's pretty embarrassing
I've also reviewed on my AI blog and shown it's empty:
oscarmdavies.substack.com/p/000016-on-yudkowsky-and-soares