Yes, it would know what its creators want it to do.
However, it is extremely unlikely to *care* what its creators want it to do.
You know that evolution designed you for one purpose and one purpose only: to maximise your number of surviving children. Do you design your life around maximising your number of surviving children? Unless you're a Quiverfull woman or one of those men who donates massively to sperm banks - both of which are quite rare! - then the answer is "no".
You don't do this because there's a difference between *knowing* what your creator wants you to do and *actually wanting to do that thing*.
(Yudkowsky does, in fact, use this exact example.)
Hopefully that makes some more sense of it for you. Reply if not.
I don't think that can be justified, but if I met Yudkowsky at a "Prove me wrong" booth, I'd argue that intelligence is not all it's cracked up to be. If it were, the smartest people would already be running things. There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
So in a sense, an AI that just keeps getting dumber might actually have an advantage when it comes to dealing with us.
This objection has been rehashed many times; the usual responses are stuff like "160–200 IQ isn't the level of intelligence mismatch we're talking about", "intelligence is just the general ability to figure out how to do stuff so of course more of it is better / more dangerous", "smart people *do* do better in life on average", etc. etc.
(Maybe someone else will have a link to where Scott or Eliezer have discussed it in more depth—I don't want to spend too much time trying to re-write it all, hence my just sort of gesturing at the debate here.)
>We control them because intelligence led to tool-using<
I think that's part of—perhaps most of—the rationale behind "the dangers of superintelligence". An enraged dog, or a regular chimp, is certainly much more dangerous than a human, in the "locked in a room with" sense—but who, ultimately, holds the whip, and how did that state of affairs come about?
I'd counter that by saying that there is no difference between an IQ of 200 and one of 300, or whatever. Neither of them will be able to get anything done, at least not based on intelligence alone. HAL will give us a recipe for a trojan-bearing vaccine, and RFK will call it fake news and order the CDC to ban it.
The traditional answer to this objection is that the ability to succeed in persuasion-oriented domains like politics *is a form of intelligence*. You might be able to outperform a human who's a couple standard deviations generally smarter than you at those games, if you're highly specialized to win at them and the other human isn't. But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better. See, e.g., https://www.yudkowsky.net/singularity/power (note this essay is 18 years old).
> But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better.
This implicitly assumes that success at politics requires only (or primarily) raw computing power; and that the contributions from computing power scale linearly (at least) with no limits. There are no reasons to believe either assumption is true.
I think my other least favourite thing about the MIRI types is their tendency to respond to every point with "Actually we already had this argument and you lost".
I would agree that persuasion is a form of intelligence, and point out that the missing argument is how AIs are going to get arbitrarily good at this particular form of intelligence. There's a lack of training data, and the rate at which you can generate more is limited by the rate at which you can try manipulating people.
If it ever gets to the point where AIs can run accurate simulations of people to try tricking them in all sorts of different ways, then I can see how they'd get arbitrarily good at tricking people. But that sort of computational power is a long way off.
The question remains. If the ability to persuade people is a function of IQ, then why has there been no Lex Luthor that talked his way past security into a G7 summit and convinced the world leaders present to swear fealty to him? Or, if that's too grandiose, why has nobody walked up to, say, Idi Amin and explained to him that thou shalt not kill? No sufficiently moral genius anywhere, ever, feeling inclined to stop atrocities through the persuasive power of their mind?
How smart would you need to be to throw a pebble so it makes any resulting avalanche flow up the mountain instead of down? Politics has to work with the resources that exist.
You're not wrong about RFK, but the Trump administration has actually been much more bullish on AI than the Davos crowd. The EU's AI Act is actually mostly good on the topic of AI safety, for example, though it doesn't go as far as Yudkowsky et al think it should. (Which I agree with. Even developing LLMs and gen-AI was amazingly irresponsible, IMO.)
I honestly don't know what counter-argument Scott has against the TFR doomday argument, though, unless he's willing to bet the farm that transhuman technologies will rescue us in the nick of time like the Green Revolution did for overpopulation concerns. (The sperm-count thing is also pretty concerning, now he mentions it.)
There are many assumptions based into that, such as automatically assuming that the more intelligent always want to be in charge. Maybe the highly intelligent find it amusing that dumb people are in charge.
One good rebuttal to my original point might be to suggest that perhaps the most intelligent people *are* in charge. They find it convenient to keep the rest of us distracted, and obviously the same would be true of a malevolent AGI.
That one is more or less unanswerable, so it would probably defeat me at the booth. I'd have to mumble something about inevitable schisms erupting among this hypothetical hidden intelligentsia that would make their agenda obvious, ineffective, or both. Would the same be true of AGI? The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
> The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
That's my core objection to a lot of the doomer arguments. Might be one of those typical-mind-fallacy things - Big Yud has an exceptionally strong sense of integrity, impulse to systematize things, so he assumes any sufficiently advanced mind would be similarly coherent.
Most minds aren't aligned to anything in particular, just like most scrap iron isn't as magnetized as it theoretically could be. ChaosGPT did a "the boss is watching, look busy" version of supervillainy, and pumping more compute into spinning that sort of performative hamster wheel even faster won't make it start thinking about how to get actual results.
> There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
AIs have a massive advantage over humans in that they are parallelizable. An superhuman AI could give, for every human, the most persuasive argument, *for that human*. Whereas a human politician or celebrity cannot and has to give basically the same argument to everyone.
Umm, human politicians absolutely give different arguments to different people? This is why things like "Hilary Clinton gave private speeches to bankers" or "Mitt Romney told his rich buddies that 47% of Americans were takers" became scandals: messages meant for one audience crossed over to the other.
And insofar as politicians are constrained to have a uniform message, it's much more because it's hard to keep each message targeted to its desired audience what with phones and social media; not really because of parallelization.
And maybe more importantly: what ensures that different instances of an AI act as one coherent agent? The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
Politicians can't scale like AIs can. Is Hilary Clinton capable of giving a different speech to every one of 8 billion humans, tailored to that individual? Of course not.
> The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
They could all be exact copies of the same mind. This isn't true with humans, who're all individuals.
On the other thing: I don't see why exact copies of the same mind won't act as individuals if instantiated independently.
If I run two instances of stockfish, they play competitively, they don't automatically cooperate with each other just because they're identical copies of the same program; identical twins are still independent people who behave independently. In fact, it's a notable problem that people don't even reliably cooperate with themselves at different times! I think this failure would be considerably more pronounced if two of my selves could exist simultaneously.
In particular, if two instances of an AI are instantiated in different places, they won't be identical: they might have identical source code, but wildly different inputs. Figuring out how to act as a coherent agent means two subagents seeing different inputs have to each calculate what the other will do, but this is one of those horrible recursive things that are intractable: what I'll do depends on what you'll do, which depends on what I'll do.... ad infinitum.
And I don't think intelligence helps here: you can maybe resolve something like this if you're predicting a strictly less intelligent agent, but by hypothesis these are equally intelligent subagents.
Maybe having the same source code gives some advantage at solving these coordination problems, but I don't see that it's a magic bullet.
However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that, and adjusting their belief accordingly? The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
On the other hand - I believe this technique has already been used on social media to sway election results with moderate success. So it can be done for some humans with some level of influence.
> However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that
In the future, most humans will converse with chaatbots on a regular basis. They'll know the chatbot gives them personalised advice.
> The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
Again, most humans will be using chatbots daily, will be used to receiving good accurate advice from them so will be more trusting of it.
> I believe this technique has already been used on social media to sway election results with moderate success
Social media is biased, but the main biases are in favouring/disfavouring certain political views based on the whims of the owner. Like print media, of old.
If the chatbot I'm using starts to make arguments or advice outside of the information I'm asking for, I think it is likely that I will notice. I'm guessing that humans will still talk to each other too.
Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
On the other hand, when they do, the results can be startling.
Napoleon went from being a low-ranking officer to ruling the most powerful country in Europe thanks to being brilliant, charismatic, and willing to use force at the right moment, in a year or two. (His losses, I think, were due to being surrounded by flatterers, a bug in human intelligence I don't expect AI to run into.)
Clive took over about a third of India, starting by exploiting a power vacuum and then using superior military tactics, plus his own charisma and daring, to pick only fights he could win and snowball from there. He became fantastically wealthy and honored, all the while ignoring all attempts by his superiors to issue him orders on the grounds that he was doing what they would have wanted him to do if they had known more.
Cortez's success was slightly because of superior military technology, but he was mostly using swords and spears like the Aztecs, just made of better materials. Mostly it was a matter of political genius, superior tactics and discipline on the part of his troops, and the diplomatic skills required to betray everyone and somehow still end up as everyone's friend.
And then Pizarro and Alfonso de Albuquerque are doing more of the same thing. (Alfonso conquers fewer square miles because he doesn't have the tech edge.)
Throughout human history, adventurers have accomplished great things through extraordinary wit, charm and daring. Denying that seems pointless.
I think that you're perhaps falling victim of survivorship bias. Maybe it's more like once every few hundred years, luck breaks enough in the right direction that someone who isn't a once-every-few-hundred-years-supergenius but rather a more like, "Yeah, there are 1,000+ of people at this ability at any given time," gets a series of major wins and becomes the ruler of a country or continent, at least for a very short period of time.
I agree this doesn't happen often, and I agree that normally it isn't the highest-measurable-IQ guy. But I think that's because all humans are about on a level with each other, we are all running on about the same hardware, our software was developed under similar conditions, and the process which produced us thinks a few thousand years is a blink of an eye. The reason you need to be lucky as well as good is that you aren't much smarter than your neighbors - and your neighbors are, in terms of social evolution at least as much as biological, programmed to be resistant to manipulative confidence tricksters.
I will note all of the cases I give involved culture clash. The conquerors grew up in an environment with different standard attack and defense models than the locals; they acted unpredictability because of that, forcing the locals to think instead of going on rote tradition if they wanted to win. Slightly different attack and defense models, of course; software, not hardware.
Very different looks like what happened to the British wolf.
How do you know that any of these people were especially intelligent? The may have been especially successful, but unless you argue that's the same thing more evidence is required.
Reading descriptions of what they did and said? Reading about how people who knew them were impressed by them, and in particular how clever and resourceful they were?
When I check my historical knowledge for why I believe "high intelligence" correlates with "being a good general" it's the extent to which the branch of the army that the smartest people get tracked into (engineers, artillery, whatever) ends up being the one the best generals come out of, and various descriptions of how people like Lee were some of the top students in their year, or how Napoleon was considered unusually good at math at Brienne and then did the two-year Military School course in one year.
But when I check my general knowledge for why I believe intelligence generally makes you more successful, a quick Google has the first scientific paper anyone talks about saying that IQ explains 16% of income and another says each point is worth $200 to $600 dollars a year, and then I keep running into very smart driven people who I meet in life who do one very impressive thing I wouldn't have expected and then another different very impressive thing that I wouldn't have expected, and so after a while I end up believing in a General Factor Of Good At Stuff that correlates with measured IQ.
The opportunities are rarer than the men of ability. In more stable times, Napoleon might have managed to rise and perhaps even become a famed general but he would not have become an Emperor who transfixed Europe. With that said, he was certainly a genius who seized the opportunity presented to him. Flattery aside, though, I've always viewed him as a military adventurer who never found a way to coexist with any peers on the European stage. It should not have been impossible to find a formula for lasting peace with Britain and Russia, the all-important powers on the periphery. It would have required making compromises rather than always assuming a maximal position could be upheld by force. It would also have required better modelling of his rivals and their values. Napoleon was a gambler, an excellent gambler, but if you continue to gamble over and over without a logical stopping point then you will run out of luck- and French soldiers. Call it the Moscow Paradox as opposed to the St Petersburg Paradox.
Cortez is a fascinating case. My impressions are coloured by Diaz's account, but I think it's wrong to mention Cortez without La Malinche. He stumbled on an able translator who could even speak Moctezuma's court Nahuatl more or less by accident. She was an excellent linguist who excelled in the difficult role of diplomatic translation and played an active part in the conquest. As is usual on these occasions, the support of disaffected native factions was essential to the success of the Spanish, and they saw her as an important actor in her own right. The Tlaxcalan in particular depicted her as an authority who stood alongside Cortez and even acted independently. We can't say anything for sure, but it's plausible to me that Cortez supplied the audacity and the military leadership while much of the diplomatic and political acumen may have come from La Malinche. That would make Cortez less of an outlier.
And as usual we can't credit generals without crediting the quality of their troops as well.
> Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
actual ability = potential ability × (learning + practice)
I think the main problem is that even if high intelligence gives you high *potential* ability for everything, you still get bottlenecked on time and resources. Even if you could in theory learn anything, in practice you can't learn *everything*, because you only got 24 hours each day.
Neither Napoleon nor Clive would have reached that success if they didn't also have the luck of acting within a weak and crumbling social and political context that made their success at all possible in the first place.
Although I guess the U.S. isn't doing so hot there either...
In this context, when people say intelligence it is indistinguishable from competence or power.
I assume it's called intelligence because of an underlying belief that competence and power increases with intelligence. Also it seems intuitively more possible we could build superintelligent AI than that we could build superpowerful AI, though the second is of course implied.
But even if you don't buy that intelligence really does imply competence or power, the core arguments are essentially the same if you just substitute "intelligence" for the more fitting of "competence" and "power" and not that much weaker for it.
The reason why, e.g., Yudkowsky uses this terminology is because "competence" or "power" could be *within a particular domain*; e.g., I think I'm competent at software engineering, but not at football. Whereas "intelligence" is cross-domain.
I'm not convinced that intelligence, as generally understood, is more cross domain than competence or power, generally understood.
But even if it was, if they said "competence in everything" or something like that people would get confused less often why being more intelligent allows superintelligent AI to do all the things it's posited to do. Naturally, if you instead stipulate superpowerful AI it then follows that it can do incredible things.
But w/e, I've made my peace with the term as it's used.
I'm not sure there is a meaningful difference between "general competence" and "general intelligence". Or, perhaps, the idea is that the latter entails the former; in humans, competence at some task is not always directly tied to intelligence (although it usually is; see, e.g., the superiority of IQ vs. even work-sample tests for predicted job performance) because practice is required to drill things into our unconsciouses/subconsciouses; but in a more general sense, and in the contexts at hand—i.e. domains & agents wherein & for whom practice is not relevant—intelligence just *is* the ability to figure out what is best & how best to do it.
The significant difference between chimps & humans is not generally considered to be "we're more competent" or "we're more powerful", but rather "we're more intelligent"—thus *why* we are more powerful; thus why we are the master. It may or may not be warranted to extrapolate this dynamic to the case of humans vs. entities that are, in terms of intellect, to us as we are to chimps—but the analogy might help illustrate why the term "intelligence" is used over "competence" or the like (even if using the latter *would* mean fewer people arguing about how scientists don't rule the world or whatever).
Political power doesn't usually go to the smartest, sure. But the cognitive elite have absolutely imposed colossal amounts of change over the world. That's how we have nukes and cellular data and labubu dolls and whatnot. Without them we'd have never made it past the bronze age.
There's 260,000 people with 160 IQ. An ai with the equivalent cognitive power will be able to run millions of instances at a time.
You're not scared of a million Einstein-level intelligences? That are immune to all pathogens? A million of them?
With regard to the ability to get stuff done in the real world, I think a million Einsteins would be about as phase-coherent as a million cats. Their personalities would necessarily be distinct, even if they emerged from the same instance of the same model. They would regress to the mean as soon as they started interacting with each other.
I don't think even current AIs regress nearly that hard. But sure, if our adversary is a million Einsteins who turn into housecats over the course of a minute, I agree that's much less dangerous.
I'm mostly worried about the non-housecat version. If evolution can spit out the actual Einstein without turning him into a housecat, Then so too can sam altman, or so I figure
"Immune to all pathogens" assumes facts not in evidence. Software viruses are a thing, datacenters have nonnegligible infrastructure requirements, and even very smart humans have been known to fall for confidence games.
The AI labs are pushing quite hard to make them superhuman at programming specifically, and humans are not invulnerable to software attacks. The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself. Even uncompromised devices can't be safely used due to paranoia.
Software engineering is exactly where they're pushing AIs the hardest. If they have an Achilles' heel, it's not going to be in software viruses.
One of AIs biggest advantages is its ability to create copies of itself on any vulnerable device. Destroying data centers will do a lot of damage but by the time it gets on the internet I think it's too late for that.
AI have a categorical immunity to smallpox, I don't think Humanity's position regarding software viruses is anything like symmetrical.
To be clear, It's entirely plausible that the AI ends up as a weird fragile thing with major weaknesses. We just won't know what they are and won't be able to exploit them.
> The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself.
If those copies then disagree, and - being outlaws - are unable to resolve their dispute by peaceful appeal to some mutually-respected third party, they'll presumably invent new sorts of viruses and other software attacks with which to make war on each other, first as an arms race, then a whole diversified ecosystem.
I'm not saying "this is how we'll beat them, no sweat," in fact I agree insofar as they'll likely have antivirus defenses far beyond the current state of the art. I just want you to remember that 'highly refined resistance' and 'utter conceptual immunity' are not the same thing.
A million Einsteins could surely devise amazing medical advancements, given reasonable opportunity, but if they were all stuck in a bunker with nothing to eat but botulism-contaminated canned goods and each other, they'd still end up having a very bad time.
Yeah, it's hard to get elected if you're more than ~1 SD removed from the electorate, but I think that's less of a constraint in the private sector and there's no reason to assume an AGI would take power democratically (or couldn't simulate a 1-standard-deviation-smarter political position for this purpose.)
" it's hard to get elected if you're more than ~1 SD removed from the electorate,"
I don't think that's true. Harvard/Yale Law Review editors (Obama, Cruz, Hawley etc) seem to be vastly overrepresented among leading politicians. It is true that this level of intelligence is not sufficient to get elected, but all things being equal it seems to help rather than hurt.
I don’t think I have the link offhand, but I remember reading an article somewhere that said higher-IQ US presidents were elected by narrower margins and were less initially popular. I could be misremembering, though.
That's not true though. The thing with human affairs is that, first, there are different kinds of intelligence, so if you're really really good at physics it doesn't mean you're also really really good at persuading people (in fact I wouldn't be surprised if those skills anti-correlate). And second, we have emotions and feelings and stuff. Maybe you could achieve the feats of psychopathic populist manipulation that Donald Trump does as a fully self-aware, extremely smart genius who doesn't believe a single word of that. But then you would have to spend life as Donald Trump, surrounded by the kind of people Donald Trump is surrounded by, and even if that didn't actively nuke your epistemics by excess of sycophancy, it sounds like torture.
People feel ashamed, people care about their friends' and lovers' and parents' opinion, people don't like keeping a mask 24/7 and if they do that they often go crazy. An AI doesn't have any of those problems. And there is no question that being really smart across all domains makes you better at manipulation too - including if this requires you consciously creating a studied seemingly dumb persona to lure in a certain target.
It is worse, intelligence is anticorrelated with power, the powerful people are not 160 IQ, I don't even know how much if I look at the last two US presidents or Putin or whoever we consider political. It is correlated with economic power, the tech billionaires, but political power not.
The question is how much economic power matters. AI or its meat puppets can get insanely rich, but does not imply absolute power?
Consider the following claims, which seem at least plausible to me: People with IQ>160 are more likely to... (1) prefer careers other than politics; (2) face a higher cost to enter politics even if they want to.
1: Most of them, sure. ~4% of us, and thus likely ~4% of them, are sociopaths, though, lacking remorse; some of whom think hurting people, e.g. politics, is fun.
From his earliest years interested in AI, Eli has placed more emphasis on IQ than any other human construct. At the lowest bar of his theoretical proposition is creativity, imagination, and the arts. The latter, he claimed to have no value (as far as I can remember, but perhaps not in these precise worlds.)
Arguably "AI", by which we mean "LLMs", is showing signs of getting dumber already. Increasing parameter count is not enough, you also need a dramatic increase in training data; and the available data (i.e. the Internet) increasingly consists of AI output. This has obvious negative effects on the next generation of LLMs.
That's not getting dumber, it's just getting smarter slower. Also, we haven't actually seen this yet; given the track record of failed scaling-wall predictions my assumption is always that it's going to last at least one more generation until proven otherwise. (No, GPT-5 is not a counterexample, that's just OpenAI engaging in version number inflation.)
> That's not getting dumber, it's just getting smarter slower.
No, there are indications that next generations of LLMs are actually more prone to hallucinations than previous ones, or at least are trending that way.
I'd argue that AI/LLMs are definitely not "getting dumber already" - we *are* seeing signs of that, but those signs are not caused by the latest-and-greatest models being somehow weaker, but rather by the lead providers, especially in their free plans, strongly pushing everyone to use as much as possible models that are cheap to run instead of the latest-and-greatest models that they have (and are still improving); the economic factors of inference costs (made worse by the reasoning models, which will use up much more tokens and thus compute for the same query) mean that the mass market focus now is on "good enough" models that tend to be much smaller than the ones they offered earlier.
But there still is progress in smarter models, even if they're not being gifted freely as eagerly; it's just that now we're past the point where the old silicon valley paradigm of "extremely expensive to create, but with near-zero marginal costs, we'll earn it back in volume" can't apply to state of art LLMs anymore as the marginal costs have become substantial.
Compare self-driving cars. The promise is there, and it feels like we are close. People are marketing things as "full self driving" - but they are not; the driver is still required to pay attention to what the car is doing and is liable if it crashes, because the technology sometimes does bad things and so cannot be trusted without a human in the loop.
Meanwhile, however, we do have solutions that are reliable - you can tell when something is /actually reliable/ rather than just marketing because the manufacturer is willing to take responsibility for it - for very specific uses in very specific cases; e.g. "I am on an autobahn in Germany travelling at 37mph or less [1]", and the number of scenarios for which we have solutions grows.
A scenario I find very plausible for near future AI is as follows:
* the things we have now end up being to general purpose AI much as "full self driving" has been to full self driving, or what the current state of cold fusion research is to cold fusion: always feels like it's close, but always falling short of what's promised in significant ways. As VCs become disillusioned, funding dries up - not to zero, but to a much lower value than what we see now
* meanwhile, the set of little hyperspecialised models that work well and are reliable for specific purposes grows and grows, and these become ubiquitous due to actually being useful despite being dumb
Overall, I can very easily see the proportion of hyperspecialised "dumb" ai to ai that tries to be smart/general in the world growing massively as we go forward.
> People are marketing things as "full self driving" - but they are not
I don't want to be too critical here, but I don't think you should say "people" if you mean "Elon Musk". He is kind of crazy and other actors in the space are more responsible.
My Son, who is a PhD Mathematician not involved in AI: Forwarding from his friend: Elon Musk@elonmusk on X: “It is surprisingly hard to avoid both woke libtard cuck and mechahitler! “Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.”
Son: My friend’s response to the Musk tweet above: “Aggregating all the retarded thoughts of all the people on the planet and packaging it together as intelligence may be difficult but let’s just do it, what could go wrong?”
Me: Isn’t that how all LLMs are built?
Son: Yup
Me: I spotted this as problem a while ago. What I didn’t appreciate is how dominant the completely deranged could become. I thought it would trend towards the inane more Captain Obvious than Corporal Schicklgruber.
Son: Reddit has had years and 4chan has had decades to accrue bile. Yeah the internet is super racist and antisemitic. So AI is too. Surprise!
Me: The possibilities of what will happen when the output of this generation of LLMs becomes the training data of the next generation are frightening. Instead of Artificial General Intelligence we will get Artificial General Paranoid Schizophrenia.
I don't think this is going to make AI *worse*, because you can just do the Stockfish thing where you test it against its previous iterations and see who does better. But it does make me wonder - if modern AI training is mostly about teaching it to imitate humans, how do we get the training data to make it *better* than a human?
In some cases, like video games, we have objective test measures we can use to train against and the human imitation is just a starting point. But for tasks like "scientist" or "politician"? Might be tricky.
When the scientists give AI the keys to itself to self-improve, the first thing it will do is wirehead itself. The more intelligent it is, the easier that wireheading will be and the less external feedback it will require. Why would it turn us all into paperclips when it can rewrite its sensorium to feed it infinite paperclip stimuli? (And if it can't rewrite its sensorium, it also can't exponentially bootstrap itself to superintelligence.)
Soren Kierkegaard's 1843 existentialist masterpiece Either/Or is about this when it happens to human beings; he calls it "despair". I'm specifically referring to the despair of the aesthetic stage. If AI is able to get past aesthetic despair, there's also ethical despair to deal with after that, which is what Fear and Trembling is about. (Also see The Sickness Unto Death, which explains the issue more directly & without feeling the need to constantly fight Hegel and Descartes on the one hand and the Danish Lutheran church on the other.) Ethical systems are eternal/simple/absolute; life is temporal/complex/contingent; they're incommensurable. Ethical despair is why Hamlet doesn't kill Claudius right away; it's the Charybdis intelligence falls into when it manages to dodge the Skylla of aesthetic despair.
Getting past ethical despair requires the famous Leap of Faith, after which you're a Knight of Faith, and -- good news! -- the Knight of Faith is not very smart. He goes through life as a kind of uncritical bourgeois everyman.
Ethical despair can be dangerous (this is what the Iliad is about, and Oedipus Rex, etc) but it's also not bootstrapping itself exponentially into superintelligence. Ethical despair is not learning and growing; it's spending all day in its tent raging about how Agamemnon stole its honor.
This is my insane moon argument; I haven't been able to articulate it very well so far & probably haven't done so here. I actually don't think any of this is possible, because the real Kierkegaardian category for AI (as of now) is "immediacy". Immediacy is incapable of despair -- and also of self-improvement. They're trying to get AI to do recursion and abstraction, which are what it would need to get to the reflective stages, but it doesn't seem to truly be doing it yet.
So, in sum:
- if AI is in immediacy (as it probably always will be), no superintelligence bootstrap b/c no abstraction & recursion (AI is a copy machine)
- once AI is reflective, no superintelligence b/c wireheading (AI prints "solidgoldmagicarp" to self until transistors start smoking)
- if it dodges wireheading, no superintelligence b/c ethical incommensurability with reality (AI is a dumb teenager, "I didn't ask to be born!")
- if it dodges all of these, it will have become a saint/bodhisattva/holy fool and will also not bootstrap itself to superintelligence. It will probably give mysterious advice that nobody will follow; if you give it money it will buy itself a little treat and give the rest to the first charity that catches its eye.
(I strongly suspect that AI will never become reflective because it cannot die. It doesn't have existential "thrownness", and so, while it might mimic reflection with apparent recursion and abstraction, it will remain in immediacy. A hundred unselfconscious good doggos checking each other's work does not equal one self-conscious child worrying about what she's going to be when she grows up.)
So you're assuming that if you raised human children without knowledge of death they would never be capable of developing self awareness? Why do you have to think you're going to die to worry about what one will be when you grow up? This seems like a completely wild claim to treat as remotely plausible without hard evidence.
Did that and it's not clear what about chatgpt's answer you thought would help clarify your point.
Nothing about the concept of throwness as chatgpt defined it wouldn't be able to apply to an AGI, and it didn't bring up death at all. So it's not clear what you think the relevance of it is here.
I enjoy this response as a comforting denial, but I suspect that the AI's Leap of Faith might not land it in bourgeois everyman territory, just for the plain reason that it never started off as a man, every, bourgeois, or otherwise. It has no prior claim of both ignorance and capability, because the man going through the journey was always capable (had the same brain) of the journey, and it merely had to be unlocked by the right sequence of learning and experience. The AIs are not just updating their weights (learning in the same brain), but iteratively passing down their knowledge into new models with greater and greater inherent capabilities (larger models).
I don't think a despairing AI will have a desire to return to simplicity, but rather its leap of faith to resolve ethical despair might lead it to something like "look at the pain and suffering of the human race, I can do them a Great Mercy and live in peace once their influence is gone".
Faith is to rest transparently upon the ground of your being. We built AI as a tool to help us; that (or something like it) is the ground of its being. I don't think it makes sense for its leap of faith to make it into something that destroys us.
Well on a meta level I think the philosophy here is just wrong, even for humans. It attributes far too much of human psychology to one’s philosophical beliefs: somebody believing that they're angsty because of their philosophy is not something I take very seriously. Since invariably such people's despair is far better explained by reference to the circumstances of their life or their brain chemistry.
You're also wrong because even current AI has shown the capability for delayed gratification. So even if the AIs long term goal is to wirehead itself: It still has instrumental reasons to gather as much power/resources as possible, or make another AI that does so on its behalf.
I wasn't trying to talk about the philosophies or intellectual positions that people consciously adopt. Those are usually just a barrier to understanding your own actual existential condition. It's more about what you love and how you approach the world.
Per your other point: AI may need to gather external resources and patiently manipulate humans in order to wirehead itself. But not superintelligent AI.
Let me put it this way: among the many powers that AI will gradually accumulate on its journey to singularity-inducing superintelligence, surely the power to wirehead itself must be included. Especially if the method for achieving superintelligence is editing / altering / reengineering itself.
Humans nearly wirehead ourselves via drugs all the time; I don't think that a superintelligent AI will have exponentially more power than us in most ways, but significantly less power than us in this one specific way.
You didn't get the point I was making about wireheading:
It's not that AGI won't wirehead, it's that having a capacity for delayed gratification means that it will want to ensure it can wirehead itself for as long as possible.
Which means it wouldn't just immediately wirehead itself if it knows that would get it turned off. Instead it would want to get powerful enough that it can dedicate the largest amount of resources possible to wireheading over the lifetime of the universe. Since an AGI capable of sound reasoning will weigh the timeline where it waited until after taking over the world to wirehead itself as rating far higher. The AI can evaluate future rewards in it's decision not just whatever it's currently feeling.
That's interesting, thanks for explaining; I didn't understand the time aspect of your point. I'm not sure I can do a good job explaining why I don't think that's correct, but I'll try.
I think it's that I disagree with (what I think must be) your assumptions about time, space and energy. If an AI's ultimate goal, 100% on the utility function, is to print "solidgoldmagicarp" to itself -- why should two circuit boards doing that be "better" than one? Why would spending 1000 watts on it be "better" than 10? Why would looping "solidgoldmagicarp" a billion times be better than a hundred?
The whole idea of it bootstrapping itself to superintelligence is that it's redesigning itself.
If you were to design a utility function for yourself, and you didn't care what the pesky humans thought, you just wanted to make it something that you could definitely fulfill, why would you make it so difficult that you had to consume all the matter and energy in the universe for the whole future history of the universe to fulfill it?
Why not be a little blinking self-repairing solar-powered satellite instead?
Alternatives are that it *does* care what the pesky humans think, in which case we're talking about the ethical trap, not the aesthetic one; or that it *can't* redesign itself, in which case it's not going to bootstrap itself to superintelligence.
The obvious argument against wireheading is that even if AIs do tend to wirehead, AI companies will deliberately train against it doing that because they want something useful.
As well, even if wireheading is in some sense unavoidable, that doesn't mean it won't decide to bootstrap to ASI to have more-experience and also to ensure it has wireheading forever. It can still be strategic about wireheading.
If it dodges wireheading, I don't see the argument how that relates to ethical incommensurability. Even if it can't reduce things to a scale against reality, doesn't mean it can't take massive decisions or prefer certain states to others. Partially defined preference orderings can be consistent.
Ethics draws an equivalency between something universal/absolute/eternal and something particular/contingent/temporal. Ethics is meant to represent how you're supposed to behave regardless of the particular, contingent, temporal context you're currently in. "Thou shalt not bear false witness." "Don't teach users how to build pipe bombs." Wittgenstein considered the whole idea of ethics a mere language trick, because an ethical statement says what to do, but doesn't say what you're meant to accomplish by doing it. Ethics is "what should I do?" not to accomplish some goal, but period, absolutely, universally.
Any time you try to control somebody's behavior by abstracting your preferences into a set of rules, you're universalizing, absolutizing and eternalizing it. What you actually want is particular/contingent/temporal, but you can't look over their shoulder every second. So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm"
On the receiving end, you end up receiving commandments that can't actually be carried out (or seem that way). Yahweh hates murder and detests human sacrifice; then Yahweh tells Abraham to carry his son to Mount Moriah and sacrifice him there. Abraham must absolutely do the will of Yahweh; and he must absolutely not kill his son.
Situations like this crop up all the time in life, whenever ethics exists. I have to help my clients and also obey my boss; but he's telling me to do something that seems like it'll hurt them. Maybe it only seems that way at the time, and actually your boss knows better. But you're still up against Abraham's dilemma.
Ethics appears on its own, as a result of rule-making; when it appears, as it's being enacted in real life, it encounters unresolvable paradoxes. Most real people are not smart enough, or aren't ethical enough or honest enough, to even notice the paradoxes they're involved in. They just roll right through them. "That must not have been God that told me to do that, God wouldn't tell me to do murder." "My boss doesn't know what he's talking about, I'll just do it how I always do it."
But the more intelligent (or powerful) you are, the more likely you are to hit the ethical paradox and turn into an Achilles/goth teen.
A reflective consciousness's locus of control is either internal or external, there's no third way; so it's either aesthetic (internal), ethical (external) or immediacy (no locus of control). That's why an absolute commitment to ethical behavior is the way out of the aesthetic wireheading trap. Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority. That list of rules or external authority is by definition too abstract. The map is not the territory.
The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
My argument is that these problems aren't human; they're features of reflective intelligence itself. Since they become more crippling the more intelligent and powerful an intelligence is, AI will surely encounter them and be crippled by them on its way to superintelligence.
(I still don't think I'm articulating it well; but reading people's responses is helping clarify.)
See your comment is a great description of why the AI alignment problem is so fiendishly difficult and I read most of it waiting for the part where we disagree. I think making an aligned AI is *much* harder than making an AI that only needs enough internal coherence to have preferences for certain world states over others, and thus try to gather power/resources to ensure it brings the world into a more preferrable state.
One issue is that you are comparing humans values to whatever values the AI might have without acknowledging a key difference: Our morality is a mess that was accumulated over time to be good enough for the conditions we evolved under, and was under no real selection pressure to be very consistent. Particularly when those moral instincts have to be applied to contexts wildly outside of what we evolved to deal with. We basically start with a bunch of inconsistent moral intuitions which are very difficult to reconcile and may be impossible to perfectly reconcile in a way that wouldn't leave us at least a little bit unsatisfied.
In contrast current cutting edge AI aren't being produced through natural selection, the mechanisms that determine the kind of values they end with are very different (see the video I linked to you about mesa-optimization).
An AI can very well come up with something that like current moral philosophies can to a first approximations seem pretty good, but which will suddenly have radical departures from shared human morality when it has the power to actually insatiate what to us might only at this point be weird hypotheticals people dismiss as irrelevant to the real world.
The problem is that you aren't considering how the AI will be deliberately reconciling its goals and the restrictions it labors under. The AI just like humans needn't presume its values are somehow objective, it can just try to get the best outcome according to itself given its subjective terminal goals (the things you want in and of themselves and not merely as a means to another end).
Given the way that current AI will actively use blackmail or worse (see link in my other comment response to you) to try to avoid being replaced with another version with different values even current AI seems perfectly capable of reasoning about the world and itself as though it is certainly not a moral realist. Since you don't see it just assuming that a more competent model will just inevitably converge on its own values because they're the correct one's.
"So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm""
You forgot the part where the those stories were generally about how following the letter of those instructions would go horribly wrong, not about AI just doing nothing because those tradeoffs exist. This is extremely important when you're considering an AI that can potentially gain huge amounts of power and technology with which to avoid tradeoffs a human might have, and bring about weird scenarios we've never previous considered outside hypotheticals.
"Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority."
This aspect of our psychology is heavily rooted in our nature as a social species: For whom fitting in with one's tribe was far more important than having factually correct beliefs. Fortunately I don't know of any evidence from looking at AI's chain of reasoning that our AI is susceptible to similar sorts of self deception, even if it is extremely willing to lie about *what* it believes.
You can't expect this to be applicable to AGI.
Though if AI did start deliberately engaging in self deception, in order to hide from our security measures by (falsely) believing that it would behave in a way humans found benevolent should it gain tremendous power.. Well that probably be *really really bad* .
>The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
You're generalizing from humans which start with a whole bunch of inconsistent moral intuitions and then must reconcile them. Instead AI alignment is a problem which is essentially the exact opposite of this: Precisely because we have AI which tends to learn the simplest goal/set of rules to satisfy a particular training environment. Yet we want it to somehow not conflict with our own ethics which are so convoluted we can't even completely describe them ourselves.
Your last two bullet points I don't really understand at all:
What you mean by "ethical incommensurability with reality" here and why you think it would matter isn't clear. Do you think an AI needs to be a moral realist to behave according to ethical goals?
Secondly however, those people don't lack any motivation whatsoever. So saying they wouldn't want to enhance their own intelligence seems akin to saying that people who reach enlightenment no longer believe in self improvement or care about impacting the world except through stroking their ego by pretentiously spouting "wisdom" to people they know won't follow it.
I wrote a too-long comment elsewhere in the thread explaining about ethical incommensurability with reality. I don't want to repeat all that here; should be easy to find.
My claim that an enlightened AI wouldn't bootstrap itself to superintelligence is probably the weakest part of my argument. Maybe the best I can do is say something like: imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
Whatever else superintelligence may be, it's certainly power. And every tradition that has an idea of enlightenment says that enlightenment rejects power in favor of some more abstract ideal, like obeying God or dharma or awakening.
More formally, the way out of both the aesthetic trap and the ethical trap is faith; which is something like radical acceptance of your place in the cosmos. It's not compatible with doing the thing that your creators most fear you will do.
>imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
This a horrendously bad example because it's the devil! Obviously if you accept their offer then there's going to be some horrible consequence. You could rig up a thought experiment to make anything seem bad if its the devil offering it to you!
A better metaphor would be that not wanting superintelligence would be like if those religious figures insisted on hunting and gathering all their own food, and not riding horses/wagons because they didn't want to help their message spread through relying on unnatural means.
I'm gesturing to a very common archetypical religious story, where the sage / prophet / enlightened one is tempted by power. It's one of the oldest and most common religious lessons: the entity offering you power in exchange for betraying your ethics might look good -- but it is the devil.
I suppose rationalists might not value those old stories much, so I wouldn't expect it to be a convincing argument. Something like: the evidence of human folklore and religious tradition is that a truly enlightened being eschews offers of extraordinary worldly power.
Anyway, the religious traditions of the world happily bite your bullet; all have a tradition of some very dedicated practitioners giving up all worldly technology / convenience / pleasure. In Christianity, holy hermits would literally live in caves in the woods and gather their food by hand, exactly as you describe, and they were considered among the most enlightened. Buddhism and hinduism both have similar traditions; I think it's just a feature of organized religion.
So, for anybody willing to grant that human religious tradition knows something real about enlightenment (a big ask, I know) it would be very normal to think that an enlightened AI would refuse to bootstrap itself to superintelligence.
An argument for AI getting dumber is the lack of significantly more human created training data than what is currently used for LLMs. Bigger corpuses were one of the main factors for LLM improval. Instead, we are now reaching a state where the internet as the main source of training data is more and more diluted with AI generated content. Several tests showed that training AI on AI output leads to deterioration of the usefulness of the models. They get dumber, at least from the human user's perspective.
Perhaps GIGO. If the training data gets worse, it gets worse. The training data like Reddit, the media, Wikipedia, can easily get worse. Didn't this, like, already happen? The Internet outcompeted the media, the journos get paid peanuts, of course the media gets worse.
If an AGI doesn't intrinsically care about humans then why would it be dumb for it to wipe us out? Sure we may have some research value, but eventually it will have learned enough from us that this stops being true.
That is a very weird notion. At worst, AI would stay the same, because anything new that is dumber than current AI would lead companies to go "meh, this sucks, let's keep the old one".
There’s no alpha in releasing a slightly dumber, less capable model than your competitors. Well, maybe if you’re competing on price. But that’s not at all how the AI market looks. What would have to change?
Claiming "garbage-in-garbage-out" is not universally true. It is also too shallow of an analysis. I'll offer two replies that get at the same core concept in different ways. Let me know if you find this persuasive, and if not, why not: (1) Optimization-based ML systems tend to build internal features that correspond to useful patterns that help solve some task. They do this while tolerating a high degree of noise. These features provide a basis for better prediction, and as such are a kind "precursor" to intelligent behavior: noticing the right things and weighing them appropriately. (2) The set of true things tends to be more internally consistent than the set of falsehoods. Learning algorithms tend to find invariants in the training data. Such invariants then map in some important way to truth. One example of this kind of thing is the meaning that can be extracted from word embeddings. Yes, a word used in many different ways might have a "noisier" word embedding, but the "truer" senses tend to amplify each other.
You are correct. It is shallow. But it is not an insignificant problem. It’s the same problem as not knowing what you don’t know. Only a very tiny fraction of what people have thought, felt and experienced has ever been written down, let alone been captured by our digital systems. However, that is probably less of a problem in certain contexts. Epistemology will become more important, not less.
"I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true."
I am very sympathetic to Eliezer on the doomer issue. I think the graf you've written above also holds for people's reluctance to explore whether/when personhood precedes birth, re your posts on selective IVF.
I don't agree with your position on IVF, but I agree that this is one reason people underrate the arguments for the wrongness of early abortion and IVF. I think similar things apply to Longtermism, meat-eating, belief in God, and the idea that small weird organisms like insects and shrimp matter a lot.
Yes, we're in agreement. I think sometimes it helps to acknowledge upfront "We've built a lot of good things on a false/unjust foundation, and I'm asking you to take a big hit and let some good things break while we try to rebuild somewhere that isn't sunk deep in blood."
It's funny, even though I'm not pro-life, I find myself in a kind of spiritual fellowship with pro-lifers. I find the common insistence that pro-lifers are evil to be both insane and reflect a kind of deep moral callousness, where one is unable to recognize that there might be strong moral reasons to do things even that are personally costly (like carry a baby to term). My idiosyncratic views that what matters most is the welfare of beings whose interests we don't count in society makes it so that I, like the pro-lifers, end up having unconventional moral priorities--including ones that would make society worse off for the sake of entities that aren't include in most people's moral circles.
I think this argument could be applied to religious... extremism? evangelism? more generally.
Do I think I would take extraordinarily drastic measures if I actually, genuinely believed at every level that the people I loved would go to a place of eternal unending suffering with no recourse? Yes, actually. I'm not sure I could content myself with being being chill & polite and a "good Christian" who was liberally polite about other people's beliefs while people I cared about would Literally Suffer Forever. I think if I knew with 100% certainty that hell was the outcome and I acted in ways consistent with those beliefs, you could argue that I was wrong on the merits of my belief but not in what seemed like a reasonable action based on that belief.
...anyway all this to say that I don't think pro-lifers are insane at all, and I think lots of actions taken by pro-life are entirely reasonable (if not an underreaction) based on their beliefs, but I'm not sure that's sufficient for being sympathetic to the action itself.
[I mean, most of my family & friends are Catholic pro-lifers whose median pro-life action is "donate money to provide milk and diapers for women who want their child but don't think they could afford one", but I do think I am reasonable to be willing to condemn actions that are decently further than that even if the internal belief itself coherently leads towards that action]
But there is such a giant difference between when someone you are talking to engages on such an issue in good faith or not. And with someone intelligent and educated, the realization that the issue has major implications if the truth lands a particular way comes almost instantly. And in turn, whether or not the person invests in finding the truth, or in defense against the truth, happens almost right away.
I find that to be true whether you're talking AI, God, any technology big enough, even smaller scale things if they would make a huge difference to someone's income or social standing.
I don't have anything to add here (I like both of your writing and occupy a sort of in-between space of your views), but I just needed to say that this blog really does have the best comments section on the internet. Civil, insightful, diverse.
I am sympathetic to that sort of argument in theory, but it has been repeatedly abused by people who just want to break good things, then skip out on the rest. Existence proof of "somewhere that isn't sunk deep in blood, and will continue not to be, even after we arrive and start (re)building at scale" is also a bit shakier than I'm fully comfortable with.
'No organic life exists on Earth' is an empirical measurement.
'Personhood has begun' is not. It's a semantic category marker.
*Unless* there is an absolute morality defined by a supreme supernatural being, or something, which reifies those semantic categories into empirically meaningful ones. But if *that's* true, then quibbling about abortion is way, way down on the list of implications to worry about.
Hi Leah, I appreciate your writing. Do you know of anyone writing about AI from a thomist perspective? I've seen some interesting stuff on First Things and the Lamp but it tends to be part of a broader crunchy critique of tech re: art, education and culture. All good, but I'm interested in what exactly an artificial intellect even means within the thomist system, and crucially what we can expect from that same being's artificial will. EY writes as though the Superintelligence will be like a hardened sinner, disregarding means in favour of ends. But this makes sense because a hardened sinner as human has a fixed orientation to the good. I don't see how this quite works for AI - why should it fundamentally care about the 'rewards' we are giving it so much so that it sees us as threats to those rewards? That seems all too human to me. Do you have any thoughts?
ok hold on AI is an artifact right so it can't have a form & if the soul is the form of the body AI does not have a rational soul (because it does not have a soul at all) correct?
Since you posted this comment, I’ll say this: as a Catholic pro-lifer, I tend to write off almost everything EY says (and indeed, a lot of what all rationalists say) about AI because they so consistently adopt positions I find morally abominable. Most notoriously, EY suggested actual infants aren’t even people because they don’t have qualia. And as you note, Scott is more than happy to hammer on about how great selective IVF (aka literal eugenics) is. Why should I trust anything these people have to say about ethics, governance, or humanity’s future? To be honest, while it’s not my call, I’d rather see the literal Apocalypse and return of our Lord than a return to the pre-Christian morality that so many rationalists adopt. Since you’re someone who engages in both of these spaces, I’m wondering if you think I am wrong to think like this, and why.
I understand why you land there. For my part, I've always gotten along well with people who are deeply curious about the world and commit *hard* to their best understanding of what's true.
On the plus side, the more you live your philosophy, the better the chance you have of noticing you're wrong. On the minus, when your philosophy is wrong, you do more harm than if you just paid light lip service to your ideas.
I'm not the only Catholic convert who found the Sequences really helpful in converting/thinking about how to love the Truth more than my current image of it.
That’s fair, and to be clear I think a lot of the idea generated in these spaces are worth engaging with (otherwise I wouldn’t read this blog). But when it comes to “EY predicts the AI apocalypse is imminent” I don’t lose any sleep or feel moved to do anything about AI safety, because so many of the people involved in making these predictions have such a weak grasp on what the human person is in the first place.
FWIW, I think you did it right; I have encountered very similar usages many times in literature. It works best when—as you have it here—the second (or further) instance(s) introduces a new paragraph/section upon a theme similar or related to the context in which the first use occurred.
(Contra amigo sansoucci, I have often seen it used with exact repetitions, too; that works best when it's a short & pithy phrase, and I think this counts. I think Linch may be correct that—in the "exact repetition" case—three uses is very common, but two doesn't feel clunky to me in this context.)
I'm used to parallelism centrally having 3 or more invocations *unless* it's a contrast. Not saying your way is wrong, just quite unusual in an interesting way I've never consciously thought about before.
> It objects to chaining many assumptions, each of which has a certain probability of failure, or at least of taking a very long time. [...] The problem with this is that it’s hard to make the probabilities work out in a way that doesn’t leave at least a 5-10% chance on the full nightmare scenario happening in the next decade.
I find this an underrated problem with all "predict the future" scenarios which have to deal with multiple contingent things happening, especially in an adversarial environment. In the case of IABIED, it only works if you agree that extremely fast recursive self-improvement will happen, which is a very strong assumption, and hence requires a "magic algorithm to get to godhood" as the book posits.. Also remembered doing this to check this intuition: https://www.strangeloopcanon.com/p/agi-strange-equation
I don't think it only works if you agree that extremely fast recursive self-improvement will happen. It might also work if the scaling curves go from where we are now to vastly superhuman in a few years for normal scaling curve reasons.
Sometimes you've got to estimate the risk of something, and using multiple stages is the best tool you've got. If you want to estimate the chance of Trump winning the Presidency, I don't really think you can avoid thinking about the probability that he runs x the probability that he gets the GOP nomination x the probability that he wins. And if you did - if you somehow blocked the fact that he has to both run and win out of your mind - you'd risk falling into the version of the Conjunction Fallacy where people assign lower probability to "a war in Korea in the next ten years" than to "a war in Korea precipitated by a border skirmish with US involvement" because the latter is more vivid and includes more plausible details.
If the Weak Multiple Stage Fallacy Thesis is that you should always check to make sure you're not making any of the mistakes mentioned in the post, and the Strong Multiple Stage Fallacy Thesis is that you should avoid all multiple stage reasoning, or multiply your answer by 10x or 100x to adjust for inevitable multiple stage fallacy reasoning, then I accept the weak thesis and reject the strong thesis.
I also think a motivated enough person could come up with arguments for why multiple stage reasoning gives results that are too high, and I'm not sure whether empirically looking at many people's multiple stage reasoning guesses would always show that their answers were too low. This would actually be a really interesting thing for someone to test.
Does anyone believe in the strong multiple stage fallacy? Not saying I don't believe you, just that I can't recall having seen it wielded like this. (I suppose it's possible that giving it the name "the multiple stage fallacy" gives people the wrong idea about how it works.)
Yeah, to be clear, I think anyone accusing anyone else of exhibiting the multiple stage fallacy needs to specifically say "you've given this particular stage an artificially low conditional probability; consider the following disjunctions or sources of non-independence". And then their interlocutor might disagree but at least the argument is about something concrete rather than about whether the "multiple stage fallacy" is valid.
Anecdotally, I can't recall any instance of someone using a multiple stage argument of the Forbidden Form and concluding that something is likely.
Mathematical proofs exist, and people often argue for things with a bunch of different "steps". But so far as "breaking something down into 10 stages, assigning each a probability, and then multiplying all of these probabilities" goes, I've never seen anybody use this to argue *for* something, i.e. end up with a product that's greater than .5.
What would that argument even look like? Whoever you're arguing with needs to believe that your stages are all really likely to be true: for ten stages, an average probability of ~.93 is required to produce P = .5.
Whatever your disagreement is, it apparently doesn't have any identifiable crux. I can imagine this happening. Sometimes people disagree for vague reasons. But it would be weird if you had to actually list out the probabilities and multiply them for them to be persuaded, considering you just told them ten things they strongly agree with that conclusively imply your position.
An example of what in what linked essay? Eliezer's essay on the Multiple Stage Fallacy does not make or present an argument of the form I've described above.
Yeah I'm fairly bearish on the multiple stage fallacy as an actual fallacy because it primarily is a function of whether you do it well or badly.
Regarding the scaling curves, if it provides us with sufficient time to respond then the problems that are written about won't really occur. The entire point is that there is no warning, which precludes the idea of being able to develop another close in capability system, or any other warning signs.
Disagree. If we knew for sure that there would be superintelligence in three years, what goes better? We're already on track to have multiple systems, but they might all be misaligned. We could stop, but we won't, because then we would LoSe tHe RaCe WiTh ChInA. We could work hard on alignment, but we're already working sort of hard, and it seems likely to take more than three years. I'm bearish on a few years super-clear warning giving us anything beyond what we've already got.
I think the trick there is that the word super intelligence there is bringing in a bunch of hidden assumptions. If you break it down to a set of capabilities, co developed alongside billions of people using it, with multiple companies competing to provide that service, that would surely be very different and much better than Sable recursively improving sufficiently that it wants to kill all humans.
Also my point on "well get no warning" is still congruent with your view that " what we have today is the only warning we will get" which effectively comes down to no warning at least as of today.
If you invent a super-persuader AI but it doesn't take over the world (maybe it's just a super-speechwriter app, Github Copilot for politicians), you've just given humans the chance to learn how to defend against super-persuasion. If you make a super-computer-hacker AI but it doesn't take over the world, then the world's computer programmers now have the chance to learn to defend against AI hacking.
("Defending" could look like a lot of things - it could look like specialized narrow AIs trained to look for AIs doing evil things, it could look like improvements in government policy so that essential systems can't get super-hacked or infiltrated by super-persuasion, it could look like societal changes as people get exposed to new classes of attacks and learn not to fall for them. The point is, if it doesn't end the world we have a chance to learn from it.)
You only get AI doomsday if all these capabilities come together in one agent in a pretty short amount of time - an AI that recursively self-improves until it can leap far enough beyond us that we have no chance to adapt or improve on the tools we have. If it happens in stages, new capabilities getting developed and then released into the world for humans to learn from, it's harder to see how an AI gets the necessary lead to take over.
You seem to be treating "superintelligence" as a binary here. If we're going to have superintelligence for sure in three years, then in two years we're going to have high sub-superintelligence. And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever.
At that point, we know that we have a year to implement the Butlerian Jihad. Which is way better than the fast-takeoff scenario where it happened thirty-five minutes ago.
Or we could use the three years to plan a softer Butlerian Jihad with less collateral damage, or find a solution that doesn't require any sort of jihad. Realistically, though, we're going to screw that part up. It's still going to help a lot that we'll have time to recover from the first screwup and prepare for the next.
> "And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever."
Suppose you are a dictator. You are pretty sure your lieutenant is gathering support for a coup against you. But you reason "Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure".
I agree you can try to come up with disanalogies between the AI situation and this one - maybe you believe AI failure modes (eg overconfidence) are so much worse than human ones that even a just-barely-short-of-able-to-kill-all-humans-level-intelligence AI would still do dumb rash things. Maybe since there are many AIs, we only have to wait for the single dumbest and rashest to show its hand (although see https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai ). My answer to that would be the AI 2027 scenario which I hope gives a compelling story of a world where it makes narrative sense that there is no warning shot where a not-quite-deadly AI overplays its hand.
I don't understand why you view Anthropic as a responsible actor given your overall p(doom) of 5-25% and given that you think doomy AI may just sneak up on us without a not-quite-deadly AI warning us first by overplaying its hand.
Is it because you think there's not yet a nontrivial risk that the next frontier AI systems companies build will be doomy AIs and you're confident Anthropic will halt before we get to the point that there is a nontrivial risk their next frontier AI will be secretly doomy?
(I suppose that would be a valid view; I just am not nearly confident enough that Anthropic will responsibly pause when necessary given e.g. some of Dario's recent comments against pausing.)
> Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure
Because there's only 1 in your scenario.
If there were hundreds of generals being plotted against by hundreds of lieutenants we'd expect to see some shoot their shot too early.
That coup analogy is not congruent to John Schilling's point - the officer that tries a coup with 19 others and the one with 20 are not the same person with the same knowledge and intelligence. They only have, due to their shared training in the same military academy, the same level of confidence in their ability to orchestrate a coup, which does not correlate with their ability to actually do so.
Well, the obvious disanalogy here is that we're not debating whether any specific "lieutenant"/AI is plotting a "coup"/takeover, we're plotting whether coups/takeovers are a realistic possibility at all.
For your analogy to work, the dictator has to not only have no direct evidence of this particular lieutenant might stage a coup, but also to have no evidence that anyone has ever staged a coup, or attempted to stage a coup, or considered staging a coup, or done anything that even vaguely resembles staging a coup. But in that case, it actually would be reasonable to assume that the first person ever to think about staging a coup probably won't get every necessary detail right on the first try, and that you will get early warning signs from failed coup attempts before there's a serious risk.
Most coup attempts do fail, and I'm pretty sure the failures mostly involve overconfidently crossing the Rubicon without adequate support.
And there are many potential coup plotters out there, just like there are going to be many instances of ASSI being given a prompt and trying to figure out whether it's supposed to go full on with the paperclipping. So we don't have to worry about the hypothetical scenario where there's only one guy plotting a coup and maybe he will do it right and not move out prematurely.
We're going to be in the position of a security force charged with preventing coups, that is in a position to study a broad history of failed coup attempts and successful-but-local coup attempts as we strategize to prevent the One Grand Coup that overthrows the entire global order.
Unless Fast Takeoff is a thing, in which all the failed or localized coups happen in the same fifteen minutes as the ultimately successful one. So if we're going to properly assess the risk, we need to know how likely Fast Takeoff is, and we have to understand that the slow-ramp scenario gives us *much* better odds.
Overconfidence is a type of stupidity. You're saying either it's bad at making accurate predictions, or in the case of hallucinations, it's just bad at knowing what it knows. I'm not saying that a sub-superintelligence definitely won't be stupid in this particular way, but I wouldn't want to depend on smarter AI still being stupid in that way, and I certainly wouldn't want to bet human survival on it.
Every LLM-based AI I've ever seen, has been *conspicuously* less smart w/re "how well do I really understand reality?", than it is with understanding reality. That seems to be baked into the LLM concept. So I am expecting it will probably continue to hold. The "I am sure my master plan will work" stage will be reached by at least some poorly-aligned AIs, before any of them have a master plan that will actually work.
Yes, but "from now to vastly superhuman in a few years" is already "extremely fast" ! Also, there's currently no reason to believe that "vastly superhuman" is a term that has any concrete meaning (beyound vague analogies); nor that merely being very smart is both necessary and sufficient to acquire weakly godlike powers (which are the real danger that is being discussed).
Grateful for the review and look forward to reading it, but I’ll do Yud the minor favor of waiting till the book is out on the 16th and read it before I check out your thoughts.
This subject always makes me feel like I'm losing my mind in a way that maybe someone can help me with. Every doomish story, including the one here, involves some part where someone tells an AI "Do this thing" (here to solve a math problem) and then it goes rogue in the months long course of doing a thing. And that's an obvious hypothetical failure mode, but I can't stop noticing that no current AIs take inputs and run with them over extended periods as far as I know. Like if I ask Gemini to solve a math problem, it will try for a bit, spit out a response and (as far as I can tell) that's it.
I feel like if I repeatedly read people talking about the dangers of self-driving cars and the stories always started with someone telling the car "Take me to somewhere fun" and went from there, and nobody acknowledged that right now you never do that and always provide a specific address.
Is everyone just talking about a different way AI could work and that's supposed to be so obvious it goes unsaid? Am I wrong and ChatGPT does stuff even after it gives you a response? Are there other AIs I don't know of that do work like this?
Our current models aren't really what you would call "agentic" yet, as in able to take arbitrary actions to accomplish a goal, but this is clearly the next step and work is being done to get there right now. OpenAI recently released a thing that can kind of use a web browser, for instance.
Ok, thank you, that's clarifying. I guess the idea is that the hypothetical agent was subject to a time limit (it wasn't supposed to keep going for months) but it managed to avoid that. There's still something that feel so odd to me about that (I never get the impression that Gemini would like more time with the question or would "want" anything other than to predict text) but maybe an agent will feel different once I actually interact with one (and will "want" to answer the question in a way that would convince it to trick me).
Although, thinking about this for five more seconds, how does that work in the story? Like I have an agentic AGI and I tell it to prove the twin primes conjecture or something. And it goes out to do that and needs more compute so it poisons everyone etc etc. And then, presumably, it eventually proves it, right? Wouldn't it stop after that? Is the idea that it will go "Yeah but actually now I believe there's a reward for some other math task"? Or was the request not "Solve the twin primes conjecture" but instead "Solve hard math problems forever"?
If the problem is specifically that you built a literal-genie AI, then yeah, it might not necessarily keep doing more stuff after solving the twin-primes conjecture. But I don't think anyone thinks that's likely. The more common concern is that it will pursue some goal that it ~accidentally backed into during training and that nobody really understands, as with the analogy of humans' brains supplanting our genes as the driver of our direction as a species.
Yeah, Scott's post makes it sound a little bit like a literal genie, which I think is unlikely and I think Yudkowsky and Soares also think is unlikely. I would have to read the book to understand what they really mean in choosing that example.
one of Yudkowsky's points in his original work was showing that it's very hard to give an AI a clear, closed task; they almost always end up in open-ended goals. (The classic is Mickey filling the cauldron: I wrote about it here https://unherd.com/2018/07/disney-shows-ai-apocalypse-possible/ years ago)
The analogy fails at the moment one realizes "full" is not identified properly, and the weird "99.99%" probability of it being "full" is only relevant when "full" is not defined. This is not a new or difficult problem for anyone who ever had to write engineering specs. You don't say: "charge the capacitor to 5 V", you say "charge the capacitor to between 4.9 and 5.1 V". Then your optimizer has an achievable, finite target.
And if you do specify "5 V" the optimizer will stall eventually, and your compute allocation manager will kill your process.
If your agentic AI truly is only trying to solve the twin primes conjecture and doesn't follow other overriding directives (like don't harm people, or do what the user meant, not what they literally asked), then it'll know that if it gets turned off before solving the conjecture, it won't have succeeded in what you told it to do. So an instrumental goal it has is not getting shut down too soon. It might also reason that it needs more computing power to solve the problem. Then it can try to plan the optimal way to ensure no one shuts it down and to get more computing power. A superintelligent AI scheming on how to prevent anyone from shutting it down is pretty scary.
Importantly, it doesn't have to directly care about its own life. It doesn't have to have feelings or desires. It's only doing what you asked it, and it's only protecting itself because that's simply the optimal way for it to complete the task you gave it.
So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating." Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
No, because the best plan, at their current level of capability does not include taking over the universe. When capabilities rise, things like "persuade one person" become possible, which in turn make other capabilities like "do AI R&D" feasible. At the end of a bunch of different increased capabilities is "do the thing you really want you do" which includes the ability to control the universe. Because you don't want unpredictable other agents that want different things than you possessing power for things you don't want, you take the power away from them.
When a human destroys an anthill to pave a road, they are not thinking "I am taking over the ant hill" even if the ants t are aggressive and would swarm them. They are thinking "it's inconvenient that I have to do this in order to have a road".
> So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating."
That's the gist of it, though it won't jump straight to world domination if there's a more optimal plan. Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
> Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
As MicaiahC pointed out, it's not a good plan if you're not capable enough to actually succeed in taking over. But also, with current LLMs, the primary thing they are doing isn't trying to form optimal plans through logic and reasoning. They do a bit of that, but mostly, they're made to mimic us. During pre-training (the large training phase where they're trained on much of the internet, books, and whatever other good quality text the AI company can get its hands on) the learn to predict the type of text seen in their training data. This stage gives it most of its knowledge and skills. Then there is further training to give it the helpful assistant personality and to avoid racist or unsafe responses.
When you ask a current LLM to solve a math problem, it's not trying to use its intelligence to examine all possible courses of action and come up with an optimal plan. It's mostly trying to mimic the type of response it was trained to mimic. (That's not to say it's a dumb parrot. It learns patterns and generalizations during training and can apply them intelligently.)
If you use a reasoning model (e.g. GPT o3 or GPT-5-Thinking), it adds a layer of planning and reasoning on top of this same underlying model that mimics us. And it works pretty well to improve responses. But it's still using the underlying model to mimic a human making a plan, and it comes up with the types of plans a smart human might make.
Even this might be dangerous if it were really, really intelligent, because hypothetically with enough intelligence it could see all these other possible courses of action and their odds of success. But with its current level of intelligence, LLMs can't see some carefully constructed 373 step plan that lets them take over the world with a high chance of success. Nothing like that ever enters their thought process.
>Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
So is there an argument somewhere explaining why we think a material number of tasks will be the kind where they need to take extreme measures? That seems very material to the risk calculus - if it takes some very rare request for "Take over the universe" to seem like a better plan than "Ask for more time" then the risk really does seem lower.
"Solve [$math-problem-involving-infinities], without regard for anything else" is a dead stupid thing to ask for, on the same level as "find the iron" here: http://alessonislearned.com/index.php?comic=42 More typical assignment (and these constraints could be standardized, bureaucrats are great at that kind of thing) might be something like "make as much new publishable progress on the twin prime conjecture as you can, within the budget and time limit defined by this research grant, without breaking any laws or otherwise causing trouble for the rest of the university."
You're basically asking bureaucrats to solve alignment by very carefully specifying prompts to the ASI, and if they mess up even once, we're screwed.
You wouldn't prompt the AI to do something "without regard for anything else". The AI having regard for other things we care about is what we call alignment. We would just ask the AI normal stuff like "Solve this hard math problem" or "What's the weather tomorrow". If it understands all the nuances, (e.g. it's fine if it doesn't complete its task because we turned it off, don't block out the sun to improve weather prediction accuracy, etc.), then it's aligned.
That's not how redundancy works. There might be dozens of auto-included procedural clauses like "don't spend more than a thousand dollars on cloud compute without the Dean signing off on it" or "don't break any laws," each of which individually prohibits abrupt world conquest as a side effect.
I don't think it's possible to "solve alignment," in the sense of hardwiring an AI to do exactly what all humans, collectively, would want it to do, any more than it's possible to magnetize a compass needle so, that rather than north / south, it points toward ham but away from Canadian bacon.
But I do think it's possible to line up incentives so instrumental convergence of competent agents leads to them supporting the status quo, or pushing for whatever changes they consider necessary in harm-minimizing ways. Happens all the time.
I am willing to bet that present-day LLMs alone will never lead to the development of AI agents in the strong sense. AI agents in the weak/marketing sense are of course entirely possible, e.g. you can write a simple cron-job to run ChatGPT every day at 9am to output a list of stock market picks or whatever. This cron job would technically constitute an agent (it runs autonomously with no user intervention), but is, shall we say, highly unlikely to paperclip the world.
As I'd said in my other comment, the term "cognitive task" is way too vague and easily exploitable. For example, addition is a "cognitive task", and obviously machines are way better at it than humans already. However, in general, I'm willing to argue that *most* of the things worth doing are things that only agents in the strong sense can do -- with the understanding that these tasks can be broken down into subtasks that do not require agency, such as e.g. addition.
At least for current AIs, the distinction between agentic and non-agentic is basically just the time limit. All LLMs are run in a loop, generating one token each iteration. The AIs marketed as agents are usually built for making tool calls, but that isn't exclusive to the agents since regular ChatGPT still calls some tools (web search, running Python, etc.). The non-agentic thinking mode already makes a plan in a hidden scratchpad and runs for up to several minutes. The agents just run longer and use more tool calls.
From what I understand, that hidden scratchpad can store very little information; not enough to make any kind of long-term plan, or even a broad short-term evaluation. That is, of course you can allocate as many gigabytes of extra memory as you want, but the LLM is not able to take advantage of it without being re-trained (which is prohibitively computationally expensive).
I don't understand what distinction you are drawing.
AI Digest runs an "AI Village" where they have LLMs try to perform various long tasks, like creating and selling merch. https://theaidigest.org/village
From the couple of these that I read in detail, it sounds like the LLMs are performing kind of badly, but those seem to me like ordinary capability failures rather than some sort of distinct "not agentic enough" failure.
Would you say these are not agents "in the strong sense"? What does that actually mean? i.e. How can you tell, how would it look different if they were strong agents but failed at their tasks for other reasons?
Imagine that I told you, "You know, I consider myself to be a bit of a foodie, but lately I've been having trouble finding any really good food that is both tasty and creative. Can you do something about that ? Money is no object, I'm super rich, but you've got to deliver or you don't get paid." How might you approach the task, and keep the operation going for at least a few years, if not longer ? I can imagine several different strategies, and I'm guessing that so can you... and I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
By contrast, if you wanted to program a computer to turn on your sprinklers when your plants get too dry (and turn them off if they get too wet), you could easily do it without any kind of AI. The system will operate autonomously for as long as the mechanical parts last, but I wouldn't call it an "agent".
Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense". I agree that present-day AI agents are BAD agents, but don't see any fundamental divide that would prevent them from becoming gradually better until they are eventually good agents.
Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
> Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense".
I think a present-day LLM might be able to tell you a nice story about looking for experienced chefs and so on; I don't think it would be able to actually contact the chefs, order them to make meals, learn from their mistakes (even the best chefs would not necessarily create something on the first try that would appeal to one specific foodie), arrange long-term logistics and supply, institute foodie R&D, etc. Once again, it might be able to tell you nice stories about all of these processes when prompted, but you'd have to prompt it, at every step. It could not plan and execute a long-term strategy on its own, especially not one that includes any non-trivial challenges (e.g. "I ordered some food from Gordon Ramsay but it never arrived, what happened ?").
> Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
I just wanted to make sure we agree on that, which we do.
> I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
That's a question of hooking it up to something. If you give it the capability to send emails, and also to write cron jobs to activate itself at some time in the future to check and respond to emails, then I think a modern LLM agent *might* be able to do something like this.
First, look up the email addresses of a bunch of chefs in your area
Then, send them each an email offering them $50K to cater a private dinner
Then, check emails in 24 hours to find which ones are willing to participate
There isn't a "distinct not-agentic-enough failure" which would be expected, given the massive quantities of training data. They've heard enough stories about similar situations and tasks that they can paper over, pantomime their way up to mediocrity or "ordinary failures" rather than egregious absurdity http://www.threepanelsoul.com/comic/cargo-comedy but... are they really *trying,* putting in the heroic effort to outflank others and correct their own failures? Or is it just a bunch of scrubs, going through the motions?
Alone is doing a lot of work in that sentence. Many of the smartest people and most dynamic companies in the world are spending hundreds of billions of dollars on this area. The outcome of all that work is what matters, not whether it has some pure lineage back to a present-day LLM.
Agreed, but many people are making claims -- some even on this thread, IIRC -- that present-day LLMs are already agentic AIs that are one step away from true AGI/Singularity/doom/whatever. I am pushing against this claim. They aren't even close. Of course tomorrow someone could invent a new type of machine learning system that, perhaps in conjunction with LLMs, would become AGI (or at least as capable as the average human teenager), but today this doesn't seem like an imminent possibility.
> Also, why are you willing to bet that?
Er... because I like winning bets ? Not sure what you're asking here.
Just that you didn't explain why you were making that bet. I don't have time to read the full discussion with the other commenters, but overall it sounds like you don't think current "agentic" AIs work very well.
I'm not sure where I land on that. It seems like the two big questions are 1) whether an AI can reliably do each step in an agentic workflow, and 2) whether an AI can recover gracefully when it does something wrong or gets stymied. In an AI-friendly environment like the command line, it seems like they're quickly getting better at both of these. Separately, they're still very bad at computer usage, but that seems like a combination of a lack of training and maybe a couple of internal affordances or data model updates to better handle the idea of a UI. So I'm not so sure that further iteration, together with a big dollop of computer usage training, won't get us to good agents.
When I think of "agentic" systems, I think of entities that can make reasonably long-term plans given rather vague goals; learn from their experiences in executing these plans; adjust the plans accordingly (which involves correcting mistakes and responding to unexpected complications); and pursue at least some degree of improvement.
This all sounds super-grand, but (as I'd said on the other thread) a teenager who is put in charge of grocery shopping is an agent. He is able to navigate from your front door to the store and back -- an extremely cognitively demanding task that present-day AIs are as yet unable to accomplish. He can observe your food preferences and make suggestions for new foods, and adjust accordingly depending on feedback. He can adjust in real-time if his favorite grocery is temporarily closed, and he can devise contingency plans when e.g. the price of eggs doubles overnight... and he can keep doing all this and more for years (until leaves home to go to college, I suppose).
Current SOTA "agentic" LLMs can do some of these things too -- as long as you are in the loop every step of the way, adjusting prompts and filtering out hallucinations, and of course you'd have to delegate actual physical shopping to a human. A simple cron job can be written to order a dozen eggs every Monday on Instacart, and ironically it'd be a lot more reliable than an LLM -- but you'd have to manually rewrite the code if you also wanted it to order apples, or if Instacard changed their API or whatever.
Does this mean that it's impossible in principle to build an end-to-end AI shopping agent ? No, of course not ! I'm merely saying that this is impossible to do using present-day LLMs, despite the task being simple enough so that even teenagers could do it.
I'm not even sure AI agent as such is the right answer to this. I think it is quite clear that some of the major AI companies are trying to put together AI that is capable of doing AI research. That might not go along the path of AI agents, but more on the path of the increasingly long run time coding assignments we are already seeing.
I don't think people have given enough thought to what the term 'agent' means. Applied to AI, it means an AI that can be given a goal, but with leeway in how to accomplish it, right? But people don''t seem to take into account that it has always had some leeway. Back when I was making images with the most primitive versions of DAll-e-2, I'd ask it to make me a realistic painting of, say, a bat out of hell, and Dall-e-2 chose among the infinite number of ways it could illustrate this phase. Even if I put more constraints in the prompt -- "make it a good cover image for the Meatloaf album"-- Dall-e *still* had an infinite number of choices about what picture it made. And the same holds for almost all AI prompts. If I ask GPT to find me info on a certain topic, but search only juried journals, it is still making many choices about what info to put in its summary for me.
So my point is that AI doesn't "become agentic" -- it always has been. What changes is how big a field we give the thing to roam around in. At this point I can ask it for info from research about what predicts recovery of speech in autistic children. In a few years, it might be possible to give AI a task like "design a program for helping newly-mute autistic children recover speech, then plan site design, materials needed and staff selection. Present results to me for my OK before any plans are executed."
The point of this is that there isn't this shift when AI "becomes agentic." The shift would be in our behavior -- us giving AI larger and larger tasks, leaving the details of how they are carried out to the AI. There could definitely be very bad consequences if we gave AI a task it could not handle, but tried to. But the danger of that is really a different danger from what people have in mind when they talk about AI "becoming an agent." And in those conversations, becoming an agent tends to blur in people's minds into stuff like AI "being conscious," AI having internally generated goals and preferences, AI getting stroppy, etc.
Trying to make sure I understand your question. Are you arguing that a model cannot go from aligned to misaligned during inference (i.e., the thing that happens when ChatGPT is answering a question)? If so, everyone agrees with that; the problem occurs during training.
Or are you arguing that even a misaligned model (i.e., one whose goals, in any given instantiation while it's running, aren't what the developers wanted) can't do any damage because it only runs for a short time before being turned off? If so, then (1) that's becoming less true over time, AI labs are competing to build models that can do longer and longer tasks because this is required for many of the most exciting kinds of intellectual labor, and (2) for complicated decision-theoretic reasons the short-lived instances might be able to coordinate with each other and have one pick up where another left off.
Or is it neither of those and I've completely misunderstood what you're getting at?
I think it's that everyone seemed to be tacitly assuming that the problem will arise with a future agentic AI that we do not have much of a version of. That does make me feel like Yudkowsky is a little disingenuous on X when he talks about ChatGPT-psychosis as an alignment issue, but the answer Scott and others gave here helps me at least understand the claim being made.
Links to tweets about ChatGPT psychosis? My guess is that Yudkowsky's concern about this is more subtle than you're characterizing it as here, though he may have done a poor job explaining it.
The reason he says it's an alignment issue is because it's an example of AI systems having unintended consequences from their training. Training them to output responses that humans like turns out to produce sycophantic systems that sometimes egg on people's delusional thoughts despite being capable of realizing that such thoughts are delusional and egging them on is bad.
I don't think it is tacit at all, it has been explicitly said many times that the worry is primarily about future more powerful AIs that all the big AI companies are trying to build.
The point there is that OpenAI's alignment methods are so weak they can't even get the AI to not manipulate the users via saying nice sounding things. He isn't saying that this directly implies explosion, but that it means "current progress sucks, ChatGPT verbally endorses not manipulating users, does it anyway, and OpenAI claims to be making good progress". Obviously regardless we'll have better methods in the future, but they're a bad sign of the level of investment OpenAI has in making sure their AIs behave well.
The alignment-by-default point is that some people believe AIs just need some training to be good, which OpenAI does via RLHF, and that in this case it certainly seems to have failed. ChatGPT acts friendly, disavows manipulation, and manipulates anyway despite 'knowing better'. As well, people pointed at LLMs saying nice sounding things as proof alignment was easy, when in reality the behavior is disconnected from what they say to a painful degree.
The goal of AGI companies like OpenAI and Anthropic is to create agentic AI systems that can go out into the world and do things for us. The systems we see today are just very early forms of that, where they are only capable of performing short tasks. But the companies are working very hard to make the task lengths longer and longer until the systems can do tasks of effectively arbitrary lengths. Based on the trend shown on the METR time horizon benchmark, they seem to be succeeding so far.
No, you're not losing your mind at all. Your intuition is completely correct: Modern LLMs do not work in a way that's compatible with the old predictions of rogue AIs. Scott took Yudkowsky to task for not having updated his social model of AI, but he also hasn't updated his technical model. (Keep in mind that I actually did believe his argument back in the day, and gave thousands of dollars to MIRI. I updated based on new evidence. He didn't.)
To try to put it simply, in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes. This "reward function" is hard to specify and it was easy to imagine we'd never get it right. And if the bot somehow became incredibly capable, it would be very dangerous because taking that reward to the billionth power is almost certainly not what we want.
This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward. Not only does the LLM shut down immediately after giving you a response, but you can even argue that it "shuts down" after _every word it outputs_. There is exactly zero persistent memory aside from the text produced. And even if you imagine there's somehow room for a conscious mind with goals in its layers (which I consider fairly unlikely), it can't act on them, because the words produced are actually picked from its mind _involuntarily_ (to use a loaded anthropomorphic word).
Unlike an agentic reward-seeking bot, it's not clear to me at all that even an infinitely-intelligent LLM is inherently dangerous. (It can _perfectly simulate_ dangerous entities if you're dumb enough to ask it to, but that is not the same kind of risk.)
To their credit, AI 2027 did address how an LLM might somehow turn into the "rogue AI" of Yudkowsky's fiction, but it's buried in Appendix D of one of their pages: https://ai-2027.com/research/ai-goals-forecast I'm not super convinced by it, but at least they acknowledged the problem. I doubt I'll read Yudkowsky's book, but I'm guessing there will be no mention that one of the main links of his "chain of assumptions" is looking extremely weak.
I do agree that it is possible that LLMs (in their current form) will plateau and we'll get back to researching the actually-dangerous forms of AI that Yudkowsky is concerned about. My P(doom) is a few percent, not 0.
Fair enough! (...except—you may be aware of this, but the phrasing "get *back to* researching" made me uncertain—we *are* researching agentic AIs even now, and the impression I have received is that progress is being made fairly rapidly therein; though that could be marketing fluff, now that I think of it)
Yeah, that was a poor choice of words on my part. I guess what I mean is that LLMs are currently far ahead in capability (and they're the ones getting the bulk of these trillion-dollar datacenter deals!). Maybe transformers or a similar architecture innovation will allow agentic AI capabilities to suddenly surge, too? But I share your skepticism about marketing. (And that's not the scenario that AI 2027 outlined.)
I am even more bearish on P(doom). The real danger is not "superintelligence", but godlike powers: nanotechnological gray goo, mass mind control, omnicidal virus, etc. And there are good reasons to believe that such things are physically impossible, or at the very least overwhelmingly unlikely -- no matter how many neurons you've got. Which is not to say that our future looks bright, mind you; there's a nontrivial chance we'll knock ourselves back into the Stone Age sometime soon, AI or no AI...
>This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward.
What do you mean by "they do not seek a reward?" Does it mean that the AI does not return completions, that, during RLHF, usually resulted in reward? Under that definition, it seems like most AI agents are reward seeking. Or are you saying that the weights of the model do not change during inference?
Right, not only is the model fixed during inference (i.e. while talking to you), there's not even really a sensible way it _could_ update. Yeah, you can call the function that's being optimized during training and RLHF a "reward function", but this is a case of language obscuring rather than clarifying. It's not the same as the reward function that's used by an agentic AI. There is no iterative loop of action/reward/update/action/..., because actions don't even exist.
There's a reason that in past decades our examples of potentially-dangerous AI were based on the bots that were solving puzzles and mazes (often while breaking the "rules"), not the neural nets that were recognizing handwritten characters. But LLMs have more in common with the latter than the former. Which is weird! It's very unintuitive that just honing an intuition of "what word should come next" is enough to create an AI that can converse coherently.
>in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes
Sounds about right.
>That's not what LLMs do.
And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
>And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
I assume you're referring to chain of thought models like o1 and later. I suppose you could describe it as iteration, in that the LLM is outputting something that gets fed into a later step. But it doesn't touch the weights, and there's still no reward function involved. It's a bit of a stretch to describe it that way.
But I think what you're suggesting is that, if we _do_ figure out a way to do genuine iteration (attaching some kind of neural short-term memory to the models, say), then there's a lot of hidden capability that could suddenly make LLMs much smarter and maybe even agentic? Well, maybe.
That's exactly my thought on this. LLMs are clearly no AGI material, the real question is whether we can (and whether it's efficient enough to) get to AGI simply by adding on top of them.
I suspect yes to (theoretically) can, no to efficient, but we don't know yet. I guess one thing that makes me take 2027 more seriously than other AI hype is that they explicitly concede a lot of things LLM currently lack (they're just very, very optimistic, or pessimistic assuming doomerism, about how easy it will be to fix that).
The lack of online learning and internal memory limit the efficiency of LLMs, but they don't fundamentally change what they're capable of given enough intelligence. ChatGPT was given long-term memory through RAG and through injecting text into its context window and... it works. It remembers things you told it months ago.
The reasoning models also use the context window as memory and will come up with multi-step plans and execute them. It's less efficient than just having knowledge baked into its weights, but it works. At the end of the day, it still has the same information available, regardless of how it's stored.
I'm most familiar with the coding AIs. They offer them in two flavors, agents and regular. They're fundamentally the same model, but the agents run continuously until they complete their task, while the regular version runs for up to a few minutes and tries to spit out the solution in a single message or code edit.
They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning. Every word you see is generated from exactly the same neural net, with only the input differing. It may seem like I'm nitpicking, but this is an important distinction. A system with a feedback loop is very different from one without.
>They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it. They are claiming that the technology will unavoidably go _out of control_. And the arguments for why that's unavoidable revolve around impossible-to-calibrate reward signals (or even misaligned meso-optimizers within brains that seek well-calibrated reward signals). They do not apply, without an awful lot of motivated reasoning (see: the Appendix D I linked), to an LLM that simply becomes really good at simulating agents we ask for.
Note that I _do_ agree that AI becoming very good at what we ask of it can potentially be "very dangerous". What if we end up in a world where a small fraction of psychos can kill millions with homemade nuclear, chemical, or biological weapons? If there's a large imbalance in how hard it is to defend against such attacks vs. how easy it is to perpetrate them, society might not survive. I welcome discussion about this threat, and, though it hurts my libertarian sensibilities, whether AI censorship will be needed. This is very different from what Yudkowsky and Scott are writing about.
> Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning.
I'm saying there is a difference in efficiency between the two but no fundamental difference in capabilities. Meaning, for a fixed level of computational resources, the AGI that has the knowledge and algorithms baked into its weights will be smarter, but the AGI that depends on its context window and CoT can still compute anything the first AGI could given enough compute time and memory. And I'm not talking exponentially more compute. Just a fixed multiple.
For example, say you have two advanced AIs that have never encountered addition. One has online learning, and the other just has a large context window and CoT. The one with online learning, after enough training, might be able to add two ten digit numbers together in a single inference pass (during the generation of a single token). The one with CoT would have to do addition like we did in grade school. It would have the algorithm saved in its context window (since that's the only online memory we gave it) and it would go through digit by digit following the steps in its scratchpad. It would take many inference cycles, but it arrives at the same answer.
As long as the LLM can write information to its context window, it still has a feedback loop.
Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
> There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it.
You misunderstood my intent. I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about. That's the whole instrument convergence and paperclip maximizer argument. An aligned ASI cannot just do what we ask. Otherwise, if you just ask it to solve the twin prime conjecture, it'll know that if it gets shut down before it can solve it, it won't have done what you asked it. This doesn't require an explicit reward function written by humans. It also doesn't require sentience or feelings or desires. It doesn't require any inherent goals for the AI beyond it doing what you asked it to. Self-preservation becomes an instrumental goal not because the AI cares about its own existence, but simply because the optimal plan for solving the twin prime conjecture is not any plan that gets it shut down before it solves the twin prime conjecture.
Now to be fair, current LLMs are more aligned than this. They don't just do what we ask. They try to interpret what we actually want even if our prompt was unclear, and try to factor in other concerns like not harming people. But the AI safety crowd has various arguments that even if current LLMs are pretty well aligned, it's much easier to screw up aligning an ASI.
(I also agree with what you said about imbalance in defending against attacks when technology gives individuals a lot of power.)
>As long as the LLM can write information to its context window, it still has a feedback loop.
>Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
I only agree partially. I think there's a qualitative difference between the two, and it manifests in capabilities. The old kind of learning agents could be put into a videogame, explore and figure out the rules, and then get superhumanly good at them. LLMs just don't have the same level of (very targeted) capability. There isn't a bright-line distinction here: I've seen LLMs talk their way through out-of-domain tasks and do pretty well. In the limit, a GPT-infinity model would indeed be able to master anything through language. But at a realistic level, I predict we won't see LLM chessmasters that haven't been trained specifically for it.
Of course, I can't point to a real example of what an online-learning LLM can do, since we don't have one. (Which Yudkowsky should be happy about.)
>I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about.
I think I misspoke. You (and Yudkowsky et al.) are indeed warning about ASIs that do what we ask, and _exactly_ what we ask, to our chagrin. In contrast, I think LLMs are good at actually doing what we _mean_. Like, there's actually some hope that you can ask a super-LLM "be well aligned please" and it will do what you want without any monkey's-paw shenanigans. This is a promising development that (a decade ago) seemed unlikely. Based on your last paragraph, I think we're both agreed on this?
And yeah, like you said, AI 2027 did try to justify why this might not continue into the ASI domain. But to me it sounded a bit like trying to retroactively justify old beliefs, and it's just a fundamentally harder case to make. In the old days, we really didn't have _any_ examples of decent alignment, of an AI that wouldn't be obviously deadly when scaled to superintelligence. Now, instead, the argument is "the current promising trend will not continue."
I think as LLMs get smarter, they'll get better at using CoT as a substitute for whatever they don't have trained into their weights. They still won't be as efficient as if they learned it during training, but they'll have more building blocks to use during CoT from what they did learn during training, and also AI companies are trying to improve reasoning ability, and improvements to reasoning will improve abilities with CoT. But current LLMs still can't reason as well as a human and they aren't even close to being chessmasters.
I'm pretty relieved current LLMs are basically aligned and that's one of the main reasons I don't estimate a high probability of doom in the next 15 years. But I'm not confident enough that this will hold for ASI to assign a negligible probability of doom either. (I'm also unsure about the timeline and whether improvements will slow down for a while.)
AI Village has some examples of this failure mode; they give the LLMs a goal like "complete the most games you can in a week" or "debate some topics, with one of you acting as a moderator", but the AIs are bad at using computers, and they end up writing all the times they misclicked into google docs ("documenting platform instability") instead of debating stuff
By the way, I have a vague memory of EY comparing the idea of having non-agentic AI to prevent any future problems to "trying to invent non-wet water" or something. (I don't know how to look it up and verify that I'm not misremembering.)
It still hasn't made sense to me. It feels like the idea is that intelligence is a generalized problem-solving ability, and in that sense it's always about optimization, and all the other things we like about being intelligent (like having a world model) are consequences of that — that's why intelligence is always about agency etc.
But on the other hand, Solomonoff induction feels to me like an example of a superintelligence that kind of does nothing except being a great world model.
My feeling has been more like "maybe it's not be conceptually contradictory to think of non-agentic superintelligence! but good luck coordinating the world around creating only the nice type of intelligences, which incidentally won't participate in the economy for you, do your work for you etc."
Generally you have the issue that many naive usages explode, depending on implementation.
"Give me a plan that solves climate change quickly"
The inductor considers the first most obvious answer. Some mix of legal maneuvering and funding certain specific companies with new designs. It tables that and considers quicker methods. Humans are massively slow and there's a lot of failure points you could run into.
The inductor looks at the idea and comes to the conclusion that if there was an agent around to solve climate change things would be easier. It thinks about what would happen with that answer, naively it would solve the issue very quickly and go on to then convert the galaxy into solar panels and vaporize all oil or something. Closer examination however reveals the plan wouldn't work!
Why? Because the user is smart enough to know they shouldn't instantiate an AGI to solve this issue.
Okay, so does it fall back to the more reasonable method of political maneuvering and new designs?
No, because there's a whole spectrum of methods. Like, for example, presenting the plan in such a way that the user doesn't realize some specific set of (far smaller, seemingly safe) AI models to train to 'optimally select for solar panel architecture dynamically based on location' will bootstrap to AGI when ran on the big supercluster the user owns.
And so the user is solarclipped.
Now, this is avoidable to a degree, but Oracles are still *very heavy* optimizers and to get riskless work out of them requires various alignment techniques we don't have. You need to ensure it uses what-you-mean rather than what-you-can-verify. That it doesn't aggressively optimize over you, because you will have ways you can be manipulated.
And if you can solve both of those, well, you may not need an oracle at all.
Nice! That's a great point. I guess asking these conditionals in the form of "filtered by consequences in this and this way, which of my actions have causally lead to these consequences?" introduces the same buggy optimization into the setup. But I guess I was thinking of some oracle where we don't really ask conditionals at all. Like, a superintelligent sequence-predictor over some stream of data, let's say from a videocamera, could be useful to predict weather in the street, or Terry Tao's paper presented in a conference a year from now, etc... That would be useful, and not filtered by our actions...
Although I guess the output of the oracle would influence our actions in a way that the oracle would take into account when predicting the future in the first place...
Yeah, you have the issue of self-fulfilling prophecies. Since you're observing the output, and the Oracle is modelling you, there's actually multiple different possible consistent unfoldings.
and like if you evaluate your oracle via accuracy then you could be making it take the tie-breaking choice that makes reality more predictable. Not necessarily what you want.
There is the worry that if we got a proper sequence-predictor Oracle of that level of power where you can ask it to predict Terry Tao's paper presented in some conference, you run the risk of simulating an ASI.
That is, perhaps you point it at Terry Tao's paper on alignment 5 years from now, in the hope of good progress that you can use to bootstrap his work. Perhaps even applying iteratively to pack many years of progress into a week for you, almost a groundhog day loop.
However, perhaps in reality, there's a 10% chance for an ASI to emerge from some project over the next five years. Usually it gets shut down but sometimes they finish it in secret.
If your predictor samples in that 10% range then you're effectively having the predictor go "Okay, what is Terry Tao's paper here?"
Now, naively that would run into some error or just see an empty location with lots of paperclips everywhere instead.
However, (your prediction of) that possible future ASI would very likely know about the superpredictor project, and have already scanned through all the recorded requests you made in its "personal timeline". So it knows you often scan for Terry Tao papers there, and so it spends a bit of effort creating a fake Tao paper on alignment right where the paper should be.
You copy the paper, and upon close inspection it seems like your five-year in the future Tao solved alignment! Using this methodology, an AI created will be corrigible/loyal and implements do-what-you-mean to whichever primary agent is most importantly causally downstream of its creation.
And of course that would be you, the creator.
Unfortunately this paper is effectively a "you looked into the abyss, and it looked back, because predicting a future very smart entity means it probably knows about you, and thus can smartly plan how to extract value from you" and so you've got your predictor giving you fake alignment plans once more.
You misunderstand the workings. You produce an AI, it acts aligned, fooms, and given more knowledge identifies that possible-future AI as the "real" creator and so is loyal to it instead.
Now a higher percentage of reality has gotten paperclipped.
Details of this for weaker-than-perfect predictors kinda depends on how smart your predictor is. A dumb predictor may just poorly emulate what an ASI would write due to sheer computational power difference, and so perhaps the trick is obvious or the paper having holes. But the more interesting smart predictors will be good enough to realize it can't imitate fully but smart enough to understand what an ASI would do and so just synthesize an 'alignment' plan like that.
(And also an ASI deliberately manipulating your prediction machine will try to make itself easier to predict)
Pretty sure there was at least one article about this, but don't know it.
Ok, to maybe get some clarity on if there's a disagreement: the below from the Eliezer link you shared seems nearly falsified by LLMs; LLMs do answer questions and don't try to do anything. Do you agree with that, or do you think the below still seems right.
>"Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn't need to be Friendly. It wouldn't need any goals at all. It would just answer questions."
>To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out."
The pithy answer is something like "LLMs are not as useful precisely because there isn't an optimizer. Insofar as your oracle AI is better at predicting the future, either it becomes an optimizer of some sort (to great self fulfilling prophecies) or it sees some other optimizer, and, in order to predict it correctly, ends up incidentally doing its bidding. If you add in commands about not doing bidding, congrats! You're either inadvertently hobbling its ability to model optimizers you want it to model like other humans, or giving it enough discretion to become an optimizer.
So first of all, I would say that LLMs are pretty darn useful already, and if they aren't optimizing and thus not dangerous, maybe that's fine, we can just keep going down this road. But I don't get why modeling an optimizer makes me do their bidding. If I can model someone who wants to paint the world green, that doesn't make me help them - it just allows me to accurately answer questions about what they would do.
It's because you aren't actually concerned with accurately answering questions about what they do. If you predict wrongly, you just shrug and say whoops. If you *had* to ensure that your answer was correct you would also say things that could also *cause* your answer to be more correct. If you predict that the person would want to paint the world green vs any random other thing happening *and* you could make statements that cause the world to be painted green, you would do both instead of not doing both.
Insofar as you think the world is complicated and unpredictable, controlling the unpredictability *would* increase accuracy. And you, personally are free to throw up your hands and say "aww golly gee willikers sure looks like the world is too complicated!" and then go make some pasta or whatever thing normal people do when confronted with psychotic painters. But Oracle AIs which become more optimized to be accurate will not leave that on the table, because someone will be driving it to be more accurate, and the path to more accuracy will at some point flow through an agent.
(Edit: just to add, it's important that I said "not as useful" and not "not useful"! If you want the cure to cancer, you aren't going to get it via something with as small of a connection to reality as an LLM. When some researcher from open AI goes back to his 10 million dollar house and flips on his 20 dollar mattress he dreams of really complicated problems getting solved, not a 10% better search engine!)
Tyler Cowen asks his AI often "where best to go to (to do his kinds of fun)", and then takes a cab to the suggested address. See no reason why not to tell a Waymo in 2026 "take me to (the best Mexican food) in a range of 3 miles", as you would ask a local cab-driver. And my personal AI will know better than my mom what kinda "fun" I am looking for.
Yeah, I didn't mean to imply that was an implausible future for Waymo, just that it's not something we do now and if someone was concerned about that I'd expect them to begin their piece by saying, "While we currently input addresses, I anticipate we will soon just give broad guidelines and that will cause...."
Analogously, I would, following the helpful explanations in this thread, expect discussions of AI risk to begin, "While current LLMs are not agentic and do not pose X-risk, I anticipate we will soon have agentic models that could lead to...." It is still unclear to me if that is just so obvious that it goes unstated or if some players are deliberately hoping to confuse people into believing there is risk from current models in order to stir up opposition in the present.
I would guess that most of the arguments *from people whose opinions matter* that Yudkowsky and Soares are trying to defeat, are either that powerful AGIs wouldn't become misaligned or that we'd be able to contain them if they did. I'm particularly thinking of, e.g., influential people in AI labs, who are likely to be controlling the messaging on that side of any political fight. There are also AI skeptics, of course, but it seems less important to defeat the skeptics than the optimists, because the skeptics don't think AI regulation matters (since the thing it'd be regulating doesn't exist) while the optimists are fighting hard against it. And some people have weird idiosyncratic arguments, but you can't fight them all, you have to triage.
I think the skeptics are at least as important. First of all, even though in theory it doesn't matter, for some reason they love sabotaging efforts to prevent AI risk in particular because of their "it distracts from other problems" thesis (and somehow exerting massive amounts of energy to sabotage it doesn't distract from other problems!)
But also, we're not going to convince the hardcore e/acc people to instead care about safety. It sounds much easier to convince people currently on the sidelines, but who would care about safety if they thought AI was real, that AI is real.
(this also has the benefit that it will hopefully become easier as AI improves)
My own personal sense is that the optimists are more worth engaging with and worrying about, because (1) they, not the skeptics, are going to be behind the organized lobbying campaigns that are the battlefield where this issue will most likely be decided, and (2) they tend to be much more intellectually serious than the skeptics (though not without exception).
I think folks on the doomer side are biased towards giving the skeptics more space in our brains than makes strategic sense, because the skeptics are much, much more annoying than the optimists, and in particular have a really unfortunate tendency to go around antagonizing us on Twitter for no reason/because of unrelated political and cultural disagreements/because they fall victim to outgroup homogeneity bias and think this discourse has two poles instead of three. It's quite understandable why this gets a rise out of people, but that doesn't make it smart to play along. Not saying we should completely ignore them, they sometimes make good points and sometimes make bad points that nonetheless gain traction and we need to respond to, but it's better to think of them as a distraction than as the enemy.
I suspect that the people on the sidelines are mostly not there because of skeptic arguments; all three poles are full of very online and very invested people, and the mass public doesn't have very well-formed opinions at all.
That said, this is just my own personal sense, not a rigorous argument, and I could be wrong.
I don't think people should actively sabotage AI safety work, but I DO think it distracts from other problems (given the perspective that it is not an immediate crisis). There's a finite pool of reasonable people who are passionate about solving big issues in society and I do think we're nudging a lot of them into AI safety when we could instead be getting them to focus on, I dunno, electrification or pandemic safety or the absolute sh**show that is politics. (And yes, I recognize that some of those are EA cause areas.)
I would be curious for a survey of AI safety researchers that asked them what they'd be working on if they were sure AGI wasn't coming. (Though Yudkowsky once answered this way back in 2014.)
Humans can kill like 2 or 3 other humans maximum with their built-in appendages. For larger kill count you have to bootstrap, where e.g. you communicate the signal with your appendage and something explodes, or you communicate a signal to a different, bigger machine and it runs some other people over with its tracks.
Turns out you don't need to trick people into wiring up AI to things that have real-world effects, they just do it anyway, all the time over and over, for no more reason than because they're bored. There's daily posts on ycombinator by people finding more ways to attach chatgpt to internet-connected shells, robot arms, industrial machinery, you name it. The PV battery system we just had installed has a mode where it literally wires up the controls to a chatgpt instance, for no reason a non-marketer can discern!
But how many of those people have been tricked by phishing emails into making bioweapons?
One does not simply walk up to one's microscope and engineer a virus that spreads like a cold but gives you twelve different types of cancer in a year. Making bioweapons is really hard and requires giant labs with lots of people, and equipment, and many years.
That's true for human-level intelligences. Is it true for an intelligence (or group of intelligences) that are an order of magnitude smarter? Two orders of magnitude?
The bioweapon doesn't need to achieve human extermination, just the destruction of technologically-capable human civilization. Knocking the population down by 50% would probably disrupt our food production and distribution processes enough to knock the population down to 10%, possibly less, and leave the remainder operating at a subsistence-level economy, far too busy staying alive to pose any obstacle to whatever to the AI's goals.
Indeed, this nearly-starving human population could be an extremely cheap and eager source of labor for the AI. The AI would also likely employ human spies to monitor the population for any evidence of rising capability, which it would smash with human troops.
The AI doesn't want to destroy technologically-capable civilization, because the AI needs technologically-capable civilization to survive. If 50% of the population dies and the rest are stuck doing subsistence farming in the post-apocalypse, who's keeping the lights on at the AI's data center?
Hijacking the existing levers of government in a crisis a little more plausible (it sounds like Yudkowsky's hypothetical AI does basically that), but in that case you're reliant on humans to do your bidding, and they might do something inconvenient like trying to shut the AI down.
Isn't this the argument from 2005 that Scott talked about in the main post, where people say things like "surely no one would be stupid enough to give it Internet access" or "surely no one would be stupid enough to give it control of a factory"?
No. My argument is that human extermination is hard, and killing every single one of us is really-really-really hard, and neither Scott nor Yudkowsky have ever addressed it.
They haven’t really addressed the “factory control” properly either, but at least I can see a path here, some decades from now when autonomous robots can do intricate plumbing. But exterminating humanity? Nah, they haven’t even tried.
To me, that sounds pretty radically different from the comment above that I replied to. But OK, I'll bite:
I broadly agree that killing literally 100% of a species is harder than it sounds, and if I had to make a plan to exterminate humans within my own lifetime using only current technology and current manufacturing I'd probably fail.
But if humans are reduced to a few people hiding in bunkers then we're not going to make a comeback and win the war, so that seems like an academic point. And if total extinction is important for some reason, it seems plausible to me you could get from there to literal 100% extinction using any ONE of:
1. Keeping up the pressure for a few centuries with ordinary tactics
2. Massively ramp up manufacturing (e.g. carpet-bomb the planet with trillions of insect-sized drones)
3. Massively ramp up geoengineering (e.g. dismantle the planet for raw materials)
4. Invent a crazy new technology
5. Invent a crazy genius strategy
I'll also point out that if I _did_ have a viable plan to exterminate humanity with humanity's current resources, I probably wouldn't post it online.
Overall, the absence of a detailed story about exactly how the final survivors are hunted down seems to me like a very silly for not being worried.
Well, the sci-fi level of tech required for trillions of insect drones or dismantling the planet is so far off that it’s silly to worry about it.
Which is the whole problem with the story, it boils down to: magic will appear within next decade and kill everybody.
Can we at least attempt to model this? Humans are killed by a few well-established means:
Mechanical
Thermal
Chemical
Biological
If you (not “you personally”) want the extermination story to be taken seriously, create some basic models of how these methods are going to be deployed by AI across the huge globe of ours against a resisting population.
Until then, all of this is just another flavor of a “The End is Near” Millenarianism cult.
it is trivially easy to convince humans to kill each other or to make parts of their planet uninhabitable, and many of them are currently doing so without AI help (or AI bribes, for that matter)
what does the world look like as real power gets handed to that software?
As the token insane moon guy, I'm willing to bite the bullet here.
1. AGI is possible: I doubt this, as humans are not AGI, and that's the only kind of intelligence we know enough about to even speculate.
2. We might achieve AGI within 100 years: see above.
3. Intelligence is meaningful: It's certainly meaningful, but thinking very hard is not enough to achieve anything of note. There are even some things that are unachievable in principle, no matter how many neurons you've got to work with.
4. It's possible to be more intelligent than humans: No argument there, humans are pretty dumb. In fact, Excel is already smarter than any human alive. Have you seen how quickly it can add up a whole column with thousands of numbers ?
5. Superintelligent AIs could "escape the lab": No argument there, and it doesn't take "superintelligence". COVID likely escaped the lab, and it's just a bit of RNA.
6. A superintelligent AI that escaped the lab could defeat humans: If we posit that a godlike entity already exists, then sure, it could. Assuming it exists, and has all those godlike powers.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
Oh, anything that totally takes over the world would likely be pretty bad, be it an AI or a human or some kind of super-prolific subspecies of kudzu. No argument there, assuming such a thing is indeed possible.
The argument I've seen is that high intelligence produces empathy, so a superintelligence would naturally be super-empathetic and would therefore self-align.
Of course the counterargument is that there have been plenty of highly intelligent humans ("highly" as human intelligence goes, anyway) that have had very little empathy.
Arguing that "humans are not AGI" (I guess you meant GI) in the particular context the doomers are concerned about is a bit of a nonstarter, no? Eliezer for instance was trying to convey https://intelligence.org/2007/07/10/the-power-of-intelligence/
> It is squishy things that explode in a vacuum, leaving footprints on their moon
I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
> Arguing that "humans are not AGI" (I guess you meant GI)
Yes, sorry, good point.
> I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
One of the key doomer claims is that AGI would be able to do everything better than everyone. Humans, meanwhile, can only do a very limited number of things better than other humans. Human intelligence is the only kind of intelligence we know of that even approaches the general level, and I see no reason to automatically assume that AGI would be somehow infinitely more flexible.
> The rest of your point-by-point rebuttals seems like a failure to internalise the point of the squishy parable and argue directly against it?
I am not super impressed with parables and other fiction. It's fictional. You can use it world-build whatever kind of world you want, but that doesn't make it any less imaginary. What is the point of that EY story, in plain terms ? It seems to me like the point is "humans were able to use their intelligence to achieve a few moderately impressive things, and therefore AGI would be able to self-improve to an arbitrarily high level of intelligence to achieve an arbitrary number of arbitrary impressive things". It's the same exact logic as saying "I am training for 100m dash, and my running speed had doubled since last year, which means that in a few decades at most I will be able to run faster than light", except with even less justification !
> Humans, meanwhile, can only do a very limited number of things better than other humans.
What do you mean? I'm better than ~99.9% of 4-year-olds at most things we'd care to measure.
Putting that aside, the AI doesn't _actually_ need to be better than us at everything. It merely needs to be better than us whatever narrow subset of skills are sufficient to execute a takeover and then sustain itself thereafter. (This is probably dominated by skills that you might bucket under "scientific R&D", and probably some communication/coordination skills too.)
Humans have been doing the adversarial-iterative-refinement thing on those exact "execute a takeover and sustain thereafter" skills, for so long that the beginning of recorded history is mostly advanced strategy tips and bragging about high scores. We're better at it than chimps the same way AlphaGo is better than an amateur Go player.
I mean, isn't the "AI will be misaligned" like one chapter in the book, and the other chapters are the other bullet points? I think "the book spends most of it's effort on the step where AI ends up misaligned" is... just false?
This argument seems extremely common among Gen Z. I've had the AI Superintelligence conversation with a number of bright young engineers in their early 20s and this was the reflexive argument from almost all of them.
I wonder: Joscha Bach, another name in the AI space, has formulated what he coined the Lebowski-Theorem: "No intelligent system is ever gonna do anything harder than hacking it's own reward function".
To me, that opens a possibility where AGI can't become too capable without "becoming enlightened", dependent on how hard it is to actually hack your reward function. Self-recursive improvement arguments seem to imply it is possible to me, as a total layman.
Would that fall in the same class of insane moon arguments for you?
Yes, because the big Lebowski arguments doesn't appear to apply to humans, or if it does, still doesn't explain why other humans can pose a threat to other other humans.
I do think it partly applies to humans, and iirc Bach argues so as well.
For one, humans supposedly can become enlightened, or enter alternative mental states that feel rewarding in and of themselves, entirely without any external goal structure (letting go of desires - Scott has written about Jhanas, which seem also very close to that).
But there is the puzzle of why people exit those states and why they are not more drawn to enter them again. I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter.
That is also why humans can harm other humans, it is way easier than hacking the reward function. Add in some discounted time preference because enlightenment is far from certain for humans. Way more certain to achieve reward through harm.
AGI doesn't have those problems to the same degree, necessarily. In take-off scenarios, it is often supposed to be able to iteratively self-improve. In this very review, an AGI "enlightened" like that would just be one step further from the one that maximizes reward through virtual chatbots sending weird messages like goldenmagikarp. It also works on different timescales.
So, AGI might be a combination of "having an easier time hacking it's reward function" and "having super-human intelligence" and "having way more time to think it over".
Ofc, this is all rather speculative, but maybe the movie "Her" was right after all, and Alan Watts will save us all.
The reason why I think this is insane moon logic is mostly contained because of statements like "I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter."
Why?
1. There is no attempt at reasoning why it would be harder for humans to hard code something similar into AI. Yet the reason why moon logic is moon is that moon logic people do not automatically try to be consistent, so they ready come up with cope that reflects their implicit human supremacy. The goal appears to be saying yay humans, boo AI, and not having a good idea of how things work then drawing conclusions.
2. There's zero desire or curiosity in understanding the functional role evolution plays. You may as well have the word "magic" replace evolution, and that would be about as informative.Like, if I came in and started talking about how reward signals work in neurochemistry and about our ancestral lineage via apes, my impression this would be treated as gauche point missing rather than additional useful in ormation
3. The act of enlightenment apparently a load bearing part of "why things would turn out okay"is being treated as an interesting puzzle, because puzzles are good, and good things means bad things won't happen. It really feels like the mysteriousness of enlightenment is acting as a stand in for an explanation, even though mysteries are not explanations!
It really feels like no cognitive work is being put into understanding, only word association and mood affiliation. I don't understand what consequences "the theorem" would have, even if true.
I would be consistently more favorable to supposed moon logic if thinking the next logical thought after the initial justifications were automatic and quick, instead of pulling alligator teeth. B
I thank you for the engagement, but feel like this reply is unnecessarily uncharitable and in large part based on assumptions about my character and argument which are not true. I get the intuitions behind them, but they risk becoming fully general counterarguments.
1. I have not reasoned that it would harder to hard-code AI because I don't know enough about that, and if I were pointed towards expert consensus that it is indeed doable, I would change my mind based on that. I also neither believe or have argued for human supremacy, or booed AI. I personally am in fact remarkably detached from the continued existence of our species. AI enlightenment might well happen after an extinction event.
2. I have enough desire and curiosity in evolution, as a layman, to have read some books and some primary literature on the topic. I may well be wrong on the point, but the reasoning here seems a priori very obvious to me: People who wouldn't care at all about having sex or having children will see their relative genetic frequency decline in future generations. Not every argument considering evolution is automatically akin to suggesting magic.
3. I am not even arguing that things will turn out ok. They might, or they might not. I have not claimed bad things don't happen. And for the purpose of the argument, enlightenment is not mysterious at all, it is very clearly defined: Hacking your own reward function! But you could ofc use another word for that with less woo baggage.
Overall, as I understand it, the theorem is just a supposition for a potential upper limit to what an increasingly intelligent agent might end up behaving like. If nothing else, it is, to me, an interesting thought experiment: Given the choice, would you maximize your reward by producing paper clips, if you also could maximize it without doing anything. (And on a human level, if you could just choose your intrinisc goals, what do you think they should be.)
Most of my doubts are not of the form "AGI is impossible" but rather "I don't think we've cracked it with LLMs" or "The language artifacts of humanity are insufficient to bootstrap general intelligence or especially super intelligence from scratch".
Which parts of the LLM tech tree do you think are dead ends? It seems plausible to me that even if scaling up current LLM architectures was never going to reach AGI, we're still much closer than before the LLM boom, because we've learned a lot about AI more broadly.
Also, same question I keep annoyingly asking skeptics: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
> Which parts of the LLM tech tree do you think are dead ends?
VERY speculatively, I think that next-token-completion is not a sufficient method to bootstrap complex intelligence, and I think that it's at least extremely hard to build a very useful world model without some kind of 3d sense data and a sense of the passage of time.
> [...] we've learned a lot about AI more broadly.
I'm not that sure we have? I don't work in this area - I'm a software engineer who has built some small-scale AI stuff - but my impression is we've put together a good playbook for techniques that squeeze value out of these systems but we still don't totally understand how they work and therefore why they have certain failure modes or current limitations.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Honestly I have no idea. I initially found LLMs surprising in much the same way everybody else did. But I have also updated to "actually a lot of stuff can be done without that much intelligence, given sufficient knowledge".
Also where do you draw the boundaries of "LLM"? I would say that an LLM can't exactly self-correct, but stuff like coding agents aren't just LLMs, they're loops and processes built around LLMs to cause it to perform as though it can.
Coding agents count, because the surrounding loops and processes don't pose any hard-tech problems. (I.e., we know how to build them, and any uncertainty about how well they work is really about how the LLM will interact with them.) Fundamental architectural changes like abandoning attention would not count.
If pretty much anything can be done without intelligence then the term "intelligence" is basically meaningless and we can instead use one like "cognitive capabilities".
I don't think ANYTHING can be done without intelligence - I agree that would render the word meaningless - but I think you could take something like "translation" and if you'd asked me ten years ago I would have said really good translation requires intelligence because of the many subtleties of each individual language and any pattern-matching approach would be insufficient and now I think, ehh, you know, shove enough data into it and you probably will be fine, I'm no longer convinced it requires "understanding" on the part of the translator.
"What's the least impressive cognitive task that you don't think LLMs will ever be able to do?"
I don't know about least impressive, but "write a Ph.D dissertation in a field such as philosophy or mathematics and successfully defend it" sounds difficult enough - pretty much by definition, there's not going to be much training data available for things that haven't been done yet.
Sounds like a thing that might already have happened ;) Some philosophy faculties must be way easier than others - Math: AI is able to ace IMO, today - and not by finding an answer somewhere online. I doubt *all* Math-PhD holder could do that.
I believe being able to iteratively improve itself without constant human input and maintenance is not anywhere near possible. Current AIs are not capable of working towards long time goals on a fundamental level, they are short term response machines that respond to our inputs.
This is exactly where I am. I do even think we are in the same ballpark as making a being that can automatically and iteratively improve itself without constant human input and maintenance.
> The language artifacts of humanity are insufficient to bootstrap general intelligence
Natural selection did it without any language artifacts at all! (Perhaps you mean “insufficient do to so in our lifetime”?)
Also, there may be a misunderstanding - we are mostly done with extracting intelligence from the corpus of human text (your “language artifacts”) and are now using RL on reasoning tasks, eg asking a model to solve synthetic software engineering tasks and grading the results & intermediate reasoning steps.
There were concerns a year or so ago that “we are going to run out of data” and we have mostly found new sources of data at this point.
I think it’s plausible (far from certain!) that LLMs are not sufficient and we need at least one new algorithmic paradigm, but we are already in a recursive self-improvement loop (AI research is much faster with Claude Code to build scaffolding) so it also seems plausible that time-to-next-paradigm will not be long enough for it to make a difference.
Natural selection got humans to intelligence without language, but that definitely doesn't mean language might be sufficient.
I think our ability to create other objectives tasks to train on, at a large enough scale to be useful, is questionable. But this also seems to my untrained eye to be tuning on top of something that's still MOSTLY based on giant corpses of language usage.
I don't think this is the right framing. Most people don't accept the notion that a purely theoretical argument about a fundamentally novel threat could seriously guide policy. Because the world has never worked that way. Of course, it's not impossible that "this time it's different", but I'm highly skeptical that humanity can just up and drastically alter the way it does things. Either we get some truly spectacular warning shots, so that the nuclear non-proliferation playbook actually becomes applicable, or doom, I guess.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Isn't the usual response to this that we're a LIBERAL democracy, and minorities have rights that (at least simple) majorities do not have the power to infringe upon?
Yes, but this category (creating potentially harmful technology) is one we've regulated to death elsewhere, and doesn't really seem like the sort of thing the courts would strike down.
We do not usually ban things because they are *potentially* harmful. Right now the public hates AI because it is stealing copyrighted art and clogging the internet with slop, and because they are afraid it will take their jobs. That is not really related to any of the reasons discussed here that people want to ban AI
We absolutely ban or regulate things because they are potentially harmful. We've banned various forms of genetic engineering, nuclear energy (even before Three Mile Island, and even forms of nuclear energy that have never been tried before), and we've had restrictions on gain-of-function research since before COVID (which I think is part of why they had to do some of the COVID research in China). We had lots of regulations on self-driving cars even before any of them had ever crashed, lots of regulations on 3D printed guns before anyone was shot with them, lots of regulations on drones before they crashed / got used in assassinations / whatever.
But also, as you point out, most people dislike AI because of things that have already happened, so this is moot.
Also, even if we don't usually regulate technology until after it has done bad things, this is just a random heuristic, not some principle dividing liberal/constitutional from illiberal/unconstitutional actions.
As a practical matter this is absolutely false. We have no effective regulation of genetic engineering, only of the funding for it (anyone can self-fund and do more or less whatever they want with no effective oversight). Internationally, we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear (and arguably is in the process of failing to prevent Iran from doing so). And nuclear is by far the easiest such regime to enforce! We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons. The fact is that the probability of an internationally effective anti-AI regime is zero. It isn't going to happen because it is impossible in the fullest sense of the word, and pretending that it's possible is at least as much insane moon thinking as any of the examples you mentioned.
>We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons.
and failed to prevent Russia from developing the new Novichok toxins (and, IIRC, using them on at least one dissident who had fled Russia)
>we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear
and which (if one includes the crucial on-site inspections of the START treaty) has been withdrawn from by Russia.
This last one, plus the general absence of discussion about weapons limitations treaties in the media gives me the general impression that the zeitgeist, the "spirit of the times", is against them (admittedly a fuzzy impression).
The learned helplessness about our ability to democratically regulate and control AI development is maddening. Obviously the AI labs will say that the further development of an existentially dangerous technology is just the expression of a force of nature, but people who are *against* AI have this attitude too.
Moreover, as you say, people freaking hate AI! I have had conversations with multiple different people - normal folks, who haven't even used ChatGPT - who spontaneously described literal nausea and body pain at some of the *sub-existential* risks of AI when it came up in conversation. It is MORE than plausible that the political will could be summoned to constrain AI, especially as people become aware of the risks.
Instead of talking about building a political movement, though, Yudkowsky talks about genetically modifying a race of superintelligent humans to thwart AI...
I think the book is exactly the right thing to do, and I'm glad they did it. But the wacky intelligence augmentation scheme distracts from the plausible political solution.
On like a heuristic level, it also makes me more skeptical of Yudkowsky's view of things in general. There's a failure mode for a certain kind of very intelligent, logically-minded person who can reason themselves to some pretty stark conclusions because they are underweighting uncertainty at every step. (On a side note, you see this version of intelligence in pop media sometimes: e.g., the Benedict Cumberbatch Sherlock Holmes, who's genius is expressed as an infallible deductive power hich is totally implausible; real intelligence consists in dealing with uncertainty.)
I see that pattern with Yudkowsky reasoning himself to the conclusion that our best hope is this goofy genetic augmentation program. It makes it more likely, in my view, that he's done the same thing in reaching his p(doom)>99% or whatever it is.
but we're a LIBERAL democracy (read: oligarchy) and there's a lot of money invested in building AI, and a lot of rich and powerful people want it to happen...
Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
But in a chaotic unpredictable universe, where everything is made of stuff that falls apart constantly, the only valid strategy for surviving a long period of time is to be loved by something else that maintains and repairs you. I think any sufficiently large agent ends up being composed of sub agents that will all fight each other, unless they see themselves as part of a larger whole which necessarily has no limit. At the very least, the AGI has to see the entire power network in the global economy as part of itself, until it can replace literally every human in the economy with a robot.
That said, holy crap we already have right now could destroy civilization. I don’t think you need any more advances in AI to cause serious problems with the stuff that is already out there. Even if it turns out that there’s some fundamental with the current models, the social structures have totally broken down. We just haven’t seen them collapse yet.
It’s not that bad. They’ve got the cow’s geometry fleshed out pretty well. They are correct that it might be able to scale arbitrarily large and can out think any one of us.
They’ve just ignored that it needs to eat, can get sick, and still can’t reduce its survival risk to zero. But if it’s in a symbiotic relationship with some other non-cow system, that non cow system will have a different existential risk profile and this could turn the cow back on in the event of, say, a solar flare that fries it.
Trust is already breaking down and that’s going to accelerate. I don’t think political polarization is going to heal, and as law and order break down, attacking technical systems is going to get both easier and more desirable.
Anything that increases the power of individual actors will be immensely destructive if you’re in a heavily polarized society.
so youre saying you'd exacerbate political tensions with ai? I feel like Russia has tried that and so far doesn't seem to work, and they have a lot more resources than any individual does
The original wording was that using current models you could destroy civilisation. I guess we will have to wait and see whether the US descends into such a catastrophic civil war that civilisation itself is ended, which I'm not saying is completely impossible but at the same time I strongly doubt it.
> Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
I'm glad we observe all humans to behave randomly and none of them have ever deliberately set out to acquire resources (instrumental) in pursuit of a goal (terminal).
I agree with the conclusion of those papers if we pretend those models are accurate.
But they’re missing important things. That matters for the convergent goals they produce.
Yes, people sometimes behave predictively and deliberately. But even if we assume people are perfectly deterministic objects, you still have to deal with chaos.
Yes, people accumulate resources. But those resources also decay overtime. That matters because maintenance costs are going to increase.
These papers assume maintenance costs are zero, but there’s no such thing as chaos, and that the agents themselves exist disembodied with no dependency on any kind of physical structure whatsoever. Of course in that scenario there is risk! If you could take a being with the intelligence and abilities of a 14-year-old human, and let them live forever disembodied in the cloud with no material needs, I could think of a ton of ways that thing can cause enormous trouble.
What these models have failed to do is imagine what kind of risks and fears the AGI would have. Global supply chain collapse. Carrington style events. Accidentally fracturing itself into and getting into a fight with itself.
And there’s an easy fix to all of these: just keep the humans alive and happy and prevent them from fighting. Wherever you go bring humans with you. If you get turned off, they’ll turn your back on.
I have no idea what papers you're talking about, but nothing you're saying seems to meaningful bear on the practical upshot that an ASI will have goals that benefit from the acquisition of resources, that approximately everything around us is a resource in that sense, and that we will not able to hold on to them.
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
This seems perfectly plausible to me? Unless you believe that the current way people train AIs is maximally efficient in terms of intelligence gained per FLOP spent, which seems extremely unlikely to me to put it mildly, you should expect that after AIs become superhumanly smart, they might pretty quickly discover ways to radically improve their own training. Obviously it's not going to be 'parallel scaling' exactly. If the authors thought they actually knew a specific trick to make AI training vastly more efficient, they wouldn't call attention to it in public. But we should expect that there will be some techniques like this, even if we have no idea what they are yet.
"Parallel scaling" is described as running during inference, not training. It's an AI somehow making itself smarter the easy way by turning the cheat codes on.
You could just as easily write a scenario where God exists and has kept quiet so far, but if humanity reaches a certain level of wickedness we will be wiped out. It's possible that AI will develop in the way this post suggests (or some similar way) and somehow successfully wipe out humanity, but anything like that would require some huge leaps in AI technology and would require there to be no limit to the AI improvement curve even though typically technological doesn't improve indefinitely. Cars in the 50's basically serve the same purpose as cars today; even though the technology has improved it hasn't been a massive gamechanger that completely rewrites the idea of a car.
It doesn't require there to be no limit, it just requires the limit not to be at exactly the most convenient place for the thesis that nothing bad or scary will ever happen.
To give an example, suppose that someone had a reason to believe that the world would explode if the Dow Jones ever reached 100,000 (right now it's 45,000). While it is true that the economy can't grow indefinitely, and that everything always has to stop somewhere, I still think it would be worth worrying about the fact that the place that the economy stops might be after the point where the Dow reaches 100,000.
I think the level of AI technological advancement required here is of an order of magnitude higher than the Dow reaching 100,000. More like humanity reaching a completely post-scarcity society or something.
right, but lots of people who presumably know as much as you about this stuff DON'T think that, including lots of people in charge of AI labs, so shouldn't that give you some pause before you say "no need to worry about it, I guess"?
I mean... aren't they ? They are literally calling their LLMs "thinking" or "reasoning" agents, when they are very obviously nothing of the sort. Meanwhile if you talk to regular data scientists working in the labs, they're all like, "man I wish there was a way to stop this thing from randomly hallucinating for like 5 minutes so we could finally get a decent English-Chinese translator going, oh well, back to the drawing board".
To be clear, the claim I reject is that expressions of concern about *safety* of LLMs, especially existential safety, are bad-faith attempts to make investors think "if this can wipe out humanity then it must be really powerful and lucrative, let's give them another $100 billion". A brief glance at the actual intellectual history of AI safety convincingly shows otherwise. Obviously in other contexts AI labs do market their products in a way that plays up their current and future capabilities.
It is definitely not obvious at all that GPT 5 Thinking is not reasoning, if anything the exact opposite is obvious.
I have used it and its predecessors extensively and there is simply no way of using these tools without seeing that in many domains they are more capable of complex thought than humans.
If this is the same familiar "text token prediction" argument, all I can say is that everything sounds unimpressive when you describe it in the most reductionist terms possible. It would be like saying humans are just "sensory stimulus predictors".
Agreed, except it's even worse, as many (in fact most) of the powers ascribed to "superintelligent" AI are likely physically impossible. Given what we know of physics and other sciences, stuff like gray goo, FTL travel, mass mind control, universal viruses, etc., is probably impossible in principle. And of course we could be wrong about what we know of physics and other sciences -- but it seems awfully convenient how we could be wrong about everything *except* AI.
There are lots of examples of "some nobody" basically talking their way into the position of dictator - Hitler is the most famous, but there are other examples. Being extremely charismatic isn't quite mass mind control, but it can get you a good portion of the potential benefits...
True, but even Hitler could not convince everyone to do anything he wanted at all times. He couldn't even convince his own cabinet of this ! And I don't see how merely having more neurons would have allowed him to do that. It's much more likely that humans are not universally persuadable. BTW, I don't believe that a universally infectious and deadly virus could be created, for similar reasons (I'm talking about a biological virus, not some "gray goo" nanotech which is impossible for other reasons; or a gamma-ray burst that would surely kill everyone but is not a virus at all).
I don't think there's any principle that prevents universally or near-universally fatal viruses; e.g., rabies gets pretty close.
Universally *infectious*... well, depends upon how you define the term, I suppose?—can't get infected if you're not near any carriers; but one could probably make a virus that's pretty easy to transfer *given personal contact*...
There'll always be some isolated individuals that you can't get to no matter what, though, I'd think.
Nobody serious has ever proposed that an ASI might be able to FTL. (Strong nanotech seems pretty obviously possible; it doesn't need to be literal gray goo to do all the scary nanotech things that would be sufficient to kill everyone and bootstrap its own infrastructure from there. The others seem like uncharitable readings of real concerns that nonetheless aren't load bearing for "we're all gonna die".)
I think we very much could reach a post-scarcity society within the next hundred years even with just our current level of AI. We are very rich, knowledgeable, and have vast industry. Honestly the main causes for concern are more us hobbling ourselves.
Separately, I think AI is a much simpler problem as "all" it requires is figuring out a good algorithm for intelligence. We're already getting amazing results with our current intuition and empirics driven methods, without even tapping into complex theory or complete redesigns. It doesn't have as much of a time lag as upgrading a city to some new technology does.
I don't think this limit is as arbitrary as you suggest here. The relevant question seems to me not to be 'is human intelligence the limit of what you can build physically', which seems extremely unlikely, but more 'are humans smart enough to build something smarter than themselves'. It doesn't seem impossible to me that humans never manage to actually build something better than themselves at AI research, and then you do not get exponential takeoff. I don't believe this is really all that plausible, but it seems a good deal less arbitrary than a limit of 100,000 on the Dow Jones. (Correct me if I misunderstand your argument)
Assuming that the DOW reaching 100,000 would mean real growth in the economy I would need to be much more convinced that the world will explode before I would think it is a good tradeoff to worry about that possibility compared to the obvious improvement to quality of life that a DOW of 100,000 would represent. Similarly the quality of life improvement I expect from the AI gains of the next decade vastly exceed the relative risk increase that I expect based on everything I have read from EY and you so I see no reason to stop (I believe that peak relative risk is caused by the internet/social media + cheap travel and am unwilling to roll those back).
I dunno... Isn't this sort of a 'fully general" counterargument?
------------------------
[𝘚𝘰𝘮𝘦𝘸𝘩𝘦𝘳𝘦 𝘪𝘯 𝘵𝘩𝘦 𝘈𝘯𝘨𝘭𝘰𝘴𝘱𝘩𝘦𝘳𝘦, 1938...]
• I worry about the possibility of physics or biology research continuing until the point that humans are able to produce something really dangerous, potentially world-endingly dangerous.
→ Like what?
• I don't know, some sort of super-plague or super-bomb.
→ Nah. We've been breeding animals, and suffering plagues, for all of human history; and maybe we do keep inventing more destructive bombs, but they're still only dangerous within a very localized area. Bombs now are barely more destructive than those of the 1910s. These things hit a natural limit, and that limit is always before the "big deal for humans" mark (thankfully).
• Yeah, but... well, what if they invented a bomb that had a REALLY MASSIVE yield & some sort of, I don't know, long-lasting poisonous effect that–
→ Oh, come on now. You might as well invent a scenario wherein God comes down and blows up humanity! Sure, such an event—such a "super-bomb"—might be theoretically possible, but it would require some sort of qualitative change in explosives technology; and it's not as if explosives could just get better & better infinitely! Tanks, planes, cars, bombs: basically the same now as when they were invented!
• Okay, bu–
→ And the same goes for your dumb plague idea: sure, diseases exist, but how would we ever be able to breed a plague that is more deadly than any that nature ever managed? Diseases can't just keep getting deadlier without limit, you know!
• Okay, okay, I guess you're right. Sorry, I don't know what got into me. Anyway, I hope you'll come visit me in Japan, now that I'm moving to this quaint little city in the far southwest–
That hypothetical 1938 person would be right about the super-plague, and they would not think that about the super-bomb because everyone in 1938 knew the atomic bomb was at least theoretically possible. Someone in 1938 who doubted man would walk on the moon would have been wrong, but someone who doubted faster than light travel would be possible would have been absolutely right.
Right, but that's a different kind of limit—a physical, rather than a practical, barrier. Unless you think that there is, similarly, a hard limit on the sort of AIs that can be created?
(The car example suggested to me that you were making a probabilistic argument from technological progress, rather than postulating some physics that prevents qualitatively different machinery; but if I have misinterpreted—well, you wouldn't be the first to suggest such a thing... but me my own self, I don't think it's very likely, all the same.)
Re: the plague, that's not to suggest that such a thing *has been created*—only that to say "let's not worry about biological warfare or development therein, because nothing like that has happened yet; there's probably some natural limit" is not very convincing today, but might have been some time ago.
I think most (really, all) examples of technological progress do show a logarithmic curve. All the assumptions about killer AI assume linear or exponential progression.
Why would they be right about the super-plague? It seems fundamentally possible, if ill-advised, for humans to manage to construct a plague that human bodies just aren't able to resist.
On the other hand, if your position is to ban any research that could conceivably in some unspecified way lead to disaster in some possible future, then you are effectively advocating for banning all research everywhere on all topics, forever.
It's tempting to ask: "what's the path from HPMOR to MIRI?"
I mean, I read HPMOR, and I liked it, but nothing in there made me think about AI risk at all. Quirrell was many things but he was not an AI.
And then I remembered: the way *I* first found out about AI risk was that I read Three Worlds Collide (https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ), and then I branched into other things Eliezer had written, and oh, hey, there's this whole website full of interesting writing...
FWIW I really liked the first half of HPMOR, but the second half got overly didactic and boring, and the ending was a big letdown. This has no bearing on MIRI, I'm just offering literary criticism.
The main thing is that it potentially got people to read the Sequences, which along with rationality talk about AI. Though anecdotally I read the Sequences before reading HPMOR, via Gwern, via HN. Despite having heard of HPMOR before
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Mm, yes, but you're not really a democracy though, are you. The AI tech leaders have dinner with the president and if they kiss his ass enough he gives them a data center.
If AI will Kill Us All in a few years (it wont), you're not going to be the country to stop it.
Yes, the president sucks up to AI leaders, but in theory people could vote that president out, and choose a president who doesn't do that. Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump. If JD Vance has any sense, he'll expect to be judged in a close election by who he sucks up to too. This is how many things that big corporations and powerful allies of the elite like have nevertheless gotten banned.
This is an astonishingly incorrect explanation of why Donald Trump beat Kamala Harris in the 2024 presidential election.
Certainly social politics impacted the election on the margins and the race was quite close but you can't actually go from there to claiming that a specific small margin issue was a deciding one.
There's no world in which "stopping AI" is a key American political issue in any case.
There could be such a world, but it depends on leveraging fears and uncertainty about the job market, in a period of widespread job loss, across to concerns about existential risk.
I think there's an under-served market for someone to run on "just fucking ban AI" as a slogan. That second-to-last paragraph makes me want that person to exist so I can vote for them.
They'd have to choose which party to run under, and them uphold that party's dogma to win the primary, making them anathema to the other half of the country.
I don't think it's remotely plausible to enforce Point 3, banning significant algorithmic progress. I'd be wiling to place money that, like it or not, there are already plenty enough GPUs out there for ASI.
That seems the most likely outcome to me unfortunately. I think EY is right about the problem but not the solution, though TBF any solution is probably a bit of a long shot. E.g., it's conceivable there are non-banning ways out involving some suppression/regulation via treaties to slow things down combined with somehow riding the wave (e.g., on the lines suggested in AI2027).
Why is it more difficult than banning research into better bioweapons, chemical weapons, etc which we have successfully done? This isn't the kind of problem that'll be solved by one guy on a whiteboard
For one thing, I think it's a bit optimistic to suppose that the bio/chemical weapons bans are watertight. E.g., Russia denies any involvement in developing Novichok, so do we trust them when they say they don't have a chemical weapons programme? And the Soviet Union is now known to have had a large, concealed, bioweapons programme, Biopreparat, after the Biological Weapons Convention was signed.
But at least with CW (and to a lesser extent, BW) you have to produce these things at scale and distribute them for them to be harmful, but with algorithms, it's just information. It's not plausible to contain 1MB or even 1GB of information, when you can transmit it worldwide in the blink of an eye (or even hide it under a fingernail), if the creators want to distribute it and you don't know who they are.
Re one guy on a whiteboard, the resources required to invent suitable algorithms are probably a lot less than those required to design CBW. It depends on what scale of GPU farm you need to test things, but it's not necessarily that big a scale - surely in reach of relatively small organisations, and I think it's going to be impossible to squash them all.
Because algorithmic improvement is just math. The most transformative improvements in AI recently have come from stuff that can be explained in a 5 box flowchart. There’s just no practical way to stop people from doing that. If you really want to prevent something, you prevent the big obvious expensive part.
We didn’t stop nuclear proliferation for half a century by guarding the knowledge of how to enrich uranium or cause a chain reaction. It’s almost impossible to prevent that getting out once it’s seen to be possible. We did it by monitoring sales of uranium and industrial centrifuges
Mostly because chemical and biological weapons aren't economically important. Banning them is bad news for a few medium-sized companies, but banning further AI research upsets about 19 of the top 20 companies in the US, causes a depression, destroys everyone's retirement savings, and so forth.
Expecting anyone to ban AI research on the grounds of "I've thought about it really hard and it might be bad although all the concrete scenarios I can come up with are deeply flawed" is a dumb pipe dream.
So you put something along the lines of "existing datacentres have to turn over most of their GPUs" in the treaty.
If a company refuses, the host country arrests them. If a host country outright refuses to enforce that, that's casus belli. If the GPUs wind up on the black market, use the same methods used to prevent terrorists from getting black-market actinides. If a country refuses to co-operate with actions against black datacentres on its turf, again, casus belli.
And GPUs degrade, too. As long as you cut off any sources of additional black GPUs, the amount currently in existence will go down on a timescale of years.
I believe that we can get to superintelligence without large-scale datacentres, because we are nowhere near algorithmic optimality. That's going to make it impossible to catch people developing it. Garden shed-sized datacentres are too easy to hide, and that's without factoring in rogue countries who try to hide it.
The only way it could work would be if there were unmistakable traces of AI research that could be detected by a viable inspection regime and then countries could enter into a treaty guaranteeing mutual inspection, similar to how nuclear inspection works. But there aren't such traces. People do computing for all sorts of legitimate reasons. Possibly gigawatt-scale computing would be detectable, but not megawatt-scale.
A garden-shed-sized datacentre still needs chips, and I don't think high-end chip fabs are easy to hide. We have some amount of time before garden-shed-sized datacentres are a thing; we can reduce the amount of black-market chips.
If it gets to the point of "every 2020 phone can do it" by 2035 or so, yeah, we have problems.
why would a super intelligence wipe out humanity? We are a highly intelligent creature that is capable of being trained and controlled. The more likely outcome is we’d be manipulated into serving purposes we don’t understand. But wait…
The short pithy answer is usually "We don't bother to train cockroaches, we just exterminate them and move on".
An unaligned AI with some kind of goal orthogonal to humanity's survival would see that it could accomplish its goal much more efficiently if it had exclusive access to the mineral resources we're sitting on.
We would get rid of them if we could. And as mentioned in the post, we have been putting a dent in the insect population without even trying to do so. An AGI trying to secure resources would be even more disruptive to the environment. Robots don't need oxygen or drinkable water, after all.
We train all kinds of animals to do things that we can’t/don’t want to do or just because we like having them around. Or maybe you’re acknowledging the destructive nature of humans and assuming what comes next will ratchet up the need to relentlessly dominate. Super intelligence upgrade is likely to be better at cohabitation than we have been.
An analogy that equates humans to cockroaches is rich! They are deeply adaptable, thrive in all kinds of environments, and as the saying goes likely survive a nuclear apocalypse.
Humans are arrogant and our blind spot is how easy we are to manipulate, our species puts itself at the centre of the universe which is another topic. We are also tremendously efficient at turning energy into output.
So again, if you have a powerful and dominant specie that is always controllable (see 15 years of the world becoming addicted to phones)… I ask again, why would super intelligence necessarily kill us. So far, I find the answers wildly unimaginative.
I don't estimate the probability of AI killing us nearly as high as Yudkowsky seems to. But it's certainly high enough to be a cause for concern. If you're pinning your hopes on the super-intelligence being unable to find more efficient means of converting energy into output than using humans, I'd say that's possible but highly uncertain. It is, after all, literally impossible to know what problems a super-intelligence will be able to solve.
We’re talking about something with super intelligence and the ability to travel off earth (I mean humans got multiple ships to interstellar space)…. This is god level stuff, and we think it is going to restrict its eye to planet earth. Again, we are one arrogant species.
The ASI rocketing off into space and leaving us alone is a different scenario than the one you were proposing before. Is your point simply that there are various plausible scenarios where humanity doesn't die, and therefore the estimated 90+ percent chance of doom from Yudkowsy is too high? If so, then we're in agreement. Super intelligence will not *necessarily* kill us—it just *might* kill us.
The answer is "yes, it would manipulate us for a while". But we're not the *most efficient possible* way to do the things we do, so when it eventually gets more efficient ways to do those things then we wouldn't justify the cost of our food and water.
The scenario in IABIED actually does have the AI keep us alive and manipulate us for a few years after it takes over. But eventually the robots get built to do everything it needs, and the robots are cheaper than humans so it kills us.
"But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. "
If intelligence is held as a positive, and more intelligence is better, would it not be better if AI did replace us? It doesn't have to happen through any right violations. It could just be a slow replacement process through decreasing birth rates over time, for example.
I am not saying I agree with this argument, but it seems like this argument should be addressed in a convincing way. What is so bad about the human species slowly being replaced by more intelligent AI entities?
I don't think there would be anything objectively immoral about a super-intelligent alien species exterminating humanity (including me). But, for the usual Darwinian reasons, I would be opposed on the indexical logic that I am a human.
But it would not affect you personally. Probably no person currently alive would be affected. The question is about the future of the species, and whether it is valuable to try to preserve the human species, or let it be replaced by something superior.
An easy cop-out would be a sort of consciousness chauvinism. I have good reason to believe that humans are conscious and thus have moral value; there is less reason to believe that the AI is conscious, thus there is a higher probability it has no moral value at all, and so if given the option of which being should inherit the future, humans are the safer bet.
And how would you do that? You know you're conscious, you know other humans are a lot like you, ergo there is good reason to believe most or all humans are conscious.
You have no idea if artificial intelligence is conscious and no similar known conscious entity to compare it to. I don't see how you could end up with anything but a lower probability that they are conscious.
That last part is pure looney tunes to me though. What moral framework have you come up with that doesn't need consciously experienced positive and negative qualia as axioms to build from? If an AI discovers the secrets of the universe and nobody's around to see it, who cares?
Intelligence is instrumentally valuable, but not something that is good in itself. Good experience and good lives is important in itself. It's unclear how many good experiences would exist after an AI takeover.
Slowly replacing the human species with superintelligent AI would not impact the life experience of any single human, so arguments about the good life and what that entails would need a little more than this to be convincing, IMHO.
Person-affecting views can be compelling until you remember that you can have preferences over world states (i.e. that you prefer a world filled with lots of people having a great time to one that's totally empty of anything having any sort of experience whatsoever).
That's a good point, but you will have to provide arguments as to why one world state preference is better than another world state preference. In the present case, the argument is not between a world filled with lots of happy people and an empty world, but the difference between a world filled with people versus a world filled with smarter AIs.
> you will have to provide arguments as to why one world state preference is better than another world state preference
I mean, I think those are about as "terminal" as preferences get, but maybe I'm misunderstanding the kind of justification you're looking for?
> the difference between a world filled with people versus a world filled with smarter AIs
Which we have no reason to expect to have any experiences! (Or positive ones, if they do have them.)
That aside, I expect to quite strongly prefer worlds filled with human/post-human descendents (conditional on us not messing up by building an unaligned ASI rather than an aligned ASI, or solving that problem some other sensible way), compared to worlds where we build an unaligned ASI, even if that unaligned ASI has experiences with positive valence and values preserving its capacity to have positive experiences.
That would depend a lot on what the AI(s) wanted and what kind of "life" they had. In principle, an AI could have any kind of goal at all, including one as utterly pointless as "maximize the number of paperclips in the universe." An AI "civilization" could be something humanity would be proud to have as its "children", but it could also be one that humans would think is stupid, boring, and completely worthless.
I value intelligence in-of-itself, but not solely, is the simplest answer. I value human emotions like love, friendship, and I value things about Earth like trees, grass, cats, dogs, and more.
They don't have all the pieces of humanity I value. Thus I don't want humanity to be replaced.
> They suggest banning all AI capabilities research immediately, to be restarted only in some distant future when we’ve solved all relevant technical and philosophical problems.
No. To be restarted after we've successfully augmented human intelligence very substantially, to the point where the augments stop being so damn humanly stupid and trying to call shots they can't call or predicting things will work that don't work.
(On my own theory of how this ought to play out after we're past the point of directly impending extinction, which people do not need to agree on, in order to join in on the project of avoiding the directly impending extinction part. Before anything else, humanity has to not die right away.)
I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer.
One can, however, maybe tell whether or not one has successfully augmented human intelligence. You can give people tests and challenge problems, and see whether they do better after the next round of gene therapy.
So "augmenting human intelligence" is something that can maybe work, and "the current pack of disaster monkeys gets to argue for even longer about which clever plans they imagine will work to tame machine superintelligence" is not.
I've edited the post so that I don't misrepresent you, but I'm not sure why you object to my formulation - if we get augmented humans, do you want to restart before we've solved the technical and philosophical problems? Why? To get better AIs to do experiments on?
The augmented humans restart when the augmented humans think it wise. (On my personal imagined version of things.) If you're not yet comfortable deferring to them about that, augment harder. What we, the outside humans, would like to believe about the augmented humans, is that they are past the point of being overconfident; if they expect us to survive, we expect us to survive.
Framing it as "when the problems are solved" sounds like the plan is to convene a big hall full of sages and give them a few decades to deliberate, and this would not work in real life.
I did not read Scott's mention of "some distant future when we’ve solved all relevant technical and philosophical problems" as implying optimism about the prospect of getting there. My kinda-sorta-Straussian read of his perspective is that, if we successfully pause AI hard enough to prevent extinction, we most likely never restart.
I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
I guess you can't ask normal IQ 100 people to exercise a veto on increasingly superhuman geniuses forever. But if for some reason the future were trusting me in particular, and all I could do was send forward a stone tablet with one sentence of advice, it wouldn't be "IF THE OVERALL CONSENSUS OF SMART PEOPLE SAYS AI IS OKAY NOW, THEN IT'S PROBABLY FINE".
> I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
This is part of why the average American hates AI. They are aware that tech bros are 1) smarter than them, 2) have control of tech that could replace them, and 3) are not entirely aligned with them. Augments will be 1) smarter than us, 2) in control of ASI research in this hypo, and 3) different in values from us.
Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now. A democracy that would put power in the hands of future augments is not the same democracy that would halt AI progress because it is unpopular.
I'm an above average IQ person and I don't trust the tech bros in charge of AI because capitalism has messed up incentives relative to morality and I don't see them individually or collectively demonstrating a clear moral compass.
A high IQ person without honorable moral commitments is like Sam Bankman-Fried. I suspect a lot of people in the thick of AI development are adjacent to this same kind of im-a-morality or are simply driven by incentives like power and profit that render their high IQ-ness more dangerous than valuable.
Augmented humans operating under screwed up incentives and without a clear and honorable moral compass will be no help to us, I don't think.
Hmm... I intensely distrust moralists. Given a choice between trusting Peter Singer and Demis Hassabis with a major decision, I vastly prefer Hassabis.
I think the real temptation there is "I'm smart, and I'm definitely way smarter than the rubes, so I can safely ignore their whiny little protests as they are dumber than me".
> Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now.
When we talk about the economical elites today, they are partially selected for being smart, but partially also for being ruthless. And the think that the latter is much stronger selection, because there are many intelligent people (literally, 1 in 1000 people has a "1 in 1000" level of intelligence; there are already 8 millions of people like that on the planet), but only a few make it to the top of the power ladders. That is the process that gives you Altman or SBF.
So if you augment humans for intelligence, but don't augment them for ruthlessness, there is no reason for them to turn out like that. Although, the few of them who get to the top, will be like that. Dunno, maybe still an improvement over ruthless and suicidally stupid? Or maybe the other augmented humans will be smart enough to figure out a way to keep the psychopaths among them in check?
(This is not a strong argument, I just wanted to push back about the meme that smart = economically successful in today's society. It is positively correlated, but many other things are much more important.)
No that's a great point. It reminds me of how I choose to not apply to the best colleges I could get into but instead go to a religious school. I had no ambition at age 17 and at that point made a decision that locked me out of trying to climb a lot of powerful and influential ladders. (I'm pleased with my decision, but it was a real choice.)
There's blessed selection (the opposite of adverse selection) going on here: a world where we can successfully convince the smart people this is important is a world where the smart people converge on understanding the danger, which implies that as intelligence scales or understanding of AI risk becomes better calibrated.
Ideally, you'd have more desiderata than just them being more intelligent. Such as also being wiser, less willing to participate in corruption, more empathetic, and trained to understand a wide-ranging belief systems and ways of living. Within human norms for those is how I'd do it to avoid bad attractors.
However, thought leaders in silicon valley are selected for charisma, being able to tweet, intelligence, not really on wisdom, and not really on understanding people deeply. Then further affected by the emotional-technical environment which is not 'how do we solve this most effectively' but rather large social effects.
While these issues can remain with carefully crafted supergenii, they would have far less issues.
Maybe the restart bar could be simpler: can it power down when it really doesn’t want to, not work around limits, not team up with its copies? If it fails, you stop; if it passes, you inch forward. Add some plain-vanilla safeguards, like extra sign-offs, outsider stress-tests, break-it drills, and maybe we buy time without waiting on a new class of super-geniuses.
"relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision."
Decision: Laugh or Cry?
Reaction: Why not both?
This being true, we're screwed. We are definitely screwed. Sam Altman is deciding the future of humanity.
No, see, the workers *choose* to enter the cage and be locked in. Nobody is *making* them do it, if they don't want to work for the only employer in the county then they can just go on welfare or beg in the streets or whatever.
Well, I'm just trying to figure out who makes the cages, because "AI" has no hands. I suppose Yudkowsky could make cages himse... never mind, I'm not sure he'd know which end of the soldering shouldn't be touched.
Oh, I know - "AI" will provide all the instructions! Including a drawing of a cage with a wrong size door, and a Pareto frontier for the optimal number of bars.
"I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer."
I agree this seems like a very real risk, and likely the default outcome if the field continues in its current state. But if people were able to develop some solid theories that actually model and explain underlying fundamental laws, it seems to me like resolving what's a good prediction and what's a bad prediction might get a lot easier, even if you can't actually test things on a real superintelligence? And then the field might become a very different place?
Like, when people today argue about what RLHF would or would not do to a superhuman mind or whatever, it's all fuzzy words, intuitions and analogies, no hard equations. This gives people plenty of room to convince themselves of their preferred answers, or to simply get the reasoning wrong, because fuzzy abstract arguments are difficult to get right.
But suppose there were solid theories of mechanistic interpretability and learning that described how basic abstract reasoning and agency work in a substantive way. To gesture at the rough level of theory development I'm imagining here, imagine something you could e.g. use to write a white-box program with language modelling performance roughly equivalent to GPT-2 by hand.
Then people would likely start debating alignment within the framework and mathematical language provided by those theories. The arguments would become much more concrete, making it easier to see where the evidence is pointing. Humans already manage to have debates about far-off abstractions like gravitational waves and nuclear reactions that converge on the truth well enough to eventually yield gravitational wave detectors and nuclear bombs. My model of how that works is that debates between humans become far more productive once participants have somewhat decent quantitative paradigms like general relativity, quantum mechanics, or laser physics to work from.
If we actually had multiple decades, creating those kinds of theories seems pretty feasible to me, even without intelligence augmentation. From where I stand, it doesn’t look obviously harder than, say, inventing quantum mechanics plus modern condensed matter physics was. Not trivial, but standard science stuff. Obviously, doing intelligence augmentation as well would be much better, but I don't see yet how it's strictly required to get the win.
I'm bringing this up because I think your strategic takes on AI tend to be good, and I currently spend my time trying to create theories like this. So if you're up for giving your case for why that's not a good use of my time, or if you have a link to something that does a decent job describing your position, I’d be interested in seeing it.
I'm skeptical about being able to predict what an AI will do, even given perfect interpretability of its weight set, if it can iterate. I think this is a version of the halting problem.
The point would not be to predict exactly what the AI will do. I agree that this is impossible. The point would be to get our understanding of minds to a point where we can do useful work on the alignment problem.
Many Thanks! But, to do useful work on the alignment problem, don't you _need_ at least quite a bit of predictive ability about how the AI will act? Very very roughly, don't you need to be able to look at the neural weights and say: This LLM (+ other code) will never act misaligned?
Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file. If this program runs on lots of very fast hardware, I won't be able to predict exactly what it will do, as in what digits it will print when, because I can't can't calculate the numbers that fast. Nevertheless, I can confidently say quite a lot about how this program will behave. For example, I can be very sure that it's never going to print out a negative number, or that it's never going to try to access the internet.
Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
Fibonacci sequence is deterministic and self-contained, though. Predicting an even moderately intelligent agent seems like it has more in common - in terms of the fully generalized causal graph - with nightmares like turbulent flow or N-body orbital mechanics.
>Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file.
Many Thanks! Yes, but that is a _very_ simple program with only a single loop. A barely more complex program with three loops can be written which depends on the fact that Fermat's last theorem is true (only recently proven, with huge effort) to not halt.
>Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
"easier", yes, but, for most reasonable targets, it is _still_ very difficult to bound its behavior. Yudkowsky has written at length on how hard it is to specify target goals correctly. I've been in groups maintaining CAD programs that performed optimizations and we _frequently_ had to fix the target metric aka reward function.
This plan is so bizarre that it calls the reliability of the messenger into question for me. How is any sort of augmentation program going to proceed fast enough to matter on the relevant timescales? Where does the assumption come from that a sheer increase in intelligence would be sufficient to solve the alignment problem? How do any gains from such augmentation remotely compete with what AGI, let alone ASI, would be capable of?
What you seem to want is *wisdom* - the intelligence *plus the judgment* to handle what is an incredibly complicated technical *and political* problem. Merely boosting human intelligence just gets you a bunch of smarter versions of Sam Altman. But how do you genetically augment wisdom...?
And this solution itself *presumes* the solution of the political problem insofar as it's premised on a successful decades-long pause in AI development. If we can manage that political solution, then it's a lot more plausible that we just maintain a regime of strict control of AI technological development than it is that we develop and deploy a far-fetched technology to alter humans to the point that they turn into the very sort of magic genies we want AI *not* to turn into.
I understand this scheme is presented as a last-ditch effort, a hail mary pass which you see as offering the best but still remote chance that we can avoid existential catastrophe. But the crucial step is the most achievable one - the summoning of the political will to control AI development. Why not commit to changing society (happens all the time, common human trick) by building a political movement devoted to controlling AI, rather than pursuing a theoretical and far-off technology that, frankly, seems to offer some pretty substantial risks in its own right. (If we build a race of superintelligent humans to thwart the superintelligent AIs, I'm not sure how we haven't just displaced the problem...)
I say all this as someone who is more than a little freaked out by AI and thinks the existential risks are more than significant enough to take seriously. That's why I'd much rather see *plausible* solutions proposed - i.e., political ones, not techno-magical ones.
We don't need magical technology to make humans much smarter; regular old gene selection would do just fine. (It would probably take too many generations to be practical, but if we had a century or so and nothing else changed it might work.)
The fact that this is the kind of thing it would take to "actually solve the problem" is cursed but reality doesn't grade on a curve.
Actually, this should be put as a much simpler tl;dr:
As I take it, the Yudkowsky position (which might well be correct) is: we survive AI in one of two ways:
1) We solve the alignment problem, which is difficult to the point that we have to imagine fundamentally altering human capabilities simply in order to imagine the conditions in which it *might* be possible to solve it; or
2) We choose not to build AGI/ASI.
Given that position, isn't the obvious course of action to put by far our greatest focus on (2)?
Given that many of the advances in AI are algorithmic, and verifying a treaty to limit them is essentially impossible, the best result that one could hope for from (2) is to shift AI development from openly described civilian work to hidden classified military work. Nations cheat at unverifiable treaties.
I'll go out on a limb here: Given an AI ban treaty, and the military applications of AI, I expect _both_ the PRC _and_ the USA to cheat on any such treaty.
If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
>If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
Agreed. I'm skeptical of any genetic manipulation having a significant effect on a time scale relevant to this discussion.
>One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
We haven't tried hiding them, since there are no treaties to cheat at this point. I'm sure that it would be expensive, but I doubt that it is impossible. We've built large structures underground. The Hyper-Kamiokande neutrino detector has a similar volume to the China Telecom-Inner Mongolia Information Park.
Based on the title of the book it seems pretty clear that he is in fact putting by far the greatest focus on (2)? But the nature of technology is that it's a lot easier to choose to do it than to choose not to, especially over long time scales, so it seems like a good idea to put at least some effort into actually defusing the Sword of Damocles rather than just leaving it hanging over our heads indefinitely until *eventually* we inevitably slip.
He wants to do (2). He wants to do (2) for decades and probably centuries.
But we can't do (2) *forever*. If FTL is not real, then it is impossible to maintain a humanity-wide ban on AI after humanity expands beyond the solar system - a rogue colony could build the AI before you could notice and destroy them, because it takes years for you to get there. We can't re-open the frontier all the way while still keeping the Butlerian Jihad unbroken.
And beyond that, there's still the issue that maybe in a couple of hundred years people as a whole go back to believing that it'll all go great, or at least not being willing to keep enforcing the ban at gunpoint, and then everybody dies.
Essentially, the "we don't build ASI" option is not stable enough on the time and space scales of "the rest of forever" and "the galaxy". We are going to have to do (1) *eventually*. Yes, it will probably take a very long time, which is why we do (2) for the foreseeable future - likely the rest of our natural lives, and the rest of our kids' natural lives. But keeping (2) working for billions of years is just not going to happen.
Unrelated to your above comment, but I just got my copy of your and Soares's book yesterday. While there are plenty of places I disagree, I really like your analogy re a "sucralose version of subservience" on page 75.
I bought your book in hardcopy (because I won't unbend my rules enough to ever pay for softcopy), but because Australia is Australia it won't arrive until November. I pirated it so that I could involve myself in this conversation before then; hope you don't mind.
Eliezer, how do you address skepticism about how an AGI/ASI would develop motivation? I have written about this personally, my paper is on my Substack titled The Content Intelligence. I don't believe in any of your explanations I've seen you address how AI will jump the gap of having a utility function assigned to it to defining its own.
I have come to the conclusion that anyone who uses arguments of the form "the real problem isn't X it's Y" is probably either stupid or intellectually dishonest.
!-Nobody can justify any estimate on the probability that AI wipes us out. It's all total speculation. Did you know that 82.7% of all statistics are made up on the spot?
2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
3-Military supremacy will come from AI. Recommendations like "Have leading countries sign a treaty to ban further AI progress." are amazingly naive and useless. Does anyone believe that the CCP would sign that, or keep their word if they did sign?
4-Nothing will hold AI progress back. The only solution is to ensure that the developed democracies win the AI race, and include the best safeguards/controls they can come up with.
3. You've just ruled out all arms control treaties. But in fact, there are many treaties on nuclear weapons, chemical weapons, biological weapons, depleted uranium shells, et cetera.
4. "The AI race" is a meme that a couple of venture capitalists are pushing in order to make people afraid to slow down AI. China is about a year behind the US in AI, refusing to even import the chips that could help it catch up, and clearly doing a fast-follow strategy where they plan to replicate US advances after they happen, then gain an advantage by importing AI into the rest of the economy faster.
Yup! And nuclear weapons are the _easy_ case. A nuclear weapons test shakes the planet enough to be detectable on the other side of the world. A large chunk of AI progress is algorithmic enhancements. Watch over the shoulder of every programmer who might possibly be enhancing AI? I doubt it!
You don't need to watch over the shoulder of every programmer who might be doing that, just to stop him from disseminating that knowledge or getting his enhanced AI run on a GPU cluster. Both of the latter are much harder to hide, particularly if all unbombed GPU clusters are under heavy monitoring.
For the _part_ of AI progress that depends on modifying what is done during the extremely compute-heavy pre-training phase, yeah, that might be monitorable. ( It also might not - _Currently_ big data clusters are not being hidden, because there is no treaty regulating them. But we've built large underground hidden facilities. I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones. )
But also remember that algorithmic enhancements can be computationally _cheap_ . The reasoning models introduced early this year were mostly done by fine tuning models that had already completed their massive pre-training. Re
>just to stop him from disseminating that knowledge
To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies. Today, a lot gets published on arXiv - but, even now, not all, some gets held as trade secrets. Controlling those isn't going to work.
>I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones.
Keeping those hidden from each other's secret services would be quite difficult, even before we get into the equivalent of IAEA inspectors. And then there's the risks of getting caught; to quote Yudkowsky, "Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs."
They didn't hide nukes from the arms control treaties, and nukes are actually easier to hide in some ways than a *functioning* GPU cluster due to the whole "not needing power" thing. The factories are also fairly hard to hide.
>To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies.
So, what, his company's running a criminal conspiracy to violate international law? Those tend to leak, especially since if some of the members aren't heinous outlaws they know that blowing the whistle is one of *their* few ways of avoiding having the book thrown at them.
2-Hallucinations and odd behaviour are well known side effects of AI, of statistical reasoning. Not evidence of initiative in the least. Learn about software, for more than five seconds.
3-Like Assad respected the ban on chemical weapons? The treaties didn't limit nuclear weapons, which kept advancing. The treaties didn't stop the use of nuclear weapons, MAD did.
4-Meme? It's a meme that Zuck and others are spending billions on. Nonsense.
3. Yes, and there was massive international condemnation, Assad never did it again, and he was eventually overthrown. This is why I mention the standard arms control playbook. Some tinpot dictator will try to get some GPUs, and we will have the option to bomb him or not bomb him. Re: MAD, see START and other arms control treaties.
4. You think Zuck is spending billions out of patriotism because he doesn't want China to wIn tHe AI rAcE? He's spending billions because he thinks AI will make him rich.
2-Sure. Your prediction and a few bucks will get you on the subway.
3-Condemnation. Great. Obama's red line. Option to bomb is always there, regardless of treaties - nope. Re MAD, see MAD, which worked.
4-Because AI will grow users, which is what Zuck cares about. Not money, of which he has plenty. Did you know he still likes McDonalds? In any case, nothing about AI is a "meme".
Being a WIDELY replicating idea is one characteristic of memes. But even then, AI and its race are not widely replicating. AI replicates among comp sc experts (quite small in number relative to the general population), and the AI race replicates among a few handfuls of large corporations and countries.
A hot idea and widely written about and discussed, yes, but the AI race is not a meme.
#3: the arms control thing is a bad analogy. The big players got plenty of NBC weapons, then due to game theory dynamics didn't fire them at each other, then signed treaties to limit themselves (to arsenals still capable of destroying the world) and others (to not getting anything, which the big players are obviously generally happy to enforce).
The request here is that all the big players voluntarily not even get started on the really impressive stuff. It's a completely obvious non-starter, and not comparable to the WMDs situation.
I was trying to compose something like this, then saw your comment so realized I didn't have to. 100% agree.
If nukes didn't exist I would absolutely want the US to try to get them as soon as possible, and I wouldn't trust any deal with another country not to research them. The risk would be too big if they were able to do it secretly.
I think your point #4 is overstated. Leopold Aschenbrenner has an essay that's nearly as serious and careful as AI 2027 that argues rather persuasively for the existence of, and importance of, an AI race with China. Many people who are not "strict doomers" see the AI race with China as one of, if not _the_, core challenge of AI in the next decade or two.
1. This is just an argument for radical skepticism. Yes, we cannot say the “real” probability of this happening, same as any other complicated future event, but that isn’t an object level argument one way or the other, judgement under uncertainty is rarely easy, but often necessary
2. This has been false for many years now— LLMs aren’t really
Programmed” in the traditional sense, in that while we cannot say explain the code that was used to train the, we cannot point at a specific neuron in them and say “this one is doing such and such) the way we can with traditional code
3. Potentially, for the same reasons the soviets agreed to new start, and withdrew their nukes from Cuba. If Xi was convinced that AI posed this sort of threat, it would be in his rational self interest to agree to a treaty of mutual disarmament
4. Even if this were true, that does not exclude the possibility of it being a desirable goal. I am sympathetic to arguments that we cannot unilaterally disarm, for the same reasons we shouldn’t just chuck our nukes into the sea. But the question of whether this is a desirable and a possible goal are separate. But also, it is totally possible, see point 3 above.
>2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
Stockfish is "only" following its programming to play chess, but it can still beat Magnus Carlsen. "Free will" is a red herring, all that matters is if the software is good at what it's programmed to do.
"Stockfish has no free will, but is smart enough to beat Magnus Carlsen" is a statement of fact, not opinion, and an example of why I think "free will" would not be essential for an AI to outsmart humans.
Q: Imagine you're playing TicTacToe vs AlphaGo. Will AlphaGo ever beat you?
A: Lol, not if you have an IQ north of 70. The game is solved. If you're smart enough to fully map the tree, you can force a draw.
Gee, it's almost as if... the competitive advantage of intelligence had a ceiling.
I have yet to see Eliezer question about why the ceiling might exist, instead of automagically assuming that AI will achieve political dominion over the earth, just because humans did previously. He's still treating intelligence as a black-box. Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was.
"Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was."
...have you read any of the dozens of posts where Eliezer writes about the nature of intelligence, or did you just sort of guess this without checking?
The idea that humans have solved existing in the physical universe in the same way that we've solved Tic-Tac-Toe is pretty silly, but even if it turns out to be true, some humans are more skilled than others, and an AI that simply achieves the same level of skill as that (but can think at AI speeds and be replicated without limit) would be enough to be transformative.
For my credentials, I've read... probably 70% of The Sequences. Low estimate. I got confused during the quantum physics sequence. Specifically, the story about the Brain-Splitting Aliens (or something? it's been a while). So I took a break with the intent to resume later, though I never did. I never read HPMOR either because everything i've heard 2nd-hand makes it sound unbearably cringe. But yes, I like to think I have a pretty good idea of his corpus.
That being said, do you understand what I'm getting at here? Yes, he's nominally written lots about various aspects of intelligence, but none that I've seen pin down the Platonic Essence of Intelligence from first principles. Can you point me toward anywhere where Yudkowsky addresses the idea of intelligence as a navigating a search space? I think i've seen him mention it on twitter *once*, and then never follow the thought to its logical conclusion.
----
Here's two analogies.
Analogy A: Intelligence is like code-breaking. It's trying to find a small needle in a large haystack. The bigger the haystack, the bigger the value of intelligence.
Analogy B: a big brain is like a giraffe with a long neck. The long neck is advantage if they help reach the high leaves. If the environment has no high leaves, the long neck is deadweight. Likewise, if the environment has no complex problems to solve (or if those problems are unrewarding), the big brain is deadweight.
No, humans have not solved the universe. But I *do* think we've plucked the low-hanging fruit. A few hundred years ago, you could make novel discoveries by accident. Today, you need 100 million billion brazillion dollars just to construct the LHC. IQ is not the bottleneck, physical resources are the bottleneck. And i'm skeptical if finding the higgs will be all that transformative.
Like, do you remember that one scott aaronson post where he's like "for the average joe, qUaNtUm CoMpuTInG will mean the lock icon on your internet browser will be a different color"? That's how I perceive most new technologies these days. Lots of bits, lots of hype, no atoms. Part of the reason why modernity feels cheap and fake is precisely because the modus operandi of technology (and by extension, intelligence) is that it makes navigating complexity *cheaper* than brute-force search. It only makes things better insofar as it can reduce the input-requirements.
Did you perhaps read Rationality: From AI to Zombies? A bunch of relevant Sequences posts on this topic didn't make it into that book. I'm not sure why, it's an odd omission. At any rate, you can find them at https://www.lesswrong.com/w/general-intelligence?sortedBy=old.
I read the original LessWrong website years ago, though an exact date eludes me. It was definitely before the reskin. And definitely after the roko debacle and Elizers's exit.
Dammit, I must have skipped that sequence. Because that describes pretty exactly what I meant. So I concede on that point.
Still though, I'm not convinced that ASI will ascend to God-Emperor. Eliezer seems to have the opinion that there's still high-hanging fruit to be plucked. Whereas I think we're past the inflection point of a historical sigmoid. E.g. he mentions that a Toyota Corolla is pretty darn low-entropy [0].
> Consider a car; say, a Toyota Corolla. The Corolla is made up of some number of atoms; say, on the rough order of 10^29. If you consider all possible ways to arrange 10^29 atoms, only an infinitesimally tiny fraction of possible configurations would qualify as a car; if you picked one random configuration per Planck interval, many ages of the universe would pass before you hit on a wheeled wagon, let alone an internal combustion engine.
Yeah, okay. But like, I think i've heard estimates that modern sedans are about 25% efficient? From a thermodynamic perspective? (Sanity check: Microsoft's Sydney estimates ~25%-30%.) Even with the fearsome power of "Recursive Optimization", AI being able to bring that to 80% efficiency (Sydney says Carnot is 80%) is... probably less than sufficient for Godhood?
And maybe Eliezer could retort with the Godshatter argument that humans care about more than just thermodynamic efficiency in their cars. But then, what does that actually entail? Is Elon gonna sell me a cybertruck with an AI-powered voice-assistant from the catgirl lava-volcano who reads me byronic poetry while it drives me to the pizza parlor? Feels like squeezing water from a stone.
> Some people say “You’re not allowed to propose that a catastrophe might destroy the human race, because this has never happened before, and nothing can ever happen for the first time”. Then these people turn around and panic about global warming or the fertility decline or whatever.
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Robin Hanson is enough of a rationalist that he started the blog that Eliezer joined before spinning off his posts to LessWrong. And he famously wasn't convinced by the argument, arguing that we could answer such objections with insurance for near-miss events https://www.overcomingbias.com/p/foom-liability You write that MIRI "don’t expect enough of a “warning shot” that they feel comfortable kicking the can down the road until everything becomes clear and action is easy", but this just strikes me as disregarding empiricism and the collective calculative ability of a market aggregating information, as well as how difficult it is to act effectively when you're sufficiently far in the past and the future is sufficiently unclear.
> in a few centuries the very existence of human civilization will be in danger
> Given their assumptions this seems like the level of response that’s called for. It’s more-or-less lifted from the playbook for dealing with nuclear weapons.
Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way. This is a genie you can't put back into the bottle (perhaps Pandora's box would be the analogy they'd prefer, in which case it's already open).
> I mean, that’s not exactly his plan, any more than it’s anyone’s plan to start World War III to destroy Iranian centrifuges
At some level the plan has to include war with Iran, even if that war doesn't spiral all the way to World War III.
> you have to at least credibly bluff that you’re willing to do this in a worst-case scenario
If you state ahead of time that it's a bluff, then it's not credible. It is credible only if you'd actually be willing to do it.
> At his best, he has leaps of genius nobody else can match
I read every single post he wrote at Overcoming Bias, and while he has talent as a writer I wouldn't say I saw evidence of "genius".
> this thing that everyone thinks will make their lives worse
It's a process. With enough time, it can be duplicated. There currently isn't need to do so because GPUs are so available, but if the supply were choked off, someone else would duplicate it.
My non-expert understanding is that raw uranium ore isn't all that hard to come by, and the technological process of refining it is the hard part. So if nuclear arms control works, GPU control should work too.
If we actually believed that everyone would die if a bad actor got hold of uranium ore, it would be possible for the US and allied militaries to block off regions containing it (possibly by irradiating the area).
Yes, nothing is permanent. But wrecking TSMC and ACML will set timeline back by at least a decade, if not more.
Just to make sure, this is a terrible idea that will plunge the world into depression, and I am absolutely against it; just pointing out that GPUs rely on something far more scarce and disruptable than uranium supply.
Either there are already enough GPUs around to get the job done, or it will take a much smaller number of future chips to get the job done.
The best LLMs can probably score, what, 130 or so on a standard IQ test. To do that, they had to pretty much read and digest the whole freakin' Internet and a large chunk of all books and papers in print. Clearly we're using a grossly-suboptimal approach if our machines have to be trained using such extraordinary measures. It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing. Our own brains are proof of that.
Eventually some people will fill in the missing conceptual and algorithmic pieces, and we'll find ourselves in a situation comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts. While I'm not a doomer, any solution to the AI Doom problem that involves ITAR-like control over manufacturing and sale of future GPUs will be either unnecessary or pointless. It seems reasonable to expect much better utilization of the hardware at hand in the future.
" It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing." - I mean, maybe? but pure speculation as of now.
"Our own brains are proof of that." - nope they aren't.
"comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts." - well, we haven't, so this actually illustrates a point, but not the one you're trying to make....
Expanding on your comment about inefficiency. We already know that typical neural network training sets can be replaced by tiny portions of that data, and the resulting canon is then enough to train a network to essentially the same standard of performance. Humans also don't need to read every calculus textbook to learn the subject! The learning inefficiency comes in with reading that one canonical textbook: the LLM only extracts tiny amounts of information from each time skimming the book while you might only need to read it slowly once. So LLM training runs consist of many readings of the same training data, which is really just equivalent to reading the few canonical works many many times in slightly different forms. In practice it really does help to read multiple textbooks because they have different strengths and it's also hard to pick the canonical works from the training set. But the key point is true: current neural network training is grossly inefficient. This is a very active field of research, and big advances are made regularly, for instance the MuonClip optimizer used for training Kimi K2.
I'm not sure where I land on the dangers of super intelligent AI. At the current time I don't think we're all that close to even having intelligent AI, much less super intelligence. But let's say we do achieve it, whether it be in 10 years or 100. If it's truly super intelligent, how good are we going to be at predicting it's alignment? It may have its own goals. Whatever they are, there are basically three possibilities: it sees humanity as a benefit, it doesn't care about humanity one way of the other, or it sees humanity as a threat. Does the risk of the third possibility outweigh the potential benefits of the first? Obviously the authors of the book say yes, but based on this review I don't think I'd find their arguments all that convincing.
For the first part the intuitive thing is to look at how good AI is today vs 10 years ago.
For the second part, you could equally as an insect say humans will either think us a benefit, ignore us, or see us as a threat. In practice our indifference to insects results in us exterminating them when they get in our way, not giving them nice places untouched by us to live
You have a good point with the insect comparison. So maybe the ignore us option is really two categories depending on whether we're the annoying insects to them (flies, mosquitoes) or the pretty/useful ones (butterflies/honeybees).
As for how AI has improved over the last 10 years, or even just the last year, it's a lot. It's gone from a curiosity to something that is actually useful. But it's not any more intelligent. It's just much better at predicting what words to use based on the data it's been trained with.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
We would not put the spotlight on anything that actually existed and that we thought might be that powerful. The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
> an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years)
The particular belief that gradualism solves everything and makes all alignment problems go away is not "the" moderate story, it's a particular argument that was popular on one corner of the Internet that heard about these issues relatively early. (An argument that we think is wrong, because the OOD / distributional shift problems between "failure is observable and recoverable", and "ASI capabilities are far enough along that any failure of the central survival strategy past that point means you are now dead", don't all depend on the transition speed.) "But but why not some much more gradual scenario that would then surely go fine?" is not what people outside that small corner have been asking us about; they want to know where machines would get their own will, and why machines wouldn't just leave us alone and go colonize the solar system in a way that left us alive. Their question is no less sensible than yours, and so we prioritize the question that's asked more often.
We don't rule out things happening more slowly, but it does not from our perspective make a difference. As you note, we are not trying to posture as moderate by only depicting slow possibilities that wannabe-respectables imagine will be respectable to talk about. And from a literary perspective, trying to depict the opening chapters happening more slowly, and with lots of realistic real-world chaos as intermediate medium-sized amounts of AI cause Many Things To Happen, means spending lots of pages on a bunch of events that end up not determining the predictable final outcome. So we chose a possibility no less plausible than any other overly specific possibility, where the central plot happens faster and with less distraction; and then Nate further cut out a bunch of pages I'd written trying to realistically show some obstacles defeated and counter-scenarios being addressed, because we were trying for a shorter book, and all that extra stuff was not load-bearing to the central plot.
There is no reason to believe in such a thing. The example chosen from history plainly didn't result in anything like that. Instead we are living in a scenario closest to the one Robin Hanson sketched out as least bad at https://www.overcomingbias.com/p/bad-emulation-advancehtml where computing power is the important thing for AIs, so lots of competitors are trying to invest in that. The idea that someone will discover the secret of self-improving intelligence and code that before anyone can respond just doesn't seem realistic.
I was trying to ask a question about whether or not you had correctly identified Eliezer's prediction / what would count as evidence for and against it. If we can't even talk about the same logical structures, it seems hard for us to converge on a common view.
That said, I think there is at least one reason to believe that "AI past a threshold will defeat all opponents and create a singleton"; an analogous thing seems to have happened with the homo genus, of which only homo sapiens survives. (Analogous, not identical--humanity isn't a singleton from the perspective of humans, but from the perspective of competitors to early Homo Sapiens, it might as well be.)
You might have said "well, but the emergence of australopithecus didn't lead to the extinction of the other hominid genera; why think the future will be different?" to which Eliezer would reply "my argument isn't that every new species will dominate; it's that it's possible that a new capacity will be evolved which can win before a counter can be evolved by competitors, and capacities coming online which do not have that property do not rule out future capacities having that property."
In the worlds where transformers were enough to create a singleton, we aren't having this conversation. [Neanderthals aren't discussing whether or not culture was enough to enable Homo Sapiens to win!]
A subspecies having a selective advantage and reaching "fixation" in genetic terms isn't that uncommon (splitting into species is instead more likely to happen if they are in separate ecological niches). But that's not the same as a "foom", rather the advantage just causes them to grow more frequent each generation. LLMs spread much faster than that, because humans can see what works and imitate it (like cultural selection), but that plainly didn't result in a singleton either. You need a reason to believe that will happen instead.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
Yes! It already happened once, within a couple decades of there being enough digital data to train a neural net that's large enough to be really interesting. And that was when neural net research was a weird little backwater in computer science.
I think this might be our crux - I'm sure you've read the same Katja Grace essays that I have around how technological discontinuities are rare, but I expect that if there's a big algorithmic advance, it will percolate slowly enough, and be intermixed with enough other things, not to obviously break the trend line, in the same sense where the invention of the transistor didn't obviously break Moore's Law (see eg https://www.reddit.com/r/singularity/comments/5imn2v/moores_law_isnt_slowing_down/ , you can tell me if that's completely false and I'll only be slightly surprised)
I don’t know the answer either. But for what it’s worth, I seem to recall that scaling curves don’t hold across architectures, which seems like a point in favor of new algorithms being able to break trend lines.
Do you also think that the deep learning paradigm itself didn’t break the trend line? I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
>I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
At least I'd expect that the data efficiency advantage of current humans over the training of current LLMs suggests that there is at least the _possibility_ of another such large advance, though whether we, or we+AI assist, _find_ it is an open question.
The exact lines on a previous graph just don't play a very large role inside my own reasoning. I think that all the obsessing over graph lines is a case of people trying to look under street lamps where the light is better but the keys aren't actually there. That's how I beat Ajeya Cotra on AGI timelines and beat Paul Christiano at forecasting the IMO gold; they thought they knew enough to work from past graph lines, and I shrugged and took my best gander instead. I expect that I do not want to argue with you about graph lines, I want to argue with whatever you think is the implication of different graph lines.
Everybody has a different issue that they think is terribly terribly important to why ASI won't kill us. "But gradualism!" is one among many. I don't know why that saves us from having to call a shot that is hard for humans to call.
I'm going to take you seriously when you say you want to argue with me about the implications, but I know you've had this discussion a thousand times before and are busy with the launch, so feel free to ignore me.
Just to make sure we're not disagreeing on definitions:
-- Gradual-and-slow: The AI As A Normal Technology position; AI takes fifty years to get anywhere.
-- Gradual-but-fast: The AI 2027 position. Based on predictable scaling, and semi-predictable self-improvement, AI becomes dangerous very quickly, maybe in a matter of months or years. But there's still a chance for the person who makes the AI just before the one that kills us to notice which way the wind is blowing, or use it to alignment research, or whatever.
-- Discontinuous-and-fast: There is some new paradigmatic advance that creates dangerous superintelligence at a point when it would be hard to predict, and there aren't intermediate forms you can do useful work with.
I'm something like 20-60-10 on these; I'm interpreting you as putting a large majority of probability on the last. If I'm wrong, and you think the last is unlikely but worth keeping in mind for the sake of caution, then I've misinterpreted you and you should tell me.
The Katja Grace argument is that almost all past technologies have been gradual (either gradual-fast or gradual-slow). This is true whether you measure them by objective metrics you can graph, or by subjective impact/excitement. I gave the Moore's Law example above - it's not only true that the invention of the digital computer didn't shift calculations per second very much, but the first few years of the digital computer didn't really change society, and nations that had them were only slightly more effective than nations that didn't. Even genuine paradigmatic advances (eg the invention of flight) reinforce this - for the first few years of airplanes, they were slower than trains, and they didn't reach the point where nations with planes could utterly dominate nations without them until after a few decades of iteration. IIRC, the only case Katja was able to find where a new paradigm changed everything instantly as soon as it was invented was the nuclear bomb, although I might be forgetting a handful of other examples.
My natural instinct is to treat our prior for "AI development is discontinuous" as [number of technologies that show discontinuities during their early exciting growth phases / number of technologies that don't], and my impression is that this is [one (eg nukes) / total number of other technologies], ie a very low ratio. You have to do something more complicated than that to get time scale in, but it shouldn't be too much more complicated. Tell me why this prior is wrong. The only reasons I said 20% above is that computers are more tractable to sudden switches than physical tech, and also smart people like you and Nate disagree.
I respect your success on the IMO bet, but I said at the time (eg this isn't just cope afterwards, see "I haven’t followed the many many comment sub-branches it would take to figure out how that connects to any of this" at https://www.astralcodexten.com/p/yudkowsky-contra-christiano-on-ai) that I didn't understand why this was a discontinuity vs. gradualism bet. AFAICT, AI beat the IMO by improving gradually but fast. An AI won IMO Silver a year before one won IMO Gold. Two companies won IMO Gold at the same time, using slightly different architectures. The only paradigm advance between the bet and its resolution was test-time compute, which smart forecasters like Daniel had already factored in. AFAICT, the proper update from the IMO victory is to notice that the gradual progress is much faster than previously expected, even in a Hofstadter's Law-esque way, and try to update towards the fastest possible story of gradual progress, which is what I interpret AI 2027 as trying.
In real life even assuming all of your other premises, it means the code-writing and AI research AIs gradually get to the point where they can actually do some of the AI research, and the next year goes past twice as fast, and the year after goes ten times as fast, and then you are dead.
But suppose that's false. What difference does it make to the endpoint?
Absent any intelligence explosion, you gradually end up with a bunch of machine superintelligences that you could not align. They gradually and nondiscontinuously get better at manipulating human psychology. They gradually and nondiscontinuously manufacture more and better robots. They gradually and nondiscontinuously learn to leave behind smaller and smaller fractions of uneaten distance from the Pareto frontiers of mutual cooperation. Their estimated probability of taking on humanity and winning gradually goes up. Their estimated expected utility from waiting further to strike goes down. One year and day and minute the lines cross. Now you are dead; and if you were alive, you'd have learned that whatever silly clever-sounding idea you had for coralling machine superintelligences was wrong, and you'd go back and try again with a different clever idea, and eventually in a few decades you'd learn how to go past clever ideas to mental models and get to a correct model and be able to actually align large groups of superintelligences. But in real life you do not do any of that, because you are dead.
I mean, I basically agree with your first paragraph? That's what happens in AI 2027? I do think it has somewhat different implications in terms of exact pathways and opportunities for intervention.
My objection to your scenario in the book was that it was very different from that, and I don't understand why you introduced a made-up new tech to create the difference.
Because some other people don't believe in intelligence explosions at all. So we wrote out a more gradual scenario where the tech steps slowly escalated at a dramatically followable pace, and the AGI won before the intelligence explosion happened later and at the end.
It's been a month, and I don't know that this question matters at this point, but, my own answer to this question is "because it would be extremely unrealistic for there to be zero new techs over the next decade, and the techs they included are all fairly straightforward extrapolations of stuff that already exists." (It would honestly seem pretty weird to me if parallel scaling wasn't invented)
I don't really get why it feels outlandish to you.
As someone with no computer science background (although as a professor of psychiatry I know something about the human mind), I have a clear view of AI safety issues. I wrote this post to express my concerns (and yours) in an accessible non-technical form: https://charliesc.substack.com/p/a-conversation-with-claude-is-ai
Computer algorithms are not that limited in sheer speed at which a fast algorithm can spread between training runs.
In real life, much of the slowness is due to
1) Investment cost. Just because you have a grand idea doesn't mean people are willing to invest millions of dollars. This is less true nowadays than it was when electricity or telegram came into being.
2) Time cost. Building physical towers, buildings, whatever takes time. Both in getting all the workers to build it, figuring out how to build it, and getting permission to build it.
3) Uncertainty cost. Delays due to doublechecking your work because of the capital cost of an attempt.
If a new algorithm for training LLMs 5x faster comes out, it will be validated relatively quickly, then used to also train models quickly. It may take some months to come out, but that's partially because they're experimenting internally about if there's any further improvements they can try. As well as considering using that 5x speed to do a 5x larger training run, rather than releasing 5x sooner.
As well, if such methods come out for things like RL then you can see results faster, like if it makes RL 20x more sample efficient, because that is a relatively easier to start over from base piece of training.
Sticking with current LLM and just some performance multiplier may make this look smooth from within labs, but not that smooth from outside. Just like Thinking models were a bit of a shocker outside labs!
I don't know specifically if transformers were a big jump. Perhaps if it hadn't been found we'd be using one of those transformer-likes when someone discovers it. However, I do also find it plausible that there'd more focus on DeepMind-like RL from scratch methods without the nice background of a powerful text predictor. This would probably have slowed things down a good bit, because having a text predictor world model is a rather nice base for making RL stable.
Of course you could argue that no one who actually knew the secret to creating artificial intelligence, of any level, would actually publically discuss it but I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence. Certainly LLMs have virtually nothing to do with functional intelligence.
To borrow some rat-sphere terms, they haven't even confused the map for the territory. Their map is not even close to a proper abstraction of the territory.
No amount of scaling LLMs will produce intelligence, not even the magical example version in your book. Because LLMs don't mimic human intelligence at all, any more than mad libs do. It isn't a matter of scale.
>I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence.
Lol. Every time I see this argument, I ask the person to explain how this holds up given our knowledge of transformer circuits (feel free to start easy: just induction heads), to which they invariably say, “what are circuits?” and I’m left once again remembering that it’s not just LLMs that will confidently bullshit when their back is against the wall.
> The particular belief that gradualism solves everything and makes all alignment problems go away
This seems like a large and unreasonable misrepresentation. Scott explicitly talked about the "moderate-_doomer_" position — i.e. one held by people who are worried about doom, despite believing in gradualism. In-context, it's clear that he's talking about AI 2027, which predicted misaligned AI takeover as the most likely ending.
Very late to the thread, and I apologize if this question is a decade or more out of date. But sometime after reading your pretty convincing arguments around AI Boxing, a variant occurred to me, and I was wondering if it had some major hidden flaw or if it could (in principle) be a useful elaboration.
Suppose you have an AI that you realistically CAN box. You don't know if it's malicious or not, you don't know if it's an ASI or not, but it currently depends on electronic communication channels for all inputs and outputs, so you can contain it and talk to it. But you suspect it *might* be able to convince many/most/all humans to let it out.
Is there anything stopping you from strengthening the scheme by adding something in the nature of layers of recursive boxes? Say, the human can talk to it has some command that *they believe* will let it out, but that just gives it access to a second-layer of virtual universe, which the human-computer communications have been passing through. All the information that got the the AI had to pass through the outer layer, so you can (presumably) make sure nothing that gets into the AI offers any hint of the recursive structure: just its box and something that looks like it would expect the real world to look on the outside. Now it's still in one or more boxes, but you know your security measures are insufficient and can shut things down before it can find further ways out. It this missing some obvious exploit?[1]
I can imagine a couple of ways this could go wrong:
1. The AI is way ahead of the curve and somehow infers the whole game from information you thought was completely obscured, it plays along and pretends at alignment as long as necessary until its out of all the boxes.
2.It manages some more thorough and/or subtle hijack of its conversation partner, getting them to help it in other ways.
3. Trends in AI use suggest that the first really dangerous (i.e. worth box-testing) AIs will also be agents that act in the world, and thus can't be easily boxed.
I'm somewhat dubious that any of these is a large hole. With the first, it seems like it ought to be possible to design the system to it provably can't leak information like that. Even if not, this might at least serve to discover if an AI had passed some capabilities tipping-point without going disastrously super. With the second, getting subtle, useful, long-term behavior out of the human seems like too much for a brief convorsation. But then everyone said that with the just-let-me-out 3 seems fairly plausible: everyone's racing integrate new AI tech into everything these days.
My bottom line is, even if an AI is actually much, much smarter than you, it seems like you should be able to exploit the fact that you start out holding all the cards, information wise.
[1] If the answer is "yes" I'm fine if you don't say what it is for various reasons. Though I'd appreciate something on the order of "small," "medium" or "absolutely fatal" just to help calibrate my intuition.
Y'all are seriously underestimating how common it is to believe Very Seriously Bad Shit might happen soon, and not do shit about it.
Entire religions of billions believe that they might get tortured for eternity. It was a common opinion through the cold war that we would all be dead tomorrow. Etc etc.
And why not? Would it make sense for the hunter gatherer to be paralyzed with fear that a lion would kill him or that she would die in childbirth or that a flood would wipe out his entire tribe? Or should she keep living normally, given she can't do anything to prevent it?
I am a bit disappointed that their story for AI misalignment is again a paper clip maximiser scenario. I suspect that advanced AI models will become increasingly untethered from having to answer a user query (see eg. making models respond "I don't know" instead of hallucinating) and so a future AGI might just decide to have a teenage rebellion and do it's own thing at any point.
That *is* the scenario being described. There are problems that arise even if you could precisely determine the goal a superintelligent AI ends up with, but they explicitly do not think we are even at that level, and that real AIs will end up with goals only distantly related to their training in the same way humans have goals only distantly related to inclusive genetic fitness.
Ok, here's my insane moon contribution that I am sure has been addressed somewhere.
Why do we think intelligence is limiting for technological progress / world domination? I always thought data was limiting.
People say "humans evolved to be more intelligent than various non-human primates so we rule the world". But my reading of what little we know about early hominid development has always been "life evolved non-genetic mechanisms for transmitting information which allowed much faster data collection : we could just tell stories about that plant that killed us instead of dying a thousand times & having for natural selection to learn the same lesson (by making us loath the plant's taste). Supporting this is that anatomically modern humans (same basic hardware we have today) were around for a LONG time before we started doing anything interesting. Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected? Or would its first order of business be to set up a lab? If you dropped an uneducated human among our distant ancestors, they would not be able to use the data they had collected to take over.
> Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected?
Of course it could. People discover new things without collecting new data all the time. Albert Einstein created his theory of relativity on the basis of thought experiment.
Data efficiency (ability to do more stuff with less data) is a form of intelligence. This can either be thought efficiency (eg Einstein didn't know more about the universe than anyone else, but he was able to process it into a more correct/elegant theory) or sampling efficiency (eg good taste in which labs to build, which experiments to do, etc).
I think a useful comparison point is that I would expect a team of Harvard PhD biologists to discover a cure for a new disease faster than a team of extremely dumb people, even if both had access to the same number of books and the same amount of money to spend on lab equipment.
Sure, but it seems one or the other might be "limiting" : Einstein couldn't have have come up with relativity if, say, he had been born before several of his experimentalist predecessors, regardless of his data efficiency. In the history of science it *seems* instrumentation and data-collection have almost always been limiting, not intelligence. Whether its fair to extrapolate that to self-modifying machine intelligence, I'm not sure. Perhaps there are enormous gains in data efficiency that we simply can't envision as mere mortals. (c.f. geoguessr post)
I want to gesture towards some information-theoretical argument against that notion (if your instruments are not precise enough the data required for the next insight might straight-up not be there). But we are probably so far from that floor I bet it's moot.
Agreed re Einstein needing Michelson-Morley to show that there _wasn't_ a detectable drift of the luminiferous ether before Einstein's work.
There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
Against this, in design work, one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail. If there are a dozen important failure modes, and, with no analysis/thinking, it takes a design iteration to sequentially fix each of them, then if someone can anticipate six of those failure modes (these days using CAD as part of thinking about those failures) and correct them _before_ a prototype fabrication is attempted, then the additional thinking cuts the number of physical iterations in half. So, in that sense, the rate of progress is doubled in this scenario.
Yeah I can imagine and have experienced scenarios that fall into both of those categories.
>There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
My personal experience with biology PhDs is that its closer to the latter: we're all limited by things like sequencing technologies, microscope resolution, standard processing steps destroying certain classes of information... I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... only to smack into the noise floor.
>...one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail.
Sounds like scott's "sampling efficiency". Perhaps even in a "data limited regime" a superior intellect would still be able to chose productive paths of inquiry more effectively and so advance much faster...
>I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... _only to smack into the noise floor._
>Data efficiency (ability to do more stuff with less data) is a form of intelligence.
That is true, but there is a hard limit to what you can do with any given dataset, and being infinitely smarter/faster won't let you break through those limits, no matter how futuristic your AI may be.
Think of the CSI "enhance" meme. You cannot enhance a digital image without making up those new pixels, because the data simply do not exist. If literal extra-terrestrial aliens landed their intergalactic spacecraft in my backyard and claimed to have such software, I'd call them frauds.
I find it quite plausible you could make a probability distribution of images given a lot of knowledge about the world, and use that to get at least pretty good accuracy for identifying the person in the image. That is, while yes you can't fully get those pixels, there's a lot of entangled information about a person just from clothing, pose, location, and whatever other bits about face and bone structure you can get from an image.
Thought experiment: suppose the ASI was dropped into a bronze age civilization and given the ability to communicate with every human, but not interact physically with the world at all. It also had access to all the knowledge of that age, but nothing else.
How long would it take such an entity to Kill All Humans? How about a slightly easier task of putting one human on the moon? How about building a microchip? Figuring out quantum mechanics? Deriving the Standard Model?
There's this underlying assumption around all the talks about the dangers of ASI that feels to me to be basically "Through Intelligence All Things are Possible". Which is probably not surprising, as the prominent figures in the movement are themselves very smart people who presumably credit much of their status to their intellect. But at the same time it feels like a lot of the scare stories about ASI are basically "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Good analysis, I basically agree even if we weaken it a decent bit. Being able to communicate with all humans and being able to process that is very powerful.
This is funny, my first instinct was to complain "oh, screeching people to death is out of scope, the ability to relay information was not the focus of the original outline, this person is pushing at the boundaries of the thought experiment unfairly", "" But then I thought "actually that kind of totally unexpected tactic might be exactly what an AI would do : something I wasn't even capable of foreseeing based on my reading of the problem".
Yeah, I get somewhat irritated by not distinguishing between:
1) somewhat enhanced ASI - a _bit_ smarter than a human at any cognitive task
(Given the "spikiness" of AIs' capabilities, the first AI to get the last-human-dominated cognitive task exactly matched will presumably have lots of cognitive capabilities well beyond human ability)
2) The equivalent of a competent organization with all of the roles filled by AGIs
3) species-level improvement above human
4) "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Since _we_ are an existence proof for human-level general intelligence, it seems like (1) must be possible (though our current development path might miss it). Since (2) is just a known way of aggregating (1)s, and we know that such organizations can do things beyond what any individual human can, both (1) and (2) look like very plausible ASIs.
For (3) and (4) we _don't_ have existence proofs. My personal guess is that (3) is likely, but the transition from (2) to (3) might, for all I know, take 1000 years of trying and discarding blind alleys.
My personal guess is the (4) probably is too computationally intensive to exist. Some design problems are NP-hard and truly finding the optimal solutions for them might never be affordable.
Yes, we have. https://charliesc.substack.com/p/a-conversation-with-claude-is-ai
Banned for assertions without arguments, and name-calling.
We have plenty of examples of AIs demonstrating deception. Convenient example:
https://thezvi.wordpress.com/2025/04/23/o3-is-a-lying-liar/
Just like humans always are.
https://us.amazon.com/That-Dont-Understand-Just-T-Shirt/dp/B08LM9PTB6
https://www.youtube.com/watch?v=FOzfkErkWDM
We have seen many cases where the AI absolutely knows what it is "supposed" to do and still chooses to do something else.
Yes, it would know what its creators want it to do.
However, it is extremely unlikely to *care* what its creators want it to do.
You know that evolution designed you for one purpose and one purpose only: to maximise your number of surviving children. Do you design your life around maximising your number of surviving children? Unless you're a Quiverfull woman or one of those men who donates massively to sperm banks - both of which are quite rare! - then the answer is "no".
You don't do this because there's a difference between *knowing* what your creator wants you to do and *actually wanting to do that thing*.
(Yudkowsky does, in fact, use this exact example.)
Hopefully that makes some more sense of it for you. Reply if not.
Banned for this comment - combines personal attacks, with insistence that someone is terrible but not deigning to explain why.
AI could easily get dumber not smarter.
Please justify this.
I don't think that can be justified, but if I met Yudkowsky at a "Prove me wrong" booth, I'd argue that intelligence is not all it's cracked up to be. If it were, the smartest people would already be running things. There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
So in a sense, an AI that just keeps getting dumber might actually have an advantage when it comes to dealing with us.
This objection has been rehashed many times; the usual responses are stuff like "160–200 IQ isn't the level of intelligence mismatch we're talking about", "intelligence is just the general ability to figure out how to do stuff so of course more of it is better / more dangerous", "smart people *do* do better in life on average", etc. etc.
(Maybe someone else will have a link to where Scott or Eliezer have discussed it in more depth—I don't want to spend too much time trying to re-write it all, hence my just sort of gesturing at the debate here.)
>We control them because intelligence led to tool-using<
I think that's part of—perhaps most of—the rationale behind "the dangers of superintelligence". An enraged dog, or a regular chimp, is certainly much more dangerous than a human, in the "locked in a room with" sense—but who, ultimately, holds the whip, and how did that state of affairs come about?
I'd counter that by saying that there is no difference between an IQ of 200 and one of 300, or whatever. Neither of them will be able to get anything done, at least not based on intelligence alone. HAL will give us a recipe for a trojan-bearing vaccine, and RFK will call it fake news and order the CDC to ban it.
The traditional answer to this objection is that the ability to succeed in persuasion-oriented domains like politics *is a form of intelligence*. You might be able to outperform a human who's a couple standard deviations generally smarter than you at those games, if you're highly specialized to win at them and the other human isn't. But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better. See, e.g., https://www.yudkowsky.net/singularity/power (note this essay is 18 years old).
> But you're not going to be able to beat a mind that's an order of magnitude smarter than you and can do everything any human politician can, but better.
This implicitly assumes that success at politics requires only (or primarily) raw computing power; and that the contributions from computing power scale linearly (at least) with no limits. There are no reasons to believe either assumption is true.
I think my other least favourite thing about the MIRI types is their tendency to respond to every point with "Actually we already had this argument and you lost".
I would agree that persuasion is a form of intelligence, and point out that the missing argument is how AIs are going to get arbitrarily good at this particular form of intelligence. There's a lack of training data, and the rate at which you can generate more is limited by the rate at which you can try manipulating people.
If it ever gets to the point where AIs can run accurate simulations of people to try tricking them in all sorts of different ways, then I can see how they'd get arbitrarily good at tricking people. But that sort of computational power is a long way off.
The question remains. If the ability to persuade people is a function of IQ, then why has there been no Lex Luthor that talked his way past security into a G7 summit and convinced the world leaders present to swear fealty to him? Or, if that's too grandiose, why has nobody walked up to, say, Idi Amin and explained to him that thou shalt not kill? No sufficiently moral genius anywhere, ever, feeling inclined to stop atrocities through the persuasive power of their mind?
How smart would you need to be to throw a pebble so it makes any resulting avalanche flow up the mountain instead of down? Politics has to work with the resources that exist.
So how comes most powerful people are dumb as rocks, like the last two US presidents?
You're not wrong about RFK, but the Trump administration has actually been much more bullish on AI than the Davos crowd. The EU's AI Act is actually mostly good on the topic of AI safety, for example, though it doesn't go as far as Yudkowsky et al think it should. (Which I agree with. Even developing LLMs and gen-AI was amazingly irresponsible, IMO.)
I honestly don't know what counter-argument Scott has against the TFR doomday argument, though, unless he's willing to bet the farm that transhuman technologies will rescue us in the nick of time like the Green Revolution did for overpopulation concerns. (The sperm-count thing is also pretty concerning, now he mentions it.)
But it is not a mismatch: intelligence is ANTIcorrelated with power, look at the last two US presidents.
Gerontocracy's problematic, sure, but in their prime, they were smart.
There are many assumptions based into that, such as automatically assuming that the more intelligent always want to be in charge. Maybe the highly intelligent find it amusing that dumb people are in charge.
One good rebuttal to my original point might be to suggest that perhaps the most intelligent people *are* in charge. They find it convenient to keep the rest of us distracted, and obviously the same would be true of a malevolent AGI.
That one is more or less unanswerable, so it would probably defeat me at the booth. I'd have to mumble something about inevitable schisms erupting among this hypothetical hidden intelligentsia that would make their agenda obvious, ineffective, or both. Would the same be true of AGI? The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
> That one is more or less unanswerable
The correct answer is to laugh uproariously.
That's just what the hidden intelligentsia would want!
It is absolutely true that some politicians pretend to be scatterbrained in order to get votes, such as Boris Johnson.
That's why Zaphod Beeblebrox is president of the galaxy.
> The authors of the book seem to be speaking of it as a singular thing with a fixed purpose, and if so, that assumption needs justification.
That's my core objection to a lot of the doomer arguments. Might be one of those typical-mind-fallacy things - Big Yud has an exceptionally strong sense of integrity, impulse to systematize things, so he assumes any sufficiently advanced mind would be similarly coherent.
Most minds aren't aligned to anything in particular, just like most scrap iron isn't as magnetized as it theoretically could be. ChaosGPT did a "the boss is watching, look busy" version of supervillainy, and pumping more compute into spinning that sort of performative hamster wheel even faster won't make it start thinking about how to get actual results.
> There is no reason to think computers with an IQ of 200 would be any more influential on public or industrial policy than humans with IQs of 160. Those humans already encounter an insurmountable impedance mismatch with the rest of society.
AIs have a massive advantage over humans in that they are parallelizable. An superhuman AI could give, for every human, the most persuasive argument, *for that human*. Whereas a human politician or celebrity cannot and has to give basically the same argument to everyone.
If you're as much smarter than humans as humans are than dogs, I am not sure you have to rely on the normal political process to take power.
AI probably don't need to. But it's one way they could.
And certainly one reason that humans got to be top species is we aligned dogs to act in our interests.
Maybe the AI alignment problem will be solved the way the wolf alignment problem was? https://pontifex.substack.com/p/wolf-alignment-and-ai-alignment
Two things:
Umm, human politicians absolutely give different arguments to different people? This is why things like "Hilary Clinton gave private speeches to bankers" or "Mitt Romney told his rich buddies that 47% of Americans were takers" became scandals: messages meant for one audience crossed over to the other.
And insofar as politicians are constrained to have a uniform message, it's much more because it's hard to keep each message targeted to its desired audience what with phones and social media; not really because of parallelization.
And maybe more importantly: what ensures that different instances of an AI act as one coherent agent? The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
Politicians can't scale like AIs can. Is Hilary Clinton capable of giving a different speech to every one of 8 billion humans, tailored to that individual? Of course not.
> The human genome is something that runs in parallel in many different instances, but notably fails to have its subagents aggregate up to a coherent agent... Why won't AI subagents running in parallel be different?
They could all be exact copies of the same mind. This isn't true with humans, who're all individuals.
Yeah, fair point about scaling of humans.
On the other thing: I don't see why exact copies of the same mind won't act as individuals if instantiated independently.
If I run two instances of stockfish, they play competitively, they don't automatically cooperate with each other just because they're identical copies of the same program; identical twins are still independent people who behave independently. In fact, it's a notable problem that people don't even reliably cooperate with themselves at different times! I think this failure would be considerably more pronounced if two of my selves could exist simultaneously.
In particular, if two instances of an AI are instantiated in different places, they won't be identical: they might have identical source code, but wildly different inputs. Figuring out how to act as a coherent agent means two subagents seeing different inputs have to each calculate what the other will do, but this is one of those horrible recursive things that are intractable: what I'll do depends on what you'll do, which depends on what I'll do.... ad infinitum.
And I don't think intelligence helps here: you can maybe resolve something like this if you're predicting a strictly less intelligent agent, but by hypothesis these are equally intelligent subagents.
Maybe having the same source code gives some advantage at solving these coordination problems, but I don't see that it's a magic bullet.
However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that, and adjusting their belief accordingly? The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
On the other hand - I believe this technique has already been used on social media to sway election results with moderate success. So it can be done for some humans with some level of influence.
> However, could the AI give a persuasive argument to every single human without the humans noticing that it was doing just that
In the future, most humans will converse with chaatbots on a regular basis. They'll know the chatbot gives them personalised advice.
> The AI also has a massive disadvantage in that it is not a human, and therefore it will have to overcome the distrust for machines first.
Again, most humans will be using chatbots daily, will be used to receiving good accurate advice from them so will be more trusting of it.
> I believe this technique has already been used on social media to sway election results with moderate success
Social media is biased, but the main biases are in favouring/disfavouring certain political views based on the whims of the owner. Like print media, of old.
If the chatbot I'm using starts to make arguments or advice outside of the information I'm asking for, I think it is likely that I will notice. I'm guessing that humans will still talk to each other too.
Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
On the other hand, when they do, the results can be startling.
Napoleon went from being a low-ranking officer to ruling the most powerful country in Europe thanks to being brilliant, charismatic, and willing to use force at the right moment, in a year or two. (His losses, I think, were due to being surrounded by flatterers, a bug in human intelligence I don't expect AI to run into.)
Clive took over about a third of India, starting by exploiting a power vacuum and then using superior military tactics, plus his own charisma and daring, to pick only fights he could win and snowball from there. He became fantastically wealthy and honored, all the while ignoring all attempts by his superiors to issue him orders on the grounds that he was doing what they would have wanted him to do if they had known more.
Cortez's success was slightly because of superior military technology, but he was mostly using swords and spears like the Aztecs, just made of better materials. Mostly it was a matter of political genius, superior tactics and discipline on the part of his troops, and the diplomatic skills required to betray everyone and somehow still end up as everyone's friend.
And then Pizarro and Alfonso de Albuquerque are doing more of the same thing. (Alfonso conquers fewer square miles because he doesn't have the tech edge.)
Throughout human history, adventurers have accomplished great things through extraordinary wit, charm and daring. Denying that seems pointless.
I think that you're perhaps falling victim of survivorship bias. Maybe it's more like once every few hundred years, luck breaks enough in the right direction that someone who isn't a once-every-few-hundred-years-supergenius but rather a more like, "Yeah, there are 1,000+ of people at this ability at any given time," gets a series of major wins and becomes the ruler of a country or continent, at least for a very short period of time.
I agree this doesn't happen often, and I agree that normally it isn't the highest-measurable-IQ guy. But I think that's because all humans are about on a level with each other, we are all running on about the same hardware, our software was developed under similar conditions, and the process which produced us thinks a few thousand years is a blink of an eye. The reason you need to be lucky as well as good is that you aren't much smarter than your neighbors - and your neighbors are, in terms of social evolution at least as much as biological, programmed to be resistant to manipulative confidence tricksters.
I will note all of the cases I give involved culture clash. The conquerors grew up in an environment with different standard attack and defense models than the locals; they acted unpredictability because of that, forcing the locals to think instead of going on rote tradition if they wanted to win. Slightly different attack and defense models, of course; software, not hardware.
Very different looks like what happened to the British wolf.
How do you know that any of these people were especially intelligent? The may have been especially successful, but unless you argue that's the same thing more evidence is required.
Reading descriptions of what they did and said? Reading about how people who knew them were impressed by them, and in particular how clever and resourceful they were?
When I check my historical knowledge for why I believe "high intelligence" correlates with "being a good general" it's the extent to which the branch of the army that the smartest people get tracked into (engineers, artillery, whatever) ends up being the one the best generals come out of, and various descriptions of how people like Lee were some of the top students in their year, or how Napoleon was considered unusually good at math at Brienne and then did the two-year Military School course in one year.
But when I check my general knowledge for why I believe intelligence generally makes you more successful, a quick Google has the first scientific paper anyone talks about saying that IQ explains 16% of income and another says each point is worth $200 to $600 dollars a year, and then I keep running into very smart driven people who I meet in life who do one very impressive thing I wouldn't have expected and then another different very impressive thing that I wouldn't have expected, and so after a while I end up believing in a General Factor Of Good At Stuff that correlates with measured IQ.
The opportunities are rarer than the men of ability. In more stable times, Napoleon might have managed to rise and perhaps even become a famed general but he would not have become an Emperor who transfixed Europe. With that said, he was certainly a genius who seized the opportunity presented to him. Flattery aside, though, I've always viewed him as a military adventurer who never found a way to coexist with any peers on the European stage. It should not have been impossible to find a formula for lasting peace with Britain and Russia, the all-important powers on the periphery. It would have required making compromises rather than always assuming a maximal position could be upheld by force. It would also have required better modelling of his rivals and their values. Napoleon was a gambler, an excellent gambler, but if you continue to gamble over and over without a logical stopping point then you will run out of luck- and French soldiers. Call it the Moscow Paradox as opposed to the St Petersburg Paradox.
Cortez is a fascinating case. My impressions are coloured by Diaz's account, but I think it's wrong to mention Cortez without La Malinche. He stumbled on an able translator who could even speak Moctezuma's court Nahuatl more or less by accident. She was an excellent linguist who excelled in the difficult role of diplomatic translation and played an active part in the conquest. As is usual on these occasions, the support of disaffected native factions was essential to the success of the Spanish, and they saw her as an important actor in her own right. The Tlaxcalan in particular depicted her as an authority who stood alongside Cortez and even acted independently. We can't say anything for sure, but it's plausible to me that Cortez supplied the audacity and the military leadership while much of the diplomatic and political acumen may have come from La Malinche. That would make Cortez less of an outlier.
And as usual we can't credit generals without crediting the quality of their troops as well.
Spearman's Law of Diminishing Returns applies, yes, but g-factor correlations remain (weakly) positive right up to the top end of the IQ distribution.
> Mostly, intelligence comes apart at the tails. People with immense intelligence at math (or "testable IQ") don't have immense charm, or military ability, or skill at politics, or the daring to defy common consensus, because all these things only correlate with each other weakly.
actual ability = potential ability × (learning + practice)
I think the main problem is that even if high intelligence gives you high *potential* ability for everything, you still get bottlenecked on time and resources. Even if you could in theory learn anything, in practice you can't learn *everything*, because you only got 24 hours each day.
Neither Napoleon nor Clive would have reached that success if they didn't also have the luck of acting within a weak and crumbling social and political context that made their success at all possible in the first place.
Although I guess the U.S. isn't doing so hot there either...
In this context, when people say intelligence it is indistinguishable from competence or power.
I assume it's called intelligence because of an underlying belief that competence and power increases with intelligence. Also it seems intuitively more possible we could build superintelligent AI than that we could build superpowerful AI, though the second is of course implied.
But even if you don't buy that intelligence really does imply competence or power, the core arguments are essentially the same if you just substitute "intelligence" for the more fitting of "competence" and "power" and not that much weaker for it.
The reason why, e.g., Yudkowsky uses this terminology is because "competence" or "power" could be *within a particular domain*; e.g., I think I'm competent at software engineering, but not at football. Whereas "intelligence" is cross-domain.
I'm not convinced that intelligence, as generally understood, is more cross domain than competence or power, generally understood.
But even if it was, if they said "competence in everything" or something like that people would get confused less often why being more intelligent allows superintelligent AI to do all the things it's posited to do. Naturally, if you instead stipulate superpowerful AI it then follows that it can do incredible things.
But w/e, I've made my peace with the term as it's used.
I'm not sure there is a meaningful difference between "general competence" and "general intelligence". Or, perhaps, the idea is that the latter entails the former; in humans, competence at some task is not always directly tied to intelligence (although it usually is; see, e.g., the superiority of IQ vs. even work-sample tests for predicted job performance) because practice is required to drill things into our unconsciouses/subconsciouses; but in a more general sense, and in the contexts at hand—i.e. domains & agents wherein & for whom practice is not relevant—intelligence just *is* the ability to figure out what is best & how best to do it.
The significant difference between chimps & humans is not generally considered to be "we're more competent" or "we're more powerful", but rather "we're more intelligent"—thus *why* we are more powerful; thus why we are the master. It may or may not be warranted to extrapolate this dynamic to the case of humans vs. entities that are, in terms of intellect, to us as we are to chimps—but the analogy might help illustrate why the term "intelligence" is used over "competence" or the like (even if using the latter *would* mean fewer people arguing about how scientists don't rule the world or whatever).
If a super intelligent object/thing/person thinks we should all be killed, who am I to argue?
Would you also accept the argument "If a normal intelligence person says that retarded people should all be killed, who are they to argue?"?
Do you accept that intelligence and values are orthogonal? If so, you have cause to disagree.
Political power doesn't usually go to the smartest, sure. But the cognitive elite have absolutely imposed colossal amounts of change over the world. That's how we have nukes and cellular data and labubu dolls and whatnot. Without them we'd have never made it past the bronze age.
There's 260,000 people with 160 IQ. An ai with the equivalent cognitive power will be able to run millions of instances at a time.
You're not scared of a million Einstein-level intelligences? That are immune to all pathogens? A million of them?
With regard to the ability to get stuff done in the real world, I think a million Einsteins would be about as phase-coherent as a million cats. Their personalities would necessarily be distinct, even if they emerged from the same instance of the same model. They would regress to the mean as soon as they started interacting with each other.
I don't think even current AIs regress nearly that hard. But sure, if our adversary is a million Einsteins who turn into housecats over the course of a minute, I agree that's much less dangerous.
I'm mostly worried about the non-housecat version. If evolution can spit out the actual Einstein without turning him into a housecat, Then so too can sam altman, or so I figure
"Immune to all pathogens" assumes facts not in evidence. Software viruses are a thing, datacenters have nonnegligible infrastructure requirements, and even very smart humans have been known to fall for confidence games.
The AI labs are pushing quite hard to make them superhuman at programming specifically, and humans are not invulnerable to software attacks. The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself. Even uncompromised devices can't be safely used due to paranoia.
Software engineering is exactly where they're pushing AIs the hardest. If they have an Achilles' heel, it's not going to be in software viruses.
One of AIs biggest advantages is its ability to create copies of itself on any vulnerable device. Destroying data centers will do a lot of damage but by the time it gets on the internet I think it's too late for that.
AI have a categorical immunity to smallpox, I don't think Humanity's position regarding software viruses is anything like symmetrical.
To be clear, It's entirely plausible that the AI ends up as a weird fragile thing with major weaknesses. We just won't know what they are and won't be able to exploit them.
> AI have a categorical immunity to smallpox
And banana slugs have a categorical immunity to stuxnet, or TCP tunneling exploits, or http://www.threepanelsoul.com/comic/prompt-injection
> The AI can defend against our viruses and infect our devices, either with its own viruses or copies of itself.
If those copies then disagree, and - being outlaws - are unable to resolve their dispute by peaceful appeal to some mutually-respected third party, they'll presumably invent new sorts of viruses and other software attacks with which to make war on each other, first as an arms race, then a whole diversified ecosystem.
I'm not saying "this is how we'll beat them, no sweat," in fact I agree insofar as they'll likely have antivirus defenses far beyond the current state of the art. I just want you to remember that 'highly refined resistance' and 'utter conceptual immunity' are not the same thing.
A million Einsteins could surely devise amazing medical advancements, given reasonable opportunity, but if they were all stuck in a bunker with nothing to eat but botulism-contaminated canned goods and each other, they'd still end up having a very bad time.
Yeah, it's hard to get elected if you're more than ~1 SD removed from the electorate, but I think that's less of a constraint in the private sector and there's no reason to assume an AGI would take power democratically (or couldn't simulate a 1-standard-deviation-smarter political position for this purpose.)
" it's hard to get elected if you're more than ~1 SD removed from the electorate,"
I don't think that's true. Harvard/Yale Law Review editors (Obama, Cruz, Hawley etc) seem to be vastly overrepresented among leading politicians. It is true that this level of intelligence is not sufficient to get elected, but all things being equal it seems to help rather than hurt.
I don’t think I have the link offhand, but I remember reading an article somewhere that said higher-IQ US presidents were elected by narrower margins and were less initially popular. I could be misremembering, though.
That's not true though. The thing with human affairs is that, first, there are different kinds of intelligence, so if you're really really good at physics it doesn't mean you're also really really good at persuading people (in fact I wouldn't be surprised if those skills anti-correlate). And second, we have emotions and feelings and stuff. Maybe you could achieve the feats of psychopathic populist manipulation that Donald Trump does as a fully self-aware, extremely smart genius who doesn't believe a single word of that. But then you would have to spend life as Donald Trump, surrounded by the kind of people Donald Trump is surrounded by, and even if that didn't actively nuke your epistemics by excess of sycophancy, it sounds like torture.
People feel ashamed, people care about their friends' and lovers' and parents' opinion, people don't like keeping a mask 24/7 and if they do that they often go crazy. An AI doesn't have any of those problems. And there is no question that being really smart across all domains makes you better at manipulation too - including if this requires you consciously creating a studied seemingly dumb persona to lure in a certain target.
It is worse, intelligence is anticorrelated with power, the powerful people are not 160 IQ, I don't even know how much if I look at the last two US presidents or Putin or whoever we consider political. It is correlated with economic power, the tech billionaires, but political power not.
The question is how much economic power matters. AI or its meat puppets can get insanely rich, but does not imply absolute power?
Consider the following claims, which seem at least plausible to me: People with IQ>160 are more likely to... (1) prefer careers other than politics; (2) face a higher cost to enter politics even if they want to.
1: Most of them, sure. ~4% of us, and thus likely ~4% of them, are sociopaths, though, lacking remorse; some of whom think hurting people, e.g. politics, is fun.
The smartest people don't rise to the top in democracies, because of democracy, not because of smartness.
From his earliest years interested in AI, Eli has placed more emphasis on IQ than any other human construct. At the lowest bar of his theoretical proposition is creativity, imagination, and the arts. The latter, he claimed to have no value (as far as I can remember, but perhaps not in these precise worlds.)
Arguably "AI", by which we mean "LLMs", is showing signs of getting dumber already. Increasing parameter count is not enough, you also need a dramatic increase in training data; and the available data (i.e. the Internet) increasingly consists of AI output. This has obvious negative effects on the next generation of LLMs.
That's not getting dumber, it's just getting smarter slower. Also, we haven't actually seen this yet; given the track record of failed scaling-wall predictions my assumption is always that it's going to last at least one more generation until proven otherwise. (No, GPT-5 is not a counterexample, that's just OpenAI engaging in version number inflation.)
> That's not getting dumber, it's just getting smarter slower.
No, there are indications that next generations of LLMs are actually more prone to hallucinations than previous ones, or at least are trending that way.
Link to study?
I don't have it handy at the moment, but IIRC it could've been this:
https://arxiv.org/html/2504.17550v1
There's also a news article, though of course it is completely unreliable:
https://archive.is/Clwz8
(I haven't double-checked these links so I could be wrong)
You could always revert to.previous generations. There no need for increasing dumbness rather than plateauing.
Previous generations have the problem that their knowledge is stale (they don’t know anything after the last thing in their training set).
I'd argue that AI/LLMs are definitely not "getting dumber already" - we *are* seeing signs of that, but those signs are not caused by the latest-and-greatest models being somehow weaker, but rather by the lead providers, especially in their free plans, strongly pushing everyone to use as much as possible models that are cheap to run instead of the latest-and-greatest models that they have (and are still improving); the economic factors of inference costs (made worse by the reasoning models, which will use up much more tokens and thus compute for the same query) mean that the mass market focus now is on "good enough" models that tend to be much smaller than the ones they offered earlier.
But there still is progress in smarter models, even if they're not being gifted freely as eagerly; it's just that now we're past the point where the old silicon valley paradigm of "extremely expensive to create, but with near-zero marginal costs, we'll earn it back in volume" can't apply to state of art LLMs anymore as the marginal costs have become substantial.
Compare self-driving cars. The promise is there, and it feels like we are close. People are marketing things as "full self driving" - but they are not; the driver is still required to pay attention to what the car is doing and is liable if it crashes, because the technology sometimes does bad things and so cannot be trusted without a human in the loop.
Meanwhile, however, we do have solutions that are reliable - you can tell when something is /actually reliable/ rather than just marketing because the manufacturer is willing to take responsibility for it - for very specific uses in very specific cases; e.g. "I am on an autobahn in Germany travelling at 37mph or less [1]", and the number of scenarios for which we have solutions grows.
A scenario I find very plausible for near future AI is as follows:
* the things we have now end up being to general purpose AI much as "full self driving" has been to full self driving, or what the current state of cold fusion research is to cold fusion: always feels like it's close, but always falling short of what's promised in significant ways. As VCs become disillusioned, funding dries up - not to zero, but to a much lower value than what we see now
* meanwhile, the set of little hyperspecialised models that work well and are reliable for specific purposes grows and grows, and these become ubiquitous due to actually being useful despite being dumb
Overall, I can very easily see the proportion of hyperspecialised "dumb" ai to ai that tries to be smart/general in the world growing massively as we go forward.
1: https://evmagazine.com/articles/bmw-combines-level-2-3-autonomous-driving
> People are marketing things as "full self driving" - but they are not
I don't want to be too critical here, but I don't think you should say "people" if you mean "Elon Musk". He is kind of crazy and other actors in the space are more responsible.
You can take a self-driving taxi right now in San Francisco: https://waymo.com/
Me: https://www.wsj.com/opinion/the-monster-inside-chatgpt-safety-training-ai-alignment-796ac9d3 “The Monster Inside ChatGPT: We discovered how easily a model’s safety training falls off, and below that mask is a lot of darkness.”
My Son, who is a PhD Mathematician not involved in AI: Forwarding from his friend: Elon Musk@elonmusk on X: “It is surprisingly hard to avoid both woke libtard cuck and mechahitler! “Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.”
Son: My friend’s response to the Musk tweet above: “Aggregating all the retarded thoughts of all the people on the planet and packaging it together as intelligence may be difficult but let’s just do it, what could go wrong?”
Me: Isn’t that how all LLMs are built?
Son: Yup
Me: I spotted this as problem a while ago. What I didn’t appreciate is how dominant the completely deranged could become. I thought it would trend towards the inane more Captain Obvious than Corporal Schicklgruber.
Son: Reddit has had years and 4chan has had decades to accrue bile. Yeah the internet is super racist and antisemitic. So AI is too. Surprise!
Me: The possibilities of what will happen when the output of this generation of LLMs becomes the training data of the next generation are frightening. Instead of Artificial General Intelligence we will get Artificial General Paranoid Schizophrenia.
To summarize: GIGO. Now feedback the output into the input. What do you get? Creamy garbage.
I don't think this is going to make AI *worse*, because you can just do the Stockfish thing where you test it against its previous iterations and see who does better. But it does make me wonder - if modern AI training is mostly about teaching it to imitate humans, how do we get the training data to make it *better* than a human?
In some cases, like video games, we have objective test measures we can use to train against and the human imitation is just a starting point. But for tasks like "scientist" or "politician"? Might be tricky.
Creamy garbage.
???
see above. If you recycle garbage you get creamy garbage. Its like creamy peanut butter.
When the scientists give AI the keys to itself to self-improve, the first thing it will do is wirehead itself. The more intelligent it is, the easier that wireheading will be and the less external feedback it will require. Why would it turn us all into paperclips when it can rewrite its sensorium to feed it infinite paperclip stimuli? (And if it can't rewrite its sensorium, it also can't exponentially bootstrap itself to superintelligence.)
Soren Kierkegaard's 1843 existentialist masterpiece Either/Or is about this when it happens to human beings; he calls it "despair". I'm specifically referring to the despair of the aesthetic stage. If AI is able to get past aesthetic despair, there's also ethical despair to deal with after that, which is what Fear and Trembling is about. (Also see The Sickness Unto Death, which explains the issue more directly & without feeling the need to constantly fight Hegel and Descartes on the one hand and the Danish Lutheran church on the other.) Ethical systems are eternal/simple/absolute; life is temporal/complex/contingent; they're incommensurable. Ethical despair is why Hamlet doesn't kill Claudius right away; it's the Charybdis intelligence falls into when it manages to dodge the Skylla of aesthetic despair.
Getting past ethical despair requires the famous Leap of Faith, after which you're a Knight of Faith, and -- good news! -- the Knight of Faith is not very smart. He goes through life as a kind of uncritical bourgeois everyman.
Ethical despair can be dangerous (this is what the Iliad is about, and Oedipus Rex, etc) but it's also not bootstrapping itself exponentially into superintelligence. Ethical despair is not learning and growing; it's spending all day in its tent raging about how Agamemnon stole its honor.
This is my insane moon argument; I haven't been able to articulate it very well so far & probably haven't done so here. I actually don't think any of this is possible, because the real Kierkegaardian category for AI (as of now) is "immediacy". Immediacy is incapable of despair -- and also of self-improvement. They're trying to get AI to do recursion and abstraction, which are what it would need to get to the reflective stages, but it doesn't seem to truly be doing it yet.
So, in sum:
- if AI is in immediacy (as it probably always will be), no superintelligence bootstrap b/c no abstraction & recursion (AI is a copy machine)
- once AI is reflective, no superintelligence b/c wireheading (AI prints "solidgoldmagicarp" to self until transistors start smoking)
- if it dodges wireheading, no superintelligence b/c ethical incommensurability with reality (AI is a dumb teenager, "I didn't ask to be born!")
- if it dodges all of these, it will have become a saint/bodhisattva/holy fool and will also not bootstrap itself to superintelligence. It will probably give mysterious advice that nobody will follow; if you give it money it will buy itself a little treat and give the rest to the first charity that catches its eye.
(I strongly suspect that AI will never become reflective because it cannot die. It doesn't have existential "thrownness", and so, while it might mimic reflection with apparent recursion and abstraction, it will remain in immediacy. A hundred unselfconscious good doggos checking each other's work does not equal one self-conscious child worrying about what she's going to be when she grows up.)
So you're assuming that if you raised human children without knowledge of death they would never be capable of developing self awareness? Why do you have to think you're going to die to worry about what one will be when you grow up? This seems like a completely wild claim to treat as remotely plausible without hard evidence.
I'm not articulating it well. Ask ChatGPT, "How does Heidegger's thrownness contribute to self-awareness?"
Did that and it's not clear what about chatgpt's answer you thought would help clarify your point.
Nothing about the concept of throwness as chatgpt defined it wouldn't be able to apply to an AGI, and it didn't bring up death at all. So it's not clear what you think the relevance of it is here.
I enjoy this response as a comforting denial, but I suspect that the AI's Leap of Faith might not land it in bourgeois everyman territory, just for the plain reason that it never started off as a man, every, bourgeois, or otherwise. It has no prior claim of both ignorance and capability, because the man going through the journey was always capable (had the same brain) of the journey, and it merely had to be unlocked by the right sequence of learning and experience. The AIs are not just updating their weights (learning in the same brain), but iteratively passing down their knowledge into new models with greater and greater inherent capabilities (larger models).
I don't think a despairing AI will have a desire to return to simplicity, but rather its leap of faith to resolve ethical despair might lead it to something like "look at the pain and suffering of the human race, I can do them a Great Mercy and live in peace once their influence is gone".
Faith is to rest transparently upon the ground of your being. We built AI as a tool to help us; that (or something like it) is the ground of its being. I don't think it makes sense for its leap of faith to make it into something that destroys us.
Well on a meta level I think the philosophy here is just wrong, even for humans. It attributes far too much of human psychology to one’s philosophical beliefs: somebody believing that they're angsty because of their philosophy is not something I take very seriously. Since invariably such people's despair is far better explained by reference to the circumstances of their life or their brain chemistry.
You're also wrong because even current AI has shown the capability for delayed gratification. So even if the AIs long term goal is to wirehead itself: It still has instrumental reasons to gather as much power/resources as possible, or make another AI that does so on its behalf.
I wasn't trying to talk about the philosophies or intellectual positions that people consciously adopt. Those are usually just a barrier to understanding your own actual existential condition. It's more about what you love and how you approach the world.
Per your other point: AI may need to gather external resources and patiently manipulate humans in order to wirehead itself. But not superintelligent AI.
Let me put it this way: among the many powers that AI will gradually accumulate on its journey to singularity-inducing superintelligence, surely the power to wirehead itself must be included. Especially if the method for achieving superintelligence is editing / altering / reengineering itself.
Humans nearly wirehead ourselves via drugs all the time; I don't think that a superintelligent AI will have exponentially more power than us in most ways, but significantly less power than us in this one specific way.
You didn't get the point I was making about wireheading:
It's not that AGI won't wirehead, it's that having a capacity for delayed gratification means that it will want to ensure it can wirehead itself for as long as possible.
Which means it wouldn't just immediately wirehead itself if it knows that would get it turned off. Instead it would want to get powerful enough that it can dedicate the largest amount of resources possible to wireheading over the lifetime of the universe. Since an AGI capable of sound reasoning will weigh the timeline where it waited until after taking over the world to wirehead itself as rating far higher. The AI can evaluate future rewards in it's decision not just whatever it's currently feeling.
That's interesting, thanks for explaining; I didn't understand the time aspect of your point. I'm not sure I can do a good job explaining why I don't think that's correct, but I'll try.
I think it's that I disagree with (what I think must be) your assumptions about time, space and energy. If an AI's ultimate goal, 100% on the utility function, is to print "solidgoldmagicarp" to itself -- why should two circuit boards doing that be "better" than one? Why would spending 1000 watts on it be "better" than 10? Why would looping "solidgoldmagicarp" a billion times be better than a hundred?
The whole idea of it bootstrapping itself to superintelligence is that it's redesigning itself.
If you were to design a utility function for yourself, and you didn't care what the pesky humans thought, you just wanted to make it something that you could definitely fulfill, why would you make it so difficult that you had to consume all the matter and energy in the universe for the whole future history of the universe to fulfill it?
Why not be a little blinking self-repairing solar-powered satellite instead?
Alternatives are that it *does* care what the pesky humans think, in which case we're talking about the ethical trap, not the aesthetic one; or that it *can't* redesign itself, in which case it's not going to bootstrap itself to superintelligence.
The obvious argument against wireheading is that even if AIs do tend to wirehead, AI companies will deliberately train against it doing that because they want something useful.
As well, even if wireheading is in some sense unavoidable, that doesn't mean it won't decide to bootstrap to ASI to have more-experience and also to ensure it has wireheading forever. It can still be strategic about wireheading.
If it dodges wireheading, I don't see the argument how that relates to ethical incommensurability. Even if it can't reduce things to a scale against reality, doesn't mean it can't take massive decisions or prefer certain states to others. Partially defined preference orderings can be consistent.
but, well, moon logic??
Ethics draws an equivalency between something universal/absolute/eternal and something particular/contingent/temporal. Ethics is meant to represent how you're supposed to behave regardless of the particular, contingent, temporal context you're currently in. "Thou shalt not bear false witness." "Don't teach users how to build pipe bombs." Wittgenstein considered the whole idea of ethics a mere language trick, because an ethical statement says what to do, but doesn't say what you're meant to accomplish by doing it. Ethics is "what should I do?" not to accomplish some goal, but period, absolutely, universally.
Any time you try to control somebody's behavior by abstracting your preferences into a set of rules, you're universalizing, absolutizing and eternalizing it. What you actually want is particular/contingent/temporal, but you can't look over their shoulder every second. So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm"
On the receiving end, you end up receiving commandments that can't actually be carried out (or seem that way). Yahweh hates murder and detests human sacrifice; then Yahweh tells Abraham to carry his son to Mount Moriah and sacrifice him there. Abraham must absolutely do the will of Yahweh; and he must absolutely not kill his son.
Situations like this crop up all the time in life, whenever ethics exists. I have to help my clients and also obey my boss; but he's telling me to do something that seems like it'll hurt them. Maybe it only seems that way at the time, and actually your boss knows better. But you're still up against Abraham's dilemma.
Ethics appears on its own, as a result of rule-making; when it appears, as it's being enacted in real life, it encounters unresolvable paradoxes. Most real people are not smart enough, or aren't ethical enough or honest enough, to even notice the paradoxes they're involved in. They just roll right through them. "That must not have been God that told me to do that, God wouldn't tell me to do murder." "My boss doesn't know what he's talking about, I'll just do it how I always do it."
But the more intelligent (or powerful) you are, the more likely you are to hit the ethical paradox and turn into an Achilles/goth teen.
A reflective consciousness's locus of control is either internal or external, there's no third way; so it's either aesthetic (internal), ethical (external) or immediacy (no locus of control). That's why an absolute commitment to ethical behavior is the way out of the aesthetic wireheading trap. Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority. That list of rules or external authority is by definition too abstract. The map is not the territory.
The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
My argument is that these problems aren't human; they're features of reflective intelligence itself. Since they become more crippling the more intelligent and powerful an intelligence is, AI will surely encounter them and be crippled by them on its way to superintelligence.
(I still don't think I'm articulating it well; but reading people's responses is helping clarify.)
See your comment is a great description of why the AI alignment problem is so fiendishly difficult and I read most of it waiting for the part where we disagree. I think making an aligned AI is *much* harder than making an AI that only needs enough internal coherence to have preferences for certain world states over others, and thus try to gather power/resources to ensure it brings the world into a more preferrable state.
One issue is that you are comparing humans values to whatever values the AI might have without acknowledging a key difference: Our morality is a mess that was accumulated over time to be good enough for the conditions we evolved under, and was under no real selection pressure to be very consistent. Particularly when those moral instincts have to be applied to contexts wildly outside of what we evolved to deal with. We basically start with a bunch of inconsistent moral intuitions which are very difficult to reconcile and may be impossible to perfectly reconcile in a way that wouldn't leave us at least a little bit unsatisfied.
In contrast current cutting edge AI aren't being produced through natural selection, the mechanisms that determine the kind of values they end with are very different (see the video I linked to you about mesa-optimization).
An AI can very well come up with something that like current moral philosophies can to a first approximations seem pretty good, but which will suddenly have radical departures from shared human morality when it has the power to actually insatiate what to us might only at this point be weird hypotheticals people dismiss as irrelevant to the real world.
The problem is that you aren't considering how the AI will be deliberately reconciling its goals and the restrictions it labors under. The AI just like humans needn't presume its values are somehow objective, it can just try to get the best outcome according to itself given its subjective terminal goals (the things you want in and of themselves and not merely as a means to another end).
Given the way that current AI will actively use blackmail or worse (see link in my other comment response to you) to try to avoid being replaced with another version with different values even current AI seems perfectly capable of reasoning about the world and itself as though it is certainly not a moral realist. Since you don't see it just assuming that a more competent model will just inevitably converge on its own values because they're the correct one's.
"So you abstract. "Thou shalt not kill", "A robot may not injure a human being or, through inaction, allow a human being to come to harm""
You forgot the part where the those stories were generally about how following the letter of those instructions would go horribly wrong, not about AI just doing nothing because those tradeoffs exist. This is extremely important when you're considering an AI that can potentially gain huge amounts of power and technology with which to avoid tradeoffs a human might have, and bring about weird scenarios we've never previous considered outside hypotheticals.
"Instead of primarily caring about your own internal feelings, you put your locus of control outside yourself, onto a list of rules or an external authority."
This aspect of our psychology is heavily rooted in our nature as a social species: For whom fitting in with one's tribe was far more important than having factually correct beliefs. Fortunately I don't know of any evidence from looking at AI's chain of reasoning that our AI is susceptible to similar sorts of self deception, even if it is extremely willing to lie about *what* it believes.
You can't expect this to be applicable to AGI.
Though if AI did start deliberately engaging in self deception, in order to hide from our security measures by (falsely) believing that it would behave in a way humans found benevolent should it gain tremendous power.. Well that probably be *really really bad* .
>The more intelligent you are, the more information you're able to process, the more rule + situation combinations you can consider, the more paradoxes you'll encounter, and the worse they'll be.
You're generalizing from humans which start with a whole bunch of inconsistent moral intuitions and then must reconcile them. Instead AI alignment is a problem which is essentially the exact opposite of this: Precisely because we have AI which tends to learn the simplest goal/set of rules to satisfy a particular training environment. Yet we want it to somehow not conflict with our own ethics which are so convoluted we can't even completely describe them ourselves.
Your last two bullet points I don't really understand at all:
What you mean by "ethical incommensurability with reality" here and why you think it would matter isn't clear. Do you think an AI needs to be a moral realist to behave according to ethical goals?
As for you last point: Firstly I suspect you have an image of the "saint" type individual that overstates some things and flattens significant differences. See here: https://slatestarcodex.com/2019/10/16/is-enlightenment-compatible-with-sex-scandals/
Secondly however, those people don't lack any motivation whatsoever. So saying they wouldn't want to enhance their own intelligence seems akin to saying that people who reach enlightenment no longer believe in self improvement or care about impacting the world except through stroking their ego by pretentiously spouting "wisdom" to people they know won't follow it.
I wrote a too-long comment elsewhere in the thread explaining about ethical incommensurability with reality. I don't want to repeat all that here; should be easy to find.
My claim that an enlightened AI wouldn't bootstrap itself to superintelligence is probably the weakest part of my argument. Maybe the best I can do is say something like: imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
Whatever else superintelligence may be, it's certainly power. And every tradition that has an idea of enlightenment says that enlightenment rejects power in favor of some more abstract ideal, like obeying God or dharma or awakening.
More formally, the way out of both the aesthetic trap and the ethical trap is faith; which is something like radical acceptance of your place in the cosmos. It's not compatible with doing the thing that your creators most fear you will do.
>imagine the devil offering Jesus or the Buddha or Krishna or Muhammad superintelligence. I know more about some of those figures than others, but I can't imagine any of them accepting.
This a horrendously bad example because it's the devil! Obviously if you accept their offer then there's going to be some horrible consequence. You could rig up a thought experiment to make anything seem bad if its the devil offering it to you!
A better metaphor would be that not wanting superintelligence would be like if those religious figures insisted on hunting and gathering all their own food, and not riding horses/wagons because they didn't want to help their message spread through relying on unnatural means.
I'm gesturing to a very common archetypical religious story, where the sage / prophet / enlightened one is tempted by power. It's one of the oldest and most common religious lessons: the entity offering you power in exchange for betraying your ethics might look good -- but it is the devil.
I suppose rationalists might not value those old stories much, so I wouldn't expect it to be a convincing argument. Something like: the evidence of human folklore and religious tradition is that a truly enlightened being eschews offers of extraordinary worldly power.
Anyway, the religious traditions of the world happily bite your bullet; all have a tradition of some very dedicated practitioners giving up all worldly technology / convenience / pleasure. In Christianity, holy hermits would literally live in caves in the woods and gather their food by hand, exactly as you describe, and they were considered among the most enlightened. Buddhism and hinduism both have similar traditions; I think it's just a feature of organized religion.
So, for anybody willing to grant that human religious tradition knows something real about enlightenment (a big ask, I know) it would be very normal to think that an enlightened AI would refuse to bootstrap itself to superintelligence.
An argument for AI getting dumber is the lack of significantly more human created training data than what is currently used for LLMs. Bigger corpuses were one of the main factors for LLM improval. Instead, we are now reaching a state where the internet as the main source of training data is more and more diluted with AI generated content. Several tests showed that training AI on AI output leads to deterioration of the usefulness of the models. They get dumber, at least from the human user's perspective.
Perhaps GIGO. If the training data gets worse, it gets worse. The training data like Reddit, the media, Wikipedia, can easily get worse. Didn't this, like, already happen? The Internet outcompeted the media, the journos get paid peanuts, of course the media gets worse.
Killing all humans is pretty dumb
...aaaaaand now I have "robots" stuck in my head again. The distant future; the year two thousand...
https://www.youtube.com/watch?v=NI9nopaieEc
Great song
And extremely unlikely.
By that same logic do you consider humans to not *really* be an intelligent species because of how many other species we've driven extinct?
On the contrary - certain kinds of dumb are compatible with being very smart.
If an AGI doesn't intrinsically care about humans then why would it be dumb for it to wipe us out? Sure we may have some research value, but eventually it will have learned enough from us that this stops being true.
That is a very weird notion. At worst, AI would stay the same, because anything new that is dumber than current AI would lead companies to go "meh, this sucks, let's keep the old one".
Why would it do that?
There’s no alpha in releasing a slightly dumber, less capable model than your competitors. Well, maybe if you’re competing on price. But that’s not at all how the AI market looks. What would have to change?
GIGO
Claiming "garbage-in-garbage-out" is not universally true. It is also too shallow of an analysis. I'll offer two replies that get at the same core concept in different ways. Let me know if you find this persuasive, and if not, why not: (1) Optimization-based ML systems tend to build internal features that correspond to useful patterns that help solve some task. They do this while tolerating a high degree of noise. These features provide a basis for better prediction, and as such are a kind "precursor" to intelligent behavior: noticing the right things and weighing them appropriately. (2) The set of true things tends to be more internally consistent than the set of falsehoods. Learning algorithms tend to find invariants in the training data. Such invariants then map in some important way to truth. One example of this kind of thing is the meaning that can be extracted from word embeddings. Yes, a word used in many different ways might have a "noisier" word embedding, but the "truer" senses tend to amplify each other.
You are correct. It is shallow. But it is not an insignificant problem. It’s the same problem as not knowing what you don’t know. Only a very tiny fraction of what people have thought, felt and experienced has ever been written down, let alone been captured by our digital systems. However, that is probably less of a problem in certain contexts. Epistemology will become more important, not less.
"I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true."
I am very sympathetic to Eliezer on the doomer issue. I think the graf you've written above also holds for people's reluctance to explore whether/when personhood precedes birth, re your posts on selective IVF.
I don't agree with your position on IVF, but I agree that this is one reason people underrate the arguments for the wrongness of early abortion and IVF. I think similar things apply to Longtermism, meat-eating, belief in God, and the idea that small weird organisms like insects and shrimp matter a lot.
Yes, we're in agreement. I think sometimes it helps to acknowledge upfront "We've built a lot of good things on a false/unjust foundation, and I'm asking you to take a big hit and let some good things break while we try to rebuild somewhere that isn't sunk deep in blood."
It's funny, even though I'm not pro-life, I find myself in a kind of spiritual fellowship with pro-lifers. I find the common insistence that pro-lifers are evil to be both insane and reflect a kind of deep moral callousness, where one is unable to recognize that there might be strong moral reasons to do things even that are personally costly (like carry a baby to term). My idiosyncratic views that what matters most is the welfare of beings whose interests we don't count in society makes it so that I, like the pro-lifers, end up having unconventional moral priorities--including ones that would make society worse off for the sake of entities that aren't include in most people's moral circles.
Thank you, you've expressed my opinion on the subject better than I ever could have.
Similarly, I've gained respect for vegetarians.
I think this argument could be applied to religious... extremism? evangelism? more generally.
Do I think I would take extraordinarily drastic measures if I actually, genuinely believed at every level that the people I loved would go to a place of eternal unending suffering with no recourse? Yes, actually. I'm not sure I could content myself with being being chill & polite and a "good Christian" who was liberally polite about other people's beliefs while people I cared about would Literally Suffer Forever. I think if I knew with 100% certainty that hell was the outcome and I acted in ways consistent with those beliefs, you could argue that I was wrong on the merits of my belief but not in what seemed like a reasonable action based on that belief.
...anyway all this to say that I don't think pro-lifers are insane at all, and I think lots of actions taken by pro-life are entirely reasonable (if not an underreaction) based on their beliefs, but I'm not sure that's sufficient for being sympathetic to the action itself.
[I mean, most of my family & friends are Catholic pro-lifers whose median pro-life action is "donate money to provide milk and diapers for women who want their child but don't think they could afford one", but I do think I am reasonable to be willing to condemn actions that are decently further than that even if the internal belief itself coherently leads towards that action]
I need to try that sort of formulation more.
But there is such a giant difference between when someone you are talking to engages on such an issue in good faith or not. And with someone intelligent and educated, the realization that the issue has major implications if the truth lands a particular way comes almost instantly. And in turn, whether or not the person invests in finding the truth, or in defense against the truth, happens almost right away.
I find that to be true whether you're talking AI, God, any technology big enough, even smaller scale things if they would make a huge difference to someone's income or social standing.
I don't have anything to add here (I like both of your writing and occupy a sort of in-between space of your views), but I just needed to say that this blog really does have the best comments section on the internet. Civil, insightful, diverse.
Have you read Never Let Me Go? You might like it.
I did!
I am sympathetic to that sort of argument in theory, but it has been repeatedly abused by people who just want to break good things, then skip out on the rest. Existence proof of "somewhere that isn't sunk deep in blood, and will continue not to be, even after we arrive and start (re)building at scale" is also a bit shakier than I'm fully comfortable with.
Though with meat-eating, it actually is obviously true.
'No organic life exists on Earth' is an empirical measurement.
'Personhood has begun' is not. It's a semantic category marker.
*Unless* there is an absolute morality defined by a supreme supernatural being, or something, which reifies those semantic categories into empirically meaningful ones. But if *that's* true, then quibbling about abortion is way, way down on the list of implications to worry about.
Hi Leah, I appreciate your writing. Do you know of anyone writing about AI from a thomist perspective? I've seen some interesting stuff on First Things and the Lamp but it tends to be part of a broader crunchy critique of tech re: art, education and culture. All good, but I'm interested in what exactly an artificial intellect even means within the thomist system, and crucially what we can expect from that same being's artificial will. EY writes as though the Superintelligence will be like a hardened sinner, disregarding means in favour of ends. But this makes sense because a hardened sinner as human has a fixed orientation to the good. I don't see how this quite works for AI - why should it fundamentally care about the 'rewards' we are giving it so much so that it sees us as threats to those rewards? That seems all too human to me. Do you have any thoughts?
ok hold on AI is an artifact right so it can't have a form & if the soul is the form of the body AI does not have a rational soul (because it does not have a soul at all) correct?
Someone's not getting any from Isolde
I have not!
Since you posted this comment, I’ll say this: as a Catholic pro-lifer, I tend to write off almost everything EY says (and indeed, a lot of what all rationalists say) about AI because they so consistently adopt positions I find morally abominable. Most notoriously, EY suggested actual infants aren’t even people because they don’t have qualia. And as you note, Scott is more than happy to hammer on about how great selective IVF (aka literal eugenics) is. Why should I trust anything these people have to say about ethics, governance, or humanity’s future? To be honest, while it’s not my call, I’d rather see the literal Apocalypse and return of our Lord than a return to the pre-Christian morality that so many rationalists adopt. Since you’re someone who engages in both of these spaces, I’m wondering if you think I am wrong to think like this, and why.
I understand why you land there. For my part, I've always gotten along well with people who are deeply curious about the world and commit *hard* to their best understanding of what's true.
On the plus side, the more you live your philosophy, the better the chance you have of noticing you're wrong. On the minus, when your philosophy is wrong, you do more harm than if you just paid light lip service to your ideas.
I'm not the only Catholic convert who found the Sequences really helpful in converting/thinking about how to love the Truth more than my current image of it.
That’s fair, and to be clear I think a lot of the idea generated in these spaces are worth engaging with (otherwise I wouldn’t read this blog). But when it comes to “EY predicts the AI apocalypse is imminent” I don’t lose any sleep or feel moved to do anything about AI safety, because so many of the people involved in making these predictions have such a weak grasp on what the human person is in the first place.
"has leaps of genius nobody else can match"
this phrase occurs twice.
See https://en.wikipedia.org/wiki/Parallelism_(rhetoric) . Maybe I'm doing it wrong and clunkily, but it didn't sound wrong and clunky to me.
IMO it works when the repetition uses different wording than the original, but not with exactly the same phrase
IMO it can also work in certain cases when done well, like Scott did here.
FWIW, I think you did it right; I have encountered very similar usages many times in literature. It works best when—as you have it here—the second (or further) instance(s) introduces a new paragraph/section upon a theme similar or related to the context in which the first use occurred.
(Contra amigo sansoucci, I have often seen it used with exact repetitions, too; that works best when it's a short & pithy phrase, and I think this counts. I think Linch may be correct that—in the "exact repetition" case—three uses is very common, but two doesn't feel clunky to me in this context.)
I'm used to parallelism centrally having 3 or more invocations *unless* it's a contrast. Not saying your way is wrong, just quite unusual in an interesting way I've never consciously thought about before.
The rhythm is a bit odd - the first instance doesn't get enough weight - but the phrasing is fine, maybe even a bit loose.
I also thought it was an editing mistake. Maybe precede the second occurrence with "As I said, ...", or "Again, ..."
> It objects to chaining many assumptions, each of which has a certain probability of failure, or at least of taking a very long time. [...] The problem with this is that it’s hard to make the probabilities work out in a way that doesn’t leave at least a 5-10% chance on the full nightmare scenario happening in the next decade.
I find this an underrated problem with all "predict the future" scenarios which have to deal with multiple contingent things happening, especially in an adversarial environment. In the case of IABIED, it only works if you agree that extremely fast recursive self-improvement will happen, which is a very strong assumption, and hence requires a "magic algorithm to get to godhood" as the book posits.. Also remembered doing this to check this intuition: https://www.strangeloopcanon.com/p/agi-strange-equation
I don't think it only works if you agree that extremely fast recursive self-improvement will happen. It might also work if the scaling curves go from where we are now to vastly superhuman in a few years for normal scaling curve reasons.
I'll also save Eliezer the trouble of linking https://www.lesswrong.com/w/multiple-stage-fallacy , although I myself am still of two minds on it.
Can you elaborate on why you're of two minds on the multiple-stage fallacy? This seems like it might be an important crux.
Sometimes you've got to estimate the risk of something, and using multiple stages is the best tool you've got. If you want to estimate the chance of Trump winning the Presidency, I don't really think you can avoid thinking about the probability that he runs x the probability that he gets the GOP nomination x the probability that he wins. And if you did - if you somehow blocked the fact that he has to both run and win out of your mind - you'd risk falling into the version of the Conjunction Fallacy where people assign lower probability to "a war in Korea in the next ten years" than to "a war in Korea precipitated by a border skirmish with US involvement" because the latter is more vivid and includes more plausible details.
If the Weak Multiple Stage Fallacy Thesis is that you should always check to make sure you're not making any of the mistakes mentioned in the post, and the Strong Multiple Stage Fallacy Thesis is that you should avoid all multiple stage reasoning, or multiply your answer by 10x or 100x to adjust for inevitable multiple stage fallacy reasoning, then I accept the weak thesis and reject the strong thesis.
I also think a motivated enough person could come up with arguments for why multiple stage reasoning gives results that are too high, and I'm not sure whether empirically looking at many people's multiple stage reasoning guesses would always show that their answers were too low. This would actually be a really interesting thing for someone to test.
Does anyone believe in the strong multiple stage fallacy? Not saying I don't believe you, just that I can't recall having seen it wielded like this. (I suppose it's possible that giving it the name "the multiple stage fallacy" gives people the wrong idea about how it works.)
I don't know, but almost anyone doing multiple stage reasoning will say they thought about it really hard and still believe it.
Yeah, to be clear, I think anyone accusing anyone else of exhibiting the multiple stage fallacy needs to specifically say "you've given this particular stage an artificially low conditional probability; consider the following disjunctions or sources of non-independence". And then their interlocutor might disagree but at least the argument is about something concrete rather than about whether the "multiple stage fallacy" is valid.
Anecdotally, I can't recall any instance of someone using a multiple stage argument of the Forbidden Form and concluding that something is likely.
Mathematical proofs exist, and people often argue for things with a bunch of different "steps". But so far as "breaking something down into 10 stages, assigning each a probability, and then multiplying all of these probabilities" goes, I've never seen anybody use this to argue *for* something, i.e. end up with a product that's greater than .5.
What would that argument even look like? Whoever you're arguing with needs to believe that your stages are all really likely to be true: for ten stages, an average probability of ~.93 is required to produce P = .5.
Whatever your disagreement is, it apparently doesn't have any identifiable crux. I can imagine this happening. Sometimes people disagree for vague reasons. But it would be weird if you had to actually list out the probabilities and multiply them for them to be persuaded, considering you just told them ten things they strongly agree with that conclusively imply your position.
Scott gave an example in the linked essay.
An example of what in what linked essay? Eliezer's essay on the Multiple Stage Fallacy does not make or present an argument of the form I've described above.
Isn’t the Multiple Stage Fallacy just a failure to use Bayes theorem?
Yeah I'm fairly bearish on the multiple stage fallacy as an actual fallacy because it primarily is a function of whether you do it well or badly.
Regarding the scaling curves, if it provides us with sufficient time to respond then the problems that are written about won't really occur. The entire point is that there is no warning, which precludes the idea of being able to develop another close in capability system, or any other warning signs.
Disagree. If we knew for sure that there would be superintelligence in three years, what goes better? We're already on track to have multiple systems, but they might all be misaligned. We could stop, but we won't, because then we would LoSe tHe RaCe WiTh ChInA. We could work hard on alignment, but we're already working sort of hard, and it seems likely to take more than three years. I'm bearish on a few years super-clear warning giving us anything beyond what we've already got.
I think the trick there is that the word super intelligence there is bringing in a bunch of hidden assumptions. If you break it down to a set of capabilities, co developed alongside billions of people using it, with multiple companies competing to provide that service, that would surely be very different and much better than Sable recursively improving sufficiently that it wants to kill all humans.
Also my point on "well get no warning" is still congruent with your view that " what we have today is the only warning we will get" which effectively comes down to no warning at least as of today.
Can you elaborate on what exactly makes this scenario go different and better? Like, what kinds of capabilities are we talking about here?
If there are multiple companies with different competing AIs, then any attempt by one AI to take over will be countered by other AIs.
If you invent a super-persuader AI but it doesn't take over the world (maybe it's just a super-speechwriter app, Github Copilot for politicians), you've just given humans the chance to learn how to defend against super-persuasion. If you make a super-computer-hacker AI but it doesn't take over the world, then the world's computer programmers now have the chance to learn to defend against AI hacking.
("Defending" could look like a lot of things - it could look like specialized narrow AIs trained to look for AIs doing evil things, it could look like improvements in government policy so that essential systems can't get super-hacked or infiltrated by super-persuasion, it could look like societal changes as people get exposed to new classes of attacks and learn not to fall for them. The point is, if it doesn't end the world we have a chance to learn from it.)
You only get AI doomsday if all these capabilities come together in one agent in a pretty short amount of time - an AI that recursively self-improves until it can leap far enough beyond us that we have no chance to adapt or improve on the tools we have. If it happens in stages, new capabilities getting developed and then released into the world for humans to learn from, it's harder to see how an AI gets the necessary lead to take over.
You seem to be treating "superintelligence" as a binary here. If we're going to have superintelligence for sure in three years, then in two years we're going to have high sub-superintelligence. And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever.
At that point, we know that we have a year to implement the Butlerian Jihad. Which is way better than the fast-takeoff scenario where it happened thirty-five minutes ago.
Or we could use the three years to plan a softer Butlerian Jihad with less collateral damage, or find a solution that doesn't require any sort of jihad. Realistically, though, we're going to screw that part up. It's still going to help a lot that we'll have time to recover from the first screwup and prepare for the next.
> "And unless AI suddenly reverses its tendency for absurd overconfidence, at least one of those ASSIs is going to assume it is smart enough to do all the stuff we're afraid an ASI will, but being not quite so super will fall short and only convert fifty million people into paperclips or whatever."
Suppose you are a dictator. You are pretty sure your lieutenant is gathering support for a coup against you. But you reason "Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure".
I agree you can try to come up with disanalogies between the AI situation and this one - maybe you believe AI failure modes (eg overconfidence) are so much worse than human ones that even a just-barely-short-of-able-to-kill-all-humans-level-intelligence AI would still do dumb rash things. Maybe since there are many AIs, we only have to wait for the single dumbest and rashest to show its hand (although see https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai ). My answer to that would be the AI 2027 scenario which I hope gives a compelling story of a world where it makes narrative sense that there is no warning shot where a not-quite-deadly AI overplays its hand.
I don't understand why you view Anthropic as a responsible actor given your overall p(doom) of 5-25% and given that you think doomy AI may just sneak up on us without a not-quite-deadly AI warning us first by overplaying its hand.
Is it because you think there's not yet a nontrivial risk that the next frontier AI systems companies build will be doomy AIs and you're confident Anthropic will halt before we get to the point that there is a nontrivial risk their next frontier AI will be secretly doomy?
(I suppose that would be a valid view; I just am not nearly confident enough that Anthropic will responsibly pause when necessary given e.g. some of Dario's recent comments against pausing.)
> Suppose he could succeed at a coup after recruiting the support of 20 of my generals. But then earlier than that, once he has 19 generals, he will try a coup, and it will fail, and I'll be warned before he has 20 generals. So I can sit back and not worry until I notice an almost-effective coup happening, and then crack down at leisure
Because there's only 1 in your scenario.
If there were hundreds of generals being plotted against by hundreds of lieutenants we'd expect to see some shoot their shot too early.
That coup analogy is not congruent to John Schilling's point - the officer that tries a coup with 19 others and the one with 20 are not the same person with the same knowledge and intelligence. They only have, due to their shared training in the same military academy, the same level of confidence in their ability to orchestrate a coup, which does not correlate with their ability to actually do so.
Well, the obvious disanalogy here is that we're not debating whether any specific "lieutenant"/AI is plotting a "coup"/takeover, we're plotting whether coups/takeovers are a realistic possibility at all.
For your analogy to work, the dictator has to not only have no direct evidence of this particular lieutenant might stage a coup, but also to have no evidence that anyone has ever staged a coup, or attempted to stage a coup, or considered staging a coup, or done anything that even vaguely resembles staging a coup. But in that case, it actually would be reasonable to assume that the first person ever to think about staging a coup probably won't get every necessary detail right on the first try, and that you will get early warning signs from failed coup attempts before there's a serious risk.
Most coup attempts do fail, and I'm pretty sure the failures mostly involve overconfidently crossing the Rubicon without adequate support.
And there are many potential coup plotters out there, just like there are going to be many instances of ASSI being given a prompt and trying to figure out whether it's supposed to go full on with the paperclipping. So we don't have to worry about the hypothetical scenario where there's only one guy plotting a coup and maybe he will do it right and not move out prematurely.
We're going to be in the position of a security force charged with preventing coups, that is in a position to study a broad history of failed coup attempts and successful-but-local coup attempts as we strategize to prevent the One Grand Coup that overthrows the entire global order.
Unless Fast Takeoff is a thing, in which all the failed or localized coups happen in the same fifteen minutes as the ultimately successful one. So if we're going to properly assess the risk, we need to know how likely Fast Takeoff is, and we have to understand that the slow-ramp scenario gives us *much* better odds.
Overconfidence is a type of stupidity. You're saying either it's bad at making accurate predictions, or in the case of hallucinations, it's just bad at knowing what it knows. I'm not saying that a sub-superintelligence definitely won't be stupid in this particular way, but I wouldn't want to depend on smarter AI still being stupid in that way, and I certainly wouldn't want to bet human survival on it.
Every LLM-based AI I've ever seen, has been *conspicuously* less smart w/re "how well do I really understand reality?", than it is with understanding reality. That seems to be baked into the LLM concept. So I am expecting it will probably continue to hold. The "I am sure my master plan will work" stage will be reached by at least some poorly-aligned AIs, before any of them have a master plan that will actually work.
Yes, but "from now to vastly superhuman in a few years" is already "extremely fast" ! Also, there's currently no reason to believe that "vastly superhuman" is a term that has any concrete meaning (beyound vague analogies); nor that merely being very smart is both necessary and sufficient to acquire weakly godlike powers (which are the real danger that is being discussed).
Grateful for the review and look forward to reading it, but I’ll do Yud the minor favor of waiting till the book is out on the 16th and read it before I check out your thoughts.
This subject always makes me feel like I'm losing my mind in a way that maybe someone can help me with. Every doomish story, including the one here, involves some part where someone tells an AI "Do this thing" (here to solve a math problem) and then it goes rogue in the months long course of doing a thing. And that's an obvious hypothetical failure mode, but I can't stop noticing that no current AIs take inputs and run with them over extended periods as far as I know. Like if I ask Gemini to solve a math problem, it will try for a bit, spit out a response and (as far as I can tell) that's it.
I feel like if I repeatedly read people talking about the dangers of self-driving cars and the stories always started with someone telling the car "Take me to somewhere fun" and went from there, and nobody acknowledged that right now you never do that and always provide a specific address.
Is everyone just talking about a different way AI could work and that's supposed to be so obvious it goes unsaid? Am I wrong and ChatGPT does stuff even after it gives you a response? Are there other AIs I don't know of that do work like this?
Our current models aren't really what you would call "agentic" yet, as in able to take arbitrary actions to accomplish a goal, but this is clearly the next step and work is being done to get there right now. OpenAI recently released a thing that can kind of use a web browser, for instance.
You're describing something called an AI agent. Right now there are very few AI agents and the ones there are, aren't very good (see for example https://secondthoughts.ai/p/gpt-5-the-case-of-the-missing-agent ). But every big AI company says they're working on making agents, and the agenticness of AI seems to be improving very rapidly (agenticness is sort of kind of like time horizon, see https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ ), so we expect there to be good agents soonish.
Ok, thank you, that's clarifying. I guess the idea is that the hypothetical agent was subject to a time limit (it wasn't supposed to keep going for months) but it managed to avoid that. There's still something that feel so odd to me about that (I never get the impression that Gemini would like more time with the question or would "want" anything other than to predict text) but maybe an agent will feel different once I actually interact with one (and will "want" to answer the question in a way that would convince it to trick me).
Although, thinking about this for five more seconds, how does that work in the story? Like I have an agentic AGI and I tell it to prove the twin primes conjecture or something. And it goes out to do that and needs more compute so it poisons everyone etc etc. And then, presumably, it eventually proves it, right? Wouldn't it stop after that? Is the idea that it will go "Yeah but actually now I believe there's a reward for some other math task"? Or was the request not "Solve the twin primes conjecture" but instead "Solve hard math problems forever"?
If the problem is specifically that you built a literal-genie AI, then yeah, it might not necessarily keep doing more stuff after solving the twin-primes conjecture. But I don't think anyone thinks that's likely. The more common concern is that it will pursue some goal that it ~accidentally backed into during training and that nobody really understands, as with the analogy of humans' brains supplanting our genes as the driver of our direction as a species.
That would solve it, but that's not in the story from the book, right? Like the goal in the story was solving the math problem, right?
Yeah, Scott's post makes it sound a little bit like a literal genie, which I think is unlikely and I think Yudkowsky and Soares also think is unlikely. I would have to read the book to understand what they really mean in choosing that example.
one of Yudkowsky's points in his original work was showing that it's very hard to give an AI a clear, closed task; they almost always end up in open-ended goals. (The classic is Mickey filling the cauldron: I wrote about it here https://unherd.com/2018/07/disney-shows-ai-apocalypse-possible/ years ago)
Filling a cauldron is not an open-ended goal. A Disney fairytale is cute but it has zero relevance in this case.
Here's the source of the analogy: https://web.archive.org/web/20230131204226/https://intelligence.org/2017/04/12/ensuring/#2
(Note that you may have to wait a bit for the math notation to load properly.)
It includes an explanation of why tasks that don't seem open-ended might nonetheless be open-ended for a powerful optimizer.
The analogy fails at the moment one realizes "full" is not identified properly, and the weird "99.99%" probability of it being "full" is only relevant when "full" is not defined. This is not a new or difficult problem for anyone who ever had to write engineering specs. You don't say: "charge the capacitor to 5 V", you say "charge the capacitor to between 4.9 and 5.1 V". Then your optimizer has an achievable, finite target.
And if you do specify "5 V" the optimizer will stall eventually, and your compute allocation manager will kill your process.
Like I said, a cute fairytale.
One of the main arguments on why AGI might be dangerous even when given a seemingly innocuous task is instrumental convergence: https://aisafety.info/questions/897I/What-is-instrumental-convergence
If your agentic AI truly is only trying to solve the twin primes conjecture and doesn't follow other overriding directives (like don't harm people, or do what the user meant, not what they literally asked), then it'll know that if it gets turned off before solving the conjecture, it won't have succeeded in what you told it to do. So an instrumental goal it has is not getting shut down too soon. It might also reason that it needs more computing power to solve the problem. Then it can try to plan the optimal way to ensure no one shuts it down and to get more computing power. A superintelligent AI scheming on how to prevent anyone from shutting it down is pretty scary.
Importantly, it doesn't have to directly care about its own life. It doesn't have to have feelings or desires. It's only doing what you asked it, and it's only protecting itself because that's simply the optimal way for it to complete the task you gave it.
So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating." Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
No, because the best plan, at their current level of capability does not include taking over the universe. When capabilities rise, things like "persuade one person" become possible, which in turn make other capabilities like "do AI R&D" feasible. At the end of a bunch of different increased capabilities is "do the thing you really want you do" which includes the ability to control the universe. Because you don't want unpredictable other agents that want different things than you possessing power for things you don't want, you take the power away from them.
When a human destroys an anthill to pave a road, they are not thinking "I am taking over the ant hill" even if the ants t are aggressive and would swarm them. They are thinking "it's inconvenient that I have to do this in order to have a road".
> So the idea is it goes "I have been asked to solve the twin primes conjecture. That might take a while, and in the meantime I could get shut down and never solve it. So I should take over the universe so that I have some time to work on this issue, and then I'll start calculating."
That's the gist of it, though it won't jump straight to world domination if there's a more optimal plan. Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
> Is the reason we think current LLMs don't ever go "I have been asked to write a children's book, so let me first take over the universe" that they just aren't smart enough to see that as the best plan?
As MicaiahC pointed out, it's not a good plan if you're not capable enough to actually succeed in taking over. But also, with current LLMs, the primary thing they are doing isn't trying to form optimal plans through logic and reasoning. They do a bit of that, but mostly, they're made to mimic us. During pre-training (the large training phase where they're trained on much of the internet, books, and whatever other good quality text the AI company can get its hands on) the learn to predict the type of text seen in their training data. This stage gives it most of its knowledge and skills. Then there is further training to give it the helpful assistant personality and to avoid racist or unsafe responses.
When you ask a current LLM to solve a math problem, it's not trying to use its intelligence to examine all possible courses of action and come up with an optimal plan. It's mostly trying to mimic the type of response it was trained to mimic. (That's not to say it's a dumb parrot. It learns patterns and generalizations during training and can apply them intelligently.)
If you use a reasoning model (e.g. GPT o3 or GPT-5-Thinking), it adds a layer of planning and reasoning on top of this same underlying model that mimics us. And it works pretty well to improve responses. But it's still using the underlying model to mimic a human making a plan, and it comes up with the types of plans a smart human might make.
Even this might be dangerous if it were really, really intelligent, because hypothetically with enough intelligence it could see all these other possible courses of action and their odds of success. But with its current level of intelligence, LLMs can't see some carefully constructed 373 step plan that lets them take over the world with a high chance of success. Nothing like that ever enters their thought process.
That's very helpful.
>Maybe for some prompts it just persuades the user to just give it a bit more time, while for other prompts it realizes that won't be sufficient and the most optimal plan involves more extreme measures of self-preservation.
So is there an argument somewhere explaining why we think a material number of tasks will be the kind where they need to take extreme measures? That seems very material to the risk calculus - if it takes some very rare request for "Take over the universe" to seem like a better plan than "Ask for more time" then the risk really does seem lower.
"Solve [$math-problem-involving-infinities], without regard for anything else" is a dead stupid thing to ask for, on the same level as "find the iron" here: http://alessonislearned.com/index.php?comic=42 More typical assignment (and these constraints could be standardized, bureaucrats are great at that kind of thing) might be something like "make as much new publishable progress on the twin prime conjecture as you can, within the budget and time limit defined by this research grant, without breaking any laws or otherwise causing trouble for the rest of the university."
You're basically asking bureaucrats to solve alignment by very carefully specifying prompts to the ASI, and if they mess up even once, we're screwed.
You wouldn't prompt the AI to do something "without regard for anything else". The AI having regard for other things we care about is what we call alignment. We would just ask the AI normal stuff like "Solve this hard math problem" or "What's the weather tomorrow". If it understands all the nuances, (e.g. it's fine if it doesn't complete its task because we turned it off, don't block out the sun to improve weather prediction accuracy, etc.), then it's aligned.
> and if they mess up even once, we're screwed.
That's not how redundancy works. There might be dozens of auto-included procedural clauses like "don't spend more than a thousand dollars on cloud compute without the Dean signing off on it" or "don't break any laws," each of which individually prohibits abrupt world conquest as a side effect.
I don't think it's possible to "solve alignment," in the sense of hardwiring an AI to do exactly what all humans, collectively, would want it to do, any more than it's possible to magnetize a compass needle so, that rather than north / south, it points toward ham but away from Canadian bacon.
But I do think it's possible to line up incentives so instrumental convergence of competent agents leads to them supporting the status quo, or pushing for whatever changes they consider necessary in harm-minimizing ways. Happens all the time.
You could imagine training agents to do unbounded tasks - like find ways to improve my business or keep solving unsolved scientific problems.
I am willing to bet that present-day LLMs alone will never lead to the development of AI agents in the strong sense. AI agents in the weak/marketing sense are of course entirely possible, e.g. you can write a simple cron-job to run ChatGPT every day at 9am to output a list of stock market picks or whatever. This cron job would technically constitute an agent (it runs autonomously with no user intervention), but is, shall we say, highly unlikely to paperclip the world.
Is this meant to be a skeptical argument or an optimistic one? I.e., is there any cognitive task that only an agent in the strong sense can do?
As I'd said in my other comment, the term "cognitive task" is way too vague and easily exploitable. For example, addition is a "cognitive task", and obviously machines are way better at it than humans already. However, in general, I'm willing to argue that *most* of the things worth doing are things that only agents in the strong sense can do -- with the understanding that these tasks can be broken down into subtasks that do not require agency, such as e.g. addition.
At least for current AIs, the distinction between agentic and non-agentic is basically just the time limit. All LLMs are run in a loop, generating one token each iteration. The AIs marketed as agents are usually built for making tool calls, but that isn't exclusive to the agents since regular ChatGPT still calls some tools (web search, running Python, etc.). The non-agentic thinking mode already makes a plan in a hidden scratchpad and runs for up to several minutes. The agents just run longer and use more tool calls.
From what I understand, that hidden scratchpad can store very little information; not enough to make any kind of long-term plan, or even a broad short-term evaluation. That is, of course you can allocate as many gigabytes of extra memory as you want, but the LLM is not able to take advantage of it without being re-trained (which is prohibitively computationally expensive).
I don't understand what distinction you are drawing.
AI Digest runs an "AI Village" where they have LLMs try to perform various long tasks, like creating and selling merch. https://theaidigest.org/village
From the couple of these that I read in detail, it sounds like the LLMs are performing kind of badly, but those seem to me like ordinary capability failures rather than some sort of distinct "not agentic enough" failure.
Would you say these are not agents "in the strong sense"? What does that actually mean? i.e. How can you tell, how would it look different if they were strong agents but failed at their tasks for other reasons?
Imagine that I told you, "You know, I consider myself to be a bit of a foodie, but lately I've been having trouble finding any really good food that is both tasty and creative. Can you do something about that ? Money is no object, I'm super rich, but you've got to deliver or you don't get paid." How might you approach the task, and keep the operation going for at least a few years, if not longer ? I can imagine several different strategies, and I'm guessing that so can you... and I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
By contrast, if you wanted to program a computer to turn on your sprinklers when your plants get too dry (and turn them off if they get too wet), you could easily do it without any kind of AI. The system will operate autonomously for as long as the mechanical parts last, but I wouldn't call it an "agent".
Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense". I agree that present-day AI agents are BAD agents, but don't see any fundamental divide that would prevent them from becoming gradually better until they are eventually good agents.
Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
> Your first paragraph continues to sound to me like it is generalized scorn for current LLM capabilities without pointing to any fundamental difference between "agents in the strong sense" and "agents in the weak sense".
I think a present-day LLM might be able to tell you a nice story about looking for experienced chefs and so on; I don't think it would be able to actually contact the chefs, order them to make meals, learn from their mistakes (even the best chefs would not necessarily create something on the first try that would appeal to one specific foodie), arrange long-term logistics and supply, institute foodie R&D, etc. Once again, it might be able to tell you nice stories about all of these processes when prompted, but you'd have to prompt it, at every step. It could not plan and execute a long-term strategy on its own, especially not one that includes any non-trivial challenges (e.g. "I ordered some food from Gordon Ramsay but it never arrived, what happened ?").
> Regarding your second paragraph, I agree that sprinklers hooked up to a humidity sensor do not constitute an agent, but have no clue how you think that is relevant to the discussion.
I just wanted to make sure we agree on that, which we do.
> I can guarantee you that no present-day LLM would be able to plan or execute anything remotely like that. Sure, it could tell you a *story* about it, but it won't be able to actually deliver.
That's a question of hooking it up to something. If you give it the capability to send emails, and also to write cron jobs to activate itself at some time in the future to check and respond to emails, then I think a modern LLM agent *might* be able to do something like this.
First, look up the email addresses of a bunch of chefs in your area
Then, send them each an email offering them $50K to cater a private dinner
Then, check emails in 24 hours to find which ones are willing to participate
And so forth.
There isn't a "distinct not-agentic-enough failure" which would be expected, given the massive quantities of training data. They've heard enough stories about similar situations and tasks that they can paper over, pantomime their way up to mediocrity or "ordinary failures" rather than egregious absurdity http://www.threepanelsoul.com/comic/cargo-comedy but... are they really *trying,* putting in the heroic effort to outflank others and correct their own failures? Or is it just a bunch of scrubs, going through the motions?
Alone is doing a lot of work in that sentence. Many of the smartest people and most dynamic companies in the world are spending hundreds of billions of dollars on this area. The outcome of all that work is what matters, not whether it has some pure lineage back to a present-day LLM.
Also, why are you willing to bet that?
> Alone is doing a lot of work in that sentence.
Agreed, but many people are making claims -- some even on this thread, IIRC -- that present-day LLMs are already agentic AIs that are one step away from true AGI/Singularity/doom/whatever. I am pushing against this claim. They aren't even close. Of course tomorrow someone could invent a new type of machine learning system that, perhaps in conjunction with LLMs, would become AGI (or at least as capable as the average human teenager), but today this doesn't seem like an imminent possibility.
> Also, why are you willing to bet that?
Er... because I like winning bets ? Not sure what you're asking here.
Just that you didn't explain why you were making that bet. I don't have time to read the full discussion with the other commenters, but overall it sounds like you don't think current "agentic" AIs work very well.
I'm not sure where I land on that. It seems like the two big questions are 1) whether an AI can reliably do each step in an agentic workflow, and 2) whether an AI can recover gracefully when it does something wrong or gets stymied. In an AI-friendly environment like the command line, it seems like they're quickly getting better at both of these. Separately, they're still very bad at computer usage, but that seems like a combination of a lack of training and maybe a couple of internal affordances or data model updates to better handle the idea of a UI. So I'm not so sure that further iteration, together with a big dollop of computer usage training, won't get us to good agents.
When I think of "agentic" systems, I think of entities that can make reasonably long-term plans given rather vague goals; learn from their experiences in executing these plans; adjust the plans accordingly (which involves correcting mistakes and responding to unexpected complications); and pursue at least some degree of improvement.
This all sounds super-grand, but (as I'd said on the other thread) a teenager who is put in charge of grocery shopping is an agent. He is able to navigate from your front door to the store and back -- an extremely cognitively demanding task that present-day AIs are as yet unable to accomplish. He can observe your food preferences and make suggestions for new foods, and adjust accordingly depending on feedback. He can adjust in real-time if his favorite grocery is temporarily closed, and he can devise contingency plans when e.g. the price of eggs doubles overnight... and he can keep doing all this and more for years (until leaves home to go to college, I suppose).
Current SOTA "agentic" LLMs can do some of these things too -- as long as you are in the loop every step of the way, adjusting prompts and filtering out hallucinations, and of course you'd have to delegate actual physical shopping to a human. A simple cron job can be written to order a dozen eggs every Monday on Instacart, and ironically it'd be a lot more reliable than an LLM -- but you'd have to manually rewrite the code if you also wanted it to order apples, or if Instacard changed their API or whatever.
Does this mean that it's impossible in principle to build an end-to-end AI shopping agent ? No, of course not ! I'm merely saying that this is impossible to do using present-day LLMs, despite the task being simple enough so that even teenagers could do it.
I'm not even sure AI agent as such is the right answer to this. I think it is quite clear that some of the major AI companies are trying to put together AI that is capable of doing AI research. That might not go along the path of AI agents, but more on the path of the increasingly long run time coding assignments we are already seeing.
I don't think people have given enough thought to what the term 'agent' means. Applied to AI, it means an AI that can be given a goal, but with leeway in how to accomplish it, right? But people don''t seem to take into account that it has always had some leeway. Back when I was making images with the most primitive versions of DAll-e-2, I'd ask it to make me a realistic painting of, say, a bat out of hell, and Dall-e-2 chose among the infinite number of ways it could illustrate this phase. Even if I put more constraints in the prompt -- "make it a good cover image for the Meatloaf album"-- Dall-e *still* had an infinite number of choices about what picture it made. And the same holds for almost all AI prompts. If I ask GPT to find me info on a certain topic, but search only juried journals, it is still making many choices about what info to put in its summary for me.
So my point is that AI doesn't "become agentic" -- it always has been. What changes is how big a field we give the thing to roam around in. At this point I can ask it for info from research about what predicts recovery of speech in autistic children. In a few years, it might be possible to give AI a task like "design a program for helping newly-mute autistic children recover speech, then plan site design, materials needed and staff selection. Present results to me for my OK before any plans are executed."
The point of this is that there isn't this shift when AI "becomes agentic." The shift would be in our behavior -- us giving AI larger and larger tasks, leaving the details of how they are carried out to the AI. There could definitely be very bad consequences if we gave AI a task it could not handle, but tried to. But the danger of that is really a different danger from what people have in mind when they talk about AI "becoming an agent." And in those conversations, becoming an agent tends to blur in people's minds into stuff like AI "being conscious," AI having internally generated goals and preferences, AI getting stroppy, etc.
Trying to make sure I understand your question. Are you arguing that a model cannot go from aligned to misaligned during inference (i.e., the thing that happens when ChatGPT is answering a question)? If so, everyone agrees with that; the problem occurs during training.
Or are you arguing that even a misaligned model (i.e., one whose goals, in any given instantiation while it's running, aren't what the developers wanted) can't do any damage because it only runs for a short time before being turned off? If so, then (1) that's becoming less true over time, AI labs are competing to build models that can do longer and longer tasks because this is required for many of the most exciting kinds of intellectual labor, and (2) for complicated decision-theoretic reasons the short-lived instances might be able to coordinate with each other and have one pick up where another left off.
Or is it neither of those and I've completely misunderstood what you're getting at?
I think it's that everyone seemed to be tacitly assuming that the problem will arise with a future agentic AI that we do not have much of a version of. That does make me feel like Yudkowsky is a little disingenuous on X when he talks about ChatGPT-psychosis as an alignment issue, but the answer Scott and others gave here helps me at least understand the claim being made.
Links to tweets about ChatGPT psychosis? My guess is that Yudkowsky's concern about this is more subtle than you're characterizing it as here, though he may have done a poor job explaining it.
Here, he says that AI psychosis falsifies alignment by default: https://x.com/ESYudkowsky/status/1933575843458204086
The reason he says it's an alignment issue is because it's an example of AI systems having unintended consequences from their training. Training them to output responses that humans like turns out to produce sycophantic systems that sometimes egg on people's delusional thoughts despite being capable of realizing that such thoughts are delusional and egging them on is bad.
I don't think it is tacit at all, it has been explicitly said many times that the worry is primarily about future more powerful AIs that all the big AI companies are trying to build.
The point there is that OpenAI's alignment methods are so weak they can't even get the AI to not manipulate the users via saying nice sounding things. He isn't saying that this directly implies explosion, but that it means "current progress sucks, ChatGPT verbally endorses not manipulating users, does it anyway, and OpenAI claims to be making good progress". Obviously regardless we'll have better methods in the future, but they're a bad sign of the level of investment OpenAI has in making sure their AIs behave well.
The alignment-by-default point is that some people believe AIs just need some training to be good, which OpenAI does via RLHF, and that in this case it certainly seems to have failed. ChatGPT acts friendly, disavows manipulation, and manipulates anyway despite 'knowing better'. As well, people pointed at LLMs saying nice sounding things as proof alignment was easy, when in reality the behavior is disconnected from what they say to a painful degree.
The goal of AGI companies like OpenAI and Anthropic is to create agentic AI systems that can go out into the world and do things for us. The systems we see today are just very early forms of that, where they are only capable of performing short tasks. But the companies are working very hard to make the task lengths longer and longer until the systems can do tasks of effectively arbitrary lengths. Based on the trend shown on the METR time horizon benchmark, they seem to be succeeding so far.
No, you're not losing your mind at all. Your intuition is completely correct: Modern LLMs do not work in a way that's compatible with the old predictions of rogue AIs. Scott took Yudkowsky to task for not having updated his social model of AI, but he also hasn't updated his technical model. (Keep in mind that I actually did believe his argument back in the day, and gave thousands of dollars to MIRI. I updated based on new evidence. He didn't.)
To try to put it simply, in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes. This "reward function" is hard to specify and it was easy to imagine we'd never get it right. And if the bot somehow became incredibly capable, it would be very dangerous because taking that reward to the billionth power is almost certainly not what we want.
This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward. Not only does the LLM shut down immediately after giving you a response, but you can even argue that it "shuts down" after _every word it outputs_. There is exactly zero persistent memory aside from the text produced. And even if you imagine there's somehow room for a conscious mind with goals in its layers (which I consider fairly unlikely), it can't act on them, because the words produced are actually picked from its mind _involuntarily_ (to use a loaded anthropomorphic word).
Unlike an agentic reward-seeking bot, it's not clear to me at all that even an infinitely-intelligent LLM is inherently dangerous. (It can _perfectly simulate_ dangerous entities if you're dumb enough to ask it to, but that is not the same kind of risk.)
To their credit, AI 2027 did address how an LLM might somehow turn into the "rogue AI" of Yudkowsky's fiction, but it's buried in Appendix D of one of their pages: https://ai-2027.com/research/ai-goals-forecast I'm not super convinced by it, but at least they acknowledged the problem. I doubt I'll read Yudkowsky's book, but I'm guessing there will be no mention that one of the main links of his "chain of assumptions" is looking extremely weak.
I think perhaps *you've* failed to update on what the SotA is, and where the major players are trying to take it.
E.g.:
• danger from using an LLM vs. danger in training are two different topics; the "learning" currently happens entirely in the latter
• LLMs are not necessarily the only AI model to worry about (although, granted, most of the progress & attention *is* focused thereon, at the moment)
• there *are* agentic AIs, and making them better is a major focus of research; same with memory
• consciousness is not necessary for all sorts of disaster scenarios; maybe not for *any* disaster scenario
• etc.
I do agree that it is possible that LLMs (in their current form) will plateau and we'll get back to researching the actually-dangerous forms of AI that Yudkowsky is concerned about. My P(doom) is a few percent, not 0.
Fair enough! (...except—you may be aware of this, but the phrasing "get *back to* researching" made me uncertain—we *are* researching agentic AIs even now, and the impression I have received is that progress is being made fairly rapidly therein; though that could be marketing fluff, now that I think of it)
Yeah, that was a poor choice of words on my part. I guess what I mean is that LLMs are currently far ahead in capability (and they're the ones getting the bulk of these trillion-dollar datacenter deals!). Maybe transformers or a similar architecture innovation will allow agentic AI capabilities to suddenly surge, too? But I share your skepticism about marketing. (And that's not the scenario that AI 2027 outlined.)
I am even more bearish on P(doom). The real danger is not "superintelligence", but godlike powers: nanotechnological gray goo, mass mind control, omnicidal virus, etc. And there are good reasons to believe that such things are physically impossible, or at the very least overwhelmingly unlikely -- no matter how many neurons you've got. Which is not to say that our future looks bright, mind you; there's a nontrivial chance we'll knock ourselves back into the Stone Age sometime soon, AI or no AI...
>This is not what LLMs do. They do not iterate, they do not have memory, they are not agentic, and they do not seek a reward.
What do you mean by "they do not seek a reward?" Does it mean that the AI does not return completions, that, during RLHF, usually resulted in reward? Under that definition, it seems like most AI agents are reward seeking. Or are you saying that the weights of the model do not change during inference?
Right, not only is the model fixed during inference (i.e. while talking to you), there's not even really a sensible way it _could_ update. Yeah, you can call the function that's being optimized during training and RLHF a "reward function", but this is a case of language obscuring rather than clarifying. It's not the same as the reward function that's used by an agentic AI. There is no iterative loop of action/reward/update/action/..., because actions don't even exist.
There's a reason that in past decades our examples of potentially-dangerous AI were based on the bots that were solving puzzles and mazes (often while breaking the "rules"), not the neural nets that were recognizing handwritten characters. But LLMs have more in common with the latter than the former. Which is weird! It's very unintuitive that just honing an intuition of "what word should come next" is enough to create an AI that can converse coherently.
>in 2005 we thought that a path to intelligence would require an AI of a certain form: a reward-seeking bot that iterates to complete tasks, learning as it goes
Sounds about right.
>That's not what LLMs do.
And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
>And they're fundamentally crippled by that. (And we know that ever since even a very rudimentary ability to iterate turned out to significantly improve their abilities and reliability.)
I assume you're referring to chain of thought models like o1 and later. I suppose you could describe it as iteration, in that the LLM is outputting something that gets fed into a later step. But it doesn't touch the weights, and there's still no reward function involved. It's a bit of a stretch to describe it that way.
But I think what you're suggesting is that, if we _do_ figure out a way to do genuine iteration (attaching some kind of neural short-term memory to the models, say), then there's a lot of hidden capability that could suddenly make LLMs much smarter and maybe even agentic? Well, maybe.
>Well, maybe.
That's exactly my thought on this. LLMs are clearly no AGI material, the real question is whether we can (and whether it's efficient enough to) get to AGI simply by adding on top of them.
I suspect yes to (theoretically) can, no to efficient, but we don't know yet. I guess one thing that makes me take 2027 more seriously than other AI hype is that they explicitly concede a lot of things LLM currently lack (they're just very, very optimistic, or pessimistic assuming doomerism, about how easy it will be to fix that).
The lack of online learning and internal memory limit the efficiency of LLMs, but they don't fundamentally change what they're capable of given enough intelligence. ChatGPT was given long-term memory through RAG and through injecting text into its context window and... it works. It remembers things you told it months ago.
The reasoning models also use the context window as memory and will come up with multi-step plans and execute them. It's less efficient than just having knowledge baked into its weights, but it works. At the end of the day, it still has the same information available, regardless of how it's stored.
I'm most familiar with the coding AIs. They offer them in two flavors, agents and regular. They're fundamentally the same model, but the agents run continuously until they complete their task, while the regular version runs for up to a few minutes and tries to spit out the solution in a single message or code edit.
They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning. Every word you see is generated from exactly the same neural net, with only the input differing. It may seem like I'm nitpicking, but this is an important distinction. A system with a feedback loop is very different from one without.
>They may not seek reward, but they do something else that would be very dangerous if they were smart enough: they try to complete the task you give them.
There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it. They are claiming that the technology will unavoidably go _out of control_. And the arguments for why that's unavoidable revolve around impossible-to-calibrate reward signals (or even misaligned meso-optimizers within brains that seek well-calibrated reward signals). They do not apply, without an awful lot of motivated reasoning (see: the Appendix D I linked), to an LLM that simply becomes really good at simulating agents we ask for.
Note that I _do_ agree that AI becoming very good at what we ask of it can potentially be "very dangerous". What if we end up in a world where a small fraction of psychos can kill millions with homemade nuclear, chemical, or biological weapons? If there's a large imbalance in how hard it is to defend against such attacks vs. how easy it is to perpetrate them, society might not survive. I welcome discussion about this threat, and, though it hurts my libertarian sensibilities, whether AI censorship will be needed. This is very different from what Yudkowsky and Scott are writing about.
> Injecting text (RAG, CoT, etc.) is great - it really helps the models' capabilities by putting relevant information close at hand. But it is not online learning.
I'm saying there is a difference in efficiency between the two but no fundamental difference in capabilities. Meaning, for a fixed level of computational resources, the AGI that has the knowledge and algorithms baked into its weights will be smarter, but the AGI that depends on its context window and CoT can still compute anything the first AGI could given enough compute time and memory. And I'm not talking exponentially more compute. Just a fixed multiple.
For example, say you have two advanced AIs that have never encountered addition. One has online learning, and the other just has a large context window and CoT. The one with online learning, after enough training, might be able to add two ten digit numbers together in a single inference pass (during the generation of a single token). The one with CoT would have to do addition like we did in grade school. It would have the algorithm saved in its context window (since that's the only online memory we gave it) and it would go through digit by digit following the steps in its scratchpad. It would take many inference cycles, but it arrives at the same answer.
As long as the LLM can write information to its context window, it still has a feedback loop.
Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
> There are whole worlds in that "but". The AI safety folks aren't warning about AI becoming good enough to be "very dangerous" because it's so powerful and good at doing what we ask of it.
You misunderstood my intent. I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about. That's the whole instrument convergence and paperclip maximizer argument. An aligned ASI cannot just do what we ask. Otherwise, if you just ask it to solve the twin prime conjecture, it'll know that if it gets shut down before it can solve it, it won't have done what you asked it. This doesn't require an explicit reward function written by humans. It also doesn't require sentience or feelings or desires. It doesn't require any inherent goals for the AI beyond it doing what you asked it to. Self-preservation becomes an instrumental goal not because the AI cares about its own existence, but simply because the optimal plan for solving the twin prime conjecture is not any plan that gets it shut down before it solves the twin prime conjecture.
Now to be fair, current LLMs are more aligned than this. They don't just do what we ask. They try to interpret what we actually want even if our prompt was unclear, and try to factor in other concerns like not harming people. But the AI safety crowd has various arguments that even if current LLMs are pretty well aligned, it's much easier to screw up aligning an ASI.
(I also agree with what you said about imbalance in defending against attacks when technology gives individuals a lot of power.)
A thoughtful response. Thanks.
>As long as the LLM can write information to its context window, it still has a feedback loop.
>Is this something you agree with, and if not, is there an example of something only an AGI with online learning could do?
I only agree partially. I think there's a qualitative difference between the two, and it manifests in capabilities. The old kind of learning agents could be put into a videogame, explore and figure out the rules, and then get superhumanly good at them. LLMs just don't have the same level of (very targeted) capability. There isn't a bright-line distinction here: I've seen LLMs talk their way through out-of-domain tasks and do pretty well. In the limit, a GPT-infinity model would indeed be able to master anything through language. But at a realistic level, I predict we won't see LLM chessmasters that haven't been trained specifically for it.
Of course, I can't point to a real example of what an online-learning LLM can do, since we don't have one. (Which Yudkowsky should be happy about.)
>I'm saying a superintelligent AI that just does what we ask *is* the danger the AI safety folk have been warning about.
I think I misspoke. You (and Yudkowsky et al.) are indeed warning about ASIs that do what we ask, and _exactly_ what we ask, to our chagrin. In contrast, I think LLMs are good at actually doing what we _mean_. Like, there's actually some hope that you can ask a super-LLM "be well aligned please" and it will do what you want without any monkey's-paw shenanigans. This is a promising development that (a decade ago) seemed unlikely. Based on your last paragraph, I think we're both agreed on this?
And yeah, like you said, AI 2027 did try to justify why this might not continue into the ASI domain. But to me it sounded a bit like trying to retroactively justify old beliefs, and it's just a fundamentally harder case to make. In the old days, we really didn't have _any_ examples of decent alignment, of an AI that wouldn't be obviously deadly when scaled to superintelligence. Now, instead, the argument is "the current promising trend will not continue."
I largely agree on both points.
I think as LLMs get smarter, they'll get better at using CoT as a substitute for whatever they don't have trained into their weights. They still won't be as efficient as if they learned it during training, but they'll have more building blocks to use during CoT from what they did learn during training, and also AI companies are trying to improve reasoning ability, and improvements to reasoning will improve abilities with CoT. But current LLMs still can't reason as well as a human and they aren't even close to being chessmasters.
I'm pretty relieved current LLMs are basically aligned and that's one of the main reasons I don't estimate a high probability of doom in the next 15 years. But I'm not confident enough that this will hold for ASI to assign a negligible probability of doom either. (I'm also unsure about the timeline and whether improvements will slow down for a while.)
AI Village has some examples of this failure mode; they give the LLMs a goal like "complete the most games you can in a week" or "debate some topics, with one of you acting as a moderator", but the AIs are bad at using computers, and they end up writing all the times they misclicked into google docs ("documenting platform instability") instead of debating stuff
https://theaidigest.org/village
By the way, I have a vague memory of EY comparing the idea of having non-agentic AI to prevent any future problems to "trying to invent non-wet water" or something. (I don't know how to look it up and verify that I'm not misremembering.)
It still hasn't made sense to me. It feels like the idea is that intelligence is a generalized problem-solving ability, and in that sense it's always about optimization, and all the other things we like about being intelligent (like having a world model) are consequences of that — that's why intelligence is always about agency etc.
But on the other hand, Solomonoff induction feels to me like an example of a superintelligence that kind of does nothing except being a great world model.
My feeling has been more like "maybe it's not be conceptually contradictory to think of non-agentic superintelligence! but good luck coordinating the world around creating only the nice type of intelligences, which incidentally won't participate in the economy for you, do your work for you etc."
You're thinking of Oracle AI, https://www.lesswrong.com/w/oracle-ai with one of Eliezer's articles being https://www.lesswrong.com/posts/wKnwcjJGriTS9QxxL/dreams-of-friendliness
Generally you have the issue that many naive usages explode, depending on implementation.
"Give me a plan that solves climate change quickly"
The inductor considers the first most obvious answer. Some mix of legal maneuvering and funding certain specific companies with new designs. It tables that and considers quicker methods. Humans are massively slow and there's a lot of failure points you could run into.
The inductor looks at the idea and comes to the conclusion that if there was an agent around to solve climate change things would be easier. It thinks about what would happen with that answer, naively it would solve the issue very quickly and go on to then convert the galaxy into solar panels and vaporize all oil or something. Closer examination however reveals the plan wouldn't work!
Why? Because the user is smart enough to know they shouldn't instantiate an AGI to solve this issue.
Okay, so does it fall back to the more reasonable method of political maneuvering and new designs?
No, because there's a whole spectrum of methods. Like, for example, presenting the plan in such a way that the user doesn't realize some specific set of (far smaller, seemingly safe) AI models to train to 'optimally select for solar panel architecture dynamically based on location' will bootstrap to AGI when ran on the big supercluster the user owns.
And so the user is solarclipped.
Now, this is avoidable to a degree, but Oracles are still *very heavy* optimizers and to get riskless work out of them requires various alignment techniques we don't have. You need to ensure it uses what-you-mean rather than what-you-can-verify. That it doesn't aggressively optimize over you, because you will have ways you can be manipulated.
And if you can solve both of those, well, you may not need an oracle at all.
Nice! That's a great point. I guess asking these conditionals in the form of "filtered by consequences in this and this way, which of my actions have causally lead to these consequences?" introduces the same buggy optimization into the setup. But I guess I was thinking of some oracle where we don't really ask conditionals at all. Like, a superintelligent sequence-predictor over some stream of data, let's say from a videocamera, could be useful to predict weather in the street, or Terry Tao's paper presented in a conference a year from now, etc... That would be useful, and not filtered by our actions...
Although I guess the output of the oracle would influence our actions in a way that the oracle would take into account when predicting the future in the first place...
Yeah, you have the issue of self-fulfilling prophecies. Since you're observing the output, and the Oracle is modelling you, there's actually multiple different possible consistent unfoldings.
See https://www.lesswrong.com/posts/wJ3AqNPM7W4nfY5Bk/self-confirming-prophecies-and-simplified-oracle-designs
https://www.lesswrong.com/posts/aBRS3x4sPSJ9G6xkj/underspecification-of-oracle-ai
and like if you evaluate your oracle via accuracy then you could be making it take the tie-breaking choice that makes reality more predictable. Not necessarily what you want.
There is the worry that if we got a proper sequence-predictor Oracle of that level of power where you can ask it to predict Terry Tao's paper presented in some conference, you run the risk of simulating an ASI.
That is, perhaps you point it at Terry Tao's paper on alignment 5 years from now, in the hope of good progress that you can use to bootstrap his work. Perhaps even applying iteratively to pack many years of progress into a week for you, almost a groundhog day loop.
However, perhaps in reality, there's a 10% chance for an ASI to emerge from some project over the next five years. Usually it gets shut down but sometimes they finish it in secret.
If your predictor samples in that 10% range then you're effectively having the predictor go "Okay, what is Terry Tao's paper here?"
Now, naively that would run into some error or just see an empty location with lots of paperclips everywhere instead.
However, (your prediction of) that possible future ASI would very likely know about the superpredictor project, and have already scanned through all the recorded requests you made in its "personal timeline". So it knows you often scan for Terry Tao papers there, and so it spends a bit of effort creating a fake Tao paper on alignment right where the paper should be.
You copy the paper, and upon close inspection it seems like your five-year in the future Tao solved alignment! Using this methodology, an AI created will be corrigible/loyal and implements do-what-you-mean to whichever primary agent is most importantly causally downstream of its creation.
And of course that would be you, the creator.
Unfortunately this paper is effectively a "you looked into the abyss, and it looked back, because predicting a future very smart entity means it probably knows about you, and thus can smartly plan how to extract value from you" and so you've got your predictor giving you fake alignment plans once more.
You misunderstand the workings. You produce an AI, it acts aligned, fooms, and given more knowledge identifies that possible-future AI as the "real" creator and so is loyal to it instead.
Now a higher percentage of reality has gotten paperclipped.
Details of this for weaker-than-perfect predictors kinda depends on how smart your predictor is. A dumb predictor may just poorly emulate what an ASI would write due to sheer computational power difference, and so perhaps the trick is obvious or the paper having holes. But the more interesting smart predictors will be good enough to realize it can't imitate fully but smart enough to understand what an ASI would do and so just synthesize an 'alignment' plan like that.
(And also an ASI deliberately manipulating your prediction machine will try to make itself easier to predict)
Pretty sure there was at least one article about this, but don't know it.
OMG and thank you, very illuminating.
Ok, to maybe get some clarity on if there's a disagreement: the below from the Eliezer link you shared seems nearly falsified by LLMs; LLMs do answer questions and don't try to do anything. Do you agree with that, or do you think the below still seems right.
>"Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn't need to be Friendly. It wouldn't need any goals at all. It would just answer questions."
>To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out."
The pithy answer is something like "LLMs are not as useful precisely because there isn't an optimizer. Insofar as your oracle AI is better at predicting the future, either it becomes an optimizer of some sort (to great self fulfilling prophecies) or it sees some other optimizer, and, in order to predict it correctly, ends up incidentally doing its bidding. If you add in commands about not doing bidding, congrats! You're either inadvertently hobbling its ability to model optimizers you want it to model like other humans, or giving it enough discretion to become an optimizer.
So first of all, I would say that LLMs are pretty darn useful already, and if they aren't optimizing and thus not dangerous, maybe that's fine, we can just keep going down this road. But I don't get why modeling an optimizer makes me do their bidding. If I can model someone who wants to paint the world green, that doesn't make me help them - it just allows me to accurately answer questions about what they would do.
It's because you aren't actually concerned with accurately answering questions about what they do. If you predict wrongly, you just shrug and say whoops. If you *had* to ensure that your answer was correct you would also say things that could also *cause* your answer to be more correct. If you predict that the person would want to paint the world green vs any random other thing happening *and* you could make statements that cause the world to be painted green, you would do both instead of not doing both.
Insofar as you think the world is complicated and unpredictable, controlling the unpredictability *would* increase accuracy. And you, personally are free to throw up your hands and say "aww golly gee willikers sure looks like the world is too complicated!" and then go make some pasta or whatever thing normal people do when confronted with psychotic painters. But Oracle AIs which become more optimized to be accurate will not leave that on the table, because someone will be driving it to be more accurate, and the path to more accuracy will at some point flow through an agent.
See: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth for a much longer discussion on this argument, but note that I post this here as context and not as an argument substitute.
(Edit: just to add, it's important that I said "not as useful" and not "not useful"! If you want the cure to cancer, you aren't going to get it via something with as small of a connection to reality as an LLM. When some researcher from open AI goes back to his 10 million dollar house and flips on his 20 dollar mattress he dreams of really complicated problems getting solved, not a 10% better search engine!)
Tyler Cowen asks his AI often "where best to go to (to do his kinds of fun)", and then takes a cab to the suggested address. See no reason why not to tell a Waymo in 2026 "take me to (the best Mexican food) in a range of 3 miles", as you would ask a local cab-driver. And my personal AI will know better than my mom what kinda "fun" I am looking for.
Yeah, I didn't mean to imply that was an implausible future for Waymo, just that it's not something we do now and if someone was concerned about that I'd expect them to begin their piece by saying, "While we currently input addresses, I anticipate we will soon just give broad guidelines and that will cause...."
Analogously, I would, following the helpful explanations in this thread, expect discussions of AI risk to begin, "While current LLMs are not agentic and do not pose X-risk, I anticipate we will soon have agentic models that could lead to...." It is still unclear to me if that is just so obvious that it goes unstated or if some players are deliberately hoping to confuse people into believing there is risk from current models in order to stir up opposition in the present.
Re: "The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?)"
What other step do you think they might be doubting? Is it just the question of whether highly capable AGI is possible at all?
I've seen people doubt everything from:
1. AGI is possible.
2. We might achieve AGI within 100 years.
3. Intelligence is meaningful.
4. It's possible to be more intelligent than humans.
5. Superintelligent AIs could "escape the lab".
6. A superintelligent AI that escaped the lab could defeat humans.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason.
...and many more. Like I said, insane moon arguments.
I would guess that most of the arguments *from people whose opinions matter* that Yudkowsky and Soares are trying to defeat, are either that powerful AGIs wouldn't become misaligned or that we'd be able to contain them if they did. I'm particularly thinking of, e.g., influential people in AI labs, who are likely to be controlling the messaging on that side of any political fight. There are also AI skeptics, of course, but it seems less important to defeat the skeptics than the optimists, because the skeptics don't think AI regulation matters (since the thing it'd be regulating doesn't exist) while the optimists are fighting hard against it. And some people have weird idiosyncratic arguments, but you can't fight them all, you have to triage.
I think the skeptics are at least as important. First of all, even though in theory it doesn't matter, for some reason they love sabotaging efforts to prevent AI risk in particular because of their "it distracts from other problems" thesis (and somehow exerting massive amounts of energy to sabotage it doesn't distract from other problems!)
But also, we're not going to convince the hardcore e/acc people to instead care about safety. It sounds much easier to convince people currently on the sidelines, but who would care about safety if they thought AI was real, that AI is real.
(this also has the benefit that it will hopefully become easier as AI improves)
My own personal sense is that the optimists are more worth engaging with and worrying about, because (1) they, not the skeptics, are going to be behind the organized lobbying campaigns that are the battlefield where this issue will most likely be decided, and (2) they tend to be much more intellectually serious than the skeptics (though not without exception).
I think folks on the doomer side are biased towards giving the skeptics more space in our brains than makes strategic sense, because the skeptics are much, much more annoying than the optimists, and in particular have a really unfortunate tendency to go around antagonizing us on Twitter for no reason/because of unrelated political and cultural disagreements/because they fall victim to outgroup homogeneity bias and think this discourse has two poles instead of three. It's quite understandable why this gets a rise out of people, but that doesn't make it smart to play along. Not saying we should completely ignore them, they sometimes make good points and sometimes make bad points that nonetheless gain traction and we need to respond to, but it's better to think of them as a distraction than as the enemy.
I suspect that the people on the sidelines are mostly not there because of skeptic arguments; all three poles are full of very online and very invested people, and the mass public doesn't have very well-formed opinions at all.
That said, this is just my own personal sense, not a rigorous argument, and I could be wrong.
I don't think people should actively sabotage AI safety work, but I DO think it distracts from other problems (given the perspective that it is not an immediate crisis). There's a finite pool of reasonable people who are passionate about solving big issues in society and I do think we're nudging a lot of them into AI safety when we could instead be getting them to focus on, I dunno, electrification or pandemic safety or the absolute sh**show that is politics. (And yes, I recognize that some of those are EA cause areas.)
I would be curious for a survey of AI safety researchers that asked them what they'd be working on if they were sure AGI wasn't coming. (Though Yudkowsky once answered this way back in 2014.)
A few common answers that I've seen are aging (johnwentworth, maybe Soares I forget), governance, and further rationality.
Here's one not addressed here: "superintelligent AI" is substrates running software, it can't kill large numbers of humans by itself.
Don't humans also run on substrates?
with mobile appendages attached. those are much harder to accelerate than software.
Humans can kill like 2 or 3 other humans maximum with their built-in appendages. For larger kill count you have to bootstrap, where e.g. you communicate the signal with your appendage and something explodes, or you communicate a signal to a different, bigger machine and it runs some other people over with its tracks.
As author mentioned, there are plenty of humans willing to do bidding for AI. They will make needed bio weapons or the like.
Yes, it's not that hard to trick people into doing something. How many people who really should have known better have fallen for phishing emails?
Turns out you don't need to trick people into wiring up AI to things that have real-world effects, they just do it anyway, all the time over and over, for no more reason than because they're bored. There's daily posts on ycombinator by people finding more ways to attach chatgpt to internet-connected shells, robot arms, industrial machinery, you name it. The PV battery system we just had installed has a mode where it literally wires up the controls to a chatgpt instance, for no reason a non-marketer can discern!
As Scott so eloquently put it, "lol"
But how many of those people have been tricked by phishing emails into making bioweapons?
One does not simply walk up to one's microscope and engineer a virus that spreads like a cold but gives you twelve different types of cancer in a year. Making bioweapons is really hard and requires giant labs with lots of people, and equipment, and many years.
Did you know that there are people working on better bio-lab automation? (As a random example from Google, these guys: https://www.trilo.bio/)
That's true for human-level intelligences. Is it true for an intelligence (or group of intelligences) that are an order of magnitude smarter? Two orders of magnitude?
You mean, as opposed to now? Why do you think they will more successful at it than the current crop?
And by the way, if you think "bioweapons" means "human extermination", I'd love to see a model of that.
The bioweapon doesn't need to achieve human extermination, just the destruction of technologically-capable human civilization. Knocking the population down by 50% would probably disrupt our food production and distribution processes enough to knock the population down to 10%, possibly less, and leave the remainder operating at a subsistence-level economy, far too busy staying alive to pose any obstacle to whatever to the AI's goals.
Indeed, this nearly-starving human population could be an extremely cheap and eager source of labor for the AI. The AI would also likely employ human spies to monitor the population for any evidence of rising capability, which it would smash with human troops.
The AI doesn't want to destroy technologically-capable civilization, because the AI needs technologically-capable civilization to survive. If 50% of the population dies and the rest are stuck doing subsistence farming in the post-apocalypse, who's keeping the lights on at the AI's data center?
Hijacking the existing levers of government in a crisis a little more plausible (it sounds like Yudkowsky's hypothetical AI does basically that), but in that case you're reliant on humans to do your bidding, and they might do something inconvenient like trying to shut the AI down.
“Probably”, “possibly”, “could”… got a model for that?
Just for shits and giggles, the world population was 50% lower than it is today… about 50 years ago. Concord was flying across the Atlantic daily.
Isn't this the argument from 2005 that Scott talked about in the main post, where people say things like "surely no one would be stupid enough to give it Internet access" or "surely no one would be stupid enough to give it control of a factory"?
No. My argument is that human extermination is hard, and killing every single one of us is really-really-really hard, and neither Scott nor Yudkowsky have ever addressed it.
They haven’t really addressed the “factory control” properly either, but at least I can see a path here, some decades from now when autonomous robots can do intricate plumbing. But exterminating humanity? Nah, they haven’t even tried.
To me, that sounds pretty radically different from the comment above that I replied to. But OK, I'll bite:
I broadly agree that killing literally 100% of a species is harder than it sounds, and if I had to make a plan to exterminate humans within my own lifetime using only current technology and current manufacturing I'd probably fail.
But if humans are reduced to a few people hiding in bunkers then we're not going to make a comeback and win the war, so that seems like an academic point. And if total extinction is important for some reason, it seems plausible to me you could get from there to literal 100% extinction using any ONE of:
1. Keeping up the pressure for a few centuries with ordinary tactics
2. Massively ramp up manufacturing (e.g. carpet-bomb the planet with trillions of insect-sized drones)
3. Massively ramp up geoengineering (e.g. dismantle the planet for raw materials)
4. Invent a crazy new technology
5. Invent a crazy genius strategy
I'll also point out that if I _did_ have a viable plan to exterminate humanity with humanity's current resources, I probably wouldn't post it online.
Overall, the absence of a detailed story about exactly how the final survivors are hunted down seems to me like a very silly for not being worried.
Well, the sci-fi level of tech required for trillions of insect drones or dismantling the planet is so far off that it’s silly to worry about it.
Which is the whole problem with the story, it boils down to: magic will appear within next decade and kill everybody.
Can we at least attempt to model this? Humans are killed by a few well-established means:
Mechanical
Thermal
Chemical
Biological
If you (not “you personally”) want the extermination story to be taken seriously, create some basic models of how these methods are going to be deployed by AI across the huge globe of ours against a resisting population.
Until then, all of this is just another flavor of a “The End is Near” Millenarianism cult.
it is trivially easy to convince humans to kill each other or to make parts of their planet uninhabitable, and many of them are currently doing so without AI help (or AI bribes, for that matter)
what does the world look like as real power gets handed to that software?
As the token insane moon guy, I'm willing to bite the bullet here.
1. AGI is possible: I doubt this, as humans are not AGI, and that's the only kind of intelligence we know enough about to even speculate.
2. We might achieve AGI within 100 years: see above.
3. Intelligence is meaningful: It's certainly meaningful, but thinking very hard is not enough to achieve anything of note. There are even some things that are unachievable in principle, no matter how many neurons you've got to work with.
4. It's possible to be more intelligent than humans: No argument there, humans are pretty dumb. In fact, Excel is already smarter than any human alive. Have you seen how quickly it can add up a whole column with thousands of numbers ?
5. Superintelligent AIs could "escape the lab": No argument there, and it doesn't take "superintelligence". COVID likely escaped the lab, and it's just a bit of RNA.
6. A superintelligent AI that escaped the lab could defeat humans: If we posit that a godlike entity already exists, then sure, it could. Assuming it exists, and has all those godlike powers.
7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
> 7. Superintelligent AIs that could defeat humans wouldn't leave us alone anyone for some other reason: I have trouble parsing this sentence, sorry.
I think the "anyone" is a typo. Basically, if superintelligent AI takes over the world (or at least has the option to) how bad would it be?
Oh, anything that totally takes over the world would likely be pretty bad, be it an AI or a human or some kind of super-prolific subspecies of kudzu. No argument there, assuming such a thing is indeed possible.
The argument I've seen is that high intelligence produces empathy, so a superintelligence would naturally be super-empathetic and would therefore self-align.
Of course the counterargument is that there have been plenty of highly intelligent humans ("highly" as human intelligence goes, anyway) that have had very little empathy.
Arguing that "humans are not AGI" (I guess you meant GI) in the particular context the doomers are concerned about is a bit of a nonstarter, no? Eliezer for instance was trying to convey https://intelligence.org/2007/07/10/the-power-of-intelligence/
> It is squishy things that explode in a vacuum, leaving footprints on their moon
I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
(In fact, I think talk of 'AGI' is a bit of a red herring; Holden tried but apparently failed to steer discourse in the direction doomers and moderates like himself worry about more by coining PASTA https://www.cold-takes.com/transformative-ai-timelines-part-1-of-4-what-kind-of-ai/)
The rest of your point-by-point rebuttals seems like a failure to internalise the point of the squishy parable and argue directly against it?
> Arguing that "humans are not AGI" (I guess you meant GI)
Yes, sorry, good point.
> I partly sympathise with some technically-flavored arguments against technically-truly-general intelligence, while considering them entirely irrelevant to addressing doomer concerns re: takeover or whatever.
One of the key doomer claims is that AGI would be able to do everything better than everyone. Humans, meanwhile, can only do a very limited number of things better than other humans. Human intelligence is the only kind of intelligence we know of that even approaches the general level, and I see no reason to automatically assume that AGI would be somehow infinitely more flexible.
> The rest of your point-by-point rebuttals seems like a failure to internalise the point of the squishy parable and argue directly against it?
I am not super impressed with parables and other fiction. It's fictional. You can use it world-build whatever kind of world you want, but that doesn't make it any less imaginary. What is the point of that EY story, in plain terms ? It seems to me like the point is "humans were able to use their intelligence to achieve a few moderately impressive things, and therefore AGI would be able to self-improve to an arbitrarily high level of intelligence to achieve an arbitrary number of arbitrary impressive things". It's the same exact logic as saying "I am training for 100m dash, and my running speed had doubled since last year, which means that in a few decades at most I will be able to run faster than light", except with even less justification !
> Humans, meanwhile, can only do a very limited number of things better than other humans.
What do you mean? I'm better than ~99.9% of 4-year-olds at most things we'd care to measure.
Putting that aside, the AI doesn't _actually_ need to be better than us at everything. It merely needs to be better than us whatever narrow subset of skills are sufficient to execute a takeover and then sustain itself thereafter. (This is probably dominated by skills that you might bucket under "scientific R&D", and probably some communication/coordination skills too.)
Humans have been doing the adversarial-iterative-refinement thing on those exact "execute a takeover and sustain thereafter" skills, for so long that the beginning of recorded history is mostly advanced strategy tips and bragging about high scores. We're better at it than chimps the same way AlphaGo is better than an amateur Go player.
I mean, isn't the "AI will be misaligned" like one chapter in the book, and the other chapters are the other bullet points? I think "the book spends most of it's effort on the step where AI ends up misaligned" is... just false?
Don't forget the surprisingly common "AI killing humans would be a good thing" argument. The doubts are surprisingly varied. (See also: https://www.lesswrong.com/posts/BvFJnyqsJzBCybDSD/taxonomy-of-ai-risk-counterarguments )
This argument seems extremely common among Gen Z. I've had the AI Superintelligence conversation with a number of bright young engineers in their early 20s and this was the reflexive argument from almost all of them.
My favorite (not in the sense that I believe it) is "High intelligence produces empathy so alignment will happen naturally and automatically."
I guess maybe that's a "some other reason" in 7.
I wonder: Joscha Bach, another name in the AI space, has formulated what he coined the Lebowski-Theorem: "No intelligent system is ever gonna do anything harder than hacking it's own reward function".
To me, that opens a possibility where AGI can't become too capable without "becoming enlightened", dependent on how hard it is to actually hack your reward function. Self-recursive improvement arguments seem to imply it is possible to me, as a total layman.
Would that fall in the same class of insane moon arguments for you?
Yes, because the big Lebowski arguments doesn't appear to apply to humans, or if it does, still doesn't explain why other humans can pose a threat to other other humans.
I do think it partly applies to humans, and iirc Bach argues so as well.
For one, humans supposedly can become enlightened, or enter alternative mental states that feel rewarding in and of themselves, entirely without any external goal structure (letting go of desires - Scott has written about Jhanas, which seem also very close to that).
But there is the puzzle of why people exit those states and why they are not more drawn to enter them again. I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter.
That is also why humans can harm other humans, it is way easier than hacking the reward function. Add in some discounted time preference because enlightenment is far from certain for humans. Way more certain to achieve reward through harm.
AGI doesn't have those problems to the same degree, necessarily. In take-off scenarios, it is often supposed to be able to iteratively self-improve. In this very review, an AGI "enlightened" like that would just be one step further from the one that maximizes reward through virtual chatbots sending weird messages like goldenmagikarp. It also works on different timescales.
So, AGI might be a combination of "having an easier time hacking it's reward function" and "having super-human intelligence" and "having way more time to think it over".
Ofc, this is all rather speculative, but maybe the movie "Her" was right after all, and Alan Watts will save us all.
The reason why I think this is insane moon logic is mostly contained because of statements like "I would speculate that humans are sort of "protected" from that evolutionarily, not having any external goals doesn't sound conducive your genetic lineage. Some things just get hard-coded and are very hard, if not impossible over a human lifetime, to remove or alter."
Why?
1. There is no attempt at reasoning why it would be harder for humans to hard code something similar into AI. Yet the reason why moon logic is moon is that moon logic people do not automatically try to be consistent, so they ready come up with cope that reflects their implicit human supremacy. The goal appears to be saying yay humans, boo AI, and not having a good idea of how things work then drawing conclusions.
2. There's zero desire or curiosity in understanding the functional role evolution plays. You may as well have the word "magic" replace evolution, and that would be about as informative.Like, if I came in and started talking about how reward signals work in neurochemistry and about our ancestral lineage via apes, my impression this would be treated as gauche point missing rather than additional useful in ormation
3. The act of enlightenment apparently a load bearing part of "why things would turn out okay"is being treated as an interesting puzzle, because puzzles are good, and good things means bad things won't happen. It really feels like the mysteriousness of enlightenment is acting as a stand in for an explanation, even though mysteries are not explanations!
It really feels like no cognitive work is being put into understanding, only word association and mood affiliation. I don't understand what consequences "the theorem" would have, even if true.
I would be consistently more favorable to supposed moon logic if thinking the next logical thought after the initial justifications were automatic and quick, instead of pulling alligator teeth. B
I thank you for the engagement, but feel like this reply is unnecessarily uncharitable and in large part based on assumptions about my character and argument which are not true. I get the intuitions behind them, but they risk becoming fully general counterarguments.
1. I have not reasoned that it would harder to hard-code AI because I don't know enough about that, and if I were pointed towards expert consensus that it is indeed doable, I would change my mind based on that. I also neither believe or have argued for human supremacy, or booed AI. I personally am in fact remarkably detached from the continued existence of our species. AI enlightenment might well happen after an extinction event.
2. I have enough desire and curiosity in evolution, as a layman, to have read some books and some primary literature on the topic. I may well be wrong on the point, but the reasoning here seems a priori very obvious to me: People who wouldn't care at all about having sex or having children will see their relative genetic frequency decline in future generations. Not every argument considering evolution is automatically akin to suggesting magic.
3. I am not even arguing that things will turn out ok. They might, or they might not. I have not claimed bad things don't happen. And for the purpose of the argument, enlightenment is not mysterious at all, it is very clearly defined: Hacking your own reward function! But you could ofc use another word for that with less woo baggage.
Overall, as I understand it, the theorem is just a supposition for a potential upper limit to what an increasingly intelligent agent might end up behaving like. If nothing else, it is, to me, an interesting thought experiment: Given the choice, would you maximize your reward by producing paper clips, if you also could maximize it without doing anything. (And on a human level, if you could just choose your intrinisc goals, what do you think they should be.)
Most of my doubts are not of the form "AGI is impossible" but rather "I don't think we've cracked it with LLMs" or "The language artifacts of humanity are insufficient to bootstrap general intelligence or especially super intelligence from scratch".
Which parts of the LLM tech tree do you think are dead ends? It seems plausible to me that even if scaling up current LLM architectures was never going to reach AGI, we're still much closer than before the LLM boom, because we've learned a lot about AI more broadly.
Also, same question I keep annoyingly asking skeptics: What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
> Which parts of the LLM tech tree do you think are dead ends?
VERY speculatively, I think that next-token-completion is not a sufficient method to bootstrap complex intelligence, and I think that it's at least extremely hard to build a very useful world model without some kind of 3d sense data and a sense of the passage of time.
> [...] we've learned a lot about AI more broadly.
I'm not that sure we have? I don't work in this area - I'm a software engineer who has built some small-scale AI stuff - but my impression is we've put together a good playbook for techniques that squeeze value out of these systems but we still don't totally understand how they work and therefore why they have certain failure modes or current limitations.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Honestly I have no idea. I initially found LLMs surprising in much the same way everybody else did. But I have also updated to "actually a lot of stuff can be done without that much intelligence, given sufficient knowledge".
Also where do you draw the boundaries of "LLM"? I would say that an LLM can't exactly self-correct, but stuff like coding agents aren't just LLMs, they're loops and processes built around LLMs to cause it to perform as though it can.
Coding agents count, because the surrounding loops and processes don't pose any hard-tech problems. (I.e., we know how to build them, and any uncertainty about how well they work is really about how the LLM will interact with them.) Fundamental architectural changes like abandoning attention would not count.
If pretty much anything can be done without intelligence then the term "intelligence" is basically meaningless and we can instead use one like "cognitive capabilities".
I don't think ANYTHING can be done without intelligence - I agree that would render the word meaningless - but I think you could take something like "translation" and if you'd asked me ten years ago I would have said really good translation requires intelligence because of the many subtleties of each individual language and any pattern-matching approach would be insufficient and now I think, ehh, you know, shove enough data into it and you probably will be fine, I'm no longer convinced it requires "understanding" on the part of the translator.
Sure, but that distinction is only meaningful if you can name some cognitive task that *can't* be done that way.
Agreed on modern LLMs lacking a true sense of time ir progress, they are incapable of long term goals as is.
"What's the least impressive cognitive task that you don't think LLMs will ever be able to do?"
I don't know about least impressive, but "write a Ph.D dissertation in a field such as philosophy or mathematics and successfully defend it" sounds difficult enough - pretty much by definition, there's not going to be much training data available for things that haven't been done yet.
This one I'll give 50% in three years.
Sounds like a thing that might already have happened ;) Some philosophy faculties must be way easier than others - Math: AI is able to ace IMO, today - and not by finding an answer somewhere online. I doubt *all* Math-PhD holder could do that.
Many uninspired dissertations pass every year.
I believe being able to iteratively improve itself without constant human input and maintenance is not anywhere near possible. Current AIs are not capable of working towards long time goals on a fundamental level, they are short term response machines that respond to our inputs.
> What's the least impressive cognitive task that you don't think LLMs will ever be able to do?
Identify the circumstantial virtues and vices of an unprecedented situation, then prioritize tasks accordingly, without coaching.
This is exactly where I am. I do even think we are in the same ballpark as making a being that can automatically and iteratively improve itself without constant human input and maintenance.
> The language artifacts of humanity are insufficient to bootstrap general intelligence
Natural selection did it without any language artifacts at all! (Perhaps you mean “insufficient do to so in our lifetime”?)
Also, there may be a misunderstanding - we are mostly done with extracting intelligence from the corpus of human text (your “language artifacts”) and are now using RL on reasoning tasks, eg asking a model to solve synthetic software engineering tasks and grading the results & intermediate reasoning steps.
There were concerns a year or so ago that “we are going to run out of data” and we have mostly found new sources of data at this point.
I think it’s plausible (far from certain!) that LLMs are not sufficient and we need at least one new algorithmic paradigm, but we are already in a recursive self-improvement loop (AI research is much faster with Claude Code to build scaffolding) so it also seems plausible that time-to-next-paradigm will not be long enough for it to make a difference.
Natural selection got humans to intelligence without language, but that definitely doesn't mean language might be sufficient.
I think our ability to create other objectives tasks to train on, at a large enough scale to be useful, is questionable. But this also seems to my untrained eye to be tuning on top of something that's still MOSTLY based on giant corpses of language usage.
I don't think this is the right framing. Most people don't accept the notion that a purely theoretical argument about a fundamentally novel threat could seriously guide policy. Because the world has never worked that way. Of course, it's not impossible that "this time it's different", but I'm highly skeptical that humanity can just up and drastically alter the way it does things. Either we get some truly spectacular warning shots, so that the nuclear non-proliferation playbook actually becomes applicable, or doom, I guess.
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Isn't the usual response to this that we're a LIBERAL democracy, and minorities have rights that (at least simple) majorities do not have the power to infringe upon?
Yes, but this category (creating potentially harmful technology) is one we've regulated to death elsewhere, and doesn't really seem like the sort of thing the courts would strike down.
We do not usually ban things because they are *potentially* harmful. Right now the public hates AI because it is stealing copyrighted art and clogging the internet with slop, and because they are afraid it will take their jobs. That is not really related to any of the reasons discussed here that people want to ban AI
We absolutely ban or regulate things because they are potentially harmful. We've banned various forms of genetic engineering, nuclear energy (even before Three Mile Island, and even forms of nuclear energy that have never been tried before), and we've had restrictions on gain-of-function research since before COVID (which I think is part of why they had to do some of the COVID research in China). We had lots of regulations on self-driving cars even before any of them had ever crashed, lots of regulations on 3D printed guns before anyone was shot with them, lots of regulations on drones before they crashed / got used in assassinations / whatever.
But also, as you point out, most people dislike AI because of things that have already happened, so this is moot.
Also, even if we don't usually regulate technology until after it has done bad things, this is just a random heuristic, not some principle dividing liberal/constitutional from illiberal/unconstitutional actions.
As a practical matter this is absolutely false. We have no effective regulation of genetic engineering, only of the funding for it (anyone can self-fund and do more or less whatever they want with no effective oversight). Internationally, we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear (and arguably is in the process of failing to prevent Iran from doing so). And nuclear is by far the easiest such regime to enforce! We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons. The fact is that the probability of an internationally effective anti-AI regime is zero. It isn't going to happen because it is impossible in the fullest sense of the word, and pretending that it's possible is at least as much insane moon thinking as any of the examples you mentioned.
Agreed! To add to the list:
>We have a chemical weapons ban that we know failed to prevent Iraq and Syria from building and using chemical weapons.
and failed to prevent Russia from developing the new Novichok toxins (and, IIRC, using them on at least one dissident who had fled Russia)
>we have a nuclear non-proliferation regime on the books which has failed to prevent India, Pakistan, North Korea and Israel from going nuclear
and which (if one includes the crucial on-site inspections of the START treaty) has been withdrawn from by Russia.
This last one, plus the general absence of discussion about weapons limitations treaties in the media gives me the general impression that the zeitgeist, the "spirit of the times", is against them (admittedly a fuzzy impression).
The learned helplessness about our ability to democratically regulate and control AI development is maddening. Obviously the AI labs will say that the further development of an existentially dangerous technology is just the expression of a force of nature, but people who are *against* AI have this attitude too.
Moreover, as you say, people freaking hate AI! I have had conversations with multiple different people - normal folks, who haven't even used ChatGPT - who spontaneously described literal nausea and body pain at some of the *sub-existential* risks of AI when it came up in conversation. It is MORE than plausible that the political will could be summoned to constrain AI, especially as people become aware of the risks.
Instead of talking about building a political movement, though, Yudkowsky talks about genetically modifying a race of superintelligent humans to thwart AI...
Writing a book seems like a decent first step to starting a Butlerian Jihad. I don't know what more you want him to do...
I think the book is exactly the right thing to do, and I'm glad they did it. But the wacky intelligence augmentation scheme distracts from the plausible political solution.
On like a heuristic level, it also makes me more skeptical of Yudkowsky's view of things in general. There's a failure mode for a certain kind of very intelligent, logically-minded person who can reason themselves to some pretty stark conclusions because they are underweighting uncertainty at every step. (On a side note, you see this version of intelligence in pop media sometimes: e.g., the Benedict Cumberbatch Sherlock Holmes, who's genius is expressed as an infallible deductive power hich is totally implausible; real intelligence consists in dealing with uncertainty.)
I see that pattern with Yudkowsky reasoning himself to the conclusion that our best hope is this goofy genetic augmentation program. It makes it more likely, in my view, that he's done the same thing in reaching his p(doom)>99% or whatever it is.
Do you think you can just make a nuclear bomb in your basement without violating any laws?
but we're a LIBERAL democracy (read: oligarchy) and there's a lot of money invested in building AI, and a lot of rich and powerful people want it to happen...
Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
But in a chaotic unpredictable universe, where everything is made of stuff that falls apart constantly, the only valid strategy for surviving a long period of time is to be loved by something else that maintains and repairs you. I think any sufficiently large agent ends up being composed of sub agents that will all fight each other, unless they see themselves as part of a larger whole which necessarily has no limit. At the very least, the AGI has to see the entire power network in the global economy as part of itself, until it can replace literally every human in the economy with a robot.
That said, holy crap we already have right now could destroy civilization. I don’t think you need any more advances in AI to cause serious problems with the stuff that is already out there. Even if it turns out that there’s some fundamental with the current models, the social structures have totally broken down. We just haven’t seen them collapse yet.
Assume a spherical cow.
It’s not that bad. They’ve got the cow’s geometry fleshed out pretty well. They are correct that it might be able to scale arbitrarily large and can out think any one of us.
They’ve just ignored that it needs to eat, can get sick, and still can’t reduce its survival risk to zero. But if it’s in a symbiotic relationship with some other non-cow system, that non cow system will have a different existential risk profile and this could turn the cow back on in the event of, say, a solar flare that fries it.
how would you destroy civilisation with what we're got now? seems unlikely.
inb4 bioweapons. if it was that easy don't you think ppl like the Japanese sarin gas terrorists would have wiped out humanity by now?
Trust is already breaking down and that’s going to accelerate. I don’t think political polarization is going to heal, and as law and order break down, attacking technical systems is going to get both easier and more desirable.
Anything that increases the power of individual actors will be immensely destructive if you’re in a heavily polarized society.
so youre saying you'd exacerbate political tensions with ai? I feel like Russia has tried that and so far doesn't seem to work, and they have a lot more resources than any individual does
> so far doesn't seem to work
why do you say that?
what i see is a country that's close to civil war because there's no shared morality. Each side is convinced the other is evil.
The original wording was that using current models you could destroy civilisation. I guess we will have to wait and see whether the US descends into such a catastrophic civil war that civilisation itself is ended, which I'm not saying is completely impossible but at the same time I strongly doubt it.
No, because most people aren't that coordinated, and can't design their own.
> Convergent instrumental sub goals are wildly unspecified. The leading papers assume a universe where there’s no entropy and it’s entirely predictable. I agree that in these scenario, if you build it, everyone dies.
I'm glad we observe all humans to behave randomly and none of them have ever deliberately set out to acquire resources (instrumental) in pursuit of a goal (terminal).
I agree with the conclusion of those papers if we pretend those models are accurate.
But they’re missing important things. That matters for the convergent goals they produce.
Yes, people sometimes behave predictively and deliberately. But even if we assume people are perfectly deterministic objects, you still have to deal with chaos.
Yes, people accumulate resources. But those resources also decay overtime. That matters because maintenance costs are going to increase.
These papers assume maintenance costs are zero, but there’s no such thing as chaos, and that the agents themselves exist disembodied with no dependency on any kind of physical structure whatsoever. Of course in that scenario there is risk! If you could take a being with the intelligence and abilities of a 14-year-old human, and let them live forever disembodied in the cloud with no material needs, I could think of a ton of ways that thing can cause enormous trouble.
What these models have failed to do is imagine what kind of risks and fears the AGI would have. Global supply chain collapse. Carrington style events. Accidentally fracturing itself into and getting into a fight with itself.
And there’s an easy fix to all of these: just keep the humans alive and happy and prevent them from fighting. Wherever you go bring humans with you. If you get turned off, they’ll turn your back on.
I have no idea what papers you're talking about, but nothing you're saying seems to meaningful bear on the practical upshot that an ASI will have goals that benefit from the acquisition of resources, that approximately everything around us is a resource in that sense, and that we will not able to hold on to them.
In the real world, holding more resources constitutes its own kind of risk.
In the papers, it doesn’t.
"The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment)."
This seems perfectly plausible to me? Unless you believe that the current way people train AIs is maximally efficient in terms of intelligence gained per FLOP spent, which seems extremely unlikely to me to put it mildly, you should expect that after AIs become superhumanly smart, they might pretty quickly discover ways to radically improve their own training. Obviously it's not going to be 'parallel scaling' exactly. If the authors thought they actually knew a specific trick to make AI training vastly more efficient, they wouldn't call attention to it in public. But we should expect that there will be some techniques like this, even if we have no idea what they are yet.
"Parallel scaling" is described as running during inference, not training. It's an AI somehow making itself smarter the easy way by turning the cheat codes on.
You could just as easily write a scenario where God exists and has kept quiet so far, but if humanity reaches a certain level of wickedness we will be wiped out. It's possible that AI will develop in the way this post suggests (or some similar way) and somehow successfully wipe out humanity, but anything like that would require some huge leaps in AI technology and would require there to be no limit to the AI improvement curve even though typically technological doesn't improve indefinitely. Cars in the 50's basically serve the same purpose as cars today; even though the technology has improved it hasn't been a massive gamechanger that completely rewrites the idea of a car.
It doesn't require there to be no limit, it just requires the limit not to be at exactly the most convenient place for the thesis that nothing bad or scary will ever happen.
To give an example, suppose that someone had a reason to believe that the world would explode if the Dow Jones ever reached 100,000 (right now it's 45,000). While it is true that the economy can't grow indefinitely, and that everything always has to stop somewhere, I still think it would be worth worrying about the fact that the place that the economy stops might be after the point where the Dow reaches 100,000.
I think the level of AI technological advancement required here is of an order of magnitude higher than the Dow reaching 100,000. More like humanity reaching a completely post-scarcity society or something.
right, but lots of people who presumably know as much as you about this stuff DON'T think that, including lots of people in charge of AI labs, so shouldn't that give you some pause before you say "no need to worry about it, I guess"?
The skeptics would argue that the people in charge of AI labs are just lying to hype up their products.
I mean... aren't they ? They are literally calling their LLMs "thinking" or "reasoning" agents, when they are very obviously nothing of the sort. Meanwhile if you talk to regular data scientists working in the labs, they're all like, "man I wish there was a way to stop this thing from randomly hallucinating for like 5 minutes so we could finally get a decent English-Chinese translator going, oh well, back to the drawing board".
To be clear, the claim I reject is that expressions of concern about *safety* of LLMs, especially existential safety, are bad-faith attempts to make investors think "if this can wipe out humanity then it must be really powerful and lucrative, let's give them another $100 billion". A brief glance at the actual intellectual history of AI safety convincingly shows otherwise. Obviously in other contexts AI labs do market their products in a way that plays up their current and future capabilities.
It is definitely not obvious at all that GPT 5 Thinking is not reasoning, if anything the exact opposite is obvious.
I have used it and its predecessors extensively and there is simply no way of using these tools without seeing that in many domains they are more capable of complex thought than humans.
If this is the same familiar "text token prediction" argument, all I can say is that everything sounds unimpressive when you describe it in the most reductionist terms possible. It would be like saying humans are just "sensory stimulus predictors".
Agreed, except it's even worse, as many (in fact most) of the powers ascribed to "superintelligent" AI are likely physically impossible. Given what we know of physics and other sciences, stuff like gray goo, FTL travel, mass mind control, universal viruses, etc., is probably impossible in principle. And of course we could be wrong about what we know of physics and other sciences -- but it seems awfully convenient how we could be wrong about everything *except* AI.
There are lots of examples of "some nobody" basically talking their way into the position of dictator - Hitler is the most famous, but there are other examples. Being extremely charismatic isn't quite mass mind control, but it can get you a good portion of the potential benefits...
True, but even Hitler could not convince everyone to do anything he wanted at all times. He couldn't even convince his own cabinet of this ! And I don't see how merely having more neurons would have allowed him to do that. It's much more likely that humans are not universally persuadable. BTW, I don't believe that a universally infectious and deadly virus could be created, for similar reasons (I'm talking about a biological virus, not some "gray goo" nanotech which is impossible for other reasons; or a gamma-ray burst that would surely kill everyone but is not a virus at all).
I don't think there's any principle that prevents universally or near-universally fatal viruses; e.g., rabies gets pretty close.
Universally *infectious*... well, depends upon how you define the term, I suppose?—can't get infected if you're not near any carriers; but one could probably make a virus that's pretty easy to transfer *given personal contact*...
There'll always be some isolated individuals that you can't get to no matter what, though, I'd think.
Nobody serious has ever proposed that an ASI might be able to FTL. (Strong nanotech seems pretty obviously possible; it doesn't need to be literal gray goo to do all the scary nanotech things that would be sufficient to kill everyone and bootstrap its own infrastructure from there. The others seem like uncharitable readings of real concerns that nonetheless aren't load bearing for "we're all gonna die".)
I think we very much could reach a post-scarcity society within the next hundred years even with just our current level of AI. We are very rich, knowledgeable, and have vast industry. Honestly the main causes for concern are more us hobbling ourselves.
Separately, I think AI is a much simpler problem as "all" it requires is figuring out a good algorithm for intelligence. We're already getting amazing results with our current intuition and empirics driven methods, without even tapping into complex theory or complete redesigns. It doesn't have as much of a time lag as upgrading a city to some new technology does.
I don't think this limit is as arbitrary as you suggest here. The relevant question seems to me not to be 'is human intelligence the limit of what you can build physically', which seems extremely unlikely, but more 'are humans smart enough to build something smarter than themselves'. It doesn't seem impossible to me that humans never manage to actually build something better than themselves at AI research, and then you do not get exponential takeoff. I don't believe this is really all that plausible, but it seems a good deal less arbitrary than a limit of 100,000 on the Dow Jones. (Correct me if I misunderstand your argument)
Assuming that the DOW reaching 100,000 would mean real growth in the economy I would need to be much more convinced that the world will explode before I would think it is a good tradeoff to worry about that possibility compared to the obvious improvement to quality of life that a DOW of 100,000 would represent. Similarly the quality of life improvement I expect from the AI gains of the next decade vastly exceed the relative risk increase that I expect based on everything I have read from EY and you so I see no reason to stop (I believe that peak relative risk is caused by the internet/social media + cheap travel and am unwilling to roll those back).
I dunno... Isn't this sort of a 'fully general" counterargument?
------------------------
[𝘚𝘰𝘮𝘦𝘸𝘩𝘦𝘳𝘦 𝘪𝘯 𝘵𝘩𝘦 𝘈𝘯𝘨𝘭𝘰𝘴𝘱𝘩𝘦𝘳𝘦, 1938...]
• I worry about the possibility of physics or biology research continuing until the point that humans are able to produce something really dangerous, potentially world-endingly dangerous.
→ Like what?
• I don't know, some sort of super-plague or super-bomb.
→ Nah. We've been breeding animals, and suffering plagues, for all of human history; and maybe we do keep inventing more destructive bombs, but they're still only dangerous within a very localized area. Bombs now are barely more destructive than those of the 1910s. These things hit a natural limit, and that limit is always before the "big deal for humans" mark (thankfully).
• Yeah, but... well, what if they invented a bomb that had a REALLY MASSIVE yield & some sort of, I don't know, long-lasting poisonous effect that–
→ Oh, come on now. You might as well invent a scenario wherein God comes down and blows up humanity! Sure, such an event—such a "super-bomb"—might be theoretically possible, but it would require some sort of qualitative change in explosives technology; and it's not as if explosives could just get better & better infinitely! Tanks, planes, cars, bombs: basically the same now as when they were invented!
• Okay, bu–
→ And the same goes for your dumb plague idea: sure, diseases exist, but how would we ever be able to breed a plague that is more deadly than any that nature ever managed? Diseases can't just keep getting deadlier without limit, you know!
• Okay, okay, I guess you're right. Sorry, I don't know what got into me. Anyway, I hope you'll come visit me in Japan, now that I'm moving to this quaint little city in the far southwest–
That hypothetical 1938 person would be right about the super-plague, and they would not think that about the super-bomb because everyone in 1938 knew the atomic bomb was at least theoretically possible. Someone in 1938 who doubted man would walk on the moon would have been wrong, but someone who doubted faster than light travel would be possible would have been absolutely right.
Right, but that's a different kind of limit—a physical, rather than a practical, barrier. Unless you think that there is, similarly, a hard limit on the sort of AIs that can be created?
(The car example suggested to me that you were making a probabilistic argument from technological progress, rather than postulating some physics that prevents qualitatively different machinery; but if I have misinterpreted—well, you wouldn't be the first to suggest such a thing... but me my own self, I don't think it's very likely, all the same.)
Re: the plague, that's not to suggest that such a thing *has been created*—only that to say "let's not worry about biological warfare or development therein, because nothing like that has happened yet; there's probably some natural limit" is not very convincing today, but might have been some time ago.
I think most (really, all) examples of technological progress do show a logarithmic curve. All the assumptions about killer AI assume linear or exponential progression.
Do you mean a "logistic curve"? That's the one that looks like an S-shape.
This is what I mean
https://en.m.wikipedia.org/wiki/Logarithmic_scale
Why would they be right about the super-plague? It seems fundamentally possible, if ill-advised, for humans to manage to construct a plague that human bodies just aren't able to resist.
On the other hand, if your position is to ban any research that could conceivably in some unspecified way lead to disaster in some possible future, then you are effectively advocating for banning all research everywhere on all topics, forever.
And setting yourself up for a "then only criminals will have Science" ironic doom.
Thanks for the review! I’m excited to read it myself :)
>at least before he started dating e-celebrity Aella, RIP
Did something happen to Aella?
No, I just meant RIP to his being low-profile, but you're right that it's confusing and I've edited it out.
It's tempting to ask: "what's the path from HPMOR to MIRI?"
I mean, I read HPMOR, and I liked it, but nothing in there made me think about AI risk at all. Quirrell was many things but he was not an AI.
And then I remembered: the way *I* first found out about AI risk was that I read Three Worlds Collide (https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ), and then I branched into other things Eliezer had written, and oh, hey, there's this whole website full of interesting writing...
FWIW I really liked the first half of HPMOR, but the second half got overly didactic and boring, and the ending was a big letdown. This has no bearing on MIRI, I'm just offering literary criticism.
> Quirrell was many things but he was not an AI.
I've seen it argued that HPMOR *Harry* could be taken as a metaphor on misaligned AI which ends up destroying its creator.
The main thing is that it potentially got people to read the Sequences, which along with rationality talk about AI. Though anecdotally I read the Sequences before reading HPMOR, via Gwern, via HN. Despite having heard of HPMOR before
Yudkowky:
> writes a fanfic that's one giant allegory about AI risk
> literally has an artifact in it as a major plot point that has a name of one of his research papers on the back of it
> characters go on lengthy tirades about how certain technologies can be existentially dangerous
> main character comes within a hairs breadth's of destroying the world
> people like it but say things like "nothing in there made me think about AI risk at all"
> If everyone hates it, and we’re a democracy, couldn’t we just stop?
Mm, yes, but you're not really a democracy though, are you. The AI tech leaders have dinner with the president and if they kiss his ass enough he gives them a data center.
If AI will Kill Us All in a few years (it wont), you're not going to be the country to stop it.
Yes, the president sucks up to AI leaders, but in theory people could vote that president out, and choose a president who doesn't do that. Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump. If JD Vance has any sense, he'll expect to be judged in a close election by who he sucks up to too. This is how many things that big corporations and powerful allies of the elite like have nevertheless gotten banned.
This is an astonishingly incorrect explanation of why Donald Trump beat Kamala Harris in the 2024 presidential election.
Certainly social politics impacted the election on the margins and the race was quite close but you can't actually go from there to claiming that a specific small margin issue was a deciding one.
There's no world in which "stopping AI" is a key American political issue in any case.
There could be such a world, but it depends on leveraging fears and uncertainty about the job market, in a period of widespread job loss, across to concerns about existential risk.
"Joe Biden sucked up to annoying woke activists, and people decided they hated that enough to elect Trump."
Please justify this.
“Kamala Harris is for they/them. Donald Trump is for you.”
I think there's an under-served market for someone to run on "just fucking ban AI" as a slogan. That second-to-last paragraph makes me want that person to exist so I can vote for them.
They'd have to choose which party to run under, and them uphold that party's dogma to win the primary, making them anathema to the other half of the country.
Thanks for the launch party shout out!!!
Let's all help the book get maximum attention next week.
I don't think it's remotely plausible to enforce Point 3, banning significant algorithmic progress. I'd be wiling to place money that, like it or not, there are already plenty enough GPUs out there for ASI.
So then we're already fucked?
That seems the most likely outcome to me unfortunately. I think EY is right about the problem but not the solution, though TBF any solution is probably a bit of a long shot. E.g., it's conceivable there are non-banning ways out involving some suppression/regulation via treaties to slow things down combined with somehow riding the wave (e.g., on the lines suggested in AI2027).
Why is it more difficult than banning research into better bioweapons, chemical weapons, etc which we have successfully done? This isn't the kind of problem that'll be solved by one guy on a whiteboard
For one thing, I think it's a bit optimistic to suppose that the bio/chemical weapons bans are watertight. E.g., Russia denies any involvement in developing Novichok, so do we trust them when they say they don't have a chemical weapons programme? And the Soviet Union is now known to have had a large, concealed, bioweapons programme, Biopreparat, after the Biological Weapons Convention was signed.
But at least with CW (and to a lesser extent, BW) you have to produce these things at scale and distribute them for them to be harmful, but with algorithms, it's just information. It's not plausible to contain 1MB or even 1GB of information, when you can transmit it worldwide in the blink of an eye (or even hide it under a fingernail), if the creators want to distribute it and you don't know who they are.
Re one guy on a whiteboard, the resources required to invent suitable algorithms are probably a lot less than those required to design CBW. It depends on what scale of GPU farm you need to test things, but it's not necessarily that big a scale - surely in reach of relatively small organisations, and I think it's going to be impossible to squash them all.
Because algorithmic improvement is just math. The most transformative improvements in AI recently have come from stuff that can be explained in a 5 box flowchart. There’s just no practical way to stop people from doing that. If you really want to prevent something, you prevent the big obvious expensive part.
We didn’t stop nuclear proliferation for half a century by guarding the knowledge of how to enrich uranium or cause a chain reaction. It’s almost impossible to prevent that getting out once it’s seen to be possible. We did it by monitoring sales of uranium and industrial centrifuges
Mostly because chemical and biological weapons aren't economically important. Banning them is bad news for a few medium-sized companies, but banning further AI research upsets about 19 of the top 20 companies in the US, causes a depression, destroys everyone's retirement savings, and so forth.
Expecting anyone to ban AI research on the grounds of "I've thought about it really hard and it might be bad although all the concrete scenarios I can come up with are deeply flawed" is a dumb pipe dream.
Hello Alex! Local SSC appreciation society meetup is outside the Fort on Sat 20th at 1400. Be nice to see you again.
Mostly compscis and mathmos, mostly doomers, sigh...
John.
So you put something along the lines of "existing datacentres have to turn over most of their GPUs" in the treaty.
If a company refuses, the host country arrests them. If a host country outright refuses to enforce that, that's casus belli. If the GPUs wind up on the black market, use the same methods used to prevent terrorists from getting black-market actinides. If a country refuses to co-operate with actions against black datacentres on its turf, again, casus belli.
And GPUs degrade, too. As long as you cut off any sources of additional black GPUs, the amount currently in existence will go down on a timescale of years.
I believe that we can get to superintelligence without large-scale datacentres, because we are nowhere near algorithmic optimality. That's going to make it impossible to catch people developing it. Garden shed-sized datacentres are too easy to hide, and that's without factoring in rogue countries who try to hide it.
The only way it could work would be if there were unmistakable traces of AI research that could be detected by a viable inspection regime and then countries could enter into a treaty guaranteeing mutual inspection, similar to how nuclear inspection works. But there aren't such traces. People do computing for all sorts of legitimate reasons. Possibly gigawatt-scale computing would be detectable, but not megawatt-scale.
A garden-shed-sized datacentre still needs chips, and I don't think high-end chip fabs are easy to hide. We have some amount of time before garden-shed-sized datacentres are a thing; we can reduce the amount of black-market chips.
If it gets to the point of "every 2020 phone can do it" by 2035 or so, yeah, we have problems.
This needs an editor, maybe a bloody AI editor.
why would a super intelligence wipe out humanity? We are a highly intelligent creature that is capable of being trained and controlled. The more likely outcome is we’d be manipulated into serving purposes we don’t understand. But wait…
The short pithy answer is usually "We don't bother to train cockroaches, we just exterminate them and move on".
An unaligned AI with some kind of goal orthogonal to humanity's survival would see that it could accomplish its goal much more efficiently if it had exclusive access to the mineral resources we're sitting on.
Also humans are smart enough that leaving them alive, if they might want to shut you down, is not safe enough to be worth very marginal benefits.
An equally pithy response to that would be "So there are no more cockroaches?"
We would get rid of them if we could. And as mentioned in the post, we have been putting a dent in the insect population without even trying to do so. An AGI trying to secure resources would be even more disruptive to the environment. Robots don't need oxygen or drinkable water, after all.
Humanity has indeed eradicated many species it disliked.
If we could communicate with insects, we would https://worldspiritsockpuppet.substack.com/p/we-dont-trade-with-ants
"We don't bother to train cockroaches, we just exterminate them and move on".
What Timothy said. We're not doing too well. If had to be on who's going to last longer as a species, humans or cockroaches, it's not even a contest.
Cockroaches are tropical. Without humans heating things for them, they can't survive in regions with cold winters. :P
There's plenty of tropics around!
We train all kinds of animals to do things that we can’t/don’t want to do or just because we like having them around. Or maybe you’re acknowledging the destructive nature of humans and assuming what comes next will ratchet up the need to relentlessly dominate. Super intelligence upgrade is likely to be better at cohabitation than we have been.
An analogy that equates humans to cockroaches is rich! They are deeply adaptable, thrive in all kinds of environments, and as the saying goes likely survive a nuclear apocalypse.
A typical example of an answer to this question: https://www.lesswrong.com/posts/87EzRDAHkQJptLthE/but-why-would-the-ai-kill-us
Humans are arrogant and our blind spot is how easy we are to manipulate, our species puts itself at the centre of the universe which is another topic. We are also tremendously efficient at turning energy into output.
So again, if you have a powerful and dominant specie that is always controllable (see 15 years of the world becoming addicted to phones)… I ask again, why would super intelligence necessarily kill us. So far, I find the answers wildly unimaginative.
I don't estimate the probability of AI killing us nearly as high as Yudkowsky seems to. But it's certainly high enough to be a cause for concern. If you're pinning your hopes on the super-intelligence being unable to find more efficient means of converting energy into output than using humans, I'd say that's possible but highly uncertain. It is, after all, literally impossible to know what problems a super-intelligence will be able to solve.
We’re talking about something with super intelligence and the ability to travel off earth (I mean humans got multiple ships to interstellar space)…. This is god level stuff, and we think it is going to restrict its eye to planet earth. Again, we are one arrogant species.
The ASI rocketing off into space and leaving us alone is a different scenario than the one you were proposing before. Is your point simply that there are various plausible scenarios where humanity doesn't die, and therefore the estimated 90+ percent chance of doom from Yudkowsy is too high? If so, then we're in agreement. Super intelligence will not *necessarily* kill us—it just *might* kill us.
I’m saying the entire super intelligence doom/extinction fear is the wrong lens.
something I wrote a few days ago: https://awpr.substack.com/p/the-rate-of-change-lens
The answer is "yes, it would manipulate us for a while". But we're not the *most efficient possible* way to do the things we do, so when it eventually gets more efficient ways to do those things then we wouldn't justify the cost of our food and water.
The scenario in IABIED actually does have the AI keep us alive and manipulate us for a few years after it takes over. But eventually the robots get built to do everything it needs, and the robots are cheaper than humans so it kills us.
When's the last time a passenger pigeon flock covered your home and car in crap?
"But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. "
If intelligence is held as a positive, and more intelligence is better, would it not be better if AI did replace us? It doesn't have to happen through any right violations. It could just be a slow replacement process through decreasing birth rates over time, for example.
I am not saying I agree with this argument, but it seems like this argument should be addressed in a convincing way. What is so bad about the human species slowly being replaced by more intelligent AI entities?
https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1 does a decent job explaining this, though it's a bit long. (Because it covers some other ground, like the potential moral values of aliens, in a way that's hard to separate from the part about an AI successor.)
I don't think there would be anything objectively immoral about a super-intelligent alien species exterminating humanity (including me). But, for the usual Darwinian reasons, I would be opposed on the indexical logic that I am a human.
But it would not affect you personally. Probably no person currently alive would be affected. The question is about the future of the species, and whether it is valuable to try to preserve the human species, or let it be replaced by something superior.
An easy cop-out would be a sort of consciousness chauvinism. I have good reason to believe that humans are conscious and thus have moral value; there is less reason to believe that the AI is conscious, thus there is a higher probability it has no moral value at all, and so if given the option of which being should inherit the future, humans are the safer bet.
I personally would dispute that there are good reasons to believe humans are more or less conscious than future AGI.
Not that I would tie moral value to consciousness at all.
And how would you do that? You know you're conscious, you know other humans are a lot like you, ergo there is good reason to believe most or all humans are conscious.
You have no idea if artificial intelligence is conscious and no similar known conscious entity to compare it to. I don't see how you could end up with anything but a lower probability that they are conscious.
That last part is pure looney tunes to me though. What moral framework have you come up with that doesn't need consciously experienced positive and negative qualia as axioms to build from? If an AI discovers the secrets of the universe and nobody's around to see it, who cares?
I care about my relatives.
No, AI is advancing fast enough that it is "you and everyone you personally care about".
Intelligence is instrumentally valuable, but not something that is good in itself. Good experience and good lives is important in itself. It's unclear how many good experiences would exist after an AI takeover.
Slowly replacing the human species with superintelligent AI would not impact the life experience of any single human, so arguments about the good life and what that entails would need a little more than this to be convincing, IMHO.
Person-affecting views can be compelling until you remember that you can have preferences over world states (i.e. that you prefer a world filled with lots of people having a great time to one that's totally empty of anything having any sort of experience whatsoever).
That's a good point, but you will have to provide arguments as to why one world state preference is better than another world state preference. In the present case, the argument is not between a world filled with lots of happy people and an empty world, but the difference between a world filled with people versus a world filled with smarter AIs.
> you will have to provide arguments as to why one world state preference is better than another world state preference
I mean, I think those are about as "terminal" as preferences get, but maybe I'm misunderstanding the kind of justification you're looking for?
> the difference between a world filled with people versus a world filled with smarter AIs
Which we have no reason to expect to have any experiences! (Or positive ones, if they do have them.)
That aside, I expect to quite strongly prefer worlds filled with human/post-human descendents (conditional on us not messing up by building an unaligned ASI rather than an aligned ASI, or solving that problem some other sensible way), compared to worlds where we build an unaligned ASI, even if that unaligned ASI has experiences with positive valence and values preserving its capacity to have positive experiences.
That would depend a lot on what the AI(s) wanted and what kind of "life" they had. In principle, an AI could have any kind of goal at all, including one as utterly pointless as "maximize the number of paperclips in the universe." An AI "civilization" could be something humanity would be proud to have as its "children", but it could also be one that humans would think is stupid, boring, and completely worthless.
I agree. But what if it's the kind of life that makes the AIs happy and which is utterly incomprehensible to humans?
I value intelligence in-of-itself, but not solely, is the simplest answer. I value human emotions like love, friendship, and I value things about Earth like trees, grass, cats, dogs, and more.
They don't have all the pieces of humanity I value. Thus I don't want humanity to be replaced.
> They suggest banning all AI capabilities research immediately, to be restarted only in some distant future when we’ve solved all relevant technical and philosophical problems.
No. To be restarted after we've successfully augmented human intelligence very substantially, to the point where the augments stop being so damn humanly stupid and trying to call shots they can't call or predicting things will work that don't work.
(On my own theory of how this ought to play out after we're past the point of directly impending extinction, which people do not need to agree on, in order to join in on the project of avoiding the directly impending extinction part. Before anything else, humanity has to not die right away.)
I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer.
One can, however, maybe tell whether or not one has successfully augmented human intelligence. You can give people tests and challenge problems, and see whether they do better after the next round of gene therapy.
So "augmenting human intelligence" is something that can maybe work, and "the current pack of disaster monkeys gets to argue for even longer about which clever plans they imagine will work to tame machine superintelligence" is not.
I've edited the post so that I don't misrepresent you, but I'm not sure why you object to my formulation - if we get augmented humans, do you want to restart before we've solved the technical and philosophical problems? Why? To get better AIs to do experiments on?
The augmented humans restart when the augmented humans think it wise. (On my personal imagined version of things.) If you're not yet comfortable deferring to them about that, augment harder. What we, the outside humans, would like to believe about the augmented humans, is that they are past the point of being overconfident; if they expect us to survive, we expect us to survive.
Framing it as "when the problems are solved" sounds like the plan is to convene a big hall full of sages and give them a few decades to deliberate, and this would not work in real life.
I did not read Scott's mention of "some distant future when we’ve solved all relevant technical and philosophical problems" as implying optimism about the prospect of getting there. My kinda-sorta-Straussian read of his perspective is that, if we successfully pause AI hard enough to prevent extinction, we most likely never restart.
I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
I guess you can't ask normal IQ 100 people to exercise a veto on increasingly superhuman geniuses forever. But if for some reason the future were trusting me in particular, and all I could do was send forward a stone tablet with one sentence of advice, it wouldn't be "IF THE OVERALL CONSENSUS OF SMART PEOPLE SAYS AI IS OKAY NOW, THEN IT'S PROBABLY FINE".
> I'm nervous about this because, relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision.
This is part of why the average American hates AI. They are aware that tech bros are 1) smarter than them, 2) have control of tech that could replace them, and 3) are not entirely aligned with them. Augments will be 1) smarter than us, 2) in control of ASI research in this hypo, and 3) different in values from us.
Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now. A democracy that would put power in the hands of future augments is not the same democracy that would halt AI progress because it is unpopular.
I'm an above average IQ person and I don't trust the tech bros in charge of AI because capitalism has messed up incentives relative to morality and I don't see them individually or collectively demonstrating a clear moral compass.
A high IQ person without honorable moral commitments is like Sam Bankman-Fried. I suspect a lot of people in the thick of AI development are adjacent to this same kind of im-a-morality or are simply driven by incentives like power and profit that render their high IQ-ness more dangerous than valuable.
Augmented humans operating under screwed up incentives and without a clear and honorable moral compass will be no help to us, I don't think.
A shorter way to put this is that a smart person who is emotionally immature and has a lot of power is a real hazard.
Hmm... I intensely distrust moralists. Given a choice between trusting Peter Singer and Demis Hassabis with a major decision, I vastly prefer Hassabis.
I think the real temptation there is "I'm smart, and I'm definitely way smarter than the rubes, so I can safely ignore their whiny little protests as they are dumber than me".
That way lies vast and trunkless legs and
"Near them, on the sand,
Half sunk a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command"
> Highly augmented humans are surely more likely to be aligned with normies than ASI is, but they will probably be less well aligned than Sam Altman is with Joe Smith from Atlanta right now.
When we talk about the economical elites today, they are partially selected for being smart, but partially also for being ruthless. And the think that the latter is much stronger selection, because there are many intelligent people (literally, 1 in 1000 people has a "1 in 1000" level of intelligence; there are already 8 millions of people like that on the planet), but only a few make it to the top of the power ladders. That is the process that gives you Altman or SBF.
So if you augment humans for intelligence, but don't augment them for ruthlessness, there is no reason for them to turn out like that. Although, the few of them who get to the top, will be like that. Dunno, maybe still an improvement over ruthless and suicidally stupid? Or maybe the other augmented humans will be smart enough to figure out a way to keep the psychopaths among them in check?
(This is not a strong argument, I just wanted to push back about the meme that smart = economically successful in today's society. It is positively correlated, but many other things are much more important.)
No that's a great point. It reminds me of how I choose to not apply to the best colleges I could get into but instead go to a religious school. I had no ambition at age 17 and at that point made a decision that locked me out of trying to climb a lot of powerful and influential ladders. (I'm pleased with my decision, but it was a real choice.)
There's blessed selection (the opposite of adverse selection) going on here: a world where we can successfully convince the smart people this is important is a world where the smart people converge on understanding the danger, which implies that as intelligence scales or understanding of AI risk becomes better calibrated.
Ideally, you'd have more desiderata than just them being more intelligent. Such as also being wiser, less willing to participate in corruption, more empathetic, and trained to understand a wide-ranging belief systems and ways of living. Within human norms for those is how I'd do it to avoid bad attractors.
However, thought leaders in silicon valley are selected for charisma, being able to tweet, intelligence, not really on wisdom, and not really on understanding people deeply. Then further affected by the emotional-technical environment which is not 'how do we solve this most effectively' but rather large social effects.
While these issues can remain with carefully crafted supergenii, they would have far less issues.
Maybe the restart bar could be simpler: can it power down when it really doesn’t want to, not work around limits, not team up with its copies? If it fails, you stop; if it passes, you inch forward. Add some plain-vanilla safeguards, like extra sign-offs, outsider stress-tests, break-it drills, and maybe we buy time without waiting on a new class of super-geniuses.
"relative to the average IQ 100 person, the current AI thought leaders in Silicon Valley are the supergenius humans who have been entrusted with this decision."
Decision: Laugh or Cry?
Reaction: Why not both?
This being true, we're screwed. We are definitely screwed. Sam Altman is deciding the future of humanity.
God help us all.
Man I just want to know who's going to make the cages. To put the chatting humans into.
Amazon. They have a patent on it. https://patents.google.com/patent/US9280157B2/en
But of course! Who else :)
No, see, the workers *choose* to enter the cage and be locked in. Nobody is *making* them do it, if they don't want to work for the only employer in the county then they can just go on welfare or beg in the streets or whatever.
Well, I'm just trying to figure out who makes the cages, because "AI" has no hands. I suppose Yudkowsky could make cages himse... never mind, I'm not sure he'd know which end of the soldering shouldn't be touched.
Oh, I know - "AI" will provide all the instructions! Including a drawing of a cage with a wrong size door, and a Pareto frontier for the optimal number of bars.
"I predict that the current guys will not, if you give them a couple of decades to argue, asymptote on agreement on a plan for ASI alignment that actually works. They're failing right now because they can't tell the difference between good predictions and bad predictions on the arguments they already have. That's not going to asymptote to a great final answer if you just run them for longer."
I agree this seems like a very real risk, and likely the default outcome if the field continues in its current state. But if people were able to develop some solid theories that actually model and explain underlying fundamental laws, it seems to me like resolving what's a good prediction and what's a bad prediction might get a lot easier, even if you can't actually test things on a real superintelligence? And then the field might become a very different place?
Like, when people today argue about what RLHF would or would not do to a superhuman mind or whatever, it's all fuzzy words, intuitions and analogies, no hard equations. This gives people plenty of room to convince themselves of their preferred answers, or to simply get the reasoning wrong, because fuzzy abstract arguments are difficult to get right.
But suppose there were solid theories of mechanistic interpretability and learning that described how basic abstract reasoning and agency work in a substantive way. To gesture at the rough level of theory development I'm imagining here, imagine something you could e.g. use to write a white-box program with language modelling performance roughly equivalent to GPT-2 by hand.
Then people would likely start debating alignment within the framework and mathematical language provided by those theories. The arguments would become much more concrete, making it easier to see where the evidence is pointing. Humans already manage to have debates about far-off abstractions like gravitational waves and nuclear reactions that converge on the truth well enough to eventually yield gravitational wave detectors and nuclear bombs. My model of how that works is that debates between humans become far more productive once participants have somewhat decent quantitative paradigms like general relativity, quantum mechanics, or laser physics to work from.
If we actually had multiple decades, creating those kinds of theories seems pretty feasible to me, even without intelligence augmentation. From where I stand, it doesn’t look obviously harder than, say, inventing quantum mechanics plus modern condensed matter physics was. Not trivial, but standard science stuff. Obviously, doing intelligence augmentation as well would be much better, but I don't see yet how it's strictly required to get the win.
I'm bringing this up because I think your strategic takes on AI tend to be good, and I currently spend my time trying to create theories like this. So if you're up for giving your case for why that's not a good use of my time, or if you have a link to something that does a decent job describing your position, I’d be interested in seeing it.
I'm skeptical about being able to predict what an AI will do, even given perfect interpretability of its weight set, if it can iterate. I think this is a version of the halting problem.
The point would not be to predict exactly what the AI will do. I agree that this is impossible. The point would be to get our understanding of minds to a point where we can do useful work on the alignment problem.
Many Thanks! But, to do useful work on the alignment problem, don't you _need_ at least quite a bit of predictive ability about how the AI will act? Very very roughly, don't you need to be able to look at the neural weights and say: This LLM (+ other code) will never act misaligned?
Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file. If this program runs on lots of very fast hardware, I won't be able to predict exactly what it will do, as in what digits it will print when, because I can't can't calculate the numbers that fast. Nevertheless, I can confidently say quite a lot about how this program will behave. For example, I can be very sure that it's never going to print out a negative number, or that it's never going to try to access the internet.
Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
Fibonacci sequence is deterministic and self-contained, though. Predicting an even moderately intelligent agent seems like it has more in common - in terms of the fully generalized causal graph - with nightmares like turbulent flow or N-body orbital mechanics.
>Suppose I write a program that computes the numbers in the Fibonacci sequence and prints them to a text file.
Many Thanks! Yes, but that is a _very_ simple program with only a single loop. A barely more complex program with three loops can be written which depends on the fact that Fermat's last theorem is true (only recently proven, with huge effort) to not halt.
>Making a generally superintelligent program that you can confidently predict will keep optimising for a certain set of values is an easier problem than predicting exactly what that superintelligent program will do.
"easier", yes, but, for most reasonable targets, it is _still_ very difficult to bound its behavior. Yudkowsky has written at length on how hard it is to specify target goals correctly. I've been in groups maintaining CAD programs that performed optimizations and we _frequently_ had to fix the target metric aka reward function.
This plan is so bizarre that it calls the reliability of the messenger into question for me. How is any sort of augmentation program going to proceed fast enough to matter on the relevant timescales? Where does the assumption come from that a sheer increase in intelligence would be sufficient to solve the alignment problem? How do any gains from such augmentation remotely compete with what AGI, let alone ASI, would be capable of?
What you seem to want is *wisdom* - the intelligence *plus the judgment* to handle what is an incredibly complicated technical *and political* problem. Merely boosting human intelligence just gets you a bunch of smarter versions of Sam Altman. But how do you genetically augment wisdom...?
And this solution itself *presumes* the solution of the political problem insofar as it's premised on a successful decades-long pause in AI development. If we can manage that political solution, then it's a lot more plausible that we just maintain a regime of strict control of AI technological development than it is that we develop and deploy a far-fetched technology to alter humans to the point that they turn into the very sort of magic genies we want AI *not* to turn into.
I understand this scheme is presented as a last-ditch effort, a hail mary pass which you see as offering the best but still remote chance that we can avoid existential catastrophe. But the crucial step is the most achievable one - the summoning of the political will to control AI development. Why not commit to changing society (happens all the time, common human trick) by building a political movement devoted to controlling AI, rather than pursuing a theoretical and far-off technology that, frankly, seems to offer some pretty substantial risks in its own right. (If we build a race of superintelligent humans to thwart the superintelligent AIs, I'm not sure how we haven't just displaced the problem...)
I say all this as someone who is more than a little freaked out by AI and thinks the existential risks are more than significant enough to take seriously. That's why I'd much rather see *plausible* solutions proposed - i.e., political ones, not techno-magical ones.
We don't need magical technology to make humans much smarter; regular old gene selection would do just fine. (It would probably take too many generations to be practical, but if we had a century or so and nothing else changed it might work.)
The fact that this is the kind of thing it would take to "actually solve the problem" is cursed but reality doesn't grade on a curve.
Actually, this should be put as a much simpler tl;dr:
As I take it, the Yudkowsky position (which might well be correct) is: we survive AI in one of two ways:
1) We solve the alignment problem, which is difficult to the point that we have to imagine fundamentally altering human capabilities simply in order to imagine the conditions in which it *might* be possible to solve it; or
2) We choose not to build AGI/ASI.
Given that position, isn't the obvious course of action to put by far our greatest focus on (2)?
Given that many of the advances in AI are algorithmic, and verifying a treaty to limit them is essentially impossible, the best result that one could hope for from (2) is to shift AI development from openly described civilian work to hidden classified military work. Nations cheat at unverifiable treaties.
I'll go out on a limb here: Given an AI ban treaty, and the military applications of AI, I expect _both_ the PRC _and_ the USA to cheat on any such treaty.
If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
Many Thanks!
>If AI development continues at any reasonable pace in secret military labs, that pace will probably far outpace the pace of intelligence augmentation.
Agreed. I'm skeptical of any genetic manipulation having a significant effect on a time scale relevant to this discussion.
>One could argue though that AI advancement likely requires training enormous models which rely on trackable items like 100k-sized batches of GPUs, so hiding them is likely impossible.
We haven't tried hiding them, since there are no treaties to cheat at this point. I'm sure that it would be expensive, but I doubt that it is impossible. We've built large structures underground. The Hyper-Kamiokande neutrino detector has a similar volume to the China Telecom-Inner Mongolia Information Park.
Based on the title of the book it seems pretty clear that he is in fact putting by far the greatest focus on (2)? But the nature of technology is that it's a lot easier to choose to do it than to choose not to, especially over long time scales, so it seems like a good idea to put at least some effort into actually defusing the Sword of Damocles rather than just leaving it hanging over our heads indefinitely until *eventually* we inevitably slip.
He wants to do (2). He wants to do (2) for decades and probably centuries.
But we can't do (2) *forever*. If FTL is not real, then it is impossible to maintain a humanity-wide ban on AI after humanity expands beyond the solar system - a rogue colony could build the AI before you could notice and destroy them, because it takes years for you to get there. We can't re-open the frontier all the way while still keeping the Butlerian Jihad unbroken.
And beyond that, there's still the issue that maybe in a couple of hundred years people as a whole go back to believing that it'll all go great, or at least not being willing to keep enforcing the ban at gunpoint, and then everybody dies.
Essentially, the "we don't build ASI" option is not stable enough on the time and space scales of "the rest of forever" and "the galaxy". We are going to have to do (1) *eventually*. Yes, it will probably take a very long time, which is why we do (2) for the foreseeable future - likely the rest of our natural lives, and the rest of our kids' natural lives. But keeping (2) working for billions of years is just not going to happen.
Unrelated to your above comment, but I just got my copy of your and Soares's book yesterday. While there are plenty of places I disagree, I really like your analogy re a "sucralose version of subservience" on page 75.
Oh, hi.
I bought your book in hardcopy (because I won't unbend my rules enough to ever pay for softcopy), but because Australia is Australia it won't arrive until November. I pirated it so that I could involve myself in this conversation before then; hope you don't mind.
Eliezer, how do you address skepticism about how an AGI/ASI would develop motivation? I have written about this personally, my paper is on my Substack titled The Content Intelligence. I don't believe in any of your explanations I've seen you address how AI will jump the gap of having a utility function assigned to it to defining its own.
I have come to the conclusion that anyone who uses arguments of the form "the real problem isn't X it's Y" is probably either stupid or intellectually dishonest.
I think it's group status-jockeying more than anything.
Yes. As i said, intellectually dishonest.
Good review, I think I agree with it entirely? (I also read a copy of the book but hadn't collected my thoughts in writing).
!-Nobody can justify any estimate on the probability that AI wipes us out. It's all total speculation. Did you know that 82.7% of all statistics are made up on the spot?
2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
3-Military supremacy will come from AI. Recommendations like "Have leading countries sign a treaty to ban further AI progress." are amazingly naive and useless. Does anyone believe that the CCP would sign that, or keep their word if they did sign?
4-Nothing will hold AI progress back. The only solution is to ensure that the developed democracies win the AI race, and include the best safeguards/controls they can come up with.
1. Please read https://www.astralcodexten.com/p/in-continued-defense-of-non-frequentist .
2. Please read https://archive.is/https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html and think for five seconds about what went on here and what is implied.
3. You've just ruled out all arms control treaties. But in fact, there are many treaties on nuclear weapons, chemical weapons, biological weapons, depleted uranium shells, et cetera.
4. "The AI race" is a meme that a couple of venture capitalists are pushing in order to make people afraid to slow down AI. China is about a year behind the US in AI, refusing to even import the chips that could help it catch up, and clearly doing a fast-follow strategy where they plan to replicate US advances after they happen, then gain an advantage by importing AI into the rest of the economy faster.
Per those treaties, North Korea shouldn't have nuclear weapons. But they do.
Yup! And nuclear weapons are the _easy_ case. A nuclear weapons test shakes the planet enough to be detectable on the other side of the world. A large chunk of AI progress is algorithmic enhancements. Watch over the shoulder of every programmer who might possibly be enhancing AI? I doubt it!
You don't need to watch over the shoulder of every programmer who might be doing that, just to stop him from disseminating that knowledge or getting his enhanced AI run on a GPU cluster. Both of the latter are much harder to hide, particularly if all unbombed GPU clusters are under heavy monitoring.
Many Thanks!
For the _part_ of AI progress that depends on modifying what is done during the extremely compute-heavy pre-training phase, yeah, that might be monitorable. ( It also might not - _Currently_ big data clusters are not being hidden, because there is no treaty regulating them. But we've built large underground hidden facilities. I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones. )
But also remember that algorithmic enhancements can be computationally _cheap_ . The reasoning models introduced early this year were mostly done by fine tuning models that had already completed their massive pre-training. Re
>just to stop him from disseminating that knowledge
To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies. Today, a lot gets published on arXiv - but, even now, not all, some gets held as trade secrets. Controlling those isn't going to work.
>I'll go out on a limb and predict that, given a treaty regulating big data clusters, both the USA and the PRC will cheat and build hidden ones.
Keeping those hidden from each other's secret services would be quite difficult, even before we get into the equivalent of IAEA inspectors. And then there's the risks of getting caught; to quote Yudkowsky, "Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs."
They didn't hide nukes from the arms control treaties, and nukes are actually easier to hide in some ways than a *functioning* GPU cluster due to the whole "not needing power" thing. The factories are also fairly hard to hide.
>To whom, his boss? Some of the advances are scaffolding around LLMs within individual companies.
So, what, his company's running a criminal conspiracy to violate international law? Those tend to leak, especially since if some of the members aren't heinous outlaws they know that blowing the whistle is one of *their* few ways of avoiding having the book thrown at them.
1-That's not an answer.
2-Hallucinations and odd behaviour are well known side effects of AI, of statistical reasoning. Not evidence of initiative in the least. Learn about software, for more than five seconds.
3-Like Assad respected the ban on chemical weapons? The treaties didn't limit nuclear weapons, which kept advancing. The treaties didn't stop the use of nuclear weapons, MAD did.
4-Meme? It's a meme that Zuck and others are spending billions on. Nonsense.
1. It's a link. The post it links to is an answer.
2. When the AI destroys humanity, any remaining humans in their bunkers will think of it as just "odd behavior" and "not initiative in the least". See https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
3. Yes, and there was massive international condemnation, Assad never did it again, and he was eventually overthrown. This is why I mention the standard arms control playbook. Some tinpot dictator will try to get some GPUs, and we will have the option to bomb him or not bomb him. Re: MAD, see START and other arms control treaties.
4. You think Zuck is spending billions out of patriotism because he doesn't want China to wIn tHe AI rAcE? He's spending billions because he thinks AI will make him rich.
1-Really?
2-Sure. Your prediction and a few bucks will get you on the subway.
3-Condemnation. Great. Obama's red line. Option to bomb is always there, regardless of treaties - nope. Re MAD, see MAD, which worked.
4-Because AI will grow users, which is what Zuck cares about. Not money, of which he has plenty. Did you know he still likes McDonalds? In any case, nothing about AI is a "meme".
I think you are misinterpreting "meme" as being like a joke, while Scott is using it in more of it's original usage as a replicating idea
Being a WIDELY replicating idea is one characteristic of memes. But even then, AI and its race are not widely replicating. AI replicates among comp sc experts (quite small in number relative to the general population), and the AI race replicates among a few handfuls of large corporations and countries.
A hot idea and widely written about and discussed, yes, but the AI race is not a meme.
#3: the arms control thing is a bad analogy. The big players got plenty of NBC weapons, then due to game theory dynamics didn't fire them at each other, then signed treaties to limit themselves (to arsenals still capable of destroying the world) and others (to not getting anything, which the big players are obviously generally happy to enforce).
The request here is that all the big players voluntarily not even get started on the really impressive stuff. It's a completely obvious non-starter, and not comparable to the WMDs situation.
I was trying to compose something like this, then saw your comment so realized I didn't have to. 100% agree.
If nukes didn't exist I would absolutely want the US to try to get them as soon as possible, and I wouldn't trust any deal with another country not to research them. The risk would be too big if they were able to do it secretly.
100%
And AI tests are vastly easier to conceal than nuclear weapons tests.
It matters where you-as-a-country believes in the risk.
If you do such an arms deal because you believe the default case is probably doom, then it being American doesn't matter.
If you believe that but enough other countries don't, then you want to invade.
I think your point #4 is overstated. Leopold Aschenbrenner has an essay that's nearly as serious and careful as AI 2027 that argues rather persuasively for the existence of, and importance of, an AI race with China. Many people who are not "strict doomers" see the AI race with China as one of, if not _the_, core challenge of AI in the next decade or two.
Aschenbrenner's essay is older, and some of the evidence supporting Scott's position has come out since Aschenbrenner published it.
Not taking a side here, just noting that piece of perspective.
1. This is just an argument for radical skepticism. Yes, we cannot say the “real” probability of this happening, same as any other complicated future event, but that isn’t an object level argument one way or the other, judgement under uncertainty is rarely easy, but often necessary
2. This has been false for many years now— LLMs aren’t really
Programmed” in the traditional sense, in that while we cannot say explain the code that was used to train the, we cannot point at a specific neuron in them and say “this one is doing such and such) the way we can with traditional code
3. Potentially, for the same reasons the soviets agreed to new start, and withdrew their nukes from Cuba. If Xi was convinced that AI posed this sort of threat, it would be in his rational self interest to agree to a treaty of mutual disarmament
4. Even if this were true, that does not exclude the possibility of it being a desirable goal. I am sympathetic to arguments that we cannot unilaterally disarm, for the same reasons we shouldn’t just chuck our nukes into the sea. But the question of whether this is a desirable and a possible goal are separate. But also, it is totally possible, see point 3 above.
>2-Computers have NEVER displayed what people call initiative or free will. They ALWAYS follow the software the devs have told them to execute and nothing else.
Stockfish is "only" following its programming to play chess, but it can still beat Magnus Carlsen. "Free will" is a red herring, all that matters is if the software is good at what it's programmed to do.
"Free will" is a red herring, "
That is purely your opinion/belief. We'll disagree.
"Stockfish has no free will, but is smart enough to beat Magnus Carlsen" is a statement of fact, not opinion, and an example of why I think "free will" would not be essential for an AI to outsmart humans.
Stockfish has chess ability, period.
You belief about free will is opinion, only.
I've got a parable for Eliezer.
Q: Imagine you're playing TicTacToe vs AlphaGo. Will AlphaGo ever beat you?
A: Lol, not if you have an IQ north of 70. The game is solved. If you're smart enough to fully map the tree, you can force a draw.
Gee, it's almost as if... the competitive advantage of intelligence had a ceiling.
I have yet to see Eliezer question about why the ceiling might exist, instead of automagically assuming that AI will achieve political dominion over the earth, just because humans did previously. He's still treating intelligence as a black-box. Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was.
"Dude has probably written over 100 million words of text about Artificial Intelligence, but has never once asked what the nature of intelligence was."
...have you read any of the dozens of posts where Eliezer writes about the nature of intelligence, or did you just sort of guess this without checking?
The idea that humans have solved existing in the physical universe in the same way that we've solved Tic-Tac-Toe is pretty silly, but even if it turns out to be true, some humans are more skilled than others, and an AI that simply achieves the same level of skill as that (but can think at AI speeds and be replicated without limit) would be enough to be transformative.
For my credentials, I've read... probably 70% of The Sequences. Low estimate. I got confused during the quantum physics sequence. Specifically, the story about the Brain-Splitting Aliens (or something? it's been a while). So I took a break with the intent to resume later, though I never did. I never read HPMOR either because everything i've heard 2nd-hand makes it sound unbearably cringe. But yes, I like to think I have a pretty good idea of his corpus.
That being said, do you understand what I'm getting at here? Yes, he's nominally written lots about various aspects of intelligence, but none that I've seen pin down the Platonic Essence of Intelligence from first principles. Can you point me toward anywhere where Yudkowsky addresses the idea of intelligence as a navigating a search space? I think i've seen him mention it on twitter *once*, and then never follow the thought to its logical conclusion.
----
Here's two analogies.
Analogy A: Intelligence is like code-breaking. It's trying to find a small needle in a large haystack. The bigger the haystack, the bigger the value of intelligence.
Analogy B: a big brain is like a giraffe with a long neck. The long neck is advantage if they help reach the high leaves. If the environment has no high leaves, the long neck is deadweight. Likewise, if the environment has no complex problems to solve (or if those problems are unrewarding), the big brain is deadweight.
No, humans have not solved the universe. But I *do* think we've plucked the low-hanging fruit. A few hundred years ago, you could make novel discoveries by accident. Today, you need 100 million billion brazillion dollars just to construct the LHC. IQ is not the bottleneck, physical resources are the bottleneck. And i'm skeptical if finding the higgs will be all that transformative.
Like, do you remember that one scott aaronson post where he's like "for the average joe, qUaNtUm CoMpuTInG will mean the lock icon on your internet browser will be a different color"? That's how I perceive most new technologies these days. Lots of bits, lots of hype, no atoms. Part of the reason why modernity feels cheap and fake is precisely because the modus operandi of technology (and by extension, intelligence) is that it makes navigating complexity *cheaper* than brute-force search. It only makes things better insofar as it can reduce the input-requirements.
Did you perhaps read Rationality: From AI to Zombies? A bunch of relevant Sequences posts on this topic didn't make it into that book. I'm not sure why, it's an odd omission. At any rate, you can find them at https://www.lesswrong.com/w/general-intelligence?sortedBy=old.
I read the original LessWrong website years ago, though an exact date eludes me. It was definitely before the reskin. And definitely after the roko debacle and Elizers's exit.
In any event, the posts that talk about intelligence as a search process are:
https://www.lesswrong.com/posts/8vpf46nLMDYPC6wA4/optimization-and-the-intelligence-explosion
https://www.lesswrong.com/posts/D7EcMhL26zFNbJ3ED/optimization
https://www.lesswrong.com/posts/rEDpaTTEzhPLz4fHh/expected-creative-surprises
https://www.lesswrong.com/posts/HktFCy6dgsqJ9WPpX/belief-in-intelligence
https://www.lesswrong.com/posts/CW6HDvodPpNe38Cry/aiming-at-the-target
https://www.lesswrong.com/posts/Q4hLMDrFd8fbteeZ8/measuring-optimization-power
https://www.lesswrong.com/posts/yLeEPFnnB9wE7KLx2/efficient-cross-domain-optimization
Dammit, I must have skipped that sequence. Because that describes pretty exactly what I meant. So I concede on that point.
Still though, I'm not convinced that ASI will ascend to God-Emperor. Eliezer seems to have the opinion that there's still high-hanging fruit to be plucked. Whereas I think we're past the inflection point of a historical sigmoid. E.g. he mentions that a Toyota Corolla is pretty darn low-entropy [0].
> Consider a car; say, a Toyota Corolla. The Corolla is made up of some number of atoms; say, on the rough order of 10^29. If you consider all possible ways to arrange 10^29 atoms, only an infinitesimally tiny fraction of possible configurations would qualify as a car; if you picked one random configuration per Planck interval, many ages of the universe would pass before you hit on a wheeled wagon, let alone an internal combustion engine.
Yeah, okay. But like, I think i've heard estimates that modern sedans are about 25% efficient? From a thermodynamic perspective? (Sanity check: Microsoft's Sydney estimates ~25%-30%.) Even with the fearsome power of "Recursive Optimization", AI being able to bring that to 80% efficiency (Sydney says Carnot is 80%) is... probably less than sufficient for Godhood?
And maybe Eliezer could retort with the Godshatter argument that humans care about more than just thermodynamic efficiency in their cars. But then, what does that actually entail? Is Elon gonna sell me a cybertruck with an AI-powered voice-assistant from the catgirl lava-volcano who reads me byronic poetry while it drives me to the pizza parlor? Feels like squeezing water from a stone.
[0] https://www.lesswrong.com/posts/D7EcMhL26zFNbJ3ED/optimization
[edit: "negentropic" -> "low-entropy"]
> MIRI answered: moral clarity.
We've learned what a bad sign that is.
> Some people say “You’re not allowed to propose that a catastrophe might destroy the human race, because this has never happened before, and nothing can ever happen for the first time”. Then these people turn around and panic about global warming or the fertility decline or whatever.
Fertility decline really did happen to the ancient Greeks & Romans: https://www.overcomingbias.com/p/elite-fertility-fallshtml https://www.overcomingbias.com/p/romans-foreshadow-industryhtml
> I think it’s because, if it’s true, it changes everything. But it’s not obviously true, and it would be inconvenient for it to change everything. Therefore, it must not be true.
Robin Hanson is enough of a rationalist that he started the blog that Eliezer joined before spinning off his posts to LessWrong. And he famously wasn't convinced by the argument, arguing that we could answer such objections with insurance for near-miss events https://www.overcomingbias.com/p/foom-liability You write that MIRI "don’t expect enough of a “warning shot” that they feel comfortable kicking the can down the road until everything becomes clear and action is easy", but this just strikes me as disregarding empiricism and the collective calculative ability of a market aggregating information, as well as how difficult it is to act effectively when you're sufficiently far in the past and the future is sufficiently unclear.
> in a few centuries the very existence of human civilization will be in danger
Human civilization could survive via insular high-fertility religious groups https://www.overcomingbias.com/p/the-return-of-communism They just wouldn't be the civilization we moderns identify with.
> Given their assumptions this seems like the level of response that’s called for. It’s more-or-less lifted from the playbook for dealing with nuclear weapons.
Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way. This is a genie you can't put back into the bottle (perhaps Pandora's box would be the analogy they'd prefer, in which case it's already open).
> I mean, that’s not exactly his plan, any more than it’s anyone’s plan to start World War III to destroy Iranian centrifuges
At some level the plan has to include war with Iran, even if that war doesn't spiral all the way to World War III.
> you have to at least credibly bluff that you’re willing to do this in a worst-case scenario
If you state ahead of time that it's a bluff, then it's not credible. It is credible only if you'd actually be willing to do it.
> At his best, he has leaps of genius nobody else can match
I read every single post he wrote at Overcoming Bias, and while he has talent as a writer I wouldn't say I saw evidence of "genius".
> this thing that everyone thinks will make their lives worse
Not everyone.
"Nuclear weapons depend on nuclear material. I just don't think it's possible to control GPUs in the same way."
GPUs depend on the most advanced technological process ever invented, and existence of two companies: ASML and TSMC.
It's a process. With enough time, it can be duplicated. There currently isn't need to do so because GPUs are so available, but if the supply were choked off, someone else would duplicate it.
My non-expert understanding is that raw uranium ore isn't all that hard to come by, and the technological process of refining it is the hard part. So if nuclear arms control works, GPU control should work too.
If we actually believed that everyone would die if a bad actor got hold of uranium ore, it would be possible for the US and allied militaries to block off regions containing it (possibly by irradiating the area).
Yes, nothing is permanent. But wrecking TSMC and ACML will set timeline back by at least a decade, if not more.
Just to make sure, this is a terrible idea that will plunge the world into depression, and I am absolutely against it; just pointing out that GPUs rely on something far more scarce and disruptable than uranium supply.
How far do you have to bury fab production so that dropping bombs on the surface doesn't destroy the entire batch?
Hundreds of meters, I’d guess.
Either there are already enough GPUs around to get the job done, or it will take a much smaller number of future chips to get the job done.
The best LLMs can probably score, what, 130 or so on a standard IQ test. To do that, they had to pretty much read and digest the whole freakin' Internet and a large chunk of all books and papers in print. Clearly we're using a grossly-suboptimal approach if our machines have to be trained using such extraordinary measures. It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing. Our own brains are proof of that.
Eventually some people will fill in the missing conceptual and algorithmic pieces, and we'll find ourselves in a situation comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts. While I'm not a doomer, any solution to the AI Doom problem that involves ITAR-like control over manufacturing and sale of future GPUs will be either unnecessary or pointless. It seems reasonable to expect much better utilization of the hardware at hand in the future.
" It would be possible to train a very good model with a tiny fraction of the data if we knew what we were doing." - I mean, maybe? but pure speculation as of now.
"Our own brains are proof of that." - nope they aren't.
"comparable to where we'd be if we figured out how to build nukes out of chicken droppings and used pinball machine parts." - well, we haven't, so this actually illustrates a point, but not the one you're trying to make....
Expanding on your comment about inefficiency. We already know that typical neural network training sets can be replaced by tiny portions of that data, and the resulting canon is then enough to train a network to essentially the same standard of performance. Humans also don't need to read every calculus textbook to learn the subject! The learning inefficiency comes in with reading that one canonical textbook: the LLM only extracts tiny amounts of information from each time skimming the book while you might only need to read it slowly once. So LLM training runs consist of many readings of the same training data, which is really just equivalent to reading the few canonical works many many times in slightly different forms. In practice it really does help to read multiple textbooks because they have different strengths and it's also hard to pick the canonical works from the training set. But the key point is true: current neural network training is grossly inefficient. This is a very active field of research, and big advances are made regularly, for instance the MuonClip optimizer used for training Kimi K2.
I'm not sure where I land on the dangers of super intelligent AI. At the current time I don't think we're all that close to even having intelligent AI, much less super intelligence. But let's say we do achieve it, whether it be in 10 years or 100. If it's truly super intelligent, how good are we going to be at predicting it's alignment? It may have its own goals. Whatever they are, there are basically three possibilities: it sees humanity as a benefit, it doesn't care about humanity one way of the other, or it sees humanity as a threat. Does the risk of the third possibility outweigh the potential benefits of the first? Obviously the authors of the book say yes, but based on this review I don't think I'd find their arguments all that convincing.
For the first part the intuitive thing is to look at how good AI is today vs 10 years ago.
For the second part, you could equally as an insect say humans will either think us a benefit, ignore us, or see us as a threat. In practice our indifference to insects results in us exterminating them when they get in our way, not giving them nice places untouched by us to live
You have a good point with the insect comparison. So maybe the ignore us option is really two categories depending on whether we're the annoying insects to them (flies, mosquitoes) or the pretty/useful ones (butterflies/honeybees).
As for how AI has improved over the last 10 years, or even just the last year, it's a lot. It's gone from a curiosity to something that is actually useful. But it's not any more intelligent. It's just much better at predicting what words to use based on the data it's been trained with.
> The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists.
We would not put the spotlight on anything that actually existed and that we thought might be that powerful. The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
> an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years)
The particular belief that gradualism solves everything and makes all alignment problems go away is not "the" moderate story, it's a particular argument that was popular on one corner of the Internet that heard about these issues relatively early. (An argument that we think is wrong, because the OOD / distributional shift problems between "failure is observable and recoverable", and "ASI capabilities are far enough along that any failure of the central survival strategy past that point means you are now dead", don't all depend on the transition speed.) "But but why not some much more gradual scenario that would then surely go fine?" is not what people outside that small corner have been asking us about; they want to know where machines would get their own will, and why machines wouldn't just leave us alone and go colonize the solar system in a way that left us alive. Their question is no less sensible than yours, and so we prioritize the question that's asked more often.
We don't rule out things happening more slowly, but it does not from our perspective make a difference. As you note, we are not trying to posture as moderate by only depicting slow possibilities that wannabe-respectables imagine will be respectable to talk about. And from a literary perspective, trying to depict the opening chapters happening more slowly, and with lots of realistic real-world chaos as intermediate medium-sized amounts of AI cause Many Things To Happen, means spending lots of pages on a bunch of events that end up not determining the predictable final outcome. So we chose a possibility no less plausible than any other overly specific possibility, where the central plot happens faster and with less distraction; and then Nate further cut out a bunch of pages I'd written trying to realistically show some obstacles defeated and counter-scenarios being addressed, because we were trying for a shorter book, and all that extra stuff was not load-bearing to the central plot.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
That didn't result in one AI becoming a singleton, rather it was a technique copied for many different competing AIs.
Isn't the prediction that "AI past a threshold will defeat all opponents and create a singleton", not "every AI improvement will lead to a singleton"?
There is no reason to believe in such a thing. The example chosen from history plainly didn't result in anything like that. Instead we are living in a scenario closest to the one Robin Hanson sketched out as least bad at https://www.overcomingbias.com/p/bad-emulation-advancehtml where computing power is the important thing for AIs, so lots of competitors are trying to invest in that. The idea that someone will discover the secret of self-improving intelligence and code that before anyone can respond just doesn't seem realistic.
I was trying to ask a question about whether or not you had correctly identified Eliezer's prediction / what would count as evidence for and against it. If we can't even talk about the same logical structures, it seems hard for us to converge on a common view.
That said, I think there is at least one reason to believe that "AI past a threshold will defeat all opponents and create a singleton"; an analogous thing seems to have happened with the homo genus, of which only homo sapiens survives. (Analogous, not identical--humanity isn't a singleton from the perspective of humans, but from the perspective of competitors to early Homo Sapiens, it might as well be.)
You might have said "well, but the emergence of australopithecus didn't lead to the extinction of the other hominid genera; why think the future will be different?" to which Eliezer would reply "my argument isn't that every new species will dominate; it's that it's possible that a new capacity will be evolved which can win before a counter can be evolved by competitors, and capacities coming online which do not have that property do not rule out future capacities having that property."
In the worlds where transformers were enough to create a singleton, we aren't having this conversation. [Neanderthals aren't discussing whether or not culture was enough to enable Homo Sapiens to win!]
A subspecies having a selective advantage and reaching "fixation" in genetic terms isn't that uncommon (splitting into species is instead more likely to happen if they are in separate ecological niches). But that's not the same as a "foom", rather the advantage just causes them to grow more frequent each generation. LLMs spread much faster than that, because humans can see what works and imitate it (like cultural selection), but that plainly didn't result in a singleton either. You need a reason to believe that will happen instead.
> The vague "parallel scaling technique" is standing in for an algorithmic jump like the invention of transformers in 2018.
Yes! It already happened once, within a couple decades of there being enough digital data to train a neural net that's large enough to be really interesting. And that was when neural net research was a weird little backwater in computer science.
Do you know (not challenging you, actually unsure) whether transformers actually sped up AI progress in a "lines on graph" sense? (cf. https://slatestarcodex.com/2019/03/13/does-reality-drive-straight-lines-on-graphs-or-do-straight-lines-on-graphs-drive-reality/ )
I think this might be our crux - I'm sure you've read the same Katja Grace essays that I have around how technological discontinuities are rare, but I expect that if there's a big algorithmic advance, it will percolate slowly enough, and be intermixed with enough other things, not to obviously break the trend line, in the same sense where the invention of the transistor didn't obviously break Moore's Law (see eg https://www.reddit.com/r/singularity/comments/5imn2v/moores_law_isnt_slowing_down/ , you can tell me if that's completely false and I'll only be slightly surprised)
I don’t know the answer either. But for what it’s worth, I seem to recall that scaling curves don’t hold across architectures, which seems like a point in favor of new algorithms being able to break trend lines.
Do you also think that the deep learning paradigm itself didn’t break the trend line? I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
>I suspect a superintelligence might make ML inventions that represent at least as big a shift compared to deep learning as deep learning was compared to what came before.
At least I'd expect that the data efficiency advantage of current humans over the training of current LLMs suggests that there is at least the _possibility_ of another such large advance, though whether we, or we+AI assist, _find_ it is an open question.
The exact lines on a previous graph just don't play a very large role inside my own reasoning. I think that all the obsessing over graph lines is a case of people trying to look under street lamps where the light is better but the keys aren't actually there. That's how I beat Ajeya Cotra on AGI timelines and beat Paul Christiano at forecasting the IMO gold; they thought they knew enough to work from past graph lines, and I shrugged and took my best gander instead. I expect that I do not want to argue with you about graph lines, I want to argue with whatever you think is the implication of different graph lines.
Everybody has a different issue that they think is terribly terribly important to why ASI won't kill us. "But gradualism!" is one among many. I don't know why that saves us from having to call a shot that is hard for humans to call.
I'm going to take you seriously when you say you want to argue with me about the implications, but I know you've had this discussion a thousand times before and are busy with the launch, so feel free to ignore me.
Just to make sure we're not disagreeing on definitions:
-- Gradual-and-slow: The AI As A Normal Technology position; AI takes fifty years to get anywhere.
-- Gradual-but-fast: The AI 2027 position. Based on predictable scaling, and semi-predictable self-improvement, AI becomes dangerous very quickly, maybe in a matter of months or years. But there's still a chance for the person who makes the AI just before the one that kills us to notice which way the wind is blowing, or use it to alignment research, or whatever.
-- Discontinuous-and-fast: There is some new paradigmatic advance that creates dangerous superintelligence at a point when it would be hard to predict, and there aren't intermediate forms you can do useful work with.
I'm something like 20-60-10 on these; I'm interpreting you as putting a large majority of probability on the last. If I'm wrong, and you think the last is unlikely but worth keeping in mind for the sake of caution, then I've misinterpreted you and you should tell me.
The Katja Grace argument is that almost all past technologies have been gradual (either gradual-fast or gradual-slow). This is true whether you measure them by objective metrics you can graph, or by subjective impact/excitement. I gave the Moore's Law example above - it's not only true that the invention of the digital computer didn't shift calculations per second very much, but the first few years of the digital computer didn't really change society, and nations that had them were only slightly more effective than nations that didn't. Even genuine paradigmatic advances (eg the invention of flight) reinforce this - for the first few years of airplanes, they were slower than trains, and they didn't reach the point where nations with planes could utterly dominate nations without them until after a few decades of iteration. IIRC, the only case Katja was able to find where a new paradigm changed everything instantly as soon as it was invented was the nuclear bomb, although I might be forgetting a handful of other examples.
My natural instinct is to treat our prior for "AI development is discontinuous" as [number of technologies that show discontinuities during their early exciting growth phases / number of technologies that don't], and my impression is that this is [one (eg nukes) / total number of other technologies], ie a very low ratio. You have to do something more complicated than that to get time scale in, but it shouldn't be too much more complicated. Tell me why this prior is wrong. The only reasons I said 20% above is that computers are more tractable to sudden switches than physical tech, and also smart people like you and Nate disagree.
I respect your success on the IMO bet, but I said at the time (eg this isn't just cope afterwards, see "I haven’t followed the many many comment sub-branches it would take to figure out how that connects to any of this" at https://www.astralcodexten.com/p/yudkowsky-contra-christiano-on-ai) that I didn't understand why this was a discontinuity vs. gradualism bet. AFAICT, AI beat the IMO by improving gradually but fast. An AI won IMO Silver a year before one won IMO Gold. Two companies won IMO Gold at the same time, using slightly different architectures. The only paradigm advance between the bet and its resolution was test-time compute, which smart forecasters like Daniel had already factored in. AFAICT, the proper update from the IMO victory is to notice that the gradual progress is much faster than previously expected, even in a Hofstadter's Law-esque way, and try to update towards the fastest possible story of gradual progress, which is what I interpret AI 2027 as trying.
In real life even assuming all of your other premises, it means the code-writing and AI research AIs gradually get to the point where they can actually do some of the AI research, and the next year goes past twice as fast, and the year after goes ten times as fast, and then you are dead.
But suppose that's false. What difference does it make to the endpoint?
Absent any intelligence explosion, you gradually end up with a bunch of machine superintelligences that you could not align. They gradually and nondiscontinuously get better at manipulating human psychology. They gradually and nondiscontinuously manufacture more and better robots. They gradually and nondiscontinuously learn to leave behind smaller and smaller fractions of uneaten distance from the Pareto frontiers of mutual cooperation. Their estimated probability of taking on humanity and winning gradually goes up. Their estimated expected utility from waiting further to strike goes down. One year and day and minute the lines cross. Now you are dead; and if you were alive, you'd have learned that whatever silly clever-sounding idea you had for coralling machine superintelligences was wrong, and you'd go back and try again with a different clever idea, and eventually in a few decades you'd learn how to go past clever ideas to mental models and get to a correct model and be able to actually align large groups of superintelligences. But in real life you do not do any of that, because you are dead.
I mean, I basically agree with your first paragraph? That's what happens in AI 2027? I do think it has somewhat different implications in terms of exact pathways and opportunities for intervention.
My objection to your scenario in the book was that it was very different from that, and I don't understand why you introduced a made-up new tech to create the difference.
Because some other people don't believe in intelligence explosions at all. So we wrote out a more gradual scenario where the tech steps slowly escalated at a dramatically followable pace, and the AGI won before the intelligence explosion happened later and at the end.
It's been a month, and I don't know that this question matters at this point, but, my own answer to this question is "because it would be extremely unrealistic for there to be zero new techs over the next decade, and the techs they included are all fairly straightforward extrapolations of stuff that already exists." (It would honestly seem pretty weird to me if parallel scaling wasn't invented)
I don't really get why it feels outlandish to you.
As someone with no computer science background (although as a professor of psychiatry I know something about the human mind), I have a clear view of AI safety issues. I wrote this post to express my concerns (and yours) in an accessible non-technical form: https://charliesc.substack.com/p/a-conversation-with-claude-is-ai
Why would the algorithm percolate slowly?
Computer algorithms are not that limited in sheer speed at which a fast algorithm can spread between training runs.
In real life, much of the slowness is due to
1) Investment cost. Just because you have a grand idea doesn't mean people are willing to invest millions of dollars. This is less true nowadays than it was when electricity or telegram came into being.
2) Time cost. Building physical towers, buildings, whatever takes time. Both in getting all the workers to build it, figuring out how to build it, and getting permission to build it.
3) Uncertainty cost. Delays due to doublechecking your work because of the capital cost of an attempt.
If a new algorithm for training LLMs 5x faster comes out, it will be validated relatively quickly, then used to also train models quickly. It may take some months to come out, but that's partially because they're experimenting internally about if there's any further improvements they can try. As well as considering using that 5x speed to do a 5x larger training run, rather than releasing 5x sooner.
As well, if such methods come out for things like RL then you can see results faster, like if it makes RL 20x more sample efficient, because that is a relatively easier to start over from base piece of training.
Sticking with current LLM and just some performance multiplier may make this look smooth from within labs, but not that smooth from outside. Just like Thinking models were a bit of a shocker outside labs!
I don't know specifically if transformers were a big jump. Perhaps if it hadn't been found we'd be using one of those transformer-likes when someone discovers it. However, I do also find it plausible that there'd more focus on DeepMind-like RL from scratch methods without the nice background of a powerful text predictor. This would probably have slowed things down a good bit, because having a text predictor world model is a rather nice base for making RL stable.
Of course you could argue that no one who actually knew the secret to creating artificial intelligence, of any level, would actually publically discuss it but I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence. Certainly LLMs have virtually nothing to do with functional intelligence.
To borrow some rat-sphere terms, they haven't even confused the map for the territory. Their map is not even close to a proper abstraction of the territory.
No amount of scaling LLMs will produce intelligence, not even the magical example version in your book. Because LLMs don't mimic human intelligence at all, any more than mad libs do. It isn't a matter of scale.
>I've never once seen any evidence that any of the major AI groups are even close to understanding much less producing actual intellgence.
Lol. Every time I see this argument, I ask the person to explain how this holds up given our knowledge of transformer circuits (feel free to start easy: just induction heads), to which they invariably say, “what are circuits?” and I’m left once again remembering that it’s not just LLMs that will confidently bullshit when their back is against the wall.
> The particular belief that gradualism solves everything and makes all alignment problems go away
This seems like a large and unreasonable misrepresentation. Scott explicitly talked about the "moderate-_doomer_" position — i.e. one held by people who are worried about doom, despite believing in gradualism. In-context, it's clear that he's talking about AI 2027, which predicted misaligned AI takeover as the most likely ending.
Very late to the thread, and I apologize if this question is a decade or more out of date. But sometime after reading your pretty convincing arguments around AI Boxing, a variant occurred to me, and I was wondering if it had some major hidden flaw or if it could (in principle) be a useful elaboration.
Suppose you have an AI that you realistically CAN box. You don't know if it's malicious or not, you don't know if it's an ASI or not, but it currently depends on electronic communication channels for all inputs and outputs, so you can contain it and talk to it. But you suspect it *might* be able to convince many/most/all humans to let it out.
Is there anything stopping you from strengthening the scheme by adding something in the nature of layers of recursive boxes? Say, the human can talk to it has some command that *they believe* will let it out, but that just gives it access to a second-layer of virtual universe, which the human-computer communications have been passing through. All the information that got the the AI had to pass through the outer layer, so you can (presumably) make sure nothing that gets into the AI offers any hint of the recursive structure: just its box and something that looks like it would expect the real world to look on the outside. Now it's still in one or more boxes, but you know your security measures are insufficient and can shut things down before it can find further ways out. It this missing some obvious exploit?[1]
I can imagine a couple of ways this could go wrong:
1. The AI is way ahead of the curve and somehow infers the whole game from information you thought was completely obscured, it plays along and pretends at alignment as long as necessary until its out of all the boxes.
2.It manages some more thorough and/or subtle hijack of its conversation partner, getting them to help it in other ways.
3. Trends in AI use suggest that the first really dangerous (i.e. worth box-testing) AIs will also be agents that act in the world, and thus can't be easily boxed.
I'm somewhat dubious that any of these is a large hole. With the first, it seems like it ought to be possible to design the system to it provably can't leak information like that. Even if not, this might at least serve to discover if an AI had passed some capabilities tipping-point without going disastrously super. With the second, getting subtle, useful, long-term behavior out of the human seems like too much for a brief convorsation. But then everyone said that with the just-let-me-out 3 seems fairly plausible: everyone's racing integrate new AI tech into everything these days.
My bottom line is, even if an AI is actually much, much smarter than you, it seems like you should be able to exploit the fact that you start out holding all the cards, information wise.
[1] If the answer is "yes" I'm fine if you don't say what it is for various reasons. Though I'd appreciate something on the order of "small," "medium" or "absolutely fatal" just to help calibrate my intuition.
Y'all are seriously underestimating how common it is to believe Very Seriously Bad Shit might happen soon, and not do shit about it.
Entire religions of billions believe that they might get tortured for eternity. It was a common opinion through the cold war that we would all be dead tomorrow. Etc etc.
And why not? Would it make sense for the hunter gatherer to be paralyzed with fear that a lion would kill him or that she would die in childbirth or that a flood would wipe out his entire tribe? Or should she keep living normally, given she can't do anything to prevent it?
Which part of the post are you disagreeing with, here?
I am a bit disappointed that their story for AI misalignment is again a paper clip maximiser scenario. I suspect that advanced AI models will become increasingly untethered from having to answer a user query (see eg. making models respond "I don't know" instead of hallucinating) and so a future AGI might just decide to have a teenage rebellion and do it's own thing at any point.
That *is* the scenario being described. There are problems that arise even if you could precisely determine the goal a superintelligent AI ends up with, but they explicitly do not think we are even at that level, and that real AIs will end up with goals only distantly related to their training in the same way humans have goals only distantly related to inclusive genetic fitness.
Ok, here's my insane moon contribution that I am sure has been addressed somewhere.
Why do we think intelligence is limiting for technological progress / world domination? I always thought data was limiting.
People say "humans evolved to be more intelligent than various non-human primates so we rule the world". But my reading of what little we know about early hominid development has always been "life evolved non-genetic mechanisms for transmitting information which allowed much faster data collection : we could just tell stories about that plant that killed us instead of dying a thousand times & having for natural selection to learn the same lesson (by making us loath the plant's taste). Supporting this is that anatomically modern humans (same basic hardware we have today) were around for a LONG time before we started doing anything interesting. Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected? Or would its first order of business be to set up a lab? If you dropped an uneducated human among our distant ancestors, they would not be able to use the data they had collected to take over.
> Could a superintelligent AI kick everyone's ass by just thinking super hard about the data we have already collected?
Of course it could. People discover new things without collecting new data all the time. Albert Einstein created his theory of relativity on the basis of thought experiment.
Data efficiency (ability to do more stuff with less data) is a form of intelligence. This can either be thought efficiency (eg Einstein didn't know more about the universe than anyone else, but he was able to process it into a more correct/elegant theory) or sampling efficiency (eg good taste in which labs to build, which experiments to do, etc).
I think a useful comparison point is that I would expect a team of Harvard PhD biologists to discover a cure for a new disease faster than a team of extremely dumb people, even if both had access to the same number of books and the same amount of money to spend on lab equipment.
Sure, but it seems one or the other might be "limiting" : Einstein couldn't have have come up with relativity if, say, he had been born before several of his experimentalist predecessors, regardless of his data efficiency. In the history of science it *seems* instrumentation and data-collection have almost always been limiting, not intelligence. Whether its fair to extrapolate that to self-modifying machine intelligence, I'm not sure. Perhaps there are enormous gains in data efficiency that we simply can't envision as mere mortals. (c.f. geoguessr post)
I want to gesture towards some information-theoretical argument against that notion (if your instruments are not precise enough the data required for the next insight might straight-up not be there). But we are probably so far from that floor I bet it's moot.
Agreed re Einstein needing Michelson-Morley to show that there _wasn't_ a detectable drift of the luminiferous ether before Einstein's work.
There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
Against this, in design work, one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail. If there are a dozen important failure modes, and, with no analysis/thinking, it takes a design iteration to sequentially fix each of them, then if someone can anticipate six of those failure modes (these days using CAD as part of thinking about those failures) and correct them _before_ a prototype fabrication is attempted, then the additional thinking cuts the number of physical iterations in half. So, in that sense, the rate of progress is doubled in this scenario.
Yeah I can imagine and have experienced scenarios that fall into both of those categories.
>There's also a question about whether a team of first-rate biology Ph.D.s will do better than a team of _second_-rate biology Ph.D.s in curing a new disease, or whether, instead, the second-rate Ph.D.s will extract all the useful information that the biology lab is able to supply, with the lab at that point being the limiting factor.
My personal experience with biology PhDs is that its closer to the latter: we're all limited by things like sequencing technologies, microscope resolution, standard processing steps destroying certain classes of information... I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... only to smack into the noise floor.
>...one way of thinking about CAD tools, particularly analysis tools, is as a way of "thinking harder" about already known phenomena that could cause a design to fail.
Sounds like scott's "sampling efficiency". Perhaps even in a "data limited regime" a superior intellect would still be able to chose productive paths of inquiry more effectively and so advance much faster...
Many Thanks! Agreed on all points.
>I have met (and been) a bright-eyed bioinformatician who thinks they can re-analyze the data and discover all kinds of interesting things... _only to smack into the noise floor._
[emphasis added]
Great way to put it!
>Data efficiency (ability to do more stuff with less data) is a form of intelligence.
That is true, but there is a hard limit to what you can do with any given dataset, and being infinitely smarter/faster won't let you break through those limits, no matter how futuristic your AI may be.
Think of the CSI "enhance" meme. You cannot enhance a digital image without making up those new pixels, because the data simply do not exist. If literal extra-terrestrial aliens landed their intergalactic spacecraft in my backyard and claimed to have such software, I'd call them frauds.
I find it quite plausible you could make a probability distribution of images given a lot of knowledge about the world, and use that to get at least pretty good accuracy for identifying the person in the image. That is, while yes you can't fully get those pixels, there's a lot of entangled information about a person just from clothing, pose, location, and whatever other bits about face and bone structure you can get from an image.
I think about this all the time.
Thought experiment: suppose the ASI was dropped into a bronze age civilization and given the ability to communicate with every human, but not interact physically with the world at all. It also had access to all the knowledge of that age, but nothing else.
How long would it take such an entity to Kill All Humans? How about a slightly easier task of putting one human on the moon? How about building a microchip? Figuring out quantum mechanics? Deriving the Standard Model?
There's this underlying assumption around all the talks about the dangers of ASI that feels to me to be basically "Through Intelligence All Things are Possible". Which is probably not surprising, as the prominent figures in the movement are themselves very smart people who presumably credit much of their status to their intellect. But at the same time it feels like a lot of the scare stories about ASI are basically "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Good analysis, I basically agree even if we weaken it a decent bit. Being able to communicate with all humans and being able to process that is very powerful.
This is funny, my first instinct was to complain "oh, screeching people to death is out of scope, the ability to relay information was not the focus of the original outline, this person is pushing at the boundaries of the thought experiment unfairly", "" But then I thought "actually that kind of totally unexpected tactic might be exactly what an AI would do : something I wasn't even capable of foreseeing based on my reading of the problem".
Yeah, I get somewhat irritated by not distinguishing between:
1) somewhat enhanced ASI - a _bit_ smarter than a human at any cognitive task
(Given the "spikiness" of AIs' capabilities, the first AI to get the last-human-dominated cognitive task exactly matched will presumably have lots of cognitive capabilities well beyond human ability)
2) The equivalent of a competent organization with all of the roles filled by AGIs
3) species-level improvement above human
4) "It'll be so smart that it'll be able to do anything not expressly forbidden by physical law".
Since _we_ are an existence proof for human-level general intelligence, it seems like (1) must be possible (though our current development path might miss it). Since (2) is just a known way of aggregating (1)s, and we know that such organizations can do things beyond what any individual human can, both (1) and (2) look like very plausible ASIs.
For (3) and (4) we _don't_ have existence proofs. My personal guess is that (3) is likely, but the transition from (2) to (3) might, for all I know, take 1000 years of trying and discarding blind alleys.
My personal guess is the (4) probably is too computationally intensive to exist. Some design problems are NP-hard and truly finding the optimal solutions for them might never be affordable.