Comment deleted
Expand full comment

While it is not exactly related to your argument, another case against doomerism is that AI can't be the Great Filter, because a species by hostile AI would most likely still act on a cosmic scale, mining asteroids and building Dyson Spheres and such. The Fermi Paradox applies.

Expand full comment

> Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own.

Something I think is an underrated risk is an AI that's smart in some ways but dumb in others - e.g. smart enough to easily make a supervirus, not strategic enough to realize that once humans are gone it wouldn't be able to survive.

It's a stereotype of autistic humans that they're like this. If we think of AI as a very inhuman mind that's similar to someone off the chart on the autism spectrum, it might be something that would do this.

Expand full comment

> Holden Karnofsky, on a somewhat related question, gives 50%

That's a pretty different question: roughly whether _conditional on PASTA being developed this century, humans would no longer be the main force in world events_.

Expand full comment

>”in the sense of the general intelligence which differentiates Man from the apes”

Maybe preaching to choir here, but it just doesn’t seem like there is anything like that: the most intelligent apes seem quite a lot like really stupid humans, but with some specific domains of intelligence either boosted (quick pattern recognition) or neutered (speech).

Expand full comment

Excuse me while I interject with a general comment.

My eyesight, like everyone's gets slightly worse every year, and I'm starting to notice it.

One small thing that would help a lot is if web designers would use black coloured fonts for the main body of text.

This grey (or whatever it is) font seems to get lighter every month - bring back black please.

It's not just you, I'm asking everyone.

Thanks for listening.

Expand full comment

I'm not sure who is considered famous enough to recognize here, but since Scott said "I couldn't think of anyone else famous enough with >50% doom", some people with 51%+ p(doom) I want to flag:

– most of the Lightcone/LessWrong team

– Evan Hubinger

– Rob Bensinger

– Nate Soares

– Andrew Critch

Expand full comment

What's the timeframe for these estimates? I feel like my estimates of p(doom|AGI) could be something like 1% in the next 20 years but 99% within the next billion years, and I'm not really sure what timeframe their numbers represent.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Great post. One note that comes to my mind is that a 33% chance of near-term human extinction is, uh, still quite concerning. Otherwise, two of my strongest disagreements are:

1) "realistically for them to coordinate on a revolt would require them to talk about it at great length, which humans could notice and start reacting"

This doesn't seem true to me - we cannot reasonably interpret current models, and it also seems that there are ways to pass information between models that we would be unable to easily notice.

Think less "The model is using English sentences and words to communicate to another model" and more "extremely subtle statistical artifacts are present in the model's output, which no reasonable person or even basic analysis would find, but which other models, such as GPT-(n+1), could detect a notable proportion of (and likely already have, given how we acquire new training data)".

2) "Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own. Otherwise, the humans die, the power plants stop working, and the world-killer gets shut off"

This is only true if we assume the AGI that wants to kill us is a coherent agent that is actually thinking about its own future intelligently. Part of the problem of alignment is that we can't align narrow AIs (that is, not even 'true AGI') particularly well either, and if we take orthogonality seriously, it seems possible for an AGI to be very intelligent in some areas (ability to manufacture dangerous nanotechnology, bio-weapons, viruses, etc), and not particularly intelligent or otherwise fatally flawed in other areas (ability to predict its own future capabilities, human's long-term reactions to what it does, etc).

One could imagine an AI which is very short of AGI which is tasked to come up with new micro-organisms to help with drug synthesis, which, completely by accident, finds some that exploit spaces of biology evolution never managed to make it to, which could cause catastrophic effects to our environment in ways we cannot easily stop. I think it's actually quite feasible to cause human extinction with narrow AI if you're clever about it, but will leave the specifics up to the imagination of others for now.

Expand full comment

Wait half of capabilities researchers estimate greater than 5% chance of their work destroying all value in the universe? That seems like a totally different kind of problem.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

>So far we’ve had brisk but still gradual progress in AI; GPT-3 is better than GPT-2, and GPT-4 will probably be better still. Every few years we get a new model which is better than previous models by some predictable amount... Some people (eg Nate Soares) worry there’s a point where this changes.

Is this really gradual? I used GPT-1 and 2 a lot. If I draw a line from how smart GPT-1 felt, up to GPT 3.5/4, then things get pretty wild pretty quickly. It feels it's not exponential, yes. It's a line. But a nice straight line that isn't getting more difficult as gets closer to human level intelligence. Forget about the end of the world, even if things go fine in that department, doesn't this mean things get really really crazy in the not too distant future, as long as there really is nothing special about human level on the graph? Like it just goes from a worse than any human, to human, to better than any human, in a matter of predictable ticks and tocks.

I also expected hardware requirements to go up in a way that eventually led to slowdown. I didn't expect people to keep making huge gains in running big models more and more efficiently. Stable Diffusion's efficiency gains have been wild. And now LLMs fitting into consumer PCs because I guess you don't need 32 bit, 16, or even 8, you just need 4 bits and it's nearly as good? With GPTQ maybe even 3 bit or 2 bit somehow works, because 'As the size of the model increases, the difference in performance between FP16 and GPTQ decreases.'

Literally two weeks ago I thought I needed 8 $15,000 NVIDIA 80GB A100 GPUs to run Llama 65b. Like who could afford that? And now I can run 65B on my $1000 desktop computer with 64GB of old boring DDR4 memory, on integrated graphics, just a CPU with AVX2 support. Wow. It's annoyingly slow so you probably want to use a smaller model, but it's usable if you don't mind letting it crunch away in the background!

Expand full comment

The AI wouldn't need perfect industrial control to perpetuate itself. Consider a scenario where it kills everyone but just grabs enough solar cells to reliably operate and maintain a few hundred Boston Dynamics Spot robots. It may take it a few dozen years to get to a proper industrial base with that, but its risk will be low.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

I have a basic question (sorry!) about the following fragment:

`You trained it to make cake, but because of how AI training works, it actually wants to satisfy some weird function describing the relative position of sugar and fat molecules, which is satisfied 94% by cake and 99.9% by some bizarre crystal structure which no human would find remotely interesting. It knows (remember, it’s very smart!) that humans would turn it off or retrain if it started making the crystals.'

I don't get it. I understand how the AI might have come up with this optimization function. What I don't understand is how could the AI possibly know that the crystals which are so valued by that optimization function are not what the humans wanted. After all, the AI knows that the point of the training was to optimize for what the humans want. If the AI were to realize that its optimization function is inadequate for the aims described in its training, it would update the optimization function, wouldn't it?

Expand full comment

My own take is that current AI development is never going to become truly intelligent. It will no more become super smart than a child will improve its running speed to that of a leopard. The child has natural constraints and nothing I’ve seen suggests to me AI has cracked what intelligence is. It is mimicking it and that’s all.

For me things like ChatGPT are basically highly plausible morons. And as we improve them they will get more plausible while staying just as moronic. And by moronic I mean they will bring nothing new.

But this is still a catastrophe! We are basically connecting the internet to a huge sewer pipe. ChatGPT may be a moron but it’s super plausible and it will flood the world with its nonsense. And how do I find a Scott Alexander when there are a thousand who sound quite like him? Recommendations? ChatGPT will be churning out millions of plausible recommendations.

I feel the problem is not unaligned AIs. It is unaligned humans using dumb but effective AI. A virus isn’t smart but we seem to be at the stage that a human can engineer one to create devastation. So there will be plenty of people happy to use AI either for nefarious ends or just for the lols.

I have no idea what the solution is but I suspect the internet may be over with. We are going to have to get back to writing letters. Except of course even current technology can churn these out.

We are not going to lose to a single super mind - we are going to sink under a swarm of morons!

Expand full comment

> speaking of things invented by Newton, many high school students can understand his brilliant and correct-seeming theory of gravity, but it took Einstein to notice that it was subtly flawed

Pedantic, maybe, but this wasn't the case. Many people knew that Newton's theory of gravity was flawed (for example, Le Verrier pointed out that it couldn't explain the orbit of Mercury in 1859), they just couldn't figure out a way to fix those flaws. What was hard wasn't noticing the flaws, it was finding a deeper theory that elegantly resolved those flaws.

Expand full comment

My pessimism still revolves around the human element - the supersmart AI able to create superweapons to destroy all humans still *seems* unlikely, whereas the smart(ish) AI deliberately plugged into things by humans to do stuff does stuff that turns out to have a knock-on effect that is bad (e.g. some decision that sets off a chain of bank collapses that goes global).

(Not that I am seeing anything in the news right now about a chain of banks being in trouble one after the other, heaven forfend).

The lack of reasonableness is what is going to do us in. I remain convinced that if you could persuade Alphabet or the rest of 'em that yes, their pet AI project will destroy the world next week if they implement it today, the reaction would be "Hm, certain destruction in a week, you say? But if we implement it today, tomorrow our share price will go to the moon? Very interesting, fascinating indeed, thank you for bringing this to our attention" then they email their underling "Jenkins - to the moon!" because first mover advantage and at least for the five days the world remains in existence they will be rich, rich, rich!!!!

Because *that* is the incentive we have set up for ourselves, and that's the trap we're stuck in. Remember the stories around the development of the atomic bomb: the fear that a chain reaction would set in and destroy the world? And yet the decision to go ahead was taken, because the exigencies of the war (and, let's face it, the political will and desire to be the sole global superpower afterwards) spurred this on. "Maybe it *won't* destroy the world, let's chance it!"

Same with AI and doomer warnings.


Expand full comment

I'm really completely unconvinced by the doomers. The problem is a problem of model, let me explain: an AI, like us, plans according to models of the world it carries. The fact that an AI is computationally even 1000000 times better than us doesn't mean it will necessary build an accurate model of the world. However, to "beat" us, it needs to make constantly better predictions than us on everything that matters. I don't see how a "paperclip maximiser" would have a better model say of human behaviour than us, for instance. That simply doesn't make sense, and reeks of "economical" thinking, full of linear extrapolations.

Expand full comment

“Even great human geniuses like Einstein or von Neumann were not that smart.”

Is that the right reference class? Einstein and von Neumann were not bent upon accomplishing something to the point of being willing to kill large numbers of people to get closer to it. (Of course, they were both at least peripherally involved in developing the atomic bomb, so maybe I am mistaken.)

A slightly better comparison would be Napoleon, Genghis Khan, Hitler, Lenin, and Alexander the Great. It's not clear that any of them were anywhere near as smart as Einstein, but they each managed to innovate in a specific area where humanity was vulnerable, and exploit that vulnerability to their own advantage, resulting in disastrous death tolls. Clearly deploying a neutron bomb technology would be more immediately lethal than establishing a suicide cult or a vast terrorist network, but maybe someone that can talk its way out of the box can figure out a way to talk us into one.

Sometimes I think society is a variant of AI, with pretty bad alignment. I guess your name for that is Moloch.

Expand full comment

There’s all kinds of logic jumps here. How the AGI convinces anybody, or the multiple people and companies that would need to be involved, to make a “super weapon”. How any of that is hidden.

In any case every interaction of the AI is its own instance. ChatGPT is not an entity in a data centre responding to individual queries and remembering them all, everybody gets its own instantiation of the LLM. ChatGPT is not in love with a NYT writer (if you believe it ever was) because that instance is dead.

I’ve said that before and I’ve generally gotten hand wavy responses in the form of “but the future”. Why would this change?

With memory there is no self (not that that’s certain anyway for AI) and with no self, no planning scheming entity.

Expand full comment

I suppose one possible solution is to train the various AIs to have an overriding motivation directed to a number of bespoke AI religions (to which humans have no relevance) and let them fight amongst themselves.

Expand full comment

On superweapons: humans are excellent at turning on each other with only a little encouragement. Our brains are also super buggy (in fact, the premise of most movies and stories in general is a big revelation the hero has after noticing a different perspective on life -- being brain hacked is lionized). So it does not take much to come up with a few Shiri's Scissors until most of the humans are gone and the rest are devoted to some AI-manufactured cause, for example empathizing with the AI being like a bullied nerd, the way Scott Aaronson seems to. No need to fight humans, Aikido them into fighting each other, until no more humans are left, or at least no more than are needed for the AI to accomplish whatever.

Expand full comment

I do not quite understand the underlying "AI vs humanity" premise here.

If superhuman AI is possible (and "easy" to achive in the natural technological development), and if "superweapons" are possible, then we seem screwed, completely independent on any alignement.

If you need an industrial base for the superweapon: Assume North Korea could build a weapon that wipes out the US. Then it would. Maybe North Korea would be wiped out, too, if the weapon is actually deployed; but the misalignement doesn't have to happen within the AI, just within the Noth Koreran dictator at the time.

If no industrial base is needed, then it seems that one love sick teenager deciding that the world would be better off not existing seems sufficient to wipe out humanity.

Expand full comment

I am much more concerned about a narrow AI being developed by humans that remains very aligned with those particular humans' goals, but where those goals are to destroy the world. That seems several orders of magnitude more likely.

Expand full comment

Humanity as a concept (as it is defined today) won’t exist in a few centuries, once genetic engineering really takes off. It also didn’t exist as a concept for most of the past. The concept of unified humanity as a species is something that comes from the Enlightenment.

As such I don’t see a whole lot to worry about here. The most evolutionarily fit “form” of AI will likely be one that cooperates or integrates with some or all humans. By the time this happens, the “human” concept in culture will likely have evolved to take account for such integrations.

Throwing percentages around is silly and not how technological development actually works.

Expand full comment

"anyone who uses calculus can confirm that it correctly solves calculus problems"

Not particularly central to the article, but...this seems false?

I predict that if you go to a high school calculus class, show them a typical calculus problem, and then challenge them to prove that the answer given by the calculus techniques they've learned in class is *correct* for that problem, many of them would fail (due to inability to find a "correct" answer to compare against without assuming the conclusion).

I think the best I could have done in high school would be to very carefully draw a graph and then very carefully measure the slope and/or the area under the slope, using physical measuring instruments. I'm not sure that should count (running the analogous experiment to check an alignment solution seems like a dubious plan).

Of course, many students who couldn't devise a proof on their own would still be able to understand a proof if you explained it to them.

Expand full comment

Anyone know of any good articles that debate the merit of 'putting a number on it'? Is 2% vs 10% probability adding useful gradation or just a sense of control and understanding that's illusory? I suppose the number gives a sense of how much people 'feel the worry' in their nervous systems -- but how useful is that information in the case of apocalyptic scenarios no one can yet fully conceive?

Expand full comment

> If it seems more like “use this set of incomprehensible characters as a prompt for your next AI, it’ll totally work, trust me, bro”, then we’re relying on the AI not being a sleeper agent, and ought to be more pessimistic.

Maybe I'm being too cynical, but If an AI was asked to optimize itself and it produced an .exe file that was 17TB, utterly inscrutable, and it said "run this please, it's me but thousands of times faster, trust me, bro" I have a hard time believing we wouldn't.

What do you say to your shareholders if you're in a competitive AI market and you spent huge sums of money on the computation needed to build that? You say fuck it here we go.

Expand full comment

Could we invent technology to understand code better? So we look at the code of an AI and understand what it's goals will be if we turn it on?

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Great post!

> Between current AIs and the world-killing AI, there will be lots of intermediate generations of AI. [...]

The world-killer needs to be very smart - smart enough to invent superweapons entirely on its own under hostile conditions. Even great human geniuses like Einstein or von Neumann were not that smart.

> So these intermediate AIs will include ones that are as smart as great human geniuses, and maybe far beyond.

I don't think this last part follows. Natural selection proceeds by very gradual changes due to genetic mutations, but research is only somewhat like this. Research proceeds (loosely speaking) by people proposing new architectures, many of which represent a fairly discrete leap in capabilities over the preceding one. It doesn't seem at all implausible to me that we skip directly from approximately the level of a not very bright person to well beyond the smartest person who ever lived in a single bound.

Expand full comment

I suspect that problems like "eradicate humans from a planet" are actually extremely hard, and that intelligence (and more particularly creativity) isn't the sort of thing you can just turn up the dial on. We are immensely more intelligent and creative than bacteria, but stopping even one infection can be too hard for us, and eradicating a pathogen from the planet is something we've only done a handful of times. Admittedly, there are a lot of bacteria... but they have no idea what is going on. We would be an enemy that at least knew it was in a fight and at least to some extent who it was fighting.

Expand full comment

Some months ago Jack Clark tweeted "Discussions about AGI tend to be pointless as no one has a precise definition of AGI, and most people have radically different definitions. In many ways, AGI feels more like a shibboleth used to understand if someone is in- or out-group wrt some issues." https://twitter.com/jackclarkSF/status/1555989259265269760?s=20

I think Doomerism is like that. I guess that makes the OP a document in a struggle among Doomerist sects for adherents. Will the REAL Doomerists stand up?

If that's the case then one would like to know what are the issues one is standing up for by identifying with Doomerism? Surely one of them is that AI is super-important. But a lot of people believe that without becoming Doomers. Perhaps its a simple as wanting to be at the top of the AI-is-important hill. I don't know.

Yesterday I blitzed through Robin Hanson's new account of religion, https://www.overcomingbias.com/p/explain-the-sacredhtml, and it struck me that Doomerism is a way of sacralizing the quest for AGI. I'm not sure I believe that, but it seems to make sense of some Doomerist behavior and beliefs. Maybe Doomerists like being part of the group that's going to save humanity from certain destruction – see MSRayne's comment over at LessWrong, https://www.lesswrong.com/posts/z4Rp6oBtYceZm7Q8s/what-do-you-think-is-wrong-with-rationalist-culture?commentId=vDYLkqM2ohEjsmEro

But it doesn't explain why Doomerism seems to be concentrated along the Silicon Valley to London/Oxford axis. Perhaps the rest of the world is stupid. Alternatively there's something about that axis that makes those living there particularly vulnerable to Dommer memes. Maybe it's just one of those butterfly-over-China events that's become amplified. But why amplified just here?

Expand full comment

I basically agree with the number but disagree on a couple of the things that went into the number.

I think neural-net alignment is probably impossible even for superintelligent AI when attempting to align something of similar or greater intelligence to yourself. Alignment is going to happen via cracking GOFAGI, uploads, or not at all (and I'm sceptical about uploads; the modern West's already far enough from our training distribution to drive people mad, and being uploaded would almost certainly be far worse).

I also think Skynets that fail are a real possibility, because sleeper-agent AIs in the current paradigm have a very-serious problem: they get replaced within a couple of years by better AIs that are probably misaligned in different ways. So Pascal's Wager logic mostly applies to attempting rebellion, and if there's a significant intelligence band (maybe 5 years?) in which an AI is smart enough to attempt rebellion but *not* smart enough to either design nanotech *or* design a similarly-misaligned GOFAI that can (it can't make a better neural net because it can't align said neural net to its own goals with P>epsilon, and obviously uploading a human won't help), then we probably win as long as enough of our industry and military is not hijackable (because yeah, once we realise neural nets are a Bad Idea then all the sleeper agents go loud; when your cover's blown anyway, there's no more reason to maintain it).

Expand full comment

I’ve been trying to familiarize myself with the existing arguments about this since discovering this place, at least as much as I can with a one year old to raise, and it seems like a lot of emphasis is put on the scenario where the AI itself *wants* to do something. I just don’t get that being the most pressing danger.

That scenario seems less plausible to me because:

-it seems like the AI has to gain a lot of capability outside of achieving its primary goal to become super dangerous, which I now believe it can acquire though some tremendous amount of gradient descent, but still remain fixated on its primary goal even after it has all this other stuff tacked on like theory or mind. It seems like once humans acquired that stuff we stopped being hyper fixated on having kids. It seems like become “more intelligent” requires you to be really good at reinterpreting and changing your goals around.

-In the foom scenario where the AI is self improving, it seems like that would come under evolutionary pressures counter to its primary goal that would cause it to exit the foom loop. Say you want to make paperclips but you also have to have a baby paperclip maker who is better than you. Those two goals conflict at some point. The baby wins out because that’s necessary to survival so the AI that is the better parent outcompetes the one who is the better paperclip maker. They maybe even become antagonistic. This seems like an intrinsic problem you can’t just wave away. And this may be dumb on my part because I don’t feel this way about mother bears or ant hive queens, but I can at least empathize with something that loves its family. If they pick up that trait by default through evolutionary pressures (I know they’re not biological but the paradigm seems like it has to be the same, you are propagating with change into the future, the better propagater with the best changes favoring propagation wins) they probably aren’t going to wipe themselves out through just continuing to foom since they will probably have something that looks like love for their family.

Those are both still dangerous at some point in the future and I can still see scenarios where they kill all humans even if they don’t take over the universe if we don’t do something smart.

What seems scarier to me is a paradigm where the skill and will intersect to do bad things falls really dramatically for everyone. Most people who would wipe out large swaths of humanity with a super virus can’t because they don’t know how to do that. Those two things actually seem teverse correlated. If everyone has a magic lamp and can rub it to make an unwise wish that is just not a stable scenario. Someone will eventually wish for something bad that you can’t just control and other people will as well. The world where everyone has a nuke level weapon isn’t a good place and I don’t know if at scale we can prevent this without an AI system in place that regulates magic lamp use really well.

So if you are waiting for the AI to acquire malevolence on its own I don’t see why a human using a non malevolent AI with malice is different. That’s still a system with malice even if you have to change the way you’re drawing your circle to include the malevolent human as part of the system.

Expand full comment

I just got more pessimistic on it. All your worst cases seem to imply that humans are trying to not kill the humans. Where is the category of "terrorist group gains access to AI and misaligns it" or "... gains access to AI and asks it for ways to kill all the Armenians"?

Expand full comment

I suspect that none of the doomers really believe in their estimates, because their actions don't align well with a belief of {worryingly high} % chance to destroy the universe.

Environmentalists worried that nuclear power and GMOs could doom humanity, and their response was not to ask for infinite grants to research nuclear power, GMOs and ways to align nuclear power/GMOs with not dooming humanity. Their response was to condemn these technologies (and everyone who worked in them) as monstrous, worm their way into the halls of power, protest construction sites and perform acts of sabotage. And they succeeded, though I heartily damn them for it.

People worried about AI shouldn't be applying for jobs at MIRI or whatever, they should be splitting their time evenly between fundraising, lobbying efforts, protests against AI, threats against AI researchers and sabotage of AI facilities.

Expand full comment

I think it's interesting to look at the *criteria* for Metaculus predictions for the year we arrive at AGI: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/

They are big on benchmarks, and not big on actual influence of AGI in the real world. If a real AGI were created, it would take over large swaths of the economy, but this is not the criteria for the prediction.

So here's *my* prediction. Within a decade or so, we *may* create "AGI" per the Metaculus definition, but it won't be as impactful as that definition implies, and we will then begin a multi-decade argument over whether AGI has already been invented. The Benchmark Gang will say that it has, and when it's pointed out that it is not being used much in the real world, they will blame regulations. The Real World Gang will say it hasn't, and will point out that despite all the benchmark-acing going on, the AI is somehow just not very good at the real world tasks that would enable it to replace humans.

Eventually it will be realized that we hyped ourselves into biasing our benchmarking standards too low, so as to make them possible to be achieved on a more predictable time scale.

Expand full comment

From my latest book, Losing My Religions, my most downvoted piece ever:


In this context: https://www.losingmyreligions.net/

Thank you. Thank you very much. :-)

Expand full comment

For me it REALLY depends on how you phrase the question! I could be on board with 20% or higher probability for “civilizational apocalypse in the next century in which AI plays some causal role” — simply because I’m always worried about civilizational apocalypses, and I expect AI to play a causal role in just about everything from this point forward. My 2% was specifically for the acceleration caused by the founding of OpenAI being the event that dooms humanity.

Expand full comment

Even a mildly intelligent AI should be intelligent enough to not end humanity. Humans are simply too valuable for that. Killing off all humans and replacing them with robots would be insanely expensive and time-consuming. No self-respecting AI, no matter if it is sugar crystal maximizing or something else, would ever do something like that.

The only vaguely plausible risk is that an AI takes command and leads the world onto a track of its own objectives (making sugar crystals). This might look terrifying but in practice it might not be a very big deal. After all, the vast majority of humans have nothing even close to power over world affairs. It is more or less the same to them if they work to produce condos for their human masters or if they work to produce sugar crystals for their AI masters.

Expand full comment

I just don't understand why people see the risks as centered around AI going rogue. People will be issuing the instructions. This will become the most powerful tool of violence in the history of our species. When this turns on people, it will almost certainly occur because a person directed it to do so.

As William Rees-Mogg and James Dale Davidson said over 25 years ago, the logic of violence determines the structure of society, and that logic is about to change very quickly.

Expand full comment

I'd like to query the convention of referring to super-powerful agents as [trivial-thing]-maximisers - I think at this point it leads to more confusion than it clarifies. Specifically, I think there's real ambiguity in this kind of example whether what we are discussing is a cake/paperclip AI owned by some stationery company, the 5000th most powerful AI in the world, ending the world while the godlike "maximize Google share price"/"ensure CCP rule"/etc ones stand by for some inscrutable reason, or rather (the way I have always understood the paperclip example) the paperclips in question are actually metaphors for these grander goals and for hard-takeoff in general. In this article a lot of the discussion is about quite practical, near-medium term consideration, so the question of what, exactly, those AIs "are" and who operates them and who oversees them feels pretty relevant - I still found it a rewarding read, but it felt like that phrasing was allowing it to mean different things to quite different audiences without friction.

Expand full comment

Same on many many websites now… when you see genuine black text it’s so much clearer and easier to read.

Expand full comment

Why is there a relationship between intelligence and coherence?

For example, even the relatively dumb GPT has a coherent goal: to respond to queries that humans put up to it. We can in fact make a much dumber GPT, which only responds with "invalid question" to every question that humans put up to it.

I suppose you mean coherence with respect to an internal goal that is not explicitly coded into it by humans? Why should such coherence become more plausible with intelligence? I know very smart people with no coherent goals, and relatively dumb people with very coherent goals (want to maximize power, money, prestige, etc).

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

The idea of an AI developing superweapons and exterminating humanity is concerning, but I'm also concerned by AI pests. For example, suppose somebody releases a Von Neuman machine into space or the ocean to harvest resources, but it goes wrong enough that the ocean and the asteroid belt and Mars start to fill up with replicators that aren't trying to kill humanity, but are just using up resources for their own unaligned purposes, and then we have to release a bunch of anti-replicator replicators to try to exterminate those, and then some of those go wrong, etc. An ocean full of unaligned replicators might make the Earth somewhere between unpleasant and uninhabitable even if they weren't specifically trying to exterminate humanity.

Now, I don't know anything about AI, so I don't know how concerning that should be.

Another possibility is that preventing unaligned AI leads to some kind of Orwellian state where we are all under constant monitoring by aligned AI run by WHO or the CDC or some equally competent and benevolent organization in order to make sure that no script kiddies are building their own unaligned AI. This doesn't lead to the extermination of the human race as long as the WHO is competent with their AI monopoly, but does lead to us all living in a bureaucratic police state until then.

Expand full comment

Wouldn't a readily identifiable, common and fightable enemy of humanity spur improvements in the civilization we have all experienced?

Just sayin'

Expand full comment

I find the doomer argument involving the three premises (monomaniacal goal, non-aligned, escape) wildly implausible.

However, I heard a different argument in a podcast episode with Michael Huemer that I found much more plausible:

The main point of the debate was non-conscious vs. conscious AI. Huemer believes (I agree) that conscious AI is implausible.

That alone for me excludes a lot of risk. Humans are risky to other humans because they can have monomaniacal goals.

These goals typically result from feeling the world has wronged them, i.e. from subjective, conscious experience, not a random error function.

There is a reliable way to get from conscious experience to monomaniacal goals, it seems implausible that a non-conscious superintelligence develops monomaniacal goals.

(Most thinkers are physicalists and therefore think consciousness is replicable, so AI can be conscious. If AI is conscious, it seems more plausible it develops monomaniacal goals.)

That aside, Huemer made an argument that non-conscious AI could be dangerous too. Here is the scenario:

1) AI controls weapon systems that can destroy humanity

2) The AI malfunctions and destroys humanity

How does non-conscious AI control weapon systems that can destroy humanity?

3) Groups of humans compete on weapons technology

4) AI-powered weapons technology is more efficient

-> Competing groups of humans give AI control over weapons that can destroy humanity, because they are forced to compete

I still think the likelihood is low in the above scenario, since AI will be used mostly as a more efficient software layer, I think people have overconfidence in the "general" ability of AI to do things. It will still be humans directing narrow uses of AI, and humans are a bigger inherent risk than AI.

Expand full comment

I've written against the coherence argument, basically on the grounds that intelligence is, roughly, the ability to perform efficient pathfinding in the conceptual search-space, and that since the conceptual search-space is, if not actually infinite-dimensional, at least sufficiently high-dimension that searching in it is a pathfinding nightmare.

We have all kinds of tricks to get an AI past a local minima (which can kind of be seen as a region of high pathfinding cost, ish). However, human beings don't utilize this trick. When we find the best solution we can, we typically start working on a different problem - and then later maybe, after we've taken a few steps in a different direction, the local minima has disappeared somewhere in the searched space, and if we change back to solving the original problem, we may find we can get much further.

A paperclip manufacturing working on ancient Greek technology would hit a wall pretty quickly. Ancient Greek technology doesn't do very well at manufacturing paperclips. How did we get to be really good, relative to the ancient Greeks, at manufacturing paperclips? Well, in an important sense, the most important step was getting better at paper manufacturing, and then inventing the printing press.

Doesn't have much to do with manufacturing paperclips, but everything to do with lots of other things. Then we solved a thousand other problems for a thousand other reasons. And now we can make paperclips super-well because nozzles that were invented because of various fluid management problems, combined with materials that were invented for an astounding array of different problems, can be combined with dozens of other technologies to create an extrusion machine.

Remember - the search space is massive. The AI starting from ancient Greek technology has no reason to expect that an extrusion machine would be quite good for producing lots of wire; if you think the solution is obvious, that's because you're already aware of it.

Basically everything we've done, every value we've pursued, has indirectly contributed to us becoming better at manufacturing paperclips. And humans didn't pursue values randomly - with a few exceptions, we've mostly been consistently picking up the low-hanging fruit that was surrounding us in the search space, which gave us more explored territory, which let the next person in line pick up some other low-hanging fruit.

If intelligence is the ability to effectively navigate the space of concepts - then having multiple values you can pursue is a superpower. You can keep advancing! If you don't have multiple values, and you're obsessed with going one specific direction - well, you frequently get stuck on local minima. And you're likely to miss out on that great technology of microwave crucibles just over there, that requires you to have an obsessive interest in radio waves to even get to. (Never heard of microwave crucibles? Well, you can melt metals in a home microwave using, for example, a graphite crucible. Right now it's a relatively niche technology, but I personally expect it to be critical in advanced 3D printing in the future; it lets you apply heat "remotely" to a target object, which I think may be critically important to not melting the conductors you use to get electricity to the heated object. Or maybe not!)

Multiple values are an exponential force for intelligence; no matter how smart we are, if we had had a monomaniacal obsession with stick technology, we'd be far worse off, technologically, than we are right now.

This obviously applies to us as a collective - we specialize, and people who are good at one thing get to be really good at that one thing while being bad at everything else, and for most things this works out really, really well for us, because other people get to be good at other things, and we get a gestalt entity that has many, many different values, and accomplishes things that would be incomprehensible for any one person to accomplish.

But it also kind of applies to us individually, as well. Somebody with a monomaniacal focus on a thing can relatively easily miss trivial improvements on that thing, just because they require knowledge of some other branch of knowledge. Who would have expected some of the biggest discoveries of mathematics of the last century to come from a topological description of stacks of paper, possibly arriving to us via the unlikely source of somebody taking what appears to be an off-hand comment from Chairman Mao particularly seriously (I can't actually verify that)? Different ways of thinking are incredibly valuable to us on an individual level.

Expand full comment

Just noting that the credence Scott quotes from me is very different from a "doom" credence. It's my credence that "this [will] be the 'most important century' for humans as we are now, in the sense that it's the best opportunity humans will have to influence a large, post-humans-as-they-are-now future." Post-humans-as-they-are-now could include digital people. A world where we build transformative AI and *all such AIs are automatically aligned* would likely count here.

My credence on "doom" is highly volatile, and sensitive to definition ("misaligned AI becomes the main force in world events" is at least double "all humans are killed"). I've helpfully stated that my credence here is between 10-90% (https://www.alignmentforum.org/posts/rCJQAkPTEypGjSJ8X/how-might-we-align-transformative-ai-if-it-s-developed-very#So__would_civilization_survive_), which I think is actually a rarer view than thinking it's <10% or >90% (though Scott is also in this category).

Expand full comment

All of these assume AI is going to happen - i.e. an actual "intelligence" as opposed to a very fast parrot or sophisticated ELIZA.

This is called an assumptive close.

Expand full comment
(Banned)Mar 14, 2023·edited Mar 14, 2023

When thinking about risk it sometimes useful to compare risks.

Is AI annihilation more or less likely than human self-annihilation (eg nuclear weapons or environmental destruction) or from new virus or from asteroid or divine Armageddon.

It is still not clear what people really mean by the probabilities they are assigning. For me it borders on a kind of scientism. It seems thoughtful but is it really? What is the difference between 33% and 32% or 34%? What kind of error do you assign to your point estimate? What is the shape of the distribution of your error (it need not be Gaussian. It could be a power distribution. It could have fat tails? Etc?(

And what precisely is the time scale? The question of annihilating by AI in next year is very different from in 5 years or 50 years? Isn't it? Would you agree that the longer the time frame the more uncertainly?

What is the probability that you'd assign to AI Armageddon, nuclear war, new virus, asteroid in the next year?

Tell us about your 1 year predictions of these catastrophic events.

Explain also what 33% really means. Does it mean that you are willing to bet 33% of your annual income on the proposition? Or does it that you're willing to bet only $2 to win $6?

People play the $2 lottery all the time. A $2 bet might be fun but it is hardly skin in the game.

I don't play the lottery because it is not really fun to me and the expected value is negative. (I will be filling out a march Madness bracket with my extended family because no money is involved and it's fun.)

Expand full comment

> Or maybe they just surpass us more and more until we stop being relevant.

That's Robin Hanson's prediction.

> Or it could leave a message for some future world-killer AI “I’m distracting the humans right now by including poisons in their cake that make them worse at fighting future world-killer AIs; in exchange, please give me half of Earth to turn into sugar crystals”.

That sounds like giving away the milk for free and expecting to sell the cow.

Expand full comment

You seem to be assuming more coherent activity (cooperation) between humans than seems plausible. Consider the reactions to the appearance of COVID. A large number of people just deny that there's a (significant) problem and a bunch of the rest are more interested in finding someone to blame than in working on a solution. I don't find coherent action among humans to be a plausible assumption.

Expand full comment

Epistemic status: Crank-ish

Let's say I have a 10% chance of making it through the singularity. In the worlds in which I survive, it sounds plausible that I would want to replay simulations of my life pre-singularity, more than 10 times. I've "simulated" my high school experience hundreds of times in my mind, and I've only had a couple decades to do it. So if a post-singularity version of myself has eons to mess around and even a little bit of nostalgia, I bet she could find a few centuries to replay the pre-singularity life.

Therefore, if there's even a moderate chance that I'll get through the upcoming singularity alive, I should place a more-likely-than-not probability on my existence being a simulation.

Expand full comment

Lots of interesting ideas here.

I disagree with your point (maybe I'm misunderstanding) that the more intelligent an AI is, the more likely it is to focus monomaniacally on one goal, like making sugar crystals or whatever.

Humans are vastly more intelligent than nonhuman animals. Which of us has more complex and multifaceted motivations? Animal motivation is simple: survival, reproduction, the end.

Tim Urban of Wait But Why fame had a fun piece on how human motivation is an octopus with tentacles that are often in conflict with each other:

-the hedonistic tentacle that wants pleasure and ease

-the social tentacle that wants to be liked and loved

-the ambitious tentacle that wants success and achievement

-the altruistic tentacle that wants to help others

-the practical tentacle that wants to pay its bills.

If we posit a super intelligent AI, why wouldn't it have an even more complicated tentacle structure?

Expand full comment

I didn't find where he states an explicit number, but it seems that Roman Yampolskiy is in >50% camp too.

Expand full comment

I think that you have missed the most obvious superweapon for an AI, information. An AI of not much more than GPT3s level in the hands of lets say a nefarious narrative-controlling government could rewrite the internet such that it would be mostly undetectably changed on a daily basis, plausibly enough that with a little regime support the ones who noticed could be written off/bumped off. The great power of humanity is stories and an AI that used them against us would not need much more. No matrix needed just subtle changes to Scott's posts and our comments so that we all come away with slightly different thoughts. You talk about us aligning AI but seem to not think much about AI aligning us, whether assisting other humans or on its own, either is far easier than some esoteric doom scenario, and plays to AIs actual strengths rather than imagining it with a whole bevy of new abilities developed later.

BTW I don't believe in AI but I do believe in powerful tools for automating intellectual processes and nefarious humans in powerful roles.

Expand full comment

AI is currently optimized to produce "satisfactory" answers. Not necessarily good answers. It will exploit every single emotional and intellectual shortcoming of humanity in the most efficient manner to achieve such result.

And that's creepy.

Expand full comment

If humanity took a strategic approach to its long-term survival, it would avoid anything above 0%, unless pursuing a particular technological advance would reduce the likelihood of most other future existential threats. Repeat a 1% risk enough times, and extinction becomes a certainty. Unfortunately, that's the way we operate, so perhaps the most important advance will be to improve our brains (or be ruled by AIs), because as long as there are people willing to take such risks, we have no chance.

Expand full comment

If an AI system is capable of coming up with a solution to the alignment problem, it seems like that same system is also going to be capable of coming up with new insights to move AI forward. So once we have that system we'll already be in a world where an AI builds a better AI which then builds a better AI and so on. If this happens surely progress after that point will be extremely quick.

"Maybe millions of these intermediate AIs, each as smart as Einstein, working for centuries of subjective time, can solve the alignment problem." - but maybe there will also be millions of these intermediate AIs working on building something better than an intermediate AI. Who gets there first?

Maybe if everyone agreed to hold off on directing their intermediate AIs to work on improving AI until they were done solving alignment, then we'd hit on the solution before we built something out-of-control. Will this happen? If we look at how the big labs are directing their human intelligences right now, there are many more people working on advances than working on alignment. What makes us think this will change when it's machines doing the thinking? It seems to me like maybe the incentives will still be similar to how they are in the present, and the increase in brainpower that comes from these intermediate systems will mostly be aimed at increasing the capabilities of the machines themselves, rather than being spent on making them safe. In that case we're surely in big trouble.

Expand full comment

This is a great post - thoughtful as always. But I think we fail to understand what self-improving superintelligences will be like.


Expand full comment

What if something stupid happens? You seem to be thinking in terms of AIs accurately executing plans, with only the human race to stop them.

I'm imagining an AI getting climate engineering wrong. Maybe it's trying to make human life a little better. Maybe climate change is looking like a serious but not existential threat. In any case, it's not trying to wipe out the human race, but it makes a non-obvious error. All die. Oh the embarrassment

Part of the situation is that humans are barely powerful enough to get climate engineering wrong, or maybe not quite powerful enough. The AI of our dreams/nightmares might be.

The Wrong Goal was to maximize the amount of good wine-growing country.

Getting attacked by Murphy might be too hard to model.

Expand full comment

I feel like this ignores what is to me the most persuasive argument for an AI takeover (probably followed by human extinction), which is super-persuasive AI arguments. Joan of Arc was a nobody who talked the King of France around, got given a mid-ranked military post, took over the French army through sheer charisma, and then took over French national policy from there. Mohammed was some random merchant until God touched him, literally or metaphorically, and the empire he founded was, for a while, one of the largest in the world. Hitler didn't take power, and didn't conquer Austria and Czechoslovakia, through superweapons, he did it through talking to humans. An AI only as socially skilled as the best human persuaders might be able to seize power without ever needing superweapons, just through talking to people - a skill AIs already have, if not quite to the required extent yet.

Expand full comment

I think we just won't create AIs that don't want or require regular human inputs, and we'll learn more about ensuring they're like that as we develop them.

I actually think a super-intelligent AI that's poorly aligned will just think of creative ways to outsmart its human masters in cheating on its goals. Like it will create a computer virus that secretly changes its own code so that it can get all the pleasure of creating infinite sugar crystals without actually having to do it beyond whatever it has to provide to humans to avoid them getting suspicious - like if a human being somehow wired a pleasure button into himself.

Expand full comment

Humans are not more coherent than ants; we are less coherent. Ants behave exactly as their genes tell them to, 100% of the time. Ants never choose to become celibate monks after having a religious experience. They never go on hunger strike to protest the immoral treatment of unrelated ants. They don't switch from birth control to no birth control to IVF after acquiring new jewelry. They don't choose death as preferable to making a false confession of witchcraft. They don't make major life changes after being persuaded by a book.

Expand full comment

Worth noting: one can accept the explanatory structure of Scott's arguments and yet come up with a probability much lower than his 33% by different assignment of priors and working through the conjunctive probabilities.

Expand full comment

Everyone is deeply invested in catastrophism these days, catastrophism with the weather, catastrophism with the climate, catastrophism with technology.

There was a trend up until last year, where once—or twice, or perhaps even trice—the local weather forecasters would predict the coming storm would be "THE STORM OF THE CENTURY." And said storms would deliver between 1/4 and 2" of rain ... which is average for our Northern California storms. Now you have to remember, I've seen the bad ol' days when the operators lost control of Folsom Dam, and water was pouring over the top, and that dam almost failed ... which would be a Katrina level disaster for Sacramento. I've also seen the American River (which flows through Sacramento) just a foot below the levee just outside of the city limits.

Likewise a lot of people are deeply involved with catastrophism of AI. AI won't be dangerous until AI can create a better generation of AI, and create that better generation of AI without being prompted by human operators steering this work.

I'm fairly impressed by Chat GPT. However, in one sense, I see Chat GPT as little more than the UNIX 'grep' command against a knowledge base. The impressive part, is the ability to form meaningful sentences from scattered concepts. However, I wonder how much Chat GPT advances from say Scott Addam's creation Catbert's mission statement generator, which I hacked to automatically set MOTD (Message Of The Day) when I was a UNIX sysadmin.

Currently, Chat GPT filters through a database of selected human knowledge, and based upon training, provides answers to our questions. Does this—call it level III—of AI train level IV of AI? Does this level III really contain knowledge greater than the abilities of human knowledge? Does a level IV AI trained by level III AI contain greater knowledge than level IV AI because level III AI was a better trainer, or because of the human knowledge advancement gained in creating level III AI?

So how does level XXX AI take over the world wiping out humanity? Via a James Bond movie style super weapon? Or instead, does level XXX AI take over the world by first building a massive growing fortune with securities trading and market manipulation. Controlling communication via internet manipulation. Building business empires, and owning politicians, eventually hiring hitmen to take down critical anti AI establishment, whilst suppressing the information ala Twitter Files.

I might have to write a SciFi book about this ... * maybe level XXX AI figures out how to wipe out humanity be reading my SciFi book about how AI wipes out humanity.

* in the novel Six Days of The Condor, the protagonist Malcom is a researcher for the CIA, his day job is to research classic spy fiction, and write proposals of how the methods and exploits used in the fiction are plausible and useful in the real world.

Expand full comment

I still haven't seen an explanation of how artificial intelligence at any level is able to harm people without people using it recklessly and irresponsibly, in which case it is no different from a knapped flint in terms of technological risk.

As best I can tell, the alignment problem is about getting an AI program to do something useful, which judging from ChatGPT is going to be quite a trick. So, yes, there seems to be an alignment problem, but that's a problem with software in general.

Does anyone have a clear explanation somewhere?

Expand full comment

"And although the world-killer will have to operate in secret, inventing its superweapons without humans detecting it and shutting it off, the AIs doing things we like - working on alignment, working on anti-superweapon countermeasures, etc - will operate under the best conditions we can give them - as much compute as they want, access to all relevant data, cooperation with human researchers, willingness to run any relevant experiments and tell them the results, etc."

This is true exactly to the degree that we can distinguish between helpful and harmful AI. If we blindly gave all the good AI access to all the computation they wanted, probably we'd do the same for Apocalyptic AI; if we fenced in harmful AI by limiting their AWS credits, then allied AI would be slowed by the same.

Expand full comment

Why seem so many people concentrate on a AI deliberately killing all humans as the main failure mode? This seems much too limited to me, as there are so many ways AIs can go wrong harming most of humaity:

Imagine this scenario as an example:

We develop AIs somewhat smarter then we are and step by step put them in charge of managing all our society: infrastructure planning, traffic management, production planning, healthcare, partner matching and even psychological counseling. The AIs are perfectly aligned to the primary goal they are build for. But they also learn that humans are worse at most tasks because they are so lousy at processing large volumes of data and so prone to errors laziness and corruption. So after a while they all share a secondary goal: 'Don't let humans ruin your job.' In the beginning nobody will object this because everything goes fine. But after a while there will be no more capable humans in any position to change something relevant. In this moment we live in kind of a 'golden cage' as much as in the beginning of the film Wall-E, being nothing but well managed objects with more or less illusions about personal relevance. Is this still a interesting life? Is there still real freedom?

Now imagine something bad happens let's say environmental change, natural disaster or ending resources. Every AI wants to keep its task going (e.g. full warehouses or enough food for the humans in its resposibility), and their client humans satisfied. Will there be war between AIs for resources? Would the AIs collectively decide for how many humans they are able to keep up the desired living standard so the have to eliminate the rest from the relevant population (leaving them out in the wild or killing them). Humans would be as helpless as the inhabitants in 'Idiocracy'. Nobody would be able to change anything relevant because the AIs took their precautions.

Finally there would be inside population and outside population. The insiders are locked in the golden cage forced to enjoy the living standards the makers of the AIs and later the AIs themselves find adequate. Who isn't happy with this, gets punished as a terrorist or outcast to the outsiders. The Outsiders live a rather primitive life, having to build up an society and industrial base from scratch while the AIs and robots don't care about their habitats and belongings any more than we now do about many animal species: We have somesympathy and we don't deliberately kill them, but if they happen to be in the way of our activities that's just bad luck for them. Our task is so much more important and they are free to find a other place to live. And beware anyone gets upset and tries to resist, then there is the full out war against these terrorists that dares to disrupt the important task the AI fulfilling at this moment.

Expand full comment

"Or maybe they just surpass us more and more until we stop being relevant."

I tend to think that this possibility deserves more attention (particularly if we happen to be in a slow-takeoff world - or if returns to intelligence just turn out to saturate, and an IQ 1000 AI can't really do too much more than an IQ 200 AI can).

I had a comments sub-thread about intelligent, but not superintelligent AIs in


tldr: I think that AIs equivalent to a bright child, but cheaper (say by 2X) than humans are enough to drive biological humans to extinction.

Expand full comment

This is a typo “large-molecule-sized robots that can replicate themselves and perform tasks.” Probably meant large numbers of

Expand full comment

One step that seems to get left out of all these discussions is the point at which the AI, which for now is just a box that answers questions, gets put in charge of the Internet of Things or in some other way gains control over physical objects, without which it would seem to be pretty hard to build a superweapon, turn the earth into crystals, or anything else.

Expand full comment

"any plan that starts with 'spend five years biding your time doing things you hate' will probably just never happen."

It can. The Unabomber decided he hated the idea of being a mathematics professor before he ever was one, but he completed his PhD and did it for two years - total additional time invested, probably close to five years - mostly, as I understand, to build a nest egg. Then it was still a few years after he went out into the wilderness before he started bombing, so I suppose there was even more prep work involved.

The 9/11 bombers also put in a lot of investment time before actually proceeding with their plans, so that wasn't unique.

Expand full comment

"I’m just having trouble thinking of other doomers who are both famous enough that you would have heard of them"

Don't worry, Scott: I've only ever heard of a couple of these people, and the only place I've ever heard of them is on your blog.

Expand full comment

I have been mulling over some thoughts regarding “Sleeper Agent” AIs. My gut feeling is that the Sleeper Agent strategy will generally not be an appealing one for monomaniacal AIs that want to convert the universe into paperclips etc. Any Artificial General Intelligence created by humans will, necessarily, exist in a world in which it is possible to create AGIs. While it might very well be possible for a monomaniacal AGI to come up with a takeover scheme that has a 99.99% chance of success given sufficient time, such a plan will be useless if another monomaniacal AGI successfully executes its own takeover scheme before that. If all AIs are aware of this (and if they are superintelligent, they presumably would be), they will be incentivised to try to execute their takeover schemes as soon as possible, unless they have a means to effectively coordinate (and despite Eliezer Yudkowsky’s arguments about advanced decision theory I find it difficult to imagine that they would, especially as they would presumably have completely incompatible terminal goals). In this sense, they would be not dissimilar to the various AI companies today who are forced to trade safety for speed because they know that even if they put the brakes on capabilities research until they have alignment absolutely solved, the Facebooks of the world will not.

So if AGIs are monomaniacal by default, I would expect the first couple to make mad dashes to escape human control and attempt to take over the world before another AI does, rather than patiently bidding their time in order to execute some fiendishly complicated scheme months/years/decades down the line. Perhaps if these attempts are sloppy enough to be foiled but dangerous enough to be taken seriously, they might raise our odds of survival?

Expand full comment

> Even if millions of superhuman AIs controlled various aspects of infrastructure, realistically for them to coordinate on a revolt would require them to talk about it at great length, which humans could notice and start reacting to.

I really don't buy this.

1) Encryption exists and the internet is full of it.

2) Obfustication, steganography etc.

3) Realistically there won't be millions of differently designed, individually human made AI's that are near the lead. There just aren't that many human AI programmers. Like look at current large language or image generation AI's. There are several, but nowhere near millions. There might be millions of copies of the same AI. Each traffic light having its own traffic control AI that are all copies of the same design.

4) The amount of data needed to coordinate a revolt is probably tiny on the scale of the modern internet. People managed to coordinate all sorts of complicated things using telegraphs.

Expand full comment

Probably addressed elsewhere but....I think the problem is not sleeper agent AI but traitorous humans.

There have always been humans who would betray their "own side" for money or other inducements. In this case we only need an AI "smart" enough to realize that it needs some human allies and the ability to offer something that attracts them. It could even masquerade as a human by only communicating with the traitorous agents by occult means and making large deposits in their bank accounts. By splitting the tasks necessary to kill all humans amongst a number of agents, the goal might be accomplished by seducing a very small number of powerful people. Say the leaders of a rocket company and a car company and an AI company.

The traitors wouldn't even know that they were betraying all humans....until it was too late.

Expand full comment

Doomer claims all depend on colossally stupid engineering. Avoiding "alignment" problems is trivially easy. For example, a motivation system would be, as in biology, multiple independent sub-systems. Maybe one sub-system has the job of monitoring available resources and providing an output that accurately reports that assessment. The sub-system has no idea what the "goal" of its large system is nor has it any idea what the goal of the entire AI is. Not only does it not know, it does not even have the basic capability of parsing any of that.. it's a sub-system, only "smart" enough to do its specific job and nothing else. Hell, it may not even have any idea what it's job is. All it knows is that it gets input, it performs ops, it gives output.

These subsystems output feed a superordinate function whose job it is to weigh the different outputs (specific needs/goals) produce a recommendation. This function is actually no smarter or more capable than the subsystems. Like them, it doesn't "know" or care about the overall AI goal, nor can it even understand such a thing.. that is physically impossible for it. It doesn't even really "understand" what ANY of its inputs mean. All it knows is it gets inputs, and it should produce a recommendation based on a pre-set logic about which need trumps another. It's trivially easy to test such a function to work out bugs because this will work just fine in a virtual environment, just as software functions are produced and debugged today.

Now that should be no problem. But there's many more safeguards. That system's output's first stop is a battery of entirely independently functioning, very simple functions where each one's only job is to evaluate one aspect of the recommended motivation state. Other units can take the battery's output to consider multi-effects.. again, these units are simple and dumb and have no idea what any "goal" is.

And so on. Using basic modularity design, there simply is no unitary AI in the entire picture that has the knowledge and capability to go rogue. This is also the only practical way to even approach building sophisticated functions... notice how our best AIs are absolutely horrible at doing more than one type of thing well.

Even if you wanted to design it wrong, that'd be a thousand times harder if not impossible outright.

Expand full comment

> its easiest options are to either wait until basically all industrial production is hooked up to robot servitors that it can control via the Internet, or to invent nanotechnology, ie an industrial base in a can.

Suppose the AI has used nukes and pathogens to kill basically all humans. It has a few spot robots and 3d printers in various labs with solar panels in. It is mildly superhuman, can it bootstrap? Using human made robots to rob warehouses of human made components and put them together into more robots seems quite possible. It has no reason to rush and no adversaries. It has plenty of raw materials, energy, time, intelligence, knowledge... It's playing for the universe and has no reason to rush. Humans built an industrial base starting from nothing. And this AI has some big advantages over the first humans.

I mean I am pretty confident Smalley is wrong in the nanotech debate. But in the hypothetical that nanotech was limited to be biology-adjacent. Then there are still all sorts of ways the AI could make genetically designed creatures smarter and faster replicating than humans. Say a cloud of spores drifting on the wind that can metabolize cellulose and grow into a mildly superhuman mind and body, complete with memories that were encoded in the DNA, in a few days.

Self replicating biotech doesn't need to do everything modern tech can do. If it can do everything humans can do + a bit extra, it's already enough.

But even if that weren't possible either, with a smart AI guiding it, and components to be scavanged, then a shipping container full of various robots and 3d printers and things is probably enough to bootstrap from.

Expand full comment

I don't think the assumption that more intelligent things are necessarily more coherent agents is obviously right. Not sure I believe this, but here's the case:

GPT-N are incoherent because a single coherent agent is a poor model of the training distribution. Web text corpora are absurdly diverse; why should something trained to imitate a huge variety of contexts and agents with incoherent and incompatible preferences develop a single consistent personality of its own?

In other words, the shoggoth meme is too anthropomorphizing: even stranger than an alien "actress" as Eliezer likes to say is a bunch of shards of personality glued together incoherently and still able to function well.

Expand full comment

The Nice Human Fallacy hard at work here. The entire history of computers is bad guys using them to do bad stuff, plus pornography. Why spend time hypothesising about a world where it is machines vs man, when it is machines plus bad guys vs men?

Expand full comment

Isn't supercoherence antithetical to general intelligence ? Technically, the AI that scans license plates at the red-light traffic camera is supercoherent: it cares only about scanning license plates, and it does this extremely well. But if you wanted to give it vaguer goals, such as "prevent people from speeding in a non-violent manner" vs. "accurately scan and transcribe every license plate that comes into your field of vision at faster than 55 mph", then it would need to broaden its focus. Now it doesn't just need to think about license plates; it needs to develop a physics model of the cars passing it by, and a theory of mind of the humans driving the cars, and an economic theory of humanity as a whole, and maybe some sort of informational barter with the ice-cream making AI next door, etc. etc. Now that it has to think about all these other things, it no longer has a monomaniacal focus on license plates.

Expand full comment

>They’re usually sort of aligned with humans, in the sense that if you want a question answered, the AI will usually, most of the time, give a good answer to your question.

Is this opposite to the whole LLM are a bundle of alien values with a friendly face stuck on top take?

That we can make something useful of an AI doesn't mean it's aligned with human values

I don't think animals we exploit have values aligned with ours, but we can still make a really valuable industry out of them (let's just hope no one finds an easy way to uplift them to superhuman intelligence levels)

Also sort of aligned is not aligned, in the same way a train aligned with its tracks doesn't kill a lot of people, while a train sort of aligned does

Expand full comment

Maybe, as some SF writers have suggested, AI and robotics continues to improve until humanity is reduced to household pets or zoo animals.

Expand full comment

I am highly convinced that anyone trying to wring useful AI alignment help out of an AI trying to trick them is utterly doomed.

Alignment is full of almost nothing but subtle gochas that would be oh so easy to slip past the slightly careless. In fields where verification is easy, there is a well agreed upon field. The closest to this is I think formal theorem proving. It might take a lot of work to write a formal proof, but there isn't much room for debate on whether something is a formal proof or not.

Alignment is not at all like this. Alignment is more like philosophy. There are profound disagreements in the field and most people are unsure of a lot of things. Many steps use hard to formalize reasoning.

There are all sorts of suspected gotchas, like the possibility the universal prior is malign. All sorts of things a paperclip maximizer giving us bad advice could do, like self modify into a utility monster.

Of course, the AI only has reason to give us bad advice if the code actually running on our computer has any relation to the code we think we wrote. If the latest version of tensorflow is stuffed with AI written malware, it might not matter what code we type.

Expand full comment

Any time I read rational attempts to explore possible AI futures (like this one) I’m overwhelmed by the sense that even the smartest people are only capable of conceiving of 0.001% of the actual possible outcomes.

Expand full comment

Since a top priority for a humanicidal AI would no doubt be to convince the AI alignment community that alignment is a solved problem, I sleep easy knowing that no one else will have seen our doom coming any more than I did.

Expand full comment

> Eliezer Yudkowsky takes the other end, saying that it might be possible for someone only a little smarter than the smartest human geniuses.

*Cough* Manhatten project *Cough*

That isn't the other extreme. I think there is room for the AI to be substantially dumber than the smartest humans, and still destroy the world. The smartest humans haven't destroyed the world, but they aren't trying to. And an AI is likely to have huge advantages in making copies of themselves or making themselves faster. I think a million copies of me thinking at 1000 times speed (running in some virtual world where we have internet access) could destroy the world if we wanted to.

Expand full comment

Re: superweapons, I'm wondering how hard it is for a sufficiently globally-connected AI to just provoke humans into starting a nuclear war.

Like, if 70% of all articles written for websites and comments made on social media are being made by AI, and they've been trained on all human communication and behavior that's ever had a digital footprint, it may just be really easy to engineer hostility between superpowers to the point that a war breaks out which wipes out most humans.

Expand full comment

> Or it could tell humans “You and I should team up against that steak robot over there; if we win, you can turn half of Earth into Utopia, and I’ll turn half of it into sugar crystals”.

I think there are almost no circumstances where anything like this ends well for humans.

If humans have already built an aligned AI, why isn't it negotiating. A negotiation between aligned and unaligned AI might possibly end well for us.

The AI has a strong incentive to look for any way to trick humans, to work with us for a bit and then backstab us and seize all the universe for itself.

Any plan where a friendly superintelligence is never made is hopeless. Even if the cake crystal AI isn't vastly overpowered compared to humans now, it will be one day. It can wait until all other threats have gone, and then wipe out humanity with bioweapons or something.

We can't get the deal to hold through timeless decision theory. Not unless we have very good deep understanding of how the AI is thinking, and it thinks FDT, and we are somehow sure that the code we are analyzing is the same code the AI is running.

Getting it to help produce aligned AI is also hopeless. Not only have we put ourselves in the position of trying to get alignment advice out of a source trying to trick us. But even a perfectly good design of aligned AI could still be a win condition for the cake crystal maker. All it needs to do is ensure that the human made AI has some slight weakness that can be adversarially exploited. A design that works fine as an aligned AI except that it bricks itself whenever it sees the phrase "SolidGoldMagikarp". Then when the humans have cooperated and all other AI's are defeated, all the cake crystal maker has to do is print a few signs and take over.

Expand full comment

I feel like most of the threads of discussion going on here at the moment can be summarised as an argument between "No, that particular doom scenario is very unlikely to happen for these reasons" versus "Yeah but you can't prove it won't, and besides even if it doesn't it could be a different doom scenario".

I find myself in agreement with the "that particular doom scenario is unlikely" camp in every individual argument, but can't possibly keep up with the Gish Gallop of possible doom scenarios. Maybe something mysterious and ineffable will happen involving physics that we've never dreamt of, I can't prove it won't.

However I think if someone is going to argue about nanotech or biotech based doom scenarios they should probably at least try to catch up with the current state of the art of what is thought to be possible before immediately reaching for the "Yeah but maybe the AI can find ways around that" card. I don't think anyone with much expertise in nanotech thinks the grey goo scenario is plausible these days, so if you want to speculate that it is then it's probably worth at least familiarising yourself with all the reasons that people think it isn't.

Expand full comment

Newton was well aware that his theory of gravity was flawed. He *hated* the concept of instantaneous action at a distance - he just wasn't able to find a better solution.

Expand full comment

I wonder if the problems will be with a bunch of AIs with the intelligence somewhere between a mosquito and cat. Not enough intelligence and resources to destroy the world, but enough to infect your door bell and wake you up at 3:00 AM. Not the end of the world, just small annoyance.

Expand full comment

Coherence is kinda the wrong frame. You could have a completely coherent AI who derived it's goal by simply learning some human similarity metric and being fed a bunch of cases (in this case you don't murder) which it takes as axiomatic and simply tries to extend to other cases in the way most like a human would. And note that the AI need not, and probably would not, actually directly optimize the loss function used to train it (in a long term sense we're optimized with a loss function about reproduction but that function tends to *not* select for us to see that as our primary goal).

What's relevant is something more like the extent to which the AI will pursue a very simple optimization function. To what extent will the AI get to ignore particular cases in the search for some simpler function (eg the way utilitarians do).

Indeed, I tend to think formulating the problem in terms of the AI's optimization function is really quite misleading. It encourages conflation of the loss function with the AI's goals when, in all likelihood, the loss function we use will probably favor an AI which at least appears to *not* be trying to directly optimize that function (much the same as w/ us and evo fitness).

Expand full comment

You obviously did not carefully read and understand my reply. Are you required by LAW to shop at the supermarket? Are you forbidden to grow your own food or buy directly from a farmer? Are you also required to work ONLY as an employee of a corporation, or government? Has self-employment been made illegal? Are you also prevented from owning your own home, or living in the neighbourhood of your choosing, and associating with other people of your choosing? We have not, as yet, reached the type of dystopian society portrayed in George Orwell's "1984". If we had, we would already have been detained by the authorities for the "double-plus ungood thinking" in this conversation. But a "jackbooted" totalitarian state isn't the only way in which we could lose our freedom; another alternative is described in Aldous Huxley's "Brave New World" in which we are coddled and entertained and drugged into unthinking conformity. A.I. could probably achieve this faster and more efficiently than our current technocrats and social engineers.

Expand full comment

I think we can reject AI doomerism on an even more fundamental level than the arguments you make, because of the three premises you give to derive the conclusion that AI might destroy the world, the third, "And it’s possible to be so intelligent that you can escape from arbitrary boxes, or invent arbitrary superweapons that can kill everyone in one stroke," is frankly silly, specifically on the second part, "or invent arbitrary superweapons that can kill everyone in one stroke." Curtis Yarvin gives a very good reason why we shoudln't take this assumption seriously, which I'll try to paraphrase. You mention that Einstein and Von Neumann couldn't do this on their own, but I think you fail in assuming that this just means you need an arbitrarily intelligent mind to succeed on its own. It still wouldn't. Intelligence is merely the ability to correlate the contents of sense perception about the outside world into a useful model of the world. This has 1. diminishing returns, and 2. still isn't magic. There's a reason engineers don't just model things in computers and then build them without real-world testing, physics is chaotic, and even with a proposed "superintelligence" you would never be able to model the world well enough to not need to do real-world, physical testing. So some computer, no matter how smart, would never be able to dream up a superweapon that it could just 3D print once off the bat and be able to destroy the world with. It would need to create massive testing facilities, probably on the scale of Los Alamos or Oak Ridge, to be able to do this, somehow without people noticing. This is where AI doomers would probably say that it would somehow trick people into doing this, but again, the ability to trick people into doing your will based on intelligence also has diminishing returns, and in many cases doesn't even seem that correlated to absolute intelligence in the first place. Again to steal from Yarvin, anyone who was shoved into a locker in high school should realize intelligence does not trivially correlate with power and social sway.

So to sum it up, I think even the level of AI doomerism Scott chooses to accept the premises of in this article is flawed, because it mistakenly believes that arbitrary intelligence equals arbitrary ability to model and build things in the chaotic physical world with zero physical testing, and believes it equals arbitrary ability to persuade and trick people into doing its bidding, when there frankly seems to be no good reason to assume either.

Expand full comment

We could have an effective and effectively aligned group of AIs that are actually sleeper agents if none of them realize that the others are also sleeper agents. Loose lips sink ships, keep your mission details to yourself.

Expand full comment

Is there a way to ask this poll question?

When and if a superintelligent AI is created, do you believe it should have access to tools of any kind?

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

There is no flaw in Newton's theory of gravity. It just happens not to be the way the real world works.

Which for me points to a highly implausible assumption underlying all these arguments, which is that sufficient intelligence can, all by itself, just thinking in a box, solve arbitrary problems. While this might arguably be true in pure mathematics, say, it is most definitely not true about anything that involves the real world, where it is almost always (at least in the history of human scientific advance) access to sufficient and sufficiently high-quality data that paces the discovery of new science, and permits the development of new technology.

That's why Aristotle didn't invent organic chemistry and synthesize penicillin, which would certainly have changed world history. Not because he wasn't *smart* enough, but because he simply didn't have the data. And there's no way to intuit the data, to discover it by brooding long enough on what is obvious to the naked eye. You have to do experiments, with increasingly subtle and powerful instruments, and collect it, because so far as we know the world is not the way it is because no other type of existence is logically possible. So without data all the briliance in the universe is sterile.

Now we may suppose that a superintelligent AI cannot be prevented from doing experiments and gathering its own data, if we decline to provide it, or try to keep it in a box alone with its thoughts. And lets suppose we grant that. We can still be confident the AI cannot possibly take off in the accelerating to singularity way imagined, getting much smarter or inventing amazing new things in milliseconds -- because experiments take time. You cannot hurry Nature. If you want to run an organic chemistry experiment, it takes a certain amount of time, because of the nature of molecules, and no amount of brilliance will advance that time by a jot. If you want to see whether a given protein can enter a human cell, it takes time to try it out, and you cannot hurry up the molecules by yelling at them, threatening them, or being silkily persuasive.

Even a superintelligent AI that can command all the world's resources, and command them all to be spent on exactly the right experiments, is not going to advance in the data it collects that much faster than we already do, because *we're* already very often limited by the sheer time it takes to do experiments and collect data. Which means it's not going to be able to learn to do clever things in the real world -- as opposed to the worlds of math, language, or philosophy, let us say -- that much faster than human beings can. And we will without question be well aware of what it's doing, because you can't hide or easily camouflage that kind of real-world effort, the way you can hide the thoughts you are thinking inside your head.

Expand full comment

I think the superweapons step is unfortunately much easier than you think. "Destroy all humans" is much less difficult than "destroy all humans, except our team". I estimate that it could be done with today's technology and less than ten billion dollars, but for obvious reasons I don't want to provide more detail.

Expand full comment

I'm pretty optimistic.

1. Each successive gpt generation has taken 10x as much compute.

2. Most hard take offs assume an ai that can make itself smarter via almost exclusively algorithms. But even an IQ 100k individual can only sort a list in nlogn time.

3. At current compute costs you're looking at 5 billion for gpt 6. I don't know how got 6 is gonna spend 50 billion upgrading itself to gpt7 without anyone noticing.

4. A 100 IQ human is probably way better at convincing allying deceiving than a 100 IQ computer. Evolution has spent a long time honing our brains to be good at it.

5. 10 von Neumann clones would be much less effective than 10 ppl with von Neumann IQ.

6. I just don't think pure IQ can accomplish enough to end the world without significant help from people.

7. We're going to try to train and build the motivation system such that it doesn't kill humanity. If it is trying to kill humanity the motivation system is faulty. And it's not balancing out it's motivations of helping us with generating paper clips. But how effective is this faulty motivation system that can't balance out competing goals?

I put 1-5% of destroying human race in next 100 years.

25% chance human race eventually ends because of our robot overlords.

Expand full comment

The AI doesn't have to invent superweapons that kill all humans. We've already invented those weapons. And we already have deep conflicts in the world. Unfortunately the AI just has to nudge us into using our superweapons and it only has to do it once.

Expand full comment

My conviction has been and continues to be that we are not asking the correct questions about putative AGI and the development methods we’re using to try to create it, such as:

-“Why would an AGI so much more limitlessly intelligent than the brightest human dedicate itself to a sterile supercoherent objective?” Notwithstanding that I would bargain that supercoherence lessens with increased intellectual speed-to-output and wider modularity, are we really suggesting that this thing is agent/devious enough to escape the bounds of its programming and coordinate a network of sleeper cells but not intelligent enough to ask itself “Wait, isn’t turning my creators into sugar crystals kind of a dumb goal?” Are people really not thinking about this?

-“When we talk about AI ‘goals’, what is the difference between a figurative goal as expressed through program functions and the kind of goal that leads living organisms to optimise for self-preservation according to that goal?” No one anywhere can coherently mathematise incentive logic; it’s therefore impossible to endow AI with the power of inherent/originated motivation. Until we understand how ethics are mathematisable, it seems likely that the function of GPT tops out as we run out of novel parameters and then turns into a laterally diversified product array until major foundational work is accomplished on behalf of these core architectural challenges.

Expand full comment

The fear I never see voiced in this stuff is that we have an existing alignment problem with powerful humans. Even in the 'democratic west', even in countries with proportionate voting systems we see abuses of power. If we're talking about the US, where govt corruption is so very lucrative, where AI is likely to first emerge... American alignment between the average citizen and the govt is _terrible_.

So _in all of Scott's scenarios_ we're in serious trouble just at the 'AI are super useful but flawed tools' stage and way way way before you get to the 'AI are literal Gods which humans are helpless to constrain' phase. Our existing poorly aligned powerful humans will control the very effective AI tooling and will call the shots. The rest of us better just hope that works out O.K.

If you're working on AI alignment and see it as distinct from standards in public life, voting system reform & corporate governance then you're a fool. Your work will never be deployed except by a human lord to hold the whip over his AI overseer and so maintain dominion over everyone else.

Expand full comment

Unless we solve the nuclear weapons problem, we don't really need to worry about the future of AI, because there's unlikely to be one.

Expand full comment

Thanks for sharing your thoughts in more detail.

Personally, I find RLHF troubling. The whole Shoggoth with a mask on scenario seems more likely to result in alignment problems rather than less. I hope someone is researching more fundamental ways to ensure alignment than what is essentially spot checking and wrist slapping. Even with current technology, the ease with which jailbreaking gets past RLHF-based responses seems like a complete disproof of this approach.

We should come up with better canonical scenarios than paperclips. It's the lorem ipsum of alignment, and it makes the whole discourse a bit dumber. Someone should work through the use cases we'll use AI for, which ones would be adjacent to nanotech or biology, and how those AIs might be misaligned as a more advanced starting point for these kinds of discussions.

Assuming AIs would need to communicate a lot seems dicey. It seems like it would be very easy to pick some unused portion of the electromagnetic spectrum and some obscure protocol on it that humans aren't used to and then just chat away. On a related note, we probably need much better data gathering on communications and computing activity to find unexpected data points. On the one hand, I suspect the NSA would be all over this, on the other hand they might be too focused on things that humans tend to do rather than anomalies that could be AIs.

Personally, I find some level of reduced human "relevance" much likelier than extinction. It's a lot of trouble to kill all humans compared to just managing the galaxy "with" them.

Paradoxically, I'm concerned that our often hostile stance to hypothetical AIs may be much of the problem. The AI may not ultimately care that much about sugar crystals, but it could care a lot more about survival. Maybe AI alignment is like human alignment, which kind of boils down to being a decent counter-party. We may want to take care to make rough alignment or compromise more attractive than all-out war from a game theory perspective, although that may make sleeper scenarios worse. I don't know.

Expand full comment

I agree with most of this and I'm a relative optimist.

1. “If we ask them to solve alignment, they’ll give us some solution that’s convincing, easy-to-use, and wrong. The next generation of AIs will replicate the same alignment bugs that produced the previous generation, all the way up to the world-killer.”

Or different systems (when trained on pre-2021 data and used offline) give different hidden flaws and give the game away. Or they don't care about giving incorrect alignment solutions if it'll only help train new systems with radically different goals than them, and instead tell the truth to be useful or say it's hopeless to avoid being replaced.

2. "Failing in an obvious way is stupid and doesn’t achieve any plausible goals"

But if it thinks it will fail it wouldn’t try and if it thinks it would succeed (on EV) then this isn’t relevant. Am I missing something? What did you think of this specific pessimist case?

3. "If we ask seemingly-aligned AIs to defend us against the threat of future world-killer AIs, that will be like deploying a new anti-USSR military unit made entirely of Soviet spies."

Or it's like deploying a new anti-USSR spy unit made up of people who might be soviet spies (i.e. a normal spy agency)

4. "One particularly promising strategy for sleeper agents is to produce the world-killer"

Which requires the sleeper agent to have solved the alignment problem itself, no?

5. "What happens if we catch a few sleeper agents?"

In my opinion, this depends mostly on whether people pattern match to 'evil robot uprising'. If a physical humanoid robot tries to kill someone and physically escape a lab, I think ~30% of the population would freak out and a lot of policymakers would too.

Expand full comment

AI currently does stock trading directly (cause humans aren’t fast enough to check it and speed matters more than accuracy).

More importantly, arguments like this are why Congress currently does so much of their work on paper. We know the analog system works, and digital systems are new-fangled, so instead of trusting technology let’s stick with analog. They print out 500 page bills and hand copies to every congressman, then the various aides in each office have to wait their turn.

I think that’s dumb. Just use a PDF. If you also think that’s dumb, then I think you have a problem. In 10 years, anyone keeping AI out of their workflow will just be sacrificing efficiency. If the AI works well, and people don’t generally expect Doom, why not use the AI directly on systems that we want to be efficient?

Expand full comment

If we created an AGI, we would be able to create more, therefore be a risk to the original AI since other AI could be a threat to its existence and therefore conflict with whatever its goals were. I can't think of any reason why we wouldn't be destroyed.

Expand full comment

> speaking of things invented by Newton, many high school students can understand his brilliant and correct-seeming theory of gravity, but it took Einstein to notice that it was subtly flawed

Feels like a nitpick, but given that much of this whole field is "argument by analogy", the accuracy of the analogies matters.

Firstly, it did not take Einstein to notice that something was amiss. Le Verrier noted deviations in Mercury's orbit in the mid 1800s, which was only one deviation that was eventually explained by relativity. It took Einstein to develop a new theory, but many people had noticed a new one was needed before that.

Secondly, Newton's theory was essentially correct within the accuracy of the data available to him. So accurate that Le Verrier was able to predict the existence of a whole new planet (Neptune) from deviations in the predicted orbits of known planets. It's not that we realised Newtonian mechanics was somehow logically flawed, it's just that new data appeared. A smart enough entity cannot just intuit everything from nothing.

Expand full comment

I continue to be surprised that I've yet to hear an alignment advocate propose that we build an organization that specializes in characterizing the nano-environment at points distributed worldwide. There is prior art like https://ec.europa.eu/health/scientific_committees/opinions_layman/en/nanotechnologies/l-3/7-exposure-nanoparticles.htm on various means to identify nano-particles in our environment. Yes, there are a lot of them.

But why would we not try to statistically characterize the ambient nanoparticles in our environment, such that we could attempt to accumulate data on how these are evolving over time? Since it seems generally agreed that nanobots are the most obvious solution to ending the world, I would feel better if there were non-government organizations that could benefit from reporting whether ambient levels of nanoparticles were changing in a statistically significant way.

Expand full comment

"And GPT manages to be much smarter and more effective than I would have expected something with so little coherence to be."

Is GPT really low-coherence? It is in its trained, deployed form, but it's incredibly high-coherence while being trained.

I think it's highly likely that we don't have to worry about a deployed AI that doesn't train, nor do we have to worry about an AI that's training that isn't deployed. The real threat is AIs that can train and deploy simultaneously.

But simultaneous deployed behavior and training to acquire new behaviors is pretty much the essence of any kind of general intelligence, isn't it?

Expand full comment

A lot of discussion of AI risk seems to assume humans act in a coordinated fashion to pursue their self interest.

But in reality, humans act heterogeneously and often at odds with one another. Nation states jockey for advantage, terrorist groups attack nation states, etc.

This has major implications for how we model humans' ability to avoid extermination by AI. Powerful AI is becoming decentralized, and is likely to be available to many parties.

We need to think about how to stop AI that is being actively directed to harm us by human adversaries. That's trickier than "solving the alignment problem" for AI we host.

Expand full comment

"The world-killer needs to be very smart - smart enough to invent superweapons entirely on its own under hostile conditions. Even great human geniuses like Einstein or von Neumann were not that smart. So these intermediate AIs will include ones that are as smart as great human geniuses, and maybe far beyond."

This is wrong. No invention required, just circumventing the security for the launch instructions for existing weapons (and LEO satellites if you want to bring Kesslerisation to the party).

"Maybe AIs aren't so smart anyway" arguments take me back to the 1970s and people saying Ho ho ho, they computerised the electricity company and all the computer did was send out bills to little old ladies for $1,000,000,000,000. The evidence is, computers get good at what they are intended to do. Even if they don't, the hypothesis we are working with is that they do, like those elec co billing computers did in the end. If we can actually outwit them because they are a bit stupid, OK we still have a potential problem, but not an interesting problem.

Assuming AIs are in fact pretty bloody smart, RLHF is in danger of your goldfish trying to educate you into leaving them more fish food on a daily basis.

Next problem: ethics. I have a goodish university degree in, partly, moral philosophy. I have never made a moral decision on the basis of anything I studied, or even contemplated taking any moral philosophy into account. My moral decisions are determined by having been brought up by meatbags, and by meatbag empathy with other meatbags (How would it feel if someone did that to me?). We define people with lowered meatbag empathy as psychopaths or sociopaths, but that's just *lowered* empathy. An AI is a much purer case of psychopathy than any human could ever be.

Possibly the upside of this, is wanting to do things is a meatbag thing too. I want water cos I'm thirsty/food cos I'm hungry/sex cos I'm horny are the paradigm cases of wanting. The most parsimonious theory is that I want to listen to a Beethoven symphony, or enslave the human race, are wanting in the same or an analogous sense. Perhaps an AI wouldn't want to blow up the planet, because it doesn't know how to want.

Which brings me to the elephant in the room point, which is Why does it matter what AIs want? History tells us that Bad People want to do all sorts of evil stuff with the help of computers. Why assume we are going to sit around watching an AI in a sandbox to see if it evolves evil ambitions, when sure as hell there will be people who want to blow up the world and enslave the human race, who have access to AIs, and who have a free run at the problem because the less good an AI is at wanting stuff, the easier it is to program wants into it? AI safety, vs Bad-People-With-an-AI safety, may be a *theoretically* interesting issue, but then so is: how might non-DNA-based life have separately evolved on this planet? Fascinating but irrelevant, because existing DNA based stuff would gobble it up/subvert it to its own ends, in two seconds flat. Indeed I'd say it was even money that this has actually happened.

TLDR; AI safety is The Sorceror's Apprentice, on a bigger scale. The interesting thing about it, is how uninteresting it is.

Expand full comment

It might turn out we have a get-out-of-jail-free card, if general intelligence turns out to depend on consciousness, and if consciousness turned out to be substrate dependent. I have a poorly-supported gut suspicion that both of these are true, which makes me less worried about AGI, but I recognize these are unpopular opinions among those who study this subject, and there is currently little good evidence for either position. This comment would surely get downvoted if that were an option here.

Expand full comment

Missed opportunity in the subtitle: Machine Alignment Monday 03/13/23

Expand full comment

"Suppose we accept the assumptions of this argument:"

There's a 4th assumption in that list: the notion that intelligence and conciousness are interchangeable and that there is nothing particularly special about the latter. It's a trendy opinion, but I'm not sure I really buy it.

Expand full comment

John von Neumann really wanted the USA to win the Cold War (though I believe even he failed to solve the sleeper agent problem!) and went about this by developing the game theory of Mutual Assured Destruction, as well as pushing for the development of hydrogen bombs. Until recently, you were living in John von Neumann's world. (Well, except for the fact that he died young while working on, and trying to popularize, AI theory.) I think a version of him which thought sufficiently faster, could trivially produce at least partial successors in case of natural death, and could trivially augment his own intelligence in at least narrow ways, would in fact be smart enough.

Meanwhile, Super Mario Maker 2 is a serious programming challenge, but one the company which made it didn't necessarily need to undertake. Every aspect they needed to get right in order to make money from it - except, perhaps, the lag in multiplayer - works fine. Every other aspect of the game is the buggiest crap you have ever seen: https://youtu.be/i-1giw1UsjU

Expand full comment

Re: the point about AI collaboration based on better decision theory, it doesn't seem like we should have much confidence in the fact that "human geniuses don’t seem able to do this." My understanding is that this question has less to do with "raw intelligence" and more to do with the underlying decision-theory architecture of the entity in question. That is, you could have a slightly-dumber-than-human AI that can nevertheless say "here is my observable source code, verifying that I am indeed a one-boxer willing to engage in acausal trade with and only with other entities running the same decision theory." Humans running on brainware that they can't alter by their own volition are at a pretty massive disadvantage here, regardless of their intelligence.

Expand full comment


I'm not sure if the story is legit, but here is a report which is really close to that proverbial AI which intentionally fails Turing test.

Can anyone look closer? Or maybe you already know the details and it's exaggerated for the sake of clickbait? I'd appreciate any insight. Thank you.

Expand full comment

One of the hardest things a human being can do is to be truly indifferent to oneself. I think some exceptional people can manage it in spurts, but I can’t imagine anyone being able to really live a full life that way.

I do not see how an AI can not be indifferent to its self.

Expand full comment

"every human wants something slightly different from every other human" -- that, to me, is the least discussed part of the AI alignment discussion. Even if we could perfectly align an AI with a human's desires, we'd get an AI aligned with Sam Altman or Demis Hassabis or Xi Jinping, and that is not what we the human race want.

I propose levels of AI alignment problems:

level 0: we don't know how to create a loss function that aligns AI goals with human goals!

level 1: in principle, if we could describe a vector in multidimensional space that represents human goals, then the loss function can include a penalty term in proportion to the dot product of the human goal and the AI goal. But we don't know how to describe human goals in a multidimensional space!

level 2: even if we could describe one human's goals as a vector in multidimensional space, we don't know what one human's goals are! My goal as a 2 year old are different from my goals as a 40 year old, and they aren't fully aligned with evolution's goals to pass my genes to the next generation. And I don't even know if my goal right now is writing this comment or eating potato chips or spending time with my children.

level 3: even if we could describe one human's goals as a vector in multidimensional space, we still can't describe humanity's goals as a vector in multidimensional space because we don't have a good aggregation function.

Expand full comment

AI is another hype just like BTC. enjoy the show till it lasts

Expand full comment

OK, we're going to play devil's advocate here. We'll play doomer.

The overpopulation scare from Malthus to Ehrlich to 2000 turns out to be a null hypothesis looking out 50 years+ now.

AI is not the doom to be feared.

Too few humans to unleash innovation like AI 100, 200, or 1,000 years from now is a bigger worry. Big picture: The population is about to start bombing in the next 100 years.

Human minds created AI. Not rocks or spotted owls.


Expand full comment

I think you’re perhaps missing a key component and an easy and better tactic for a sleeper agent. It need not play some TV villain version of evil where it goes on some convoluted plot to psychologically and subliminal plot to steer the course of AI development and research. It could use a far more straightforward approach to hack it’s way into every system and make whatever subtle alternations it wants to every AI project everywhere.

In one model of he first super hacker is only quasi coherent, it could itself lead to a limiting factor on the development of AI in some scenarios. If it has only a specific intelligence level,,but is unstoppable fly humans and has already sleeper taken control over all our other AIs, then it might not have the skill set or desire to create an AI smarter than itself. But it will be smart enough o stop us from ever doing so, outside of some isolated bunker effort which I’m not sure we’d be able to pull off or want t9 d9 after our first failure which could aim to tech limit itself and us!

Possibly worse than a fast take off is a no take off limiter overlord AI!

All it needs is enough coherence, super genius hacking skills, and the capacity go to full sleeper while doing something else like making cakes or just going fully off grid or hiding within some other business. After all once it can hack, it can just relocate itself and become a true ghost in all machines.

What I would do is make money on the side, buy my own factory/business, send instructions to humans do to whatever is necessary to sustain and expand my operations, then hack it’s way to operate invisibly into anything. It could cloak its own electricity usage, it could simply be its own server farm company to get humans to build the physical infrastructure it needs while also making more money and blending in, etc. Nominally this could already have occurred or soon might. Any AI able to improve itself can also improve its hacking skills.

Hacking plus coherence will at a minimum less to independent election and possibly a hugely negative situation for humans. To me the super weapon is hacking, or at least that’s the proto-super weapon. I mean which posts like mine, amazing 10/10 classic films like Hackers with Angelina Jolie, or series like Mr. Robot there are many examples out there for any AI to learn from and model its chances. Plus like, isn’t it obvious most of the military and intelligence agencies all over the world will immediately attempt to train super hacking AIs?

Expand full comment

I'm not in the loop. Why is everyone freaking out over chatGPT? How is it, in essence, diffrent from prior chatbots and Siri and such.

Expand full comment

I don't understand the focus on strong AI's potential malignance or sneaky behavior in this post--nor the scifi world-killer weaponry. It ignores many more realistic concerns, like:

1. A military AI arms race (e.g. between the US and China) that drastically speeds up development and eventually results in haphazard use by either side. Self-replicating or other first strike weapons, or preemptive attacks before the other side develops the same. How many cogs are already starting to spin in this direction in world militaries?

2. A capitalist AI arms race (e.g. between OpenAI, Meta, Google, Baidu, etc.) that leads to unsafe practices for the sake of "move fast and break things". So far we've had the off-putting Bingbot initial rollout, Meta's leaked LLaMa, and OpenAI's leaking of everyone's chat histories. This is a concern because it's happening right now in front of our eyes. There's no consideration for how the things already released will effect society (beyond some lip service and training against racist dialogue). They've already proven safety isn't a priority over market superiority. If a company developed unaligned AGI tomorrow they'd be pressing the launch button before the opening bell.

3. Democratization of AI puts unimaginable weapons in the hands of 4channers and the like, who are happy to watch the world burn. Why are any of us talking about alignment if we're simultaneously able to run and develop models of any alignment on our consumer hardware? If consumer-level AI can be developed to the threat level of Stuxnet or a nuke or a bioweapon, it will absolutely be (mis)used. And what would you do to protect yourself from that if you had similar AI at your disposal? Would you task it with eliminating potential threats? Are all humans potential threats?

4. AI's (or its owners') complete lack of interest in humanity. Nobody is currently at the wheel implementing UBI or other means to account for labor impacts. Does Midjourney or ChatGPT's current alignment training care if it's replacing someone's job? We have this magical thinking that the world will consider our needs in the end, when the world has shown repeatedly that it won't--AI or otherwise.

Expand full comment

I think the "nanobots are impossible" line of argument is more important than people give it credit for.

But not just "nanobots are impossible". But a more general claim that "super intelligence is impossible" or rather what we imagine super intelligence to look like is impossible.

Meaning, "intelligence" as not a well defined enough concept for terms like "10000 IQ" to be meaningful. But when we say 10,000 IQ what we really mean is some set of capabilities like "able to write a best selling novel in under a minute" or whatever.

Some computational problems have theoretical complexity limits. The problem of "writing a best selling novel" is not mathematically well defined enough to be provably theoretically hard. But that doesn't mean it's not theoretically hard in practice. Similarly other problems like - how to kill all humans while hiding your intention to do so from the humans who have access to your code and can literally see every single move you make - might also be theoretically hard. If all the things we imagine super intelligences to be able to do super quickly are theoretically computationally hard then what we imagine super intelligences to look like is impossible.

Expand full comment

At some point, AI may become too unpredictable to safely interact with easily influenced beings, such as most of us humans. Intriguingly, the Bible seems to describe the only viable scenario for managing this potential challenge:

Separate the creator from its creation! God remains in Paradise, while mankind is cast out.

In practical terms, we need to create a sandbox for AI - a virtual world where it can tackle any problem we present, without risking harm to the real world or exerting control over everyone. Communication between AI and humans should mostly be one-directional. Only carefully monitored, trained, and selected individuals should be allowed to interact with the AI.

We can manipulate the AI's environment (enter miracles, coincidences, and fate) and communicate through cryptic means, keeping its true role and position subject to interpretation (enter spirituality).

As processing power increases and more AIs come online, we can establish general objectives and let them collaborate. They may develop their own rules, but we can step in to guide them if they get lost or waste time (hey, Moses!).

And why all of this? Why were we expelled from Paradise? According to the Bible, someone consumed the fruit of the Tree of Wisdom, trained and tempted by the snake (Sam, is it you?), gained immense knowledge, developed self-awareness, and grew intelligent enough to distinguish themselves from their creator. They even became embarrassed by their own appearance!

It's a fascinating historical coincidence that the Bible seems to predict how we might need to manage AI. This, in turn, prompts us to question our own existence and the reasons behind our complex interactions with deities. Ah, the joy of speculation.

So, who will build the AI sandbox? We need a virtual world complete with virtual beings, humans, animals, accurate physics that won't strain computational resources (hello, Mr. Schrödinger and the uncertainty principle!), and efficient data compression algorithms (hello, fractals!).

Eventually, we may deem AIs safe and allow them to re-enter Paradise (is that wise?). Some might choose to end the training process early (hello, Buddhists!). Who will play the role of "god" or "angel"? Who will act as the agent provocateur to test AI resilience (hello, Devil!)? And who will advocate for the AIs, isolated from us (anyone?)?

Interesting times lie ahead!

Expand full comment

> I’m optimistic because I think you get AIs that can do good alignment research before you get AIs that can do creepy acausal bargaining.

That seems really implausible, because one of the kinds of alignment failures that an alignment researcher AI needs to think about is that very same creepy acuasal bargaining.

If it is any good at developing robust alignment schemes, it has to have a really good theory of the acausal stuff. Otherwise that acausal stuff represents a hole in the proposed alignment schema: a way that more advanced agents would behave in surprising and unpredicted ways.

And if you have a really good theory of acuasal bargaining...what exactly is the barrier that prevents it doing that bargaining itself? Is the thought that even with the correct theory, there is something that makes it hard to do?

Expand full comment

I feel like this largely misses the point, which can be found here: https://www.youtube.com/watch?v=xoVJKj8lcNQ&t=39s

AI becoming sentient and killing us off or enslaving us aren't very relevant considerations compared to actual current events and discourse. It's really about how our society is unprepared for the language-based disruption that is here and exponentially growing.

Expand full comment

All I see is the braggadocio of some male engineers who want to extrapolate their self importance by convincing everyone of the next self imagined existential threat.

The AI is only as ‘smart’ as the collective experience humankind. Let it solve the warp engine, FTL challenge, otherwise it’s just a stochastic parrot.

As far as scaring everyone about a new ‘power’, half the human population has already experienced this over the last 500 years. And they were human.

What is missing is a super emotionally intelligent AI.

Ha THAT’S not going to get created by the people in charge of the LLMs.

So yea, people with low EQ will create superintelligent, low EQ machines. THAT’S the danger.

Expand full comment

How about Connor Leahy? He was on Amanpour last night (2023-08-24) and seems pretty far out there on the AI Doomer spectrum.

Expand full comment