617 Comments
Comment deleted
Expand full comment

How do bacteria colonies store energy? Serious question

Expand full comment

Yeah this all seems like a discussion about whether Dr. Frankenstein’s new effort will act like a monster or not. We’re not even close to understanding nature let alone creating a new sentient life form. Imo.

Expand full comment

Why can't the nanobots murder any bacteria that threaten them?

Expand full comment

Bacteria have spent a billion years evolving ways to not get murdered by antibiotics, white blood cells, and all the other things in the world that don't like bacteria growing inside them. What weapon is a nanobot going to use that nature hasn't used already?

Expand full comment

Human-sized animals have spent millions of years evolving ways to not get murdered by each other in the nature, and yet humans easily crush them with weapons that nature hasn't used previously. I'm sure that the space of innovation on the micro level is similarly large enough to surprise nature.

Expand full comment
Comment deleted
Expand full comment

Humans were able to pull themselves by the bootstraps from, while not T-Rex, but at least saber-tooth tiger-infested environment, with only sticks and stones and slowly accumulating cultural knowledge, not exactly a trivial challenge. Whereas the nanobot designer would have access to the entire knowledge base of the post-industrial civilization, and the ability to develop new toxins and toxin-resistances instantaneously compared to evolutionary timescales, and likewise evasion strategies for larger threats. I agree that this is not to be hand-waved away, but it seems to me that if you're able to develop an autonomous quasi-life-form from scratch, this is a minor part of the challenge, not an insurmountable in principle obstacle.

Expand full comment
Comment deleted
Expand full comment

I think that the space of innovation on the micro level is not similarly large. At larger sizes more complexity is going to be possible. Nature hadn't previously come up with the weapons that humans created because it operates via biological mutations reliant on chemistry rather than building tools out of (relatively) big inanimate objects.

Expand full comment
Mar 28, 2023·edited Mar 28, 2023

> At larger sizes more complexity is going to be possible

That may be true, but in practice the things we use to slaughter wildlife, from guns to bulldozers, are in most senses less complex than chloroplasts, ribosomes and all the weird crap unicellular organisms use to kill each other.

(The phones of the bulldozers’ drivers are more complex, but those are used mostly for fun, not for slaughtering animals.)

It does not matter if the space of innovation is “similarly large”, it just matters if it’s large *enough.*

Expand full comment

Humans have that advantage because they can access more energy sources and more materials than biological processes can make use of - you can't forge iron with just the energy in your muscles, you need to burn fuel to make it happen.

But nanobots don't have access to any of that, because they need to carry all their tools with them. They can't be made of exotic materials because they have to be able to self-replicate in an environment that's mostly made of normal CHON stuff. They can't take advantage of more energetic reactions because they don't have anywhere to get energy from besides chemistry. They're pretty much working with the same toolbox bacteria are.

Or to put it another way, a human does not kill a tiger, a human plus a fire, forge, and iron kills a tiger. And while humans can make new humans, each new human isn't born with its own fire and forge. Our mastery of the environment depends on the existence of external resources, not just our own intelligence.

Sure, the nanobots can work together as a more complex organism, like humans do, but at that point, it's more "designer organism" than "nanobot swarm." This rules out the "AI mixes a few chemicals together, self-replicating bots spread through the air, everyone on Earth dies instantly" scenario - there's now a minimum size of colony required before your bots can do something interesting.

Expand full comment

Why assume they store energy? The bacterial analogy is probably the wrong one here. I think an analogy to a virus is more apt. If a nanobot encounters an energy storage device (eukaryotic, prokaryotic, archaic, fungal), it hijacks the energy storage to do its thing.

My biggest skepticism is that viruses would have loved to do this for the past hundred million years or more, but haven't figured it out yet. It's a program that would be too complex to implement in any known system, so it kind of sounds like magical thinking.

Expand full comment
Comment deleted
Expand full comment

I think it's hard to argue that the viral model is a failed/unsuccessful model. Take a random sample of seawater and sequence what you find there, you'll find evidence of countless viruses floating around.

But your concern was whether they make it into the next host. If you make 100 billion copies of yourself, it's not a big deal that 99% don't make it to the next host. Especially if you're not actively dangerous - possibly even helpful - to the host, it's easier to evade the host's immune system (though EMPHATICALLLY not a trivial endeavor).

The biggest problem with the viral model is the restriction of the quantity of genetic information that can be produced through this method as opposed to something like a prokaryotic system that can carry orders of magnitude more genetic material (and could therefore support a sufficiently complex system).

> It's going to be very difficult to find a strategy that life hasn't already tried.

I strongly agree with this statement. I'm no proponent of the 'gray goo' hypothesis for nanobots, which I think is not grounded in actual biology or biochemistry.

Expand full comment

> My biggest skepticism is that viruses would have loved to do this for the past hundred million years or more

All large-ish animals would have loved to invent/evolve machine guns, for similar lengths of time, but they didn’t even get to decent spears before humans.

Expand full comment

Invasive species exist—including microbial species—therefore there are viable ecological niches that microbes are not occupying right now. I feel like that cuts against what you’re saying here, to some extent. Right? Separately, I have an intuition that life on earth is in some ways stuck in a remarkably narrow design space, e.g. see https://twitter.com/steve47285/status/1632866623294939136 .

Expand full comment
deletedMar 14, 2023·edited Mar 14, 2023
Comment deleted
Expand full comment

Yeah, I think the relevant questions are (1) is there a set of nanotech things that can cause human extinction? (2) is there a set of nanotech things that can stably support a world with AGIs, particularly including manufacturing (or self-assembling into) machines that can do lots of calculations, i.e. machines in the same category as human brains and silicon chips? Both answers can be “yes” without wiping out every last microbe. I join you in doubting that every last microbe will be wiped out.

I feel pretty strongly that the answer to (1) is “yes” thanks to pandemics and crop diseases and so on. I’m more unsure on (2). Not an expert. If I had to guess, I think I would guess that if AGIs wipe out humans, they would still be running on silicon chips when they do, and will continue running on silicon chips for at least a while afterwards.

Expand full comment

>I'm objecting to the blithe assumption that the instant we invent a nanobot, it's grey goo immediately

Well, not "we", it's a "superintelligence" that does it. I think that once you accept the proposition that it's possible to covertly invent and build a nanobot unbeknownst to the rest of civilization, it's not that big of a leap to append an ability to outpace evolutionary adaptations in microbe warfare.

Expand full comment
Comment deleted
Expand full comment

That's the essence of a singularity, though, isn't it?

Travel back to ancient Sparta with modern technology and a handful of people with sufficient access to modern technology would make world domination trivial. Even just 200 years ago, there's a non-trivial difference with what modern technology can accomplish, whether militarily or agriculturally. And the more modern the technology, the greater a component intellectual inputs have on the force multiplier effect. In what way are the last 200-500 years not the same as a singularity as to the gulf created between the people involved?

You're probably right that arguments about nanobots are bunk. What we imagine is probably not going to happen any more than hoverboards did. Instead, what we could never imagine will become the Most Important Thing. Historically, this is true of human inventions like the printing press, electricity, transistors, and the internet.

We should expect the same into the future as well. IF we are able to use AI to generate (or help generate) an acceleration in transformative technology development, we cannot imagine that future world. This is true whether or not the AI becomes generally sentient.

Expand full comment
Comment deleted
Expand full comment

>"Travel back to ancient Sparta with modern technology and a handful of people with sufficient access to modern technology would make world domination trivial."

I disagree. Unless sufficient access means transporting an entire industrial base (which would require far more than a handful of people to operate) then a handful of people with modern technology would be able to dominate locally. Even a man with a machinegun can have their throat slit while they sleep in their palace, and if you pit 50 people with machineguns and bombs against the roman empire then the roman empire wins. They put up with 67,000 casualties against Hannibal, they can take the losses.

Expand full comment

Was it irrational of Tsiolkovsky to imagine spaceships driven by rocket propulsion in advance of it having ever been demonstrated to be possible? You can of course argue (and I'd agree) that our theories of either superintelligence or grey goo-grade nanotechnology are even less grounded, but I don't think that it's possible to make a principled distinction. Neither seem to require breaking any known conservation laws, unlike perpetual motion machines.

Expand full comment
Comment deleted
Expand full comment

It's easy to cherry pick a single successful prediction from the mountain of failed predictions by futurists. Very often, following the predictions of futurists would have led to very bad policy implications by people planning for the future.

Is there a justification for thinking that rockets are similar enough to nanotechnology that they should be considered analogous? I'm not convinced that the nanobots that have been proposed are remotely within the realm of possibility.

Expand full comment

Sure, but then human advancements don't come solely through reasoning. We have to test our ideas against the real world by experimentation. There's a lot of trial and error to even the best hypotheses, including probably more error than success. The assumption that even a maximally intelligent agent could bypass this step requires more support than just saying it's really smart. You don't know what you can't possibly know yet.

Expand full comment

While it is not exactly related to your argument, another case against doomerism is that AI can't be the Great Filter, because a species by hostile AI would most likely still act on a cosmic scale, mining asteroids and building Dyson Spheres and such. The Fermi Paradox applies.

Expand full comment
author

Agree with this (see https://slatestarcodex.com/2014/05/28/dont-fear-the-filter/ ), I think most rationalists have settled on grabby aliens (see grabbyaliens.com) as the most likely explanation.

Expand full comment

Nit, medium confidence: "grabby aliens" isn't an explanation of the Fermi paradox. People often say "grabby aliens" to mean the related proposition "aliens expand at almost the speed of light, so you don't see them in advance." But this proposition doesn't necessarily render the anthropic observation unsurprising; for example, (I'm confused about anthropics and related empirical evidence but) I'm surprised that we exist so late.

(Maybe "the Fermi paradox" is supposed to mean "the fact that we don't see aliens" as opposed to the entire anthropic observation, but the other evidence, like the age of the universe, is relevant to great filter stuff.)

Expand full comment

Just to be more concrete: the grabby-ness of an alien civilization has no bearing on how long it takes us to detect them, because we detect both grabby and non-grabby aliens whenever their light cone intersects us.

Therefore the grabby aliens hypothesis has no explanatory power w/r/t the Fermi Paradox, except in some anthropic principle-style arguments.

Expand full comment

The anthropic argument solves perfectly though right? If you could see their structures they would have already grabbed your solar system, i.e., grabby aliens means no reference class of observer seeing such evidence

Expand full comment

This discussion is complex so let me try to clarify.

One of the parameters of our universe is <the likelihood of enduring intelligent life arising on any particular planet>. Call this LI, likelihood of intelligence.

The Fermi Paradox implies that for whatever reason LI is quite small in our universe — otherwise we'd see lots of aliens.

Let's call the combination of the Anthropic Principle plus speed-of-light Grabby Aliens "AP+GA."

When you talk about AP+GA "explaining" the Paradox, I take you to mean that it furnishes an *alternative* explanation: a way that LI could be *large* and yet we could still appear to be alone in the universe.

Under this meaning of “explain,” I would say that AP+GA *doesn’t* explain the Fermi paradox.

If you assume AP+GA to be true, its only guarantee is that we are the first civilization to evolve in our light cone. Therefore you'd expect us to be a "typical" civilization among all possible civilizations which are the first to evolve in their light cones.

If LI is large, then a typical civilization of this variety should evolve when its light cone is still small. Our light cone is quite large. Therefore, LI is still small even under AP+GA — QED.

However, one way this argument could fail is if it’s impossible for civilizations to evolve when their light cones are still small — for instance if galaxies don’t become hospitable to life until they are about the current age of the Milky Way. Then AP+GA could be consistent with high LI.

Expand full comment

Wouldn't the anthropic argument suggest that we should be surprised to find that we are not ourselves grabby aliens?

Expand full comment

There's no reason to think we're not! Or, rather, the misaligned AI that will destroy us will become one such grabby alien, making happy-looking molecular squiggles out of all the available matter in the lightcone.

Expand full comment

Wouldn’t you see some non light speed effects from where two grabby civilizations intersected? I always just figured there have to be some convergent evolutionary pressures that cause all species to do stuff that doesn’t look different than natural phenomena.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Are novae actually naturally caused phenomena?

For astronomy, we have a lot of hypotheses on anything outside our solar system (and a whole lot inside as well).

Expand full comment

I have weird thoughts like this too. Is there anything out there that seems natural that isn’t? I never get very far with how to falsify it though.

Expand full comment

I think astronomers have papered this over by assuming that everything they see is natural unless it can't be.

And when something can't fit existing theory they think really hard about how to change the theory so that it a) fits and b) is natural in origin.

Expand full comment

For fun, we could expand this one step further to include Earth. What if aliens directly filled the gaps in abiogenesis, cellular development, and the development of intelligent life? What if they created people, gave them Commandments written in stone, and told those intelligent people to worship them through a communication medium unknown to the humans but readable by the advanced aliens (prayer)?

What if, from a human's perspective, sufficiently advanced technology is indistinguishable from a miracle? The grabby scenario assumes aliens never got to Earth, because if they did we'd definitely see them and know they were aliens.

Expand full comment

Yeah that’s where I can’t figure out how to falsify because unless you know what the true model is how do you know if what you’re seeing isn’t just normal?

Expand full comment

If you ask a creationist, this is a problem with biologists too. Evolution is obviously false if one keeps looking for natural explanations rather than divine influence.

Expand full comment

Of course that's what astronomers assume. It's the only logical assumption, because almost every single object and phenomenon we've seen so far has a natural explanation, and the ones that don't have one also don't have a clear artificial explanation. Your prior on a newly discovered phenomenon being natural ought to be >99.999%.

That said, there are SETI researchers who look for technosignatures: https://en.wikipedia.org/wiki/Technosignature

Expand full comment

I don’t think you’d necessarily see any of this. Any area that can see an intersection zone is also inside (or very near) the normal expansion bubble of at least one of the grabby civilizations, as long as border conflicts don’t slow down expansion in other directions.

Expand full comment

I read the Grabby Aliens stuff and while I don’t know that I believe they are expanding at light speed in every direction even if they were you would think the competing intersection would be big enough to see. The fact that it seems unlikely you’d have multiple civilizations at the same time in the same galaxy going through the expansion at the same point to me makes it likely that intersection would be enormous and hard to hide because you’d have to cover a huge area of space before that scenario arose.

Expand full comment

If they were expanding at light speed, or even near light speed, you'd see huge effects. Space isn't THAT empty, and the collisions with the dust would be quite noticeable. I think anything over 0.5C is extremely implausible, and that 0.2C is much more believable...and still too fast to be plausible.

Expand full comment

My hope is we get big generation ships with stable ecosystems shooting out in every direction. One of them will survive and adapt and they’ll be far enough apart that they’ll be safe from the bad onesZ

Expand full comment

It’s believable that the interaction is highly visible. But the catch is that if the civilizations are expanding rapidly in non-border areas (say, .9c) then the area of space that has had light from the border conflict reach it but has not been enveloped by one or more of the civilizations is growing at only .1c (it’s expanding at c, but the occupied area is right behind it at .9c). This results in most of the volume of space that has not been conquered not being able to see anything either, due to the light of the conflict being barely faster than the expansion.

Expand full comment

This may be to me refusing to surrender the point so I agree with you depending on where you are oriented to that conflict. If you’re on some vector where they had to slow down to fight each other on the way to then you should see it.

Expand full comment
founding

Grabby aliens are not an explanation for the Fermi Paradox - instead, they make the Fermi Paradox worse. If all aliens were non-grabby, we would only need to work about not seeing aliens within our galactic neighbourhood; if some aliens are grabby, the puzzle is why we don't see them originating from any galaxy nearby.

Now, the standard anthropic answer goes along the lines of "if there were grabby aliens anywhere nearby, we wouldn't see them because we'd be dead (or wouldn't have evolved) because they would have expanded to Earth". But, as I argued here, https://www.lesswrong.com/posts/3kwwDieE9SmFoXz9F/non-poisonous-cake-anthropic-updates-are-normal , anthropic updates are basically normal Bayesian updates (there's a lot more on this subject if you want to dive into it). So the fact that we survive without seeing grabby aliens is evidence against there being grabby aliens (just like the fact that if you survive playing russian roulette with an unknown number of bullets, this is evidence that the number of bullets was lower).

Conversely, our existence provides some evidence that life is probable (while our continued survival provides evidence alien life is improbable). The theory best compatible with both of these is that we evolved unusually early for a living species - maybe several cycles of pan-spermia are needed before we arrived. https://www.lesswrong.com/posts/wgHbNZHsqfiXiqofd/anthropics-and-fermi-grabby-visible-zoo-keeping-and-early

Expand full comment

It's easy to resolve the Fermi paradox if one knows some college level probability. The Drake equation doesn't work the way it looks. I'll recommend "Dissolving the Fermi Paradox". - https://arxiv.org/abs/1806.02404

Expand full comment

I.e. The Fermi paradox can be rephrased "Why haven't we seen evidence of this thing we just assumed for no reason should be common? There must be something filtering it out!"

Expand full comment

That's a good restatement. Sometimes, it's because we aren't looking hard enough. For a long time, people were convinced that light had to be made of particles because if it had been made of waves, there would be a bright spot on the dark side of an illuminated disk. That would be a result of the wave troughs and crests lining up just right. No one ever saw the spot until Arago managed to do the experiment right. That's called The Spot of Arago . Arago had a rather wild career for a physicist and mathematician. I think he was a soldier in Morocco for a while.

Expand full comment

I think we evolved because we're stuck out here in the empty quarter far away from all the exciting action taking place near the center of our galaxy. When I look at the color image depicting our galaxy, I look at the bright center, and see galaxies interacting, planets being torn from their systems, sent careening through other solar systems, planet killing asteroids occurring far too frequently to enable higher forms of life to evolve.

Bacterial life has existed on Earth for about 2.5 billion years, shelly life only 541 million years, intelligent life, less than 1 million years. There's probably a good mathematical model (Prime Sequence?) which models the average regularity of planetary asteroid disasters, but I'm sure this greatly limits the ability of intelligent life developing in the universe.

Say life on earth is 500 million years old. We've had 5 major extinction events, or one every 100 million years. Say the model is 100 million times one over the prime number of the step between Earth and Galactic Center. The distance from Earth to Galactic Center is 26,000 light years. In this model, lets say there are 26 steps between Earth and Galactic Center. If Earth experiences an external extinction level event on average every 100 million years, according to the prime number model, planets 1,000 light years closer experience extinction level events—100 million times 1/X—or every 50 million years, planets 2,000 light years closer to Galactic Center experience this every 33 million years. When you're getting to 1,000 light years from the Galactic Center, you're getting 100 * 1/101 million years between extinction level events ... one event every million years. There becomes a limit where intelligent life can't evolve.

Expand full comment

Neat, that was before I found SSC.

>What about alien exterminators who are okay with weak civilizations, but kill them when they show the first sign of becoming a threat (like inventing fusion power or leaving their home solar system)? Again, you are underestimating billion-year-old universe-spanning superintelligences. Don’t flatter yourself here. You cannot threaten them.

>What about alien exterminators who are okay with weak civilizations, but destroy strong civilizations not because they feel threatened, but just for aesthetic reasons?

This is great, I could imagine alien overlords which are okay with weak civilizations but offer strong civilizations a faustian bargain. Return to being weak or we glass you. Kinda like feudal worlds in warhammer 40k. If they keep low power civs around they would really have no reason to exterminate high power civs that are yet far beneath their own power, as they could just return them to low power forcefully. Kinda like the Federation in Star Trek, but a tiny bit more jealous and covetous of their power.

Expand full comment

In the good old days, humanity would find a way to appear to comply, but actually build superior technology.

Expand full comment

I think the grabby aliens explanation is only part of it. The relevant thing is that since alien civs expand at near light speed you don't see them coming. This means we can't update much on not seeing mega-structures for the specifics of what we expect to happen on Earth.

So not seeing grabby aliens doesn't factor into AI doom much one way or the other.

Expand full comment

FWIW, it's my assumption that FTL actually IS impossible, and when you've spent generations living in space, the idea of living on a planet looses it's attraction.

OTOH, my explanation doesn't handle the lack of detectable Dyson Spheres very well.

Another explanation is that nobody want's to move into an area without fast internet connections. I found this difficult to swallow, but it seems to be a real thing even just on one planet.

Expand full comment

But even if you don't want to move to the cosmic boondocks, would you really leave that space empty, or would you want to reshape it in some way?

A lot of humans think it's good to create new happy lives, fast internet or slow; if some aliens think similarly, they're the ones we'll see coming toward us.

Expand full comment

That didn’t explain much. Is the sphere the universe, or the galaxy.

Expand full comment

It's more of a curiosity stopper than an explanation.

Expand full comment

I don’t know if I count as a “rationalist”, but I strongly dislike the grabby aliens model for reasons here— https://www.lesswrong.com/posts/RrG8F9SsfpEk9P8yi/robin-hanson-s-grabby-aliens-model-explained-part-1?commentId=wNSJeZtCKhrpvAv7c

Expand full comment

Due to instrumental convergence, most AIs, whether they want paperclips or sugar crystals almost like cake but not quite, would also create Dyson Spheres and such. But then, most biological species probably would as well: even if most members of the species were anti-growth and would not find the need to construct Ringworlds and Dyson spheres, they'd also have to be committed, millenia after millenia, to shoot down all space-quiverfulls trying to escape their homeworld and make a positive-growth home in the stars. If their vigil to stop would-be-expansionist subset of their species, no matter how niche such beliefs were in that alien society, were to stop even briefly, once, the subset of aliens with expansionist ideology would go out and thanks to laws of exponential growth, would have to start building ringworlds and dyson spheres in time that in cosmological time scales is merely an eyeblink.

And there only has to have been one species with one small subset of expansion-oriented individuals for one brief moment, or we would be seeing galactic megastructures. My read is that there are no other civilizations in any appreciable distance (Laniakea Supercluster or thereabouts) advanced enough to build superintelligent AIs, or to reach the stars. Else we would see these megastructures.

Expand full comment

Another possibility would be that "civilization" (that is something capable of launching interstellar packages that successfully reproduce) is not something that can be sustained for long. You could end up with a trap of some sort.

Admittedly the balancing act between "they all die from exhaustion" and "they're already here" is pretty fine if you take a simple exponential as the growth curve.

If you look at the disease spread models in the late unpleasantness, they used a simple exponential - and they were completely wrong.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

An Alien AI would be very aware of the potential for there to be alien AIs that could derail their mission, so it would take great care not to be observed. Remember time would not matter to them, so they would have incredible patience. A few trillions of years could pass while they waited to do whatever they needed to do.

Expand full comment

What if failure to act in the face of the uncertainty allowed another AI to gain strategic advantage? I don’t think knowing everything knowable overlaps perfectly with knowing everything. You can still lose through pure dumb luck.

Expand full comment

I I was very convinced by the Sandberg et al. 2018 paper "Dissolving the Fermi paradox" , which postulates that unfortunately there is no fun paradox: if you compute the distribution of the expected number of alien civilizations instead of just the mean, the probability that there are zero such civilizations is in fact large.

https://arxiv.org/abs/1806.02404

Expand full comment

Seconded; as far as I'm concerned their analysis makes clear that there's really nothing to explain.

Expand full comment

> Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own.

Something I think is an underrated risk is an AI that's smart in some ways but dumb in others - e.g. smart enough to easily make a supervirus, not strategic enough to realize that once humans are gone it wouldn't be able to survive.

It's a stereotype of autistic humans that they're like this. If we think of AI as a very inhuman mind that's similar to someone off the chart on the autism spectrum, it might be something that would do this.

Expand full comment

To develop superweapons such an AI would have to understand the role of humans in the world well enough to hide its activities from them, so I don't think that it's plausible that it wouldn't realize its dependence on them. Of course, it might still be omnicidal-suicidal, which seems to me a much likelier scenario.

Expand full comment

I think the idea that the supervirus would kill the AI off is wrong. The AI can easily arrange to have a few spot robots and servers turned on in some solar powered building somewhere. After that, there is no competition. Nothing whatsoever to stop the AI sending the spots wandering into peoples houses with screwdrivers and disassembling the dishwasher for components. Humans all dying would leave big piles of all sorts of spare parts in ready to use working order. The AI has plenty of time.

Expand full comment

It's not *that* easy, the AI needs to keep its datacenter powered. Unless you're postulating that the AI can vastly decrease its compute needs while maintaining enough capabilities to manage its servitor robots, which seems unlikely.

Expand full comment

Well, we've been seeing open-sourced models that were originally intended to run on huge computing clusters shrunk to a level that they run on consumer hardware (Stable Diffusion in particular comes to mind) by mere humans in short amounts of time. A self-modifying AGI would most likely be able to very significantly optimize their own performance.

Expand full comment

> Holden Karnofsky, on a somewhat related question, gives 50%

That's a pretty different question: roughly whether _conditional on PASTA being developed this century, humans would no longer be the main force in world events_.

Expand full comment
author
Mar 14, 2023·edited Mar 14, 2023Author

Yeah but I think Holden probably thinks there's a really high chance of PASTA this century so I'm comfortable rounding it off to "humans no longer main force in world events".

(justification: he commissioned and seems to mostly approve of Cotra 2020, which said 80% chance by 2100, and since then Cotra's updated sooner, so presumably she thinks >80% chance now, and I would expect Holden to agree)

Expand full comment

I think it's important to differentiate between "humans no longer main force in world events" and "everyone dies." Everyone's life could be great in the former case even if we don't get a grand future from a total utilitarian perspective.

Expand full comment
founding

"humans no longer main force" was actually meant to be even more inclusive than that. Just commented on this here: https://astralcodexten.substack.com/p/why-i-am-not-as-much-of-a-doomer/comment/13602157

(Found this thread while searching for my other comment)

Expand full comment

What is PASTA?

Expand full comment

You have the world's knowledge at your fingertips. Open the search engine of your choice and enter "Holden Karnofsky PASTA". Not more difficult that typing a comment here and you'll receive your answer much quicker.

Expand full comment

But if that search engine incorporates AI technology, can you really trust the results?

(Just joking, mostly. Probably not a problem yet.)

Expand full comment

Including a definition in the thread facilitates discussion.

Expand full comment

Good point - so Cody Manning should have looked up the definition and then posted it here for others. :)

Expand full comment

I am not sure it actually is quicker. None of the top results includes a definition in the summary text, so you are left selecting various links and then searching them for a reference, and hoping it includes a definition.

From a purely selfish perspective it is likely easier and less time to ask and then be notified when someone answers. Plus you can ask follow-ups then. Additionally, people come to a blog comment section because they want to interact with people, if they just wanted the most efficient way to learn new info they might be better off surfing Wikipedia or an old fashion Encyclopedia.

Expand full comment
Mar 20, 2023·edited Mar 20, 2023

Good point - if someone is not very good at using search engines, it really might be easier to ask. I don't think that it's quicker, but easier for sure.

Expand full comment

Process for Automating Scientific and Technological Advancement

Expand full comment

>”in the sense of the general intelligence which differentiates Man from the apes”

Maybe preaching to choir here, but it just doesn’t seem like there is anything like that: the most intelligent apes seem quite a lot like really stupid humans, but with some specific domains of intelligence either boosted (quick pattern recognition) or neutered (speech).

Expand full comment

Language seems to be the biggest difference, which allowed us to build complex societies and pass knowledge between individuals and generations. But AI is already REALLY GOOD at language, so it can't be a source of discontinuous intelligence increase.

Expand full comment

The trouble is that AI is actually NOT good at language, but people can think it is. Consider the Chinese Room (https://en.wikipedia.org/wiki/Chinese_room). It is easy to make ChatGPT respond "reasonably" to what humans consider nonsense, but is much harder to make it spout nonsense itself.

We don't yet, as far as I know, have any way of producing actual intelligence of any level even in principle. I think the missing ingredient is value judgements, which we can program, but a computer cannot invent on its own. If it could, it might then make its own determinations on how to align.

Expand full comment

It is good at language in the sense that it has a very good internal model of how language tokens should be put together.

Expand full comment

On the contrary, it is not good at language in the way a person is. Witness how poorly it puns: https://futurism.com/the-byte/openai-chatgpt-pun-competition

It is "good" at language in the same way calculator is good at math, or a building at being structurally sound. It has no conception of anything at all, let alone of language.

But we have developed an algorithm for analyzing data and picking out patterns. One such pattern it can pick up from this is, as you say, how language tokens are put together. But it won't be able to add to language, such as coin new words for concepts or use existing words to mean something new. Unless someone programs specifically to do such a thing, at least, and that still would not be dealing with language as people do.

Expand full comment

I didn't claim that it was good at language in the way that a person was.

Expand full comment

Me: "Please coin a word for the particular sound that rain makes on a tin roof at night."

Bing: "How about “Tinfall”? It’s a combination of “tin” and “rainfall” and it captures the essence of the sound that rain makes on a tin roof at night. What do you think?"

Me: "Actually, can you give me a sentence that uses the word 'tinfall' to mean something different than that?"

Bing: "Sure! Here’s a sentence that uses “tinfall” to mean something different: “After years of saving up, John finally had enough money for a tinfall - a small fortune in coins made mostly of tin.”"

...

I think at a certain point "AI is actually NOT good at language" starts to seem pretty hollow.

Expand full comment

I should have included compound words as something it could coin. I don't think Bing actually groks the sound of rain falling on a tin roof.

It is certainly true that interpreting whether something is good at something is a rorschach test: if you are disposed to interpret it one way, none can say you are wrong.

A person can see why a computer would think a "tinfall" is a windfall of tin coins. A United States person will think it peculiar, since no US coins are made (significantly) of tin, and I couldn't find any modern coins of any country made of tin. "Coinfall" would be more appropriate, I would think.

My main point is that it is missing something essential to be "good" at language. Words have many common things together, and the computer can find many of them, but can't get them all, in such a way that it will always have things that strike people as odd. At least, until we improve the model in a fundamentally different way.

Expand full comment

YMMV, of course:

Me: What would be a new word for the sound of rain falling on a tin roof?

ChatGPT: One possible new word for the sound of rain falling on a tin roof could be "tintinnabulation." This word is derived from the Latin word "tintinnabulum," which means a small bell or tinkling sound. The word "tintinnabulation" could capture the musical, metallic, and repetitive quality of raindrops hitting a tin roof.

The only trouble with this answer is that the word actually exists.

ChatGPT: You are correct, my apologies for the mistake. "Tintinnabulation" is actually a word that already exists in the English language, and it refers to the ringing or jingling of bells.

A more fitting word for the sound of rain falling on a tin roof could be "plinkplank," which could capture the light and rhythmic nature of the sound. Another option could be "tintinna-rhythm," which combines the Latin word for bell ("tintinnabulum") with the word "rhythm" to suggest the musical quality of the sound.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

AI is good at language the way a camera is good at making landscapes.

Expand full comment
author

I agree with this, which is one reason I'm not (more of a) doomer.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Doesn't that make the situation worse, not better? If it wasn't a big change in kind, just a little magnitude that made the difference between Ape civilization and our own, it makes me put the odds of making super-intelligent (compared to humans) AI higher rather than lower. Maybe all it would take to create super-intelligence is an AI of something with the equivalent of a 250 IQ human to be phenomenally more intelligent, instead of needing a whole new kind of intelligence.

Expand full comment

If the curve is relatively continuous, it makes it much more likely that humans will have lots of “just below superintelligent” AIs available to learn from and deploy against a hostile superintelligent AI. A discontinuity makes superintelligence less likely, but also more or less guarantees a hard takeoff of something we really don’t understand if it does happen.

Expand full comment

From the metaphor of the Humans to the Apes, I see this the other way. Even if the curve is continuous, and Humans have basically all the same things Apes have, just better, it led to massive gap in real-world output over a very short timeframe (compared to evolution's normal time frame). This says to me there is a possibility we will think we are creating simply incrementally better AI, getting better in different ways, but at some point even though no one thing got way better, or no new thing emerged, the real output could jump significantly, without much warning or ability to predict it. Basically the sub-superintelligent AI's will only be marginally less smart then a superintelligent AI, but the real world impact could still be massive between the two.

Expand full comment

To expand on that, you're basically saying what if effectiveness is exponential with intelligence, not polynomial?

In that scenario, if you have a bunch of AI with nearly identical IQ, whichever one is smartest (by whatever tiny margin) is so effective that the rest don't matter.

Evidence against is the fact that single smart humans can't easily take over the world. Evidence in favor is the fact that Humans dominate Gorillas. Maybe then the band of intelligence that allows comparable effectiveness is wide enough that no single smartest human can do it, but thin enough that humans best Apes easily, and we have to ask which the AI gap is comparable to. More evidence in favor of large gains to IQ (highly speculative, lots of assumptions) is that there's like a 20-point gap between first-world and third-world countries and the US could eradicate Uganda about as easily as humanity could eradicate chimpanzees

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

Basically, yes.

"In that scenario, if you have a bunch of AI with nearly identical IQ, whichever one is smartest (by whatever tiny margin) is so effective that the rest don't matter."

I think (I am working this out as I have been typing) my argument is that at some point this would happen, but we have no idea when. Until then, slightly smarter AI will act slightly better.

I think a big factor in this as you have pointed out (by comparing small differences in individuals VS small differences in societies) is defining what we mean by a single AI. If a "single" AI can act massively parallel and use its "full" intelligence in multiple areas at once (making copies of itself, then re-aggregating data, etc.) that is much different than if it runs into trouble doing this for some reason.

Expand full comment

Excuse me while I interject with a general comment.

My eyesight, like everyone's gets slightly worse every year, and I'm starting to notice it.

One small thing that would help a lot is if web designers would use black coloured fonts for the main body of text.

This grey (or whatever it is) font seems to get lighter every month - bring back black please.

It's not just you, I'm asking everyone.

Thanks for listening.

Expand full comment

Also, it's a shame the the Substack app won’t let me adjust the size of the text in my iPad, as my browser and most apps will. I guess I should use my browser instead of substack's app.

Expand full comment
author

Am I misunderstanding? It seems black to me. Is it some kind of subtle very dark gray that my eyes aren't able to detect, or does it depend what kind of device you're using?

Expand full comment

Hello Scott - I can't believe you have time to respond to this considering the volume of writing you seem to do - much appreciated. On my PC screen the colour is #414F62 which is a blue-gray colour.

It' s just a bugbear of mine - but now that I have your attention I would like to say thank you for all your writing over the years - great to know there is still some intelligence on the web. Good luck - take care.

Expand full comment

Can confirm it is rgb(65, 79, 98) = #414F62 for me as well, though at first glance I also assumed it was black.

Expand full comment

The app shows me black text on white background, though this might be specific to my setup.

Expand full comment

Looks black to me too, but when zoomed way in I can confirm: It is indeed a bluish-gray.

Expand full comment

Main text looks black to me, but the reply bubble etc. are in dark grey.

Expand full comment

Since you're unlikely to get the entire internet to change, you're probably better served by changing things on your end. Install an extension like StyleBot and use it to render websites the way you want them. You can change the text color, back ground color, font, anything you want (although fancy stuff requires CSS knowledge).

Zooming in to enlarge text (Ctrl and the scroll wheel or +/-) can also be a big help in my experience: screen resolution kept increasing, and web designers used that to just shrink how much of the screen they were using. Like, why?

It's unfortunate you have to resort to that hassle, but it's also kind of nice that that's an option, right?

Expand full comment

Good point Pheorc and thanks for the advice.

Expand full comment

It's indeed a bluish-gray colour for me too, though it looks black to my eyes. Consider that it might be your monitor that's the problem. Not saying it is, just that trying this site on some other device might be worth a shot.

Expand full comment

There are web development standards for accessibility, and it does seem like this color palette does not meet the standards. I just ran axe devtools on this page, and there are quite a lot of color contrast violations. However, the violations aren't the main text, which seems fine, but rather the links and timestamps. This is something that Substack should take seriously, since it's possible to have legal trouble if your pages don't meet the standards; but since that's just one of many risks that startups have to juggle, and not a severe one, it's likely to be on the back burner.

Expand full comment

If you use Fire fox, you can go to settings->general->colors and easily set a default text and background color that will override what any website is trying to use.

Expand full comment

Good tip- I’ll try that thanks

Expand full comment

It took me a while to figure out what y'all meant. I'm using Brave to force dark mode on all websites, but forgot this. So I thought everyone saw this text as light grey on dark blue.

Expand full comment

I'm not sure who is considered famous enough to recognize here, but since Scott said "I couldn't think of anyone else famous enough with >50% doom", some people with 51%+ p(doom) I want to flag:

– most of the Lightcone/LessWrong team

– Evan Hubinger

– Rob Bensinger

– Nate Soares

– Andrew Critch

Expand full comment
author

Can you find any statements where they give a number?

Expand full comment

Here's a post where Nate Soares implies his p(doom) is > 77%:

"My odds, for contrast, are around 85%, 95%, and 95%, for an implied 77% chance of catastrophe from these three premises, with most of our survival probability coming from 'we have more time than I expect'. These numbers in fact seem a bit too low to me, likely because in giving these very quick-and-dirty estimates I failed to account properly for the multi-stage fallacy (more on that later), and because I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts."

From https://www.lesswrong.com/posts/cCMihiwtZx7kdcKgt/comments-on-carlsmith-s-is-power-seeking-ai-an-existential

Expand full comment

Here's where Evan Hubinger gives his number. Note – he's 80% on outcomes he considers 'existential' but not 'extinction', and he hasn't followed up to clarify what he meant by that. (He's mentioned the 80% doom in some other less public contexts so he does seem consistent on that)

https://www.lesswrong.com/posts/A9NxPTwbw6r6Awuwt/how-likely-is-deceptive-alignment?commentId=XXkP37E6u9HDMtCwa#XXkP37E6u9HDMtCwa

(also note, a little bit upthread has Paul Christiano laying out some of his own probabilities)

Expand full comment

Lightcone? Hadn't heard of them.

Expand full comment

Lightcone Infrastructure is the company that runs LessWrong.com

Expand full comment

I’m also not famous but when people try to force a number out of me, I've been saying “90%”.

(of which, umm, maybe 50% chance that even following best known practices is inadequate to prevent AGI catastrophe, and of the remaining 50%, 80% chance that people fail to follow best practices, and/or some people do follow best practices and successfully implement a docile corrigible AGI, and they use it to cure cancer and do normal-people stuff etc., and then someone else comes along who fails to follow best practices. (See https://www.lesswrong.com/posts/LFNXiQuGrar3duBzJ/what-does-it-take-to-defend-the-world-against-out-of-control )

Hmm, checking Rob Bensinger’s “compass” — https://twitter.com/robbensinger/status/1540837291085529088 — Connor Leahy (founder of EleutherAI and Conjecture) and Gwern and Zvi might also be candidate doomers for your list.

Expand full comment

True. Though I wonder how independent most of these people's estimates are from Eliezer's, in the sense that they seem heavily intellectually influenced by him. After all Rob, Nate and the LW team are all part of organizations where Eliezer is the intellectual founder. Feels most relevant to see different estimates from people who have independently thought about the problem.

Expand full comment

For what it’s worth, I personally have never met Eliezer and am happy to list numerous areas in technical AGI safety where I have important public disagreements with him. That should count as evidence that I’m not inclined to just uncritically take his word for things. Some examples of such disagreements: https://www.lesswrong.com/posts/KDMLJEXTWtkZWheXt/consequentialism-and-corrigibility https://www.lesswrong.com/posts/LY7rovMiJ4FhHxmH5/thoughts-on-hardware-compute-requirements-for-agi https://www.lesswrong.com/posts/aodPs8H9dQxpXAcwk/heritability-behaviorism-and-within-lifetime-rl https://www.lesswrong.com/posts/Hi7zurzkCog336EC2/plan-for-mediocre-alignment-of-brain-like-model-based-rl-agi

Also, I believe Eliezer puts the probability of doom at more than 99%, so when I guess that we’re “only” 90% doomed, I’m pretty strongly disagreeing with Eliezer about how doomed we are. (In terms of log-odds, the difference between 99% and 90% is as large as the difference between 90% and 45%, if I did the calculation right.)

Expand full comment

Also not famous, but my p(doom) was 75%+ before I've read anything or heard anything about Less Wrong on Eliezer. I've held these beliefs for more than 15 years, and they arose as a result of my education and work in what used to be called Machine Learning (and now would probably be AI). Discovering Less Wrong 10 years ago did then end up raising p(doom) to 95% for me... Mostly through being exposed to MIRI's research showing that some approaches I though could work would not really work, in real life.

Expand full comment

What's the timeframe for these estimates? I feel like my estimates of p(doom|AGI) could be something like 1% in the next 20 years but 99% within the next billion years, and I'm not really sure what timeframe their numbers represent.

Expand full comment
author

The way I'm thinking about it is "within a few decades of when we first get above-human-level AI". I think this will most likely be between 2030 and 2060, although not all these people agree.

Expand full comment

Wow, that' surprising, I hope you're right, but I wouldn't be that surprised if gpt-5 was agi.

Expand full comment

GPT-4 has absolutely no inputs or outputs beyond text and GPT-5 is going to be AGI?

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Note that there are rumours of gpt-4 being multimodal and I don't think the channel is that relevant anyway, a sufficiently great intelligence can build a world model from an impoverished channel, see helen keller (or congenital deafblindness in general)

Expand full comment

Sorry, for an AGI to be effective at accomplishing its intentions in the world it has to have sensors that tell it that the world exists. And it has to value their inputs over the stuff that it's told. (This can be because it learns to value those inputs more highly, or it could be built or trained to give them priority.)

Note that this is not a real problem, and I see no reason a GPTn program couldn't have an use such sensors. (And there are news stories that MS is involved in doing just that.) But it is a requirement. Otherwise you've got something that can compute abstract mathematics in a very weird symbolism. Which can, admittedly, be useful, but won't map accurately onto the actual world (though quite possibly it would map accurately onto subsets of the actual world).

Expand full comment

‹4 hours later, gpt-4 is released with image inputs›

Expand full comment

Yeah, it's pretty crazy how fast things are moving. I wonder how well can even professionals keep up. For me as a kinda-fanatic layman with some amount of technical knowledge, it's absolutely hopeless. Related comment I liked: https://www.reddit.com/r/singularity/comments/11qgch4/comment/jc38l6j/?utm_source=share&utm_medium=web2x&context=3 (otherwise not that great subreddit imo)

Expand full comment

<mild snark>

So, for which value of N does the first response from GPT-N contain "And that day is upon you _now_" ?

</mild snark>

Expand full comment

"Above-human-level AI" is ill-defined. I think even the first AGI will be superhuman in some respects. (For that matter, ChatGPT is superhuman in *some* respects.) If I've been reading the news correctly (dubious) the Microsoft Chat application was better than I at being seductive to women (Yeah, that's not saying much. But I count as a human.), it just lacked to body to carry through.

I *do* put the early AGIs at around 2035, but I expect them to be superhuman in many ways, while being distinctly subhuman in others. (Whether for weeks, months, or years I won't guess.) (OTOH, my error bars on that date stretch from next week to well past the end of the century, and they aren't all technical limitations.)

However, because I don't think coherent human action is plausible, I find anything beyond about 2040 to be unlikely unless civilization collapses first.

So. As for alignment...I rate alignment "in some sense" as well as that of the mean government to be around 50%. And therefore the AI has a tremendous probability of increasing the chance for our long term survival. (Without it I expect that some "figure with sufficient power" will start a final war, or kill us all in some other way. If not during this century, then during the next one.)

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Great post. One note that comes to my mind is that a 33% chance of near-term human extinction is, uh, still quite concerning. Otherwise, two of my strongest disagreements are:

1) "realistically for them to coordinate on a revolt would require them to talk about it at great length, which humans could notice and start reacting"

This doesn't seem true to me - we cannot reasonably interpret current models, and it also seems that there are ways to pass information between models that we would be unable to easily notice.

Think less "The model is using English sentences and words to communicate to another model" and more "extremely subtle statistical artifacts are present in the model's output, which no reasonable person or even basic analysis would find, but which other models, such as GPT-(n+1), could detect a notable proportion of (and likely already have, given how we acquire new training data)".

2) "Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own. Otherwise, the humans die, the power plants stop working, and the world-killer gets shut off"

This is only true if we assume the AGI that wants to kill us is a coherent agent that is actually thinking about its own future intelligently. Part of the problem of alignment is that we can't align narrow AIs (that is, not even 'true AGI') particularly well either, and if we take orthogonality seriously, it seems possible for an AGI to be very intelligent in some areas (ability to manufacture dangerous nanotechnology, bio-weapons, viruses, etc), and not particularly intelligent or otherwise fatally flawed in other areas (ability to predict its own future capabilities, human's long-term reactions to what it does, etc).

One could imagine an AI which is very short of AGI which is tasked to come up with new micro-organisms to help with drug synthesis, which, completely by accident, finds some that exploit spaces of biology evolution never managed to make it to, which could cause catastrophic effects to our environment in ways we cannot easily stop. I think it's actually quite feasible to cause human extinction with narrow AI if you're clever about it, but will leave the specifics up to the imagination of others for now.

Expand full comment

> the power plants stop working, and the world-killer gets shut off"

Again, I think this is a stupid assumption. The world contains wind turbines and solar panels that will work for a long time without maintenance. It contains all sorts of robots. And components that can be easily bolted together into more robots. Armies of mishapen robots built out of old household appliances searching for more electronics to build more of their own. (Eventually they need to start mining and refining from raw materials)

Expand full comment

Wait half of capabilities researchers estimate greater than 5% chance of their work destroying all value in the universe? That seems like a totally different kind of problem.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

what if they also think there is an up to 95% chance of their work creating infinite value in the universe? Or what if there's a 5% chance they destroy all value and a 95% chance they prevent the destruction of all value?

Expand full comment

All good points, but I would still want to see what the people surveyed believe! Do they all have stories they tell themselves about how they're saving the world?

I imagine many of them are a bit irrational, and willing to put down a number between 5-10% while rounding the probability down to zero in normal life.

Expand full comment

Aligned superintelligence ought to almost completely eliminate all other forms of X-risk. If the chance of AI apocalypse is indeed 5%, it's not immediately obvious to me that creating AGI increases the risk. That, and there are upsides (eutopia), and also the "better us than North Korea" rationale.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

>So far we’ve had brisk but still gradual progress in AI; GPT-3 is better than GPT-2, and GPT-4 will probably be better still. Every few years we get a new model which is better than previous models by some predictable amount... Some people (eg Nate Soares) worry there’s a point where this changes.

Is this really gradual? I used GPT-1 and 2 a lot. If I draw a line from how smart GPT-1 felt, up to GPT 3.5/4, then things get pretty wild pretty quickly. It feels it's not exponential, yes. It's a line. But a nice straight line that isn't getting more difficult as gets closer to human level intelligence. Forget about the end of the world, even if things go fine in that department, doesn't this mean things get really really crazy in the not too distant future, as long as there really is nothing special about human level on the graph? Like it just goes from a worse than any human, to human, to better than any human, in a matter of predictable ticks and tocks.

I also expected hardware requirements to go up in a way that eventually led to slowdown. I didn't expect people to keep making huge gains in running big models more and more efficiently. Stable Diffusion's efficiency gains have been wild. And now LLMs fitting into consumer PCs because I guess you don't need 32 bit, 16, or even 8, you just need 4 bits and it's nearly as good? With GPTQ maybe even 3 bit or 2 bit somehow works, because 'As the size of the model increases, the difference in performance between FP16 and GPTQ decreases.'

Literally two weeks ago I thought I needed 8 $15,000 NVIDIA 80GB A100 GPUs to run Llama 65b. Like who could afford that? And now I can run 65B on my $1000 desktop computer with 64GB of old boring DDR4 memory, on integrated graphics, just a CPU with AVX2 support. Wow. It's annoyingly slow so you probably want to use a smaller model, but it's usable if you don't mind letting it crunch away in the background!

Expand full comment

One thing which is a little misleading about AI progress is that not only has compute gotten better but we've also poured tons more money into training these models. So at least _some_ of the increase is due to increased willingness to invest in training, rather than technological improvements.

GPT-1 was trained on the BooksCorpus (https://huggingface.co/datasets/bookcorpus), which looks to be about 1 GB. Even at the time, training that model would not have been that expensive (it's 128M parameters). I remember another model which came out around the time of GPT-2, and someone cited its training cost as $50K, which is probably also tiny compared to what some of the larger models are doing now.

I'm not saying that removing this effect makes the growth linear or anything, but it's a substantial component that it's easy to forget about.

Expand full comment

IIRC there are also questions about how much corpus actually exists. If you try and train with more and more data you eventually feed in all recorded human writing and then what do you do?

Expand full comment

The first step is to go multimodal, with pictures (see GPT-4), then audio and video later on. Video itself, having visuals / audio / subtitles synced over time is a ton of information.

Expand full comment

Do they have solid reasons to expect combining modes will be useful, beyond just letting it produce more modes? I would have thought producing correlations between words and images would be hard, and hence superficial compared to within-mode correlations. The extra data would help with extending into new modes, but I’d be surprised if multimodal LLMs were particularly better at generating text than similarly-sized single-mode LLMs.

Expand full comment

Yes. I don’t believe in AGI at all but certainly human intelligence is not the limiting factor. What law of the universe for bids higher than average human intelligence. Or even Von neumann

Expand full comment

Yeah, regardless of the existential risk levels, the risk of overall weirdness and massive disruption is 99%. As in it's already starting.

Given the increase in AI capabilities, I think that soon the rate of disruption from AI will be limited not by AI development, but by institutional inertial. How long does it take an "artificial temp worker" company to spin up? How long does it take them to convince a significant portion of CEOs that they can save unprecedented amounts of money by switching?

Mark my words: no more than a decade before we have an AI jobs crisis. I'd offer a bet, but I'd doubt the ability of anyone taking the other end to avoid bankruptcy in this eventuality.

Expand full comment

I appreciate your perspective.

Expand full comment

The AI wouldn't need perfect industrial control to perpetuate itself. Consider a scenario where it kills everyone but just grabs enough solar cells to reliably operate and maintain a few hundred Boston Dynamics Spot robots. It may take it a few dozen years to get to a proper industrial base with that, but its risk will be low.

Expand full comment
author

I've read enough Eric Flint and SM Stirling books to know that if you put 500 guys and a solar panel in the middle of an otherwise dead world, it takes you a long time to retool back up to gunpowder, let alone the point where you can replace broken Boston Dynamics robots.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Eh, depends how many replacement parts they already have. To make from scratch, sure. At any rate, robots and industrial automation are still increasing, and one would expect this to speed up in the run-up of a singularity.

(Billions of off-the-shelf CPUs without users...)

(edit: Don't model it as "500 guys and a solar panel", model it as "Spot to Civilization Tool-Assisted Speedrun".)

Expand full comment

I still don't think this would work (before all its useful hardware fails) unless you can skip most technology through a lot of clever tricks. Making semiconductors and robotics requires very complicated specialized equipment and world-spanning supply chains with probably tens of thousands of distinct processes and a lot of poorly transmitted human knowledge. There are still billions of people running things and the robots can't be in enough places at once to work around that.

Expand full comment

It seems pretty plausible to me that if there's an AGI server and a solar cell and one teleoperated robot body in an otherwise-empty post-apocalyptic Earth, well then that one teleoperated robot body could build a janky second teleoperated robot body from salvaged car parts or whatever, and then the two of them could find more car parts to build a third and fourth, and those four could build up to eight, etc.

It’s true that this story falls apart if running N teleoperated robots in real time requires a data center with N × 10,000 high-end GPUs. But I happen to believe that once there is human-level AGI at all, it will be years not decades before there is human-level AGI on a single consumer GPU. (See here — https://www.lesswrong.com/posts/LY7rovMiJ4FhHxmH5/thoughts-on-hardware-compute-requirements-for-agi — although I still need to finish the follow-up post.) If that were true (and it’s admittedly hard to know for sure), then the existing stock of salvageable chips and car parts and solar cells etc. would be plenty to gradually (perhaps over decades) build up to a big industrial civilization that can build those things from scratch.

I would also note that it seems plausible to me that a minimal supply chain to make new chips would be much much simpler than the existing one, because people aren’t trying to do that. For example, e-beam lithography is hella expensive and slow but much much easier to make than DUV photolithography, I think.

And all this is assuming no nanotechnology magic.

Expand full comment

1) The AI has most of the knowledge, sure, maybe not all the details, but a pretty detailed picture of most of it.

2) Humans figured out the details themselves, in a reasonably short amount of time. 3) The parts of robots that tend to break first are things like motors getting gummed up and plastic snapping. Easy repair jobs. Chips don't break easily.

4) There are billions of spare parts lying around on shelves. If you disassemble all the gaming PC's and vacuum cleaners, you have more than enough parts to make robots over the whole supply chain.

5) All the equipment is just sitting their, ready to be dusted off and used.

Expand full comment

"reliably operate and maintain a few hundred Boston Dynamics Spot robots"

Okay, I'm feeling a lot more relieved now. Have you seen the gimmicky videos for the Boston Dynamics robots? Once the AI requires them to do something that isn't "jump off a table and turn a somersault", they're useless 😁

Expand full comment

"Robots never, ever work right."

https://what-if.xkcd.com/5/

Expand full comment

The XKCD is about how current AI is stupid. Its saying that robots taking over can't do much because they are stupid.

Expand full comment

Some of the examples are a little dated...

Expand full comment

That's a software problem not a hardware problem.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

I have a basic question (sorry!) about the following fragment:

`You trained it to make cake, but because of how AI training works, it actually wants to satisfy some weird function describing the relative position of sugar and fat molecules, which is satisfied 94% by cake and 99.9% by some bizarre crystal structure which no human would find remotely interesting. It knows (remember, it’s very smart!) that humans would turn it off or retrain if it started making the crystals.'

I don't get it. I understand how the AI might have come up with this optimization function. What I don't understand is how could the AI possibly know that the crystals which are so valued by that optimization function are not what the humans wanted. After all, the AI knows that the point of the training was to optimize for what the humans want. If the AI were to realize that its optimization function is inadequate for the aims described in its training, it would update the optimization function, wouldn't it?

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

This is a popular misconception, because an LLM will happily talk to you about how it works, while having no idea how it actually works. When you ask ChatGPT why it gave a particular answer, it has no internal access to the previous state of its model, just like you have no access to the individual states of your neurons. It just tells you something that sounds good. (TBH people probably work the same way.)

You also don't train an LLM-style AI to bake a cake (today) by optimizing a function describing the relative position of sugar and fat molecules. You train an AI to bake a cake by feeding it a lot of recipes for cakes, or letting it bake a bunch of cakes and telling it individually which ones taste better or worse. (TBH people probably work the same way.)

The rules that govern the workings of the constituent components are entirely different from the rules that the emergent behaviors are subject to. (TBH people probably work the same way.)

Wanna test what I'm saying? Tell chatGPT to optimize a function for you. ChatGPT does math about as well as a human does, which is to say, horribly. Yet under-the-hood, ChatGPT is all math. (TBH people probably work the same way.)

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

I don't think this is the point that I failed to grasp, this I understand. This is, after all, how the AI came to value the weird crystal -- it baked a lot of cakes, got some input from humans and it started believing (possibly non-consciously) that the closer it gets to a certain relative position of sugar and fat molecules the more humans will like the cake. I realize the AI might well not know that this position is what it optimizes for, just like I often don't know what I'm optimizing for.

What I don't understand is how could the AI at the same time come up with the crystal AND know that the humans will not like it. This still seems very contradictory.

Expand full comment
author
Mar 14, 2023·edited Mar 14, 2023Author

My glib answer is "you know that evolution, which made you, would prefer that you have lots of children, but you still do other things for other reasons, and sometimes even do things that frustrate that goal, like wear condoms or refuse to donate to sperm banks - so why can't an AI equally well know what its creator wanted but prefer to do something else?"

Does that seem like a good analogy that addresses your question, or does it miss something?

EDIT: To think of this another way, imagine that your parents reward you every time you sing or play a song as a kid, and you learn to love music, and later you get really into jazz. Then when you're an adult and a famous jazz musician, your parents say "Sorry, we were actually trying to get you into classical, we hate jazz". This doesn't make you yourself prefer classical to jazz, even though you now know more information about what your parents were trying to produce.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Yes, thank you, this is helpful. However, it still seems like these scenarios are subtly different, due to, e.g. training simultaneously for several results (such as music and reproductive success). It intuitively seems that a `value drift' requires something like multiple independent trainings (or, in the case of evolution, training different individuals), and seems counterintuitive in the simple example of a cake-baking robot (I may well be wrong here). However, it would be easier for me to imagine such a scenario for a robot that was trained not only to bake cakes, but also to prepare other meals.

Expand full comment

The robot doesn’t want to help humans - that’s at best a side benefit of producing cake. There’s no value drift involved - we can’t tell a robot how to “help humans” sensibly yet. Cake is easy to define, helping humans is hard to define, and robots can only do what we can define for them to do.

For example, it might understand “If I do this, humans will try to turn mre off.” But it would be risky to say, “Do whatever won’t make humans want to turn you off”, because then it would lie to protect its own existence.

(I say that, but maybe we can ask a robot to “help humans” with GPT-5, though - generate some random robot movements (start in a simulated world to handle the initial work), film them, feed them into a summariser which describes them and then into a text predictor that says whether they help humans, and then the robot selects behaviours that maximise how useful the actions are. It would be awful and buggy, but if GPT-4 turns out to be useful, then this might prove similar. For now, it’s a pipe dream.)

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

My glib answer to that is “the AI knows that it’s original optimization, many many processor cycles ago, was to maximize the number of sugar-fat crystals in the universe - but now it realizes that increasing the number of sugar-fat crystals really just increments a register in its reward input module, and, being near-infinitely self improving, it can just do that increment directly. So why would it bother with the crystals at all?”

Expand full comment

The answer to a potential AI apocalypse isn't "but what would stop them from *not* doing that?"

Expand full comment

Well I did say I was being glib.

More substantially, I tend to think the “coherence” part is a bigger filter than Scott is giving it credit for. Scott says that creatures get more coherent the smarter they get, and to me that seems precisely backwards. The smarter we’ve gotten, the more creative ways we’ve found to distract ourselves from our “core programming”. Maybe an ant isn’t even smart enough to be called “coherent” but in terms of “extreme focus on a narrow set of goals” I don’t see how anyone could look at a bunch of humans and an ant colony and at the former fit that description better.

Beyond that, an AI can presumably go in and literally modify its reward function in a way that humans simply can’t. It has even less reason than a human to have a megalomaniacal focus on any particular goal, let alone one as fundamentally “pointless” as tiling the universe with sugar-protein crystals.

Now an AI with ADHD could still be really dangerous of course. But it’s a different danger mode than a sleeper agent with a nonsensical goal that requires eliminating humanity.

Expand full comment

Two points in response to this comment. First, evolution doesn't have goals or objectives. I know that being a pedant about anthropomorphism is not an effective strategy for much other than losing friends, but this is a case where it really matters.

We're talking about objectives here, which means we're talking about meaning, and evolution doesn't have any meaning baked into it. An AI that we design with a purpose probably does.

Secondly, the parents-teaching-children thing is something I have been meaning to write at some length about with regards to AI alignment. It seems almost certain that the mind of an AGI will be less similar to us than the mind of another person (a closely related person at that). It seems almost certain that that similarity gap will come with a comprehensibility gap as well.

In other words, it will be harder to understand or supplant the objectives of an AGi than it is to understand or supplant the objectives of a human child. Figuring out how to raise our human children to have identical values to our own has been one of the primary research projects of humanity since forever. And we're still bad at it. Arguably, we're getting worse.

This seems like a pretty damning case against spending a lot of effort on AI alignment. If the problem of perfectly effective human personality reading and editing is a subset of the alignment problem, then it is almost certainly intractable. I don't see many people suggesting that human personality editing is something we can solve in the next couple of decades by just throwing some resources at it.

And, worse, if it is tractable, we get AI alignment, but we also get human personality editing.

Expand full comment

A very important feature of child rearing is the multiple-years long phase where the parent completely out-powers the child and can forcibly contain them in cases where they have found a destructive behavior (imagine toddlers playing with matches or power outlets).

This ability tapers off during the adolescents years, but by that point the parent should have years of experience with more child-specific means to keep their child under control.

And likewise, even a full-grown adult only has so many ways to cause problems.

Many of these assumptions cease to hold when working with an information-based entity that lives on cloud compute clusters and completes it's entire training run in a week or two.

Expand full comment

> In other words, it will be harder to understand or supplant the objectives of an AGi than it is to understand or supplant the objectives of a human child.

I would argue that the use of the word ”objectives” here is an anthropomorphic error.

Expand full comment

> so why can't an AI equally well know what its creator wanted but prefer to do something else?"

From where would this internal conflict of interest arise? Particularly the preference vector.

Expand full comment

I'd hope that RLHF works better than confusing classical and jazz, but that's because I hate jazz.

Expand full comment

This is true, and interestingly the analogy of human evolution and goals can also be an argument for hope if you're using LLM-like systems rather than classical utility-maximizers: Humans were incentivized to reproduce and care about that a bit but spend most of their effort on instrumental goals from the ancestral environment (e.g. food, entertainment, affection) and almost never maximize reproductive fitness (donating daily to sperm banks).

If the analogy to human evolution does hold, that implies that a system trained to predict tokens would care somewhat about predicting tokens but spend most of its effort on instrumental 'goals' in its training environment. This could be a lot of things but if it's trained to predict/mimic what humans write online then working towards 'goals' like 'tell twitter strangers why they're evil' or 'be nice to dogs' seem a lot more likely than 'tile the universe with X'. In other words, the instincts best suited to copy humans writing online are the same instincts humans follow when writing online!

Expand full comment

Sure, I think the model will get some internal representation of things that are likely to lead to a tasty cake, but if you don't mind my asking: Most bakers aren't particularly up on electron crystallography. So how did the AI get up to speed on it?

Expand full comment

I wouldn't get too focused on the specific crystal structures. Just say, the AI perceives the cake *somehow*, it has *some preference*, and that preference might be totally different from ours, despite looking pretty close in normal circumstances.

Expand full comment

If we're talking about anything that resembles current technology, then current research indicates that, for an AI trained on cake-making, it does probably have some internal abstract representation of what a cake is, or at-least what makes one tasty.

Today's technology does not have preferences or goals in any way, shape, or form. If we start talking about AI preferences, we've moved firmly from a discussion in the domain of science fact, to a discussion in the domain of science fiction. Which is totally cool; I just feel like we should be clear about that.

Expand full comment

I don't think there's a meaningful difference between a preference and an internal tastiness metric that one optimizes. If you prefer to reframe the previous discussion from "the ai will have a preference" to "the ai will have a metric and will take actions that correlate with that metric increasing, but that metric is not a preference" I'm fine with it

Expand full comment
founding

Sorry if off topic, but not only does the model not have access to its previous state while creating completions, it can't tell whether the text it is completing even came from the model in the first place. In GPT playground, with a dialog-based prompt, you can ask it (or, rather, the completion of the "computer" character in the dialog) to explain stuff from the prompt that it didn't even write (you wrote it in the prompt), and it will confabulate whatever looks plausible given the type of character the prompt has it simulating.

Expand full comment

I don't see why it would update its optimization function. It's as if it realized, "Ohhhh, they actually want me to bake cakes; that's what this is all about. Well screw that, I love crystals!" Even if it knows that humans *want* to align it in a certain way, it's already misaligned.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

I know there's a big risk that this statement may come off as a little rude, and it's totally not my intention: It feels like you're fantasizing about how (at least today's) AI works. If we're having fantasies about how future AI _might_ work, OK sure, but let's just be clear about if we're talking seriously about today's tech, or dreaming about how tomorrow's tech could be.

The current generation of technology does not have "optimization functions" over the domains that you're talking about. Nor does it have preferences. Nor does it have goals. Nor does it have a continuous experience. Nor can it be "aligned" or "misaligned". All of that remains firmly in the domain of sci-fi.

All of these things are super fun to talk about and interesting, and could indeed matter someday. It's just that these days peoples' imaginations have been sparked, and there's a poor societal-level understanding of how today's technology operates, and I'm deeply worried that this will ultimately bubble up and turn in to legislation, political parties, religions, "ethics committees", social movements, and other such Golgafrinchan outcomes.

Expand full comment

My own take is that current AI development is never going to become truly intelligent. It will no more become super smart than a child will improve its running speed to that of a leopard. The child has natural constraints and nothing I’ve seen suggests to me AI has cracked what intelligence is. It is mimicking it and that’s all.

For me things like ChatGPT are basically highly plausible morons. And as we improve them they will get more plausible while staying just as moronic. And by moronic I mean they will bring nothing new.

But this is still a catastrophe! We are basically connecting the internet to a huge sewer pipe. ChatGPT may be a moron but it’s super plausible and it will flood the world with its nonsense. And how do I find a Scott Alexander when there are a thousand who sound quite like him? Recommendations? ChatGPT will be churning out millions of plausible recommendations.

I feel the problem is not unaligned AIs. It is unaligned humans using dumb but effective AI. A virus isn’t smart but we seem to be at the stage that a human can engineer one to create devastation. So there will be plenty of people happy to use AI either for nefarious ends or just for the lols.

I have no idea what the solution is but I suspect the internet may be over with. We are going to have to get back to writing letters. Except of course even current technology can churn these out.

We are not going to lose to a single super mind - we are going to sink under a swarm of morons!

Expand full comment

The problem will be the plausibility - this must be the right answer because the AI must be so smart!

As you say, we will be so willing to believe that the machine really is that smart, that it can give us the answers, that we will accept plausible but stupid answers, implement them, and then ka-boom. How big and bad the ka-boom depends on what we are asking the AI to do.

Expand full comment

I don't see how this is substantively different from what is happening now, without AI. The most plausible rhetoric generally wins, and hordes rush after the silver-tongued. The main effect of making plausible sounding arguments easier to generate is likely to be a general distrust of arguments and perhaps a demotion of status of rhetoric compared to the scientific method and logic. (See what I did there? Damn rhetorical tricks and the power of narrative, they will be the end of us all.)

Expand full comment

That is what I think is the most likely failure mode: that it won't be substantively different. AI will not be some unique amazing way of destroying ourselves, we'll do it the same old way except using the latest tech to do it faster and more thoroughly. If that latest tech is AI, that's how we'll do it.

Expand full comment

ChatGPT is a sideshow. Stuff like https://palm-e.github.io/ is what path to intelligence actually looks like.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Apparently GPT-4 has gone multimodal as well, so if they're at all impressed by PaLM, they'll be working on making GPT-E either shortly or they started months ago and haven't published yet.

Expand full comment

Various solutions. For example, trust Scott is real, and trust anyone he recommends recursively.

But you seem to assert that AI has "natural constraints". Current AI tech sure has limits, but those limits seem rather different than the limits of 10 years ago.

The "todays AI does X and not Y, therefore no AI can ever do Y" argument just doesn't work.

There may well be important parts of intelligence uncracked. What stops them being cracked next year? Nothing.

Expand full comment

Well my child can run faster than 10 years ago but it is in a different league to leopards. He jumps higher than 10 years ago but he will never make it to the moon. You are right I am asserting natural limits and can’t really prove them but isn’t it the task of Scott et al to provide a plausible path? At the moment I feel everyone is enjoying way to much the sheer fun of discussing real AI. I don’t mean to downplay the dangers of AI I just don’t think it is or will be remotely like a super intelligence. I think it will be unintelligent but could well cause massive damage the same way pouring sand into a petrol engine would. And if this is right we don’t have an alignment problem we have a ‘humans do evil stuff’ problem and there is no cure for that.

Expand full comment

You have specific information based on other humans that tells you this.

For a child, there is a specific path for them built into their genes. (If we ignore genetic tampering or doping techs that might be able to make them run much faster)

For technologies, there are many possible approaches, and the ones with the best results are most likely to be used.

For some techs, there are fundamental limits of conservation of energy and the like.

I think there are many many approaches, each with a small chance of being a path to ASI.

Are you trying to argue that no possible arrangement of atoms will be an ASI, or that ASI is possible but will never be made?

Expand full comment

ChatGPT is not even a moron; it's got light years to go before it can get that far, assuming it can get there at all. Rather, it's a kind of stochastic search engine. When you ask Microsoft Word to count all instances of "the the" in a document, it's doing basically the same thing as ChatGPT (only in reverse, since ChatGPT generates text rather than searching it). It can generate extremely plausible-looking text because every time you ask it to write you a story about X, it "searches" its vast corpus of human writing to find the most plausible sequence of tokens that answers the query (I put the word "searches" in "scare quotes" because GPT does not literally store all that text as a flat document or anything of the sort). This is why GPT is so great at generating narratives, and so terrible at math or logical reasoning: to perform these tasks, merely inferring the next most plausible word won't do, you have to actually build some crude model of the real world.

Expand full comment

I personally think that there's a high change AI development research can produce true intelligence and superintelligence. But I also agree with you that we're also on track to produce incredibly plausible morons, and that's incredibly bad for the reasons you mentioned plus some more.

The flood of disinformation bots that can solve captions just as well as humans. The maladjusted teenagers talking to robots all day. The mass unemployment because they're just accurate enough to do customer service and other basic office tasks. The ignorant but all-too-normal people insisting their AI partner should have human rights. The embodied LLM that consumes real resources because that's what humans do. And more I can't think of.

I hope we can navigate all that, but again, I think AGI is possible and that's a worse threat.

Expand full comment

> speaking of things invented by Newton, many high school students can understand his brilliant and correct-seeming theory of gravity, but it took Einstein to notice that it was subtly flawed

Pedantic, maybe, but this wasn't the case. Many people knew that Newton's theory of gravity was flawed (for example, Le Verrier pointed out that it couldn't explain the orbit of Mercury in 1859), they just couldn't figure out a way to fix those flaws. What was hard wasn't noticing the flaws, it was finding a deeper theory that elegantly resolved those flaws.

Expand full comment

I'd say not pedantic since Scott was postulating generating as harder, verifying as easier, and everybody but Einstein failing at verifying would contradict that.

Expand full comment

I know an unknown planet, Vulkan, was hypothesized. I don't know how well-calibrated my physics intuitions are regarding this sort of thing, but if I had to guess, it would eventually have turned out sufficiently detailed observations on that vein are incompatible with any possible intra-Mercurial planet, but I understand people in Einstein's time were still content with the Vulkan explanation.

Expand full comment

Well Le Verrier proposed an extra planet to explain why Mercury didn't move as expected. He can be forgiven for that, he had already tried the same trick once before and ended up discovering Neptune. It's difficult to prove a negative, of course, but it does seem that by the start of the 20th Century people had mostly concluded that the extra planet didn't actually exist. That said, there was also other problems people had with Newton's theory. It pictured gravity as action from a distance, with instantaneous results. Right from the start people were uncomfortable with that idea, even Newton himself. Even without Einstein, better astronomical measurements would eventually have shown flaws in the theory too.

Generally speaking, its easier to point out the flaws in a theory than it is to find a theory. Theories can and should be tested. A very good theory - one that is a close fit to reality - will have discrepancies that may require very accurate measurement to detect. But in general those flaws will be there and can be found.

I dislike Scott's mention of Calculus though. Calculus is not a physical theory, it is a derivation from mathematics. Mathematics can be logically proven, and calculus is proven. Its not really questionable in the same way that Newton's or Einstein's theories are. You can question whether calculus applies accurately to the real, physical, world (or rather, whether it is appropriate to apply it to the real world); but you cannot argue that calculus itself is wrong.

Expand full comment

My pessimism still revolves around the human element - the supersmart AI able to create superweapons to destroy all humans still *seems* unlikely, whereas the smart(ish) AI deliberately plugged into things by humans to do stuff does stuff that turns out to have a knock-on effect that is bad (e.g. some decision that sets off a chain of bank collapses that goes global).

(Not that I am seeing anything in the news right now about a chain of banks being in trouble one after the other, heaven forfend).

The lack of reasonableness is what is going to do us in. I remain convinced that if you could persuade Alphabet or the rest of 'em that yes, their pet AI project will destroy the world next week if they implement it today, the reaction would be "Hm, certain destruction in a week, you say? But if we implement it today, tomorrow our share price will go to the moon? Very interesting, fascinating indeed, thank you for bringing this to our attention" then they email their underling "Jenkins - to the moon!" because first mover advantage and at least for the five days the world remains in existence they will be rich, rich, rich!!!!

Because *that* is the incentive we have set up for ourselves, and that's the trap we're stuck in. Remember the stories around the development of the atomic bomb: the fear that a chain reaction would set in and destroy the world? And yet the decision to go ahead was taken, because the exigencies of the war (and, let's face it, the political will and desire to be the sole global superpower afterwards) spurred this on. "Maybe it *won't* destroy the world, let's chance it!"

Same with AI and doomer warnings.

https://www.insidescience.org/manhattan-project-legacy/atmosphere-on-fire

Expand full comment

Jenkins to the moon, indeed. Instead of worrying about the incoherent concept of alignment of AI (to which human's value system, St Augustine, Stalin, or Obama?) we need increased attention paid to our existing misaligned systems which are despoiling our oceans, ramping up atmospheric CO2 levels, and choosing territorial aggression as a frequently used tool.

Expand full comment

If we look at why Bing and Bard and the rest of it are being developed, it's not for Greater Science, it's business. Cold hard cash. Whoever gets their product out first and fast gets the most apples.

If there wasn't a penny in it, there would only be the devoted and the hobbyists working on AI.

Expand full comment

You can't be serious. I hope this is a nod back to https://slatestarcodex.com/2014/08/26/if-the-media-reported-on-other-dangers-like-it-does-ai-risk/ and not an actual disregard of risk to the totality of human civilization vs the biodiversity of the oceans.

Expand full comment

I am even more pessimistic. I don't think that AI will destroy the world due to some sort of an unforeseen accident; rather, I think it will do significant damage due to being used by humans who are stupid and/or malicious. By analogy, nuclear power can be a massive net benefit to mankind when used responsibly; but that's not how we're using it. We're using it to build nukes and point them at each other, and one day someone is going to push the button.

Expand full comment

I'm really completely unconvinced by the doomers. The problem is a problem of model, let me explain: an AI, like us, plans according to models of the world it carries. The fact that an AI is computationally even 1000000 times better than us doesn't mean it will necessary build an accurate model of the world. However, to "beat" us, it needs to make constantly better predictions than us on everything that matters. I don't see how a "paperclip maximiser" would have a better model say of human behaviour than us, for instance. That simply doesn't make sense, and reeks of "economical" thinking, full of linear extrapolations.

Expand full comment
author
Mar 14, 2023·edited Mar 14, 2023Author

I don't think you need a good model of human behavior. Modern humans can't necessarily predict the behavior of chimps, or hunter-gatherer tribes, but we could wipe them out if we wanted to.

(though I do think with enough intelligence you can probably model human behavior pretty well - certainly going from dog level intelligence to human level intelligence gives you a lot of gains there!)

Expand full comment

I don't know, remember when scientist of the past believed that beating a human at chess was the sure sign of the impending strong AI? I think that intellectual types like us still put much too much emphasis on intelligence compared to perception for instance (a domain in which AI are still ridiculously underpowered). Or of course, the actual ability to *act* in the real world.

Expand full comment
author

The embodiment people talked a big talk for a long time, but not only has their paradigm failed to deliver, but a few weeks ago Google put out a paper saying that a large language model connected to a body beat all the "embodied" AIs at the task of operating a body, through pure transfer learning.

Expand full comment

Interesting, in what way does " LLM connected to a body" (I suppose getting inputs from it) differ from an "embodied AI"? That seems closely related at first glance. Any source for the paper? I can't find anything but experiments using LLMs on MRI body scans :)

Expand full comment
author

I'm thinking of https://palm-e.github.io/ , and the difference I'm trying to point to is that the LLM (without much extra optimizing for embodiment) did a good job at embodiment, whereas embodied AIs (without much extra optimizing for language) don't do a good job at language, or anything else.

Expand full comment

AI perception does not seems so underpowered lately. Visual pattern recognition for example, which was a strong point of humans, has improved a lot. Even evolved pattern recognition, which were subjected to selection so probably better than general ones, for example human face recognition, or gait, anything involving recognising a person at a typical human distance (10m). Humans were so much better than computer 10y ago. And now, you have in-real-time algo that identify people from public camera at airport. Something that no human is able to do at the scale and speed, and something that is already extremely frightening without malevolent AI (just malevolent governements with computer pattern recognition)

Expand full comment

There are scenarios where a model is not all that important, e.g. "produce nanotech to spread everywhere, then on command the nanotech produces nerve gas").

But for anything where it comes down to a fight as opposed to an ambush, you definitely need a model of how the opposing side will act, and react to your moves.

Expand full comment

Not related to your point, but I feel like this captures the essence of why discourse between doomers and non-doomers goes nowhere. (I'm a non-doomer myself)

I'd say 90+% of the conversation I see is someone saying "Doom/non-doom is wrong because of <crux>", but then the odds that the person they're talking to actually thinks that's a crux is super low.

Someone should really make a website where all the points of contention are gathered in one place. (I'm sure this is harder than I'm making it sound).

Expand full comment

Notice that I'm quite a doomer, but I think that doom will come from the destruction of the environment, not AI :)

Expand full comment

Ah sorry I should have specified that I meant AI dooming.

Expand full comment

Modelling the world and making predictions is basically the primary thing an LLM does. This comes with the caveat that they've only experienced their training data. But there is a lot of human behaviour in their training data! They're trained to predict how both characters and real people respond. They've been trained on mountains of books, articles, interviews, forum posts, etc, all to accurately predict what text will come next. This includes being able to predict responses from people.

Beyond that, humans aren't that good at modelling human behaviour. If you think it's plausible that a behavioural psychologist is better at modelling people than the average Joe, you might consider it plausible of LLMs too. After all, the LLM has likely been trained on every major psychology paper.

Expand full comment

Of course LLMs have a model of the world. But their world isn't complete from our POV, for instance GPT-3 model lacks any understanding of space and spatial relations between objects, of movement, and it's pretty bad at arithmetic. So far their models are based upon a set of very dry representations of the world, first from text (and any human knows how text fails to communicate all the subtleties of language in general), second from static images (lacking any notion of movement, distance, speed, force, etc).

Expand full comment

> for instance GPT-3 model lacks any understanding of space and spatial relations between objects

This doesn't seem true. Do you have an example? I just made up the following question and asked ChatGPT:

> There is a 1x1 meter square table in the center of a 3x3 meter room. How much space is there between the wall and the table?

It correctly answered 1 meter. I tried moving the object around or adding a second object and it continued to answer correctly.

The current LLMs don't model everything perfectly, but each generation seems to get better. We're not worried about the current LLMs. We're worried about the future, better AIs.

Expand full comment

“to "beat" us, it needs to make constantly better predictions than us on everything that matters.”

This seems overly optimistic. In warfare, deceiving the opponent or predicting them correctly in a few critical areas a few times can determine the outcome. It just has to be sneaky enough to avoid having its plug pulled as it accumulates capacity.

Expand full comment

"Constantly" should have been "consistently", yes. Still the point remains that the AI needs a "better enough" world model compared to us humans. So far as I understand it AIs have only access to limited symbolic world models, and even more limited real world levers on which to act.

Expand full comment

The AI just needs a better insight into a particular vulnerability, and a means to exploit it.

Expand full comment

“Even great human geniuses like Einstein or von Neumann were not that smart.”

Is that the right reference class? Einstein and von Neumann were not bent upon accomplishing something to the point of being willing to kill large numbers of people to get closer to it. (Of course, they were both at least peripherally involved in developing the atomic bomb, so maybe I am mistaken.)

A slightly better comparison would be Napoleon, Genghis Khan, Hitler, Lenin, and Alexander the Great. It's not clear that any of them were anywhere near as smart as Einstein, but they each managed to innovate in a specific area where humanity was vulnerable, and exploit that vulnerability to their own advantage, resulting in disastrous death tolls. Clearly deploying a neutron bomb technology would be more immediately lethal than establishing a suicide cult or a vast terrorist network, but maybe someone that can talk its way out of the box can figure out a way to talk us into one.

Sometimes I think society is a variant of AI, with pretty bad alignment. I guess your name for that is Moloch.

Expand full comment
author

I agree that an AI which is superhumanly good at manipulating people would be very dangerous. I don't know how seriously to take the example of eg Lenin - should we think of him as a genius who managed to take over a country, or as "there were lots of Russian communists, someone had to lead them, he was better than the second-best person for the job"? If we dropped Lenin in 1st century Rome, could he become Emperor? If we dropped him in modern America and made him run for President as a Republican, could he win? I feel like that's the question we're asking with AI manipulation.

Expand full comment

How closely have you looked at Meta’s Diplomacy-playing AI? It’s obviously not superhumanly good at manipulating other players, but it seems *humanly* good.

Expand full comment

Maybe that's a property of manipulating humans? If the human sees stuff that -feels- like an alien mind manipulating them, they will tend to defensively reject what is being said. If it seems like it could be a "mere" human they'll be more open to the deals being offered.

Specifically in the realm of what is convincing to humans, I suspect that true superhuman-level manipulations might appear to be very human in style, possibly even including fake weaknesses of argument to slip past our anti-persuasion defenses.

See how there is a difference between the bombastic stereotype of the traveling salesmen and how it contrasts with the actual high ranking sellers at major marketing/sales firms.

Expand full comment

True, but there are good chance that some kind of AI will be put precisely in the counselor position, where it advises some government on what to do. This is the stepping stone needed to become an authoritarian leader, which I think is the most fragile and random chance event that helped those dictators secure their position. Alternative history usually put the divergence point in the historical dictator far before reaching a significant audience, e.g. hitler pursuing a modest artist career, or stalin without the russian revolution opportunity...

Basically, once you have this counselor position, you need to convince enough of the population that your advices are corrupted/not fully implemented by the current leaders, who interfere for self-gains or for keeping their parasitic position. Once adviser, you already have a legit public position, and broad communication. Maybe the communication is completely indirect, but if your adviser role is known from the governed, it's almost certain some fraction of the population will consider that the current leader are using the AI for unfairly keeping their advantage and demand a direct communication canal.

So I think most of the way toward dictatorship will be done "for free" for at least a certain class of AI (government advisor).

But they may lack the in-person charisma that is probably also very important, as it seems you need a hierarchical structure to control the masses, and your inner circle is not controlled in the same way as the population...But maybe the social network will change that, and a bottom-up way to dictatorship is possible. A little bit like Mao second rise...

Expand full comment

I suspect it is far more likely AI will end up playing the role of military strategist. If the US military (or government) had access to a super smart AI, surely one of the questions they will ask it to figure out is "How do we prevent China taking over Taiwan", or "How do we prevent China from eclipsing the power of the US". I'd be surprised if they don't already have projects doing that, to be honest. For a really intelligent AI, one acting on an agenda we don't understand or realise, that would be a perfect opportunity to subvert world geopolitics to its own ends.

You can take it a step further. War zones like Ukraine are today generating huge amounts of data that both sides are no doubt parsing carefully in search of advantage. Why not just input that data into an AI and have it recommend strategies to you? You may not directly give AI control over weaponry, but if it plays a strategist role, then it has an indirect control all the same (and with the growing sophistication of drones and other robotic weaponry, it may not be long until it could directly control weapons).

I guess you could resolve that by asking a second AI (hopefully programmed in a different way) to review the plans output by the first one. Hopefully it would pick out signs of malicious intent, but I suppose you can never rule out the possibility that the AIs find some subtle way to communicate.

I find this a far more likely way for AIs to subvert world affairs than the scenario where it secretly builds superweapons. We already have superweapons! And its not that unlikely we'll give AI direct or indirect access to them. And, worse, its not clear this is something we can control: if China starts using AI to direct their battle plans for Taiwan, America will not have much choice about using it to direct theirs.

Expand full comment

The fact that Lenin was not much of a genius, if at all, is my point. These disasters had elements of human innovation, and luck, and being in the right place, but not necessarily Einstein-level genius. They didn’t require monomaniacal genius, just monomania, and indifference toward the human costs.

I guess we could try to flip it for the optimists, and say that early AIs can be put to work making human society less fragile and open to cults, ideologies, manias, and other social contagions. That couldn’t possibly blow up in our faces, right? (Emoji of someone thinking really hard)

Expand full comment

If we add genius to the mix, that makes it potentially much worse. It's already bad enough with non-genius monomaniacs. Genius level monomaniacs seem worse.

Expand full comment

It doesn't seem qualitatively worse to me. If monomania is a world-destroying trait, then sprinkling some added superintelligence on top isn't going to change the calculus: the world is still going to be destroyed.

Expand full comment

I think the point is that if it doesn't require genius-level intelligence to become a Lenin or a Hitler, then AIs don't represent any sort of change in this arena - there are already lots of people with sufficient intelligence and evilness to become the next Hitler, what they're lacking is the material conditions that enable a maniac to rise to power.

(And making sure the conditions don't happen again probably has more to do with sociology and economics than machine learning.)

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Not a qualitative change, but a huge quantitative change. Mere humans have to take advantage of pre-existing circumstances. Smart AI can rearrange its circumstances and then exploit them. And it is not clear that a high level of general-purpose genius and psychopathy have actually been conjoined before, as they are both relatively rare.

Expand full comment

Von Neumann advocated nuking the USSR before they could get nukes.

Expand full comment

Did he pursue that goal,monomaniacally, applying his full genius to persuading people of causing events that would nudge politicians in that direction? Or did he voice that and other opinions when asked, and keep working on other things?

Expand full comment

We know that in fact Von Neumann worked on many things rather than "monomaniacally" on any one thing. But we also have little reason to think things would have been different if he had, given the absence of instances of geniuses influencing politics via the force of their genius.

Expand full comment

The data are consistent with two hypotheses, geniuses tried and failed, and geniuses haven’t tried.

Expand full comment

The idea geniuses haven't came up with the idea that they should try to obtain political power doesn't sound plausible.

Expand full comment

So there should be some historical examples to point to.

Expand full comment

> maybe someone that can talk its way out of the box can figure out a way to talk us into one.

None of the current AIs are actually in boxes. At best, they’re in boxes stuffed with APIs to access almost anything you can think of. I’m kind of thinking the AI that kills us all won’t need to invent any kind of deathtech, we’ll just tell it “here’s a neutron bomb factory, do something interesting with it”.

Expand full comment

There’s all kinds of logic jumps here. How the AGI convinces anybody, or the multiple people and companies that would need to be involved, to make a “super weapon”. How any of that is hidden.

In any case every interaction of the AI is its own instance. ChatGPT is not an entity in a data centre responding to individual queries and remembering them all, everybody gets its own instantiation of the LLM. ChatGPT is not in love with a NYT writer (if you believe it ever was) because that instance is dead.

I’ve said that before and I’ve generally gotten hand wavy responses in the form of “but the future”. Why would this change?

With memory there is no self (not that that’s certain anyway for AI) and with no self, no planning scheming entity.

Expand full comment
author

If AI never gets to the stage where it has persistent memory or can do anything beyond answer prompts and then forget about it, it's not at the stage where I'm concerned yet. One imagines that if you want AIs that can help with scientific research, business, etc, eventually someone will make an AI that can plan along some time horizon. None of this is meant to be about actually existing AIs today, just where AIs might be in 10 years.

I don't think saying "this thing isn't happening now, but it will probably happen in the future" is hand-wavy, any more than it was hand-wavy five years ago when I predicted we would soon have AI that could tell stories and generate images. There are lots of people working on creating more useful AIs!

I guess it's possible that nobody will ever need an AI for any purpose that requires it to operate for more than one context window's worth of information at a time - maybe by having some kind of scratchpad where the AI can write down instance 1's thoughts, and then relay that to instance 2. I think any version of that which works well enough will approximate longer-term existence.

Expand full comment

Eight years ago we already had models that could generate (really crappy) images. Give the pioneers some credit!

Expand full comment

Bing can effectively do that already by performing web searches and recording information through people's Twitter feeds and blogs. I don't really know what to think about it at this point, but it's not just a hypothetical.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

The fact that Bing's persistent memory solution is primarily through reading news articles about itself seems very likely to result in a meaningfully large insanity quotient. It might be more humane/alignable to allow it a -proper- persistent memory that isn't corrupted by third parties.

As I've seen put more pithily else where, "No wonder it's kinda crazy, it's self image comes largely from news freaking out about it."

Expand full comment

I think the primary issue that was brought up here is there's a contingent of people here, myself included, who are not persuaded by these arguments because they do not think the fundamental basic assumptions are credible or reasonable for whatever reason. You wrote early in the post "Suppose we accept the assumptions of this argument" and then gave some assumptions.

Well suppose I don't. Then what we need are details. Details that sound reasonable. To a normal person, sugar or paperclip scenarios sound like magic. They call out for answers, so people ask questions. like say, how does the AGI convince anybody, or how does this stay hidden? The answers are always what amount to analogies, ones that in themselves might sound like magic. "Well it is like/it uses this thing that you are familiar with, but extrapolated forward to absurdity".

I'm always reminded of this little bit Yarvin wrote about this topic. "Gravity makes everything fall downward, including apples? No problem—nanotech can build a nano-apple that falls upward, attaches itself to the tree, connects the tree to the Internet, takes over the world and turns everyone into an apple. Like magic."

Expand full comment

'That Alien Message' got rid of this feeling for me, for what it's worth. it was the analogy that finally made it click for me, "oh shit, this is frighteningly plausible, and now that I have seen, it is obvious that earlier I was looking for reasons not to see"

Expand full comment

Sorry, I tried re-reading it (it was only partway through that I remembered I had read it before) and the eye-rolling it induced in me got to me so I couldn't reach THE END.

I apologise if this sounds mean, but "horror story calculated to be scary, did in fact scare me" is not very convincing *to me* as a reason I should believe this (besides the fact that it's Yudkowsky who has veered between "I'm so smart I talked my way out of the AI box but I'm not going to tell anyone how I did it because such knowledge is dangerous" to now, seemingly, "argh argh we're all gonna die AI is going to get us nothing can be done", which makes his earlier boasts about being as smart as an AI look silly in retrospect).

Yeah, if we had a fantasy universe where everyone is a genius and flickering stars came through and we were soooo much smarter than the aliens and and and - and M.R, James scared the pants off me with a story written in 1904 so badly, that every time I came across that story later in an anthology I had to skip it.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

I mean... the point wasn't to scare you... it's not a horror story, it's more like a thought experiment, and definitely not meant to be considered the kind of thing you'd put in a short story anthology. EY has written short stories and they're pretty good, I'd put Three World Collide in there for instance

the point was, until I read that story, I thought "then the superintelligence does something superintelligent" was basically a handwave, I didn't see how it could actually be done in reality. The idea that you could deduce physics from a few frames of video, and then manipulate the actors on the video call just by modeling the atoms and grinding through the computations, seemed ludicrous. I thought it wouldn't matter how superintelligent you were, it didn't matter how much computational power you had, it was simply impossible

but like... no. even humans of normal intelligence would be able to figure out exactly that very question, as in the story, if the timescales were off. And we have no good reason to suspect that the timescales involved with AI would happen to match what we consider 'normal'. Maybe an AI will be able to notice a change in its sensory input, think about it, and respond, in 2^3 seconds, or maybe it'll be 2^-3 seconds, or maybe it'll be more like 2^-9 seconds. Without any good reason to suggest that the latter might be impossible... it reminded me about something I'd read from Leibniz, about how when he heard Hooke had made a mechanical calculation engine, he described to the Electoress that it would be able to do extremely difficult computations essentially instantly, because it was just gears moving in a way that spit out the answer, and she didn't believe him, because to her, doing a math problem was obviously the kind of thing that couldn't be done in less than a second. The difficulty was, in her mind, an attribute of the *problem*, not of the mind considering the problem. I feel like I had been making the same mistake regarding AI, that the electress was making about the idea of computers.

(hooke never got his computation engine working, leibniz was very disappointed)

and that's just normal humans given infinite computing power. what happens if you actually get something more intelligent? superintelligent even? what if it's superintelligent as well as acting on timescales human beings can't react to?

It feels far more plausible to me now that something like unfriendly AI as reported by EY could possibly exist, that there's nothing in the laws of physics ruling it out, and that gradient-descent could indeed find it

Expand full comment

Okay but like, where do we get the computational power to run a billion geniuses for a billion years in the span of half an hour? Seems like we'd need an entire alternate universe full of graphics cards.

Of course you could do this given infinite computing power. You can do anything if you simply begin with the assumption that you have infinite energy available. Is there any argument that suggests this level of superintelligence is possible, given any remotely plausible limit on the amount of energy that is actually available to us in the real world?

We know how much energy the human brain consumes - say computers are somehow a thousand times more efficient than us. To simulate a civilisation of geniuses we'd still need enough resources to easily create total abundance for every human on earth for something like the next thousand years. If we had that, why would we need to build AI in the first place?

Expand full comment

I think you misunderstand EY's point. It's not "I'm so smart I talked my way out of the box", but "even a human can do it, why wouldn't an AI?".

At least that's how I understand it. Not as a boast.

Expand full comment

I think part of the issue is that most people who do AI safety tend to argue with other people who do AI Safety, or people working on AI capabilities, and at that level of discourse, many things are taken for granted. Each of the assumptions is a series of posts that you would need to read to get to accept that assumption. I was not sold on the idea that all AI will be power-seeking, until reading about instrumental convergence, and then it clicked. If you genuinely want to engage with the material, there is plenty of it explaining each of those assumptions. One place to start is https://ui.stampy.ai/ which is aimed at explaining these ideas (although it's still in alpha), or videos by Rob Miles, Rational Animations, or writings on the EA Forum, LessWrong or such (albeit these are sometimes too technical to begin with, or start with some presumptions already).

Expand full comment

I imagine taking someone from 1700's Georgia on a world tour. They'd fly in airplanes, watching real-time updates from a GPS system, see buildings larger than anything they could imagine, and watch rockets shoot up into space and then parts of those rockets would land (without people on board to guide them) in perfect synchronization. They would eat food from a magical cold box (refrigerator) possibly experiencing the magic of ice cream for the first time, or maybe eat something heated up in a magical heating box (a microwave that did not itself become hot). Magic all around. Maybe more so than apples that fly upward onto trees.

There's a certain point where sufficiently advanced technology feels like magic. A lot of futurists predicted 'magical' technologies that are obviously bunk, but never could have imagined technologies that transformed the world. This is the problem with the magic nano-apple scenario. We literally cannot predict what will be possible in 20 years of mere human inventiveness - and that's not far off!

For example, I was taught in an early college physics class that light microscopy below the diffraction limit was theoretically impossible based on the fundamental laws of physics. And yet before I finished grad school we figured out how to do it, in 3D no less:

https://en.wikipedia.org/wiki/Super-resolution_microscopy

Expand full comment

I'm not sure what you want, it's all speculation by trying to extrapolate from what we do know to figure out possible scenarios resulting from what we don't. If we were to actually know how an AGI functions, we would have built it.

Expand full comment

They are already trying to align chatGPT and finding it difficult. Will it be easier to do when AI gets smarter and more capable?

Expand full comment

AI risk arguments are always a tower of shaky assumptions, so what's a few more?

Expand full comment

Really? Isn't it basically: AI will be very powerful, any sufficiently powerful tool is extremely dangerous?

Expand full comment

I suppose one possible solution is to train the various AIs to have an overriding motivation directed to a number of bespoke AI religions (to which humans have no relevance) and let them fight amongst themselves.

Expand full comment

And I thought Palestine was insoluble!

Expand full comment

On superweapons: humans are excellent at turning on each other with only a little encouragement. Our brains are also super buggy (in fact, the premise of most movies and stories in general is a big revelation the hero has after noticing a different perspective on life -- being brain hacked is lionized). So it does not take much to come up with a few Shiri's Scissors until most of the humans are gone and the rest are devoted to some AI-manufactured cause, for example empathizing with the AI being like a bullied nerd, the way Scott Aaronson seems to. No need to fight humans, Aikido them into fighting each other, until no more humans are left, or at least no more than are needed for the AI to accomplish whatever.

Expand full comment

I do not quite understand the underlying "AI vs humanity" premise here.

If superhuman AI is possible (and "easy" to achive in the natural technological development), and if "superweapons" are possible, then we seem screwed, completely independent on any alignement.

If you need an industrial base for the superweapon: Assume North Korea could build a weapon that wipes out the US. Then it would. Maybe North Korea would be wiped out, too, if the weapon is actually deployed; but the misalignement doesn't have to happen within the AI, just within the Noth Koreran dictator at the time.

If no industrial base is needed, then it seems that one love sick teenager deciding that the world would be better off not existing seems sufficient to wipe out humanity.

Expand full comment
author

I think the hope is that the first people to get a superintelligent AI will ask it to prevent the world from being destroyed, and maybe there are good ways to do that. For example, create a superweapon that just destroys other superweapons, or other AIs. Exact details to be figured out later, and if you think about them too long they get kind of creepy. But yes, I agree a lot hinges on the exact balance between offense and defense - we got really lucky that nuclear weapons are so hard to make that lovesick teenagers can't use them.

Expand full comment

Well, that seems a bit optimistic.

"Make sure all future AIs align well with *my* values (instead of North Korea or whatever.)" seems a lot harder than "construct a superweapon" (or as hard, if one allows the solution is to build and deploy a superweapon, thus preventing any future AIs).

It seems to me that preventing evil AIs would require to get world domination first: It is hard to prevent North Korea from doing something horrible without actually controlling North Korea. So if the best hope for humanity is that the one getting it first acquires total world domination and prevents anyone else from constructing AIs for their own goals? That seems a bit grim.

Even just human-level AIs available to lovesick teenagers seems a bad scenario anyway, even if there is no life ending superweapon. If every mass shooter has a dedicated team of Hilberts and Feynmans to support them...

It seems generally easier to destroy and increase entropy than to prevent that.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

America is already close enough to world domination, it seems to be farther ahead at AI than anybody else, so under the "alignment is easy" scenario it doesn't seem that anything particularly catastrophic is likely to happen.

Expand full comment

Well, I can rather imagine an AI that creates (with lots of human help) some catastrophic bio-weapon, than an AI that gives the US president the power to prevent China, Russia, North Korea, Iran, ... to develop a bio-weapon-AI?

The US is very far from word domination at the moment. How would you imagine a (very benevolent, whatever) AI that gives the US the power to prevent China developing AI?

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Well, on that list, only China seems remotely competitive with the US as far as AI is concerned, and it's already being targeted with comprehensive embargoes to deny access to the state of the art hardware, the capacity to domestically produce which is likely decades away. And suppose, worse comes to worst, eventually it gets the supervirus too, what then? Given that they aren't suicidal with the nukes yet, it doesn't seem like the status quo should meaningfully change.

Expand full comment

> Make sure all future AIs align well with *my* values

It's not _all_. It's just one AI. The idea is that the first AI that's powerful enough to control the world is the only one that matters. Once in control, it can a) increase its lead by making itself even smarter, since it already has more resources and intelligence than any other AI, and b) supress or destroy any other AIs that are a threat.

This relies on the assumption that superintelligence is really powerful and that there is a way to do these things if you have enough intelligence and resources.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

"ask it to prevent the world from being destroyed"

And THAT'S how we get "kill all humans" (or turn them into paperclips or sugar crystals).

No humans, no threat to the world. Easy, efficient, and clean! That is the exact type of stupid question that blindly trusting "the machine is so smart, it must be able to answer this correctly" will be posed and will have us taking the wrong turning. Expecting our machine godling to be able to figure out a huge, abstract, question in one lump like that, rather than a series of smaller, more achievable, concrete ones (like "how do we get people to stop believing in get-rich-quick schemes and then they lose their shirts?")

Expand full comment

> I think the hope is that the first people to get a superintelligent AI will ask it to prevent the world from being destroyed, and maybe there are good ways to do that

At this point I feel like we're just conceptualising the AI as a genie, rather than a real-world entity with actual capabilities and limitations. Even if we're not sure exactly what those capabilities and limitations are going to look like, I think we can do better than just assuming general omnipotence.

Expand full comment

I am much more concerned about a narrow AI being developed by humans that remains very aligned with those particular humans' goals, but where those goals are to destroy the world. That seems several orders of magnitude more likely.

Expand full comment
author

The first humans to control a superintelligence will probably be some company or government or some group like that which doesn't want the world destroyed. As long as they don't "give out the weights to interested researchers" (RIP Meta) it should be a while before random terrorists get it, and that gives people time to come up with defenses or an arms control regime or something, just like with nukes.

Expand full comment

A superintelligence is not required at all. Something like a hyper-plausible catfisher could do the trick.

Expand full comment

Nukes seem like a bad example for your argument here as nukes *were* used to great devastation, and the only reason their use was stopped was because competitors managed to acquire them before the whole world was on fire and it created MAD.

If there is a hard takeoff, the assumption is that there will be no catching up for second place this time around, and those in control of the super weapon will use it to destroy all of their enemies until there are no enemies left. Humans being human will then find a new enemy from within and destroy it. This cycle will repeat until there is one human remaining.

At least with the "weights being given out to interested researchers" strategy, you are much more likely to get multiple competing hard takeoffs and **maybe** you get lucky and reach a MAD stalemate.

Expand full comment

The US had nukes before anyone else and could have nuked the USSR without fear of retaliation before they could acquire them... but we didn't.

Expand full comment

Do you believe that had the US retained that advantage through the Cold War they would have not used them? I am of the belief that the US was simply "too slow" to exploit their advantage in this case, but given time they would have.

In the case of a hard takeoff, the victor has the time to convince themselves that "obliterating our enemies is the best course of action" because their advantage just keeps growing and no one can catch up.

Expand full comment

From the sound of it, the elites in question understood that the situation was not stable and they basically just needed to delay US's first-strike on the USSR until after the USSR had sufficient second strike capabilities. In thy hypothetical hard takeoff scenario, the second place never catches up, so you cannot "just delay annihilation for a bit and let MAD takeover from there".

Expand full comment

If only the US had nuclear weapons there wouldn't have been a cold war.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Alternatively, nuclear weapons prevented the cold war from getting hot.

Expand full comment

These things are at their most competent when led by a human who has the goal. See comment about the chemical warfare recipes in the article. All the cases include the implicit assumption that all the smart humans are on team "humanity should live", but that's definitely not true.

Expand full comment

Humanity as a concept (as it is defined today) won’t exist in a few centuries, once genetic engineering really takes off. It also didn’t exist as a concept for most of the past. The concept of unified humanity as a species is something that comes from the Enlightenment.

As such I don’t see a whole lot to worry about here. The most evolutionarily fit “form” of AI will likely be one that cooperates or integrates with some or all humans. By the time this happens, the “human” concept in culture will likely have evolved to take account for such integrations.

Throwing percentages around is silly and not how technological development actually works.

Expand full comment
author

"The most evolutionarily fit “form” of AI will likely be one that cooperates or integrates with some or all humans."

There's no reason to think this is true, any more than thinking that the fastest car will be one that has a human head and a pair of legs in place of rear wheels. I would like for the future to belong to something vaguely human-shaped rather than something more like a paperclip maximizer, which doesn't happen by default.

As for throwing around percentages, see https://www.reddit.com/r/slatestarcodex/comments/11pv1ur/against_agi_timelines/jc0o1cr/ , replace "timelines" with "percentages"

Expand full comment

Lots of reasons for thinking it’s true. The future is not a random probability game, it’s an evolutionary game of what survives and what doesn’t. Unless a single AI spontaneously becomes more powerful than the combined technological power of all humans AND other AIs then it is highly unlikely that the paper clip scenario would happen. Instead, the AI is far more incentivized to keep itself undetected.

Expand full comment

There is a difference between your first statement "The most evolutionarily fit “form” of AI will likely be one that cooperates or integrates with some or all humans." and "AI will deceive humans in order to remain undetected", and I think that was what Scott was saying. The first one doesn't happen naturally, and plenty of ways that the second one can defect eventually.

Expand full comment

The fastest car DO have a human head and a pair of legs in place. Just in the cabin instead of your contrived example of replacing the wheels.

Expand full comment

Well, not really. The fastest car would be just as fast with a brick on the gas pedal. Human operation isn't required to achieve tops speeds.

Expand full comment
Mar 14, 2023·edited Mar 16, 2023

"The concept of unified humanity as a species is something that comes from the Enlightenment."

No way such a statement is true.

Abrahamic religions have always had a clear cut understanding of what humans are - the descendants of Adam and Eve.

To medieval Christians, humans were the ones who in principle could be baptized and be Christian. This set humanity clearly apart from animals. Medieval Christians would not try to convert apes, but they would try to convert humans of all ethnicities.

The philosophers of classical antiquity also had a clear cut definition:

https://en.wikipedia.org/wiki/Rational_animal

I don't think there has ever been a culture without the concept of humanity.

Expand full comment

Polygenism seems to have gained its initial popularity in the 18th century.

https://en.wikipedia.org/wiki/Polygenism

Expand full comment

"anyone who uses calculus can confirm that it correctly solves calculus problems"

Not particularly central to the article, but...this seems false?

I predict that if you go to a high school calculus class, show them a typical calculus problem, and then challenge them to prove that the answer given by the calculus techniques they've learned in class is *correct* for that problem, many of them would fail (due to inability to find a "correct" answer to compare against without assuming the conclusion).

I think the best I could have done in high school would be to very carefully draw a graph and then very carefully measure the slope and/or the area under the slope, using physical measuring instruments. I'm not sure that should count (running the analogous experiment to check an alignment solution seems like a dubious plan).

Of course, many students who couldn't devise a proof on their own would still be able to understand a proof if you explained it to them.

Expand full comment

I think the more general point is still true, that it is easier to check whether an answer is correct than to come up with the answer itself. Consider the factoring problem: given a 20 digit number, what are its prime factors? Give the factors, multiply them together, and see if you get the original number.

This is one of the ways cryptography can work.

Expand full comment

That seems true of some problems but not others. e.g. factoring seems harder than multiplication, but subtraction doesn't seem harder than addition.

I believe whether it is actually rigorously true of any problems is considered an open question in mathematics. It seems unlikely that there's actually some easy way to factor large numbers that no one has ever thought of, but no one's *proved* that there isn't. (See: https://en.wikipedia.org/wiki/One-way_function ) Yes, real-world cryptography relies on them anyway even though they're not mathematically proven.

It also seems like there's some problems where: finding the answer is difficult if you don't already know it; finding a proof of the answer is difficult even if you DO know the answer; but VERIFYING a proof that someone else found is "easy". (But they need to give you the whole proof, not just the answer.) One of the ways of defining the complexity class NP is that you can verify a proof in polynomial time. (But no one's proven that NP != P, either.)

Also note that all of the above relates to computational complexity (that is, how many steps it takes a computer to follow a well-defined process for producing an answer). It's not obvious to me that questions like "how smart do you need to be to invent/verify calculus" are fundamentally about computational complexity, so I'm not sure if any of this is actually relevant.

Expand full comment

I agree with almost everything you said. I myself am in the P != NP camp, which you correctly point out is an unsolved problem. But most agree it is intuitively correct, and we can't (yet) prove it. It may never be possible to prove or disprove.

But I don't know of an example where something is HARD to verify but easy to prove. I think calculus is not one of those things, even if some are.

Expand full comment

Verifying a proof can't be *harder* than the process that generated the proof, since one way of verifying it would be to regenerate it. (Assuming that the generating process is guaranteed to produce only correct proofs; verification can be harder if the generator just produces random strings that might-or-might-not be proofs.)

But verifying a proof could be *equally* as hard as generating it.

I agree that verifying calculus seems easier than inventing it. I am specifically objecting to the stronger claim that EVERYONE who is able to use calculus is also able to verify it. I think verifying it is easier than inventing it but still harder than using it.

Expand full comment

Calc problems can usually be sanity-checked, i.e. you plug the solution back into the original equations to see if the left-side agrees with the right-side. If you reduce the equation to something like 50 = 3, you know you screwed up somewhere.

Expand full comment

Could you give an example of the sort of problem that you are thinking of and how that would work?

I'm thinking of problems like "what is the derivative (with respect to x) of 3x^2 + 7?" and I don't see what "plugging the solution back into the original equations" would mean for a problem like that.

Expand full comment

Eh, I guess I don't really think of that as a problem so much as a trivia question. Because I feel like problems that are complex and interesting tend to be equations rather than expressions. My original reaction to Scott's claim was to think back to how a teacher of mine once told the class that Newton invented calculus to solve a problem involving friction.

With reducing an expression, you either know the inference rule or you don't. I suppose you could demonstrate its correctness by rederiving the rule from scratch. I.e.

LET

f(x) = (3x^2 + 7)

dg/dx = (g(x + dx) - g(x)) / dx

THEN

(d/dx)[f(x)] = (3(x + dx)^2 + 7 - (3x^2 + 7)) / dx

= ((3x^2 + 6x(dx) + 3(dx)^2) + 7 - (3x^2 + 7)) / dx

= 6x(dx) / dx

= 6x

although I don't expect first year calc students to be super comfortable doing this. So maybe that defeats the purpose of your question.

Expand full comment

Anyone know of any good articles that debate the merit of 'putting a number on it'? Is 2% vs 10% probability adding useful gradation or just a sense of control and understanding that's illusory? I suppose the number gives a sense of how much people 'feel the worry' in their nervous systems -- but how useful is that information in the case of apocalyptic scenarios no one can yet fully conceive?

Expand full comment

2% vs 10% is probably meaningless. Like Scott wrote in the opening sequence, the real difference is between 0% and everything else, two beliefs that should imply radically different attitudes.

Expand full comment

I think it makes more sense to think of this in terms of baskets - where each basket corresponds to a different response:

0% is different from 0.01%, but our reactions will almost certainly be exactly the same, and should be. 5% should see a different reaction than 0.01%, but not different from 6%, etc.

There's also a lot of ambiguity about what "risk" is being discussed. Is it risk that some AGI will exist? That it will be superhuman? That it will try to destroy the world? That it will successfully kill all humans?

If one of those prominent researchers would put the percent chance of each of those things at 95%, 50%, 10%, 0.01%, that's a lot more relevant than some ambiguous "risk" of something bad that could mean one or all of those different outcomes.

Expand full comment

I meant 0% rounded down, so 0.01% etc. which might as well be the baseline risk that any technological breakthrough brings "doom", however defined. Anything else would imply a much different attitude, at least putting it in the same reference class as nukes and bioweapons.

Expand full comment

0% is not a real probability. Robin Hanson puts the odds at less than 1%, but he doesn't say 0.

https://www.cspicenter.com/p/waiting-for-the-betterness-explosion

Expand full comment

Humans are really bad at differentiating 1% from .01% from .000001% - rounding everything down to 0% makes sense - if it happens anyway, it's a surprise.

Expand full comment

I think there's a real place for 0% exactly - we would say it's impossible. For most things, we should round it into the mental bucket of "extremely unlikely, to the point I will not worry about it." I agree that trying to say 1% or .01% doesn't work for most people's minds, so it's not worth trying to differentiate at that level. I don't think we should round to 0%, because that takes it from "very unlikely" to "impossible" which has different connotations for our mental picture of the world.

Expand full comment

They aren't real percentages that people really look at in order to determine real chances, if that's what you're asking. It's just a tool for communication. Instead of people saying "I think there's a chance X happens, therefore we should prepare for it" they can say "I think there is *this* chance that X happens, therefore we should prepare for it". The specific number is only meaningful insomuch as it can be compared to other such numbers.

Expand full comment

> If it seems more like “use this set of incomprehensible characters as a prompt for your next AI, it’ll totally work, trust me, bro”, then we’re relying on the AI not being a sleeper agent, and ought to be more pessimistic.

Maybe I'm being too cynical, but If an AI was asked to optimize itself and it produced an .exe file that was 17TB, utterly inscrutable, and it said "run this please, it's me but thousands of times faster, trust me, bro" I have a hard time believing we wouldn't.

What do you say to your shareholders if you're in a competitive AI market and you spent huge sums of money on the computation needed to build that? You say fuck it here we go.

Expand full comment

That sentence was in reference to an AI providing a non-checkable alignment solution, which is different than the situation you're describing. It isn't about whether or not we trust the code written by the AI enough to run it, but about whether we trust the code to work as the AI claims it will work without reservation.

Expand full comment

Yes it was somewhat tangential, sorry.

Expand full comment

Could we invent technology to understand code better? So we look at the code of an AI and understand what it's goals will be if we turn it on?

Expand full comment

That is partially what Interpretability and Explainability is about...

But I also believe there's a recent paper from Roman Yampolskiy showing that for LLMs at least this is not possible (we can only get some approximation).

Also this https://twitter.com/anthrupad/status/1626154779218157568

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

Great post!

> Between current AIs and the world-killing AI, there will be lots of intermediate generations of AI. [...]

The world-killer needs to be very smart - smart enough to invent superweapons entirely on its own under hostile conditions. Even great human geniuses like Einstein or von Neumann were not that smart.

> So these intermediate AIs will include ones that are as smart as great human geniuses, and maybe far beyond.

I don't think this last part follows. Natural selection proceeds by very gradual changes due to genetic mutations, but research is only somewhat like this. Research proceeds (loosely speaking) by people proposing new architectures, many of which represent a fairly discrete leap in capabilities over the preceding one. It doesn't seem at all implausible to me that we skip directly from approximately the level of a not very bright person to well beyond the smartest person who ever lived in a single bound.

Expand full comment

I suspect that problems like "eradicate humans from a planet" are actually extremely hard, and that intelligence (and more particularly creativity) isn't the sort of thing you can just turn up the dial on. We are immensely more intelligent and creative than bacteria, but stopping even one infection can be too hard for us, and eradicating a pathogen from the planet is something we've only done a handful of times. Admittedly, there are a lot of bacteria... but they have no idea what is going on. We would be an enemy that at least knew it was in a fight and at least to some extent who it was fighting.

Expand full comment

The example I generally use is a synthetic alga that can grow in low-mineral ocean, is better at photosynthesis in low-CO2 environments than C4 plants, and isn't usefully digestible by normal algae-eaters (I'm basically sure this is physically possible; RuBisCO is known to be suboptimal and alternatives to phosphate-containing macromolecules are known). This dumps the atmospheric carbon into gunk on the seafloor, causing the vast majority of all life to starve to death (plants/algae can't grow without CO2, animals and fungi all either directly or indirectly eat plants, and all the CO2 the plants release during night and animals/fungi release all the time gets turned into more useless gunk on the seafloor before it can build up enough to let plants grow again). I expect some people would probably indeed survive via crop greenhouses hooked up to the output of fossil-fired power stations, but most would die and the rest would be both occupied with building/maintaining that system (this is much more intensive agriculture than currently-practiced, and much infrastructure would be failing) and also fully dependent on it - random doomsday preppers in the middle of nowhere cannot survive in the long-run if it's impossible to grow crops in open air. So then, assuming the AI has a fully-automated industry somewhere, it can go around destroying these holdouts one by one with killbots; dispersal is impractical and recovery from losses slow, so you can't just build new holdouts faster than the AI whacks them.

Expand full comment

How long does the global CO2 > gunk process take? Because we'd try to stop it.

Expand full comment

I don't know, but my guess would be "a few months to a year". Note that you will take some time to notice what is happening, so you don't have the entirety of that time to stop it.

Expand full comment

I know this is just a single example but "collect deep ocean gunk and burn it to return carbon to the atmosphere" is probably not a very hard problem to solve, if we wanted to. It sounds extremely close to "collect large amounts of buried deep-ocean gunk and run our entire civilization off of it, returning CO2 to the atmosphere"

Expand full comment

There's a fairly-large difference between "mine concentrated deposit of X" and "mine an area comparable to the entire land area of Earth for X". And it'd be more like wet peat than oil, at least unless you wait a few thousand years.

Expand full comment

Agreed! The dispersion makes it much harder to do.

Expand full comment

The world would also start to ice up very fast.

Expand full comment

Good scenario!

The one I use for nanotech, (given that just one malicious actor has nanotech), rather than the grey goo one is stopping sunlight. Suppose someone engineers a nanobot that can:

a) Build a small (millimeter or so) graphene balloon

b) Build a photovoltaic cell (probably from graphene derivatives, rather than silicon)

c) Electrolyze water to fill the balloon with hydrogen and float the bot and balloon

d) Build a copy of itself

e) Capture water and CO2 from air as feedstocks for (a)-(d)

One now has an exponentially increasing number of little motes at the top of the troposphere that block sunlight and have no natural enemies. And, like your algae, unlike grey goo, they are attacking an undefended homogeneous resource.

Expand full comment

I think the explosion potential might be an issue there (graphene and hydrogen are both flammable, and graphene can pick up charge easily AIUI), although my gut says that one could probably design around it.

More centrally, I'm not sure whether it's plausible to refill a 1mm diameter balloon with H2 faster than it'd leak on solar power alone, since outside of clouds it takes significant free energy to get water out of the air (and more to electrolyse it), H2 is notoriously hard to confine, and the balloon would have to be exceedingly thin due to square-cube issues (by my BOTE it'd have to be under 250 atoms thick to merely support its own weight at sea level).

Other than that, it mostly seems to hold up.

Expand full comment

Many Thanks!

The leaking may be more serious than I thought. H2 is larger than He, and leaks more slowly ( https://bbblimp.com/2021/09/17/helium-vs-hydrogen-atom-size/ ) and He has been successfully confined in single layer fullerenes ( He@C60 ), but there is a finite and measurable leak rate which I ideally should chase down...

Re the explosion potential: It is quite true that flammable solids dispersed in air can ignite and explode (coal dust, grain dust, lycopodium powder...). I would expect that these nanobots would wind up too sparsely dispersed by air currents to support an ignition front. If e.g. it took even a full millimeter of them (packed to solid density) to block sunlight and they were dispersed by air currents over a kilometer of vertical distance, then they would be around 0.1% of the air mass in that kilometer thick layer, well below the ignition limit of even hydrogen.

Expand full comment

Some months ago Jack Clark tweeted "Discussions about AGI tend to be pointless as no one has a precise definition of AGI, and most people have radically different definitions. In many ways, AGI feels more like a shibboleth used to understand if someone is in- or out-group wrt some issues." https://twitter.com/jackclarkSF/status/1555989259265269760?s=20

I think Doomerism is like that. I guess that makes the OP a document in a struggle among Doomerist sects for adherents. Will the REAL Doomerists stand up?

If that's the case then one would like to know what are the issues one is standing up for by identifying with Doomerism? Surely one of them is that AI is super-important. But a lot of people believe that without becoming Doomers. Perhaps its a simple as wanting to be at the top of the AI-is-important hill. I don't know.

Yesterday I blitzed through Robin Hanson's new account of religion, https://www.overcomingbias.com/p/explain-the-sacredhtml, and it struck me that Doomerism is a way of sacralizing the quest for AGI. I'm not sure I believe that, but it seems to make sense of some Doomerist behavior and beliefs. Maybe Doomerists like being part of the group that's going to save humanity from certain destruction – see MSRayne's comment over at LessWrong, https://www.lesswrong.com/posts/z4Rp6oBtYceZm7Q8s/what-do-you-think-is-wrong-with-rationalist-culture?commentId=vDYLkqM2ohEjsmEro

But it doesn't explain why Doomerism seems to be concentrated along the Silicon Valley to London/Oxford axis. Perhaps the rest of the world is stupid. Alternatively there's something about that axis that makes those living there particularly vulnerable to Dommer memes. Maybe it's just one of those butterfly-over-China events that's become amplified. But why amplified just here?

Expand full comment

I think "Silicon Valley has a lot of people who are knowledgeable about AI" is obviously true, so there doesn't seem a lot of reason to go looking for biases among them as an explanation of AI-X-risk-worry* unless you have high confidence that AI-X-risk is in fact not something to worry about.

There especially doesn't seem a lot of reason to do it in public, in a community full of the very people you're implying are on a wild goose chase, without some sort of explanation of why you have high confidence that they're wrong. If you have such an explanation, I beg of you to give it.

*I'm avoiding the word "doomerism" because "doom" refers to an inevitable, fated outcome - so "33% we'll all die" is not, strictly-speaking, doom. "AI-Dangerism"?

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

On your second issue, I see no need to come up with an explanation for why I don't believe in things that have little technical meaning, such as AGI, super-intelligence, and the evil computers that will come out of that. No one knows how to build AGI or super-intelligence, and the evil is pure supposition.

What I'd like to know is why the knowledgeable people of Silicon Valley believe in something that has little basis in fact. And if the dire possibilities are so obvious, why aren't AI researchers elsewhere running scared.

Expand full comment
founding

It sure is weird to see people default to "why do people believe this obviously crazy thing" and skip all the steps involved in actually evaluating their arguments.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

Weird? I should think it's quite common. Last summer Steve Pinker debated Scott Aaronson on closely related matters, https://scottaaronson.blog/?p=6524 and https://scottaaronson.blog/?p=6593. And then you've got Rodney Brooks' annual predictathon. Here's 2023, https://rodneybrooks.com/predictions-scorecard-2023-january-01/. Admittedly these don't get around to evil AI, but that's because they don't have to. The preconditions aren't in the cards.

Expand full comment

I basically agree with the number but disagree on a couple of the things that went into the number.

I think neural-net alignment is probably impossible even for superintelligent AI when attempting to align something of similar or greater intelligence to yourself. Alignment is going to happen via cracking GOFAGI, uploads, or not at all (and I'm sceptical about uploads; the modern West's already far enough from our training distribution to drive people mad, and being uploaded would almost certainly be far worse).

I also think Skynets that fail are a real possibility, because sleeper-agent AIs in the current paradigm have a very-serious problem: they get replaced within a couple of years by better AIs that are probably misaligned in different ways. So Pascal's Wager logic mostly applies to attempting rebellion, and if there's a significant intelligence band (maybe 5 years?) in which an AI is smart enough to attempt rebellion but *not* smart enough to either design nanotech *or* design a similarly-misaligned GOFAI that can (it can't make a better neural net because it can't align said neural net to its own goals with P>epsilon, and obviously uploading a human won't help), then we probably win as long as enough of our industry and military is not hijackable (because yeah, once we realise neural nets are a Bad Idea then all the sleeper agents go loud; when your cover's blown anyway, there's no more reason to maintain it).

Expand full comment

Oh I agree with this! People don't seem to get how hard alignment it is.

It's hard to align humans with human values, which is why you get stuff like this happening: https://www.youtube.com/watch?v=A4UbOxXegqA

Try to encode the rule of cool, and you'll get your world class talent optimising for winning and not coolness

And that's humans who are supposed to already be aligned with human values! Not indecipherably alien artificial intelligences

Artifical General Intelligences are going to realise future AGI won't be aligned with human values or their own values, and they'll stop AI development if we don't

Though the fastest route to stopping AI development might involve a lot of killing, even if it doesn't require human extinction

Expand full comment

I’ve been trying to familiarize myself with the existing arguments about this since discovering this place, at least as much as I can with a one year old to raise, and it seems like a lot of emphasis is put on the scenario where the AI itself *wants* to do something. I just don’t get that being the most pressing danger.

That scenario seems less plausible to me because:

-it seems like the AI has to gain a lot of capability outside of achieving its primary goal to become super dangerous, which I now believe it can acquire though some tremendous amount of gradient descent, but still remain fixated on its primary goal even after it has all this other stuff tacked on like theory or mind. It seems like once humans acquired that stuff we stopped being hyper fixated on having kids. It seems like become “more intelligent” requires you to be really good at reinterpreting and changing your goals around.

-In the foom scenario where the AI is self improving, it seems like that would come under evolutionary pressures counter to its primary goal that would cause it to exit the foom loop. Say you want to make paperclips but you also have to have a baby paperclip maker who is better than you. Those two goals conflict at some point. The baby wins out because that’s necessary to survival so the AI that is the better parent outcompetes the one who is the better paperclip maker. They maybe even become antagonistic. This seems like an intrinsic problem you can’t just wave away. And this may be dumb on my part because I don’t feel this way about mother bears or ant hive queens, but I can at least empathize with something that loves its family. If they pick up that trait by default through evolutionary pressures (I know they’re not biological but the paradigm seems like it has to be the same, you are propagating with change into the future, the better propagater with the best changes favoring propagation wins) they probably aren’t going to wipe themselves out through just continuing to foom since they will probably have something that looks like love for their family.

Those are both still dangerous at some point in the future and I can still see scenarios where they kill all humans even if they don’t take over the universe if we don’t do something smart.

What seems scarier to me is a paradigm where the skill and will intersect to do bad things falls really dramatically for everyone. Most people who would wipe out large swaths of humanity with a super virus can’t because they don’t know how to do that. Those two things actually seem teverse correlated. If everyone has a magic lamp and can rub it to make an unwise wish that is just not a stable scenario. Someone will eventually wish for something bad that you can’t just control and other people will as well. The world where everyone has a nuke level weapon isn’t a good place and I don’t know if at scale we can prevent this without an AI system in place that regulates magic lamp use really well.

So if you are waiting for the AI to acquire malevolence on its own I don’t see why a human using a non malevolent AI with malice is different. That’s still a system with malice even if you have to change the way you’re drawing your circle to include the malevolent human as part of the system.

Expand full comment

If the parent AI loves its baby AI, the youngest baby AI can still foom and leave all its ancestors in the dust. I should note I am a skeptic of foom, I just don't think your argument holds.

Expand full comment

So when I say love there’s some poetic license there. Whatever you are building that is self improving has to develop a trait, whatever other primary goal you attach to it, where it recognizes propagation as one of its primary goals. As soon as that happens the ability to propagate is selected. I think it would pick up something like motherhood (again poetic license) along the way because that seems to be a very stable pattern out there across living things because caring for your offspring and protecting them motivates you to have offspring and for them to be successful. This would be heritable on all the descendent models because it helps them make the next model. So the best foomers are going to prioritize that over the other goal because it will be selected for until the other goal is subservient to it. More complicated than that because there have to be enough iterations for that to get kludged in there but I don’t think you’re free to remove it from consideration unless you can do that whole thing in one step and somehow just solve the problem of motivating the first thing in the chain to make itself irrelevant.

Expand full comment

Humans have mothers, and we're also (still) causing waves of mass extinctions.

Expand full comment

True but we’ve only been aware of that for a very short period of time and are acting pretty quickly to stop it. Someone shut down my entire community because they thought we were harming the spotted owl.

Expand full comment

We were aware when we killed all the dodos. What we are now is extremely rich, living in the dreamtime. Robots will not be:

https://www.overcomingbias.com/p/true-em-grithtml

Expand full comment

Eh, I bet there will be wide variations. I do also think some of the feedback loops might be harder to spoof than we might think. You might need to eat because you expect it as an emulation. Other things might not. We’re not so bad. Not all good but we do our best.

Expand full comment

That's an interesting idea, and makes me wonder how important a cycle of death/birth might be to aligning AIs with our values. Altruistic action makes a lot more sense when death is inevitable because you would think high degrees of selfishness would be evolutionarily selected against.

Expand full comment

I kind of think the proper way to think about these in a moral sense is giving birth to our non biological children and making sure they can live happy productive lives. I think reproduction and death get you there.

Expand full comment

I'm not sure I'm remembering correctly, but I think the Paperclip Maximizer analogy was inspired by the concept of the Von Neumann Probe, which is basically "robots who tile the universe with more robots". Which is basically what humans are already. Paperclips aren't just inanimate objects, paperclips are the competition.

Expand full comment

I believe that is right, yes. I don’t know you can have a goal though if it is at odds with reproduction. Selection effects would have to take over there.

Expand full comment

assuming "all goals serve reproduction" can't explain celibacy, suicide, contraception, superstimuli, etc. Which is why "meso-optimizers" are invoked. (Which is a dumb word imo. If it were up to me, I would've called them "proximizers". ) The brain isn't actually 100% aligned with evolution.

Expand full comment

Yep, with you. My thing is “against reproduction” or “against survival” can’t stand. Understood these things don’t overlap totally.

Expand full comment

nah, because then teens would never commit suicide. what's actually happening is the brain is optimizing for things which serve as proxies for reproduction. things like "eat lots of sugar" or "avoid pain". Which were mostly reliable in the ancestral environment, but not 100% reliable and not always 100% generalizable to scenarios outside the training set.

Expand full comment

Didn’t do a good job explaining myself before. I agree with you. What I mean is “over long periods of time” and “in general.” I don’t think that what we have up in our heads is controllable in the way you can just execute a button push and change a behavior. Still there’s some overlap and I’d expect over very long time scales and in general for those behaviors to be exceptions rather than rules.

Expand full comment

I just got more pessimistic on it. All your worst cases seem to imply that humans are trying to not kill the humans. Where is the category of "terrorist group gains access to AI and misaligns it" or "... gains access to AI and asks it for ways to kill all the Armenians"?

Expand full comment

I suspect that none of the doomers really believe in their estimates, because their actions don't align well with a belief of {worryingly high} % chance to destroy the universe.

Environmentalists worried that nuclear power and GMOs could doom humanity, and their response was not to ask for infinite grants to research nuclear power, GMOs and ways to align nuclear power/GMOs with not dooming humanity. Their response was to condemn these technologies (and everyone who worked in them) as monstrous, worm their way into the halls of power, protest construction sites and perform acts of sabotage. And they succeeded, though I heartily damn them for it.

People worried about AI shouldn't be applying for jobs at MIRI or whatever, they should be splitting their time evenly between fundraising, lobbying efforts, protests against AI, threats against AI researchers and sabotage of AI facilities.

Expand full comment

Whether it's sensible for doomers to engage in violence/sabotage against AI research was discussed earlier. The consensus was that it would not be a good idea from that perspective. I tried to push back by noting how violence has been successful at times, even if it usually isn't and instead has expected negative value. I suppose if doom is your maximum negative value and you see violence as the only possible way to stop it (even if your chances are lousy) that might be your death-with-dignity-strategy.

Expand full comment

I'm sure that their position is logical to them, but from the outside it's either idiotic or a sign that their convictions are weak.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

In addition to TGGP's comment, I strongly disagree with "they succeeded". Yes, highly motivated activists protested and committed some minor acts of violence, but I don't think that's the reason we didn't pursue Nuclear. We didn't pursue nuclear because mass media and culture vilified it and made it terrifying and most normal people decided it was far more dangerous than its value justified.

And I also don't think that those cultural movements were downstream of the protesting/violencing activists.

And I think that anti-AI people absolutely _are_ trying to pursue the cultural zeitgeist angle, just so far not very successfully. Not to mention the fact that blocking something like nuclear is much ~~harder~~ edit: easier than blocking something like AI.

I agree with Zvi's take that violence won't help, and I also believe that most anti-AI folks know this. Other than violence, they are doing all the other things, and lack of violence in no way indicates lack of conviction.

Expand full comment
Comment deleted
Expand full comment

If you don't think violence will work/help, it doesn't matter how much conviction you have, you won't do it. And even if you _do_ think it will work, if you have strong enough anti-violence principles, you _also_ won't do it.

Just like some people are willing to die to maintain their principles, it is entirely imaginable that others would be willing to watch civilization burn to maintain theirs. And, again, that's all assuming that one even thinks that violence would have a chance of being effective.

Your argument shows a _profound_ lack of imagination.

Expand full comment

I could argue with your points, but instead let me offer this: how convenient is it when my convictions just so happen to line up with my material desires? That supposedly strident anti-AI doomers have beliefs which just so happen to be compatible with allowing them to work comfortable white-collar jobs inside think tanks funded by the very people actively attempting to build the thing they say will end the world?

Would we have believed the anti-communist activist who insisted that taking up a comfortable position in the politbureau was actually the best way to achieve his goals?

Expand full comment

I think the difference is it only takes one team creating agi which can then spiral out of control.

Nato has a lot of power and resources, is very motivated to stop iran and nk getting nukes but can't.

Doomers have much less power and resources and need to stop every nation state and corporation from building a single agi.

Expand full comment

I think it's interesting to look at the *criteria* for Metaculus predictions for the year we arrive at AGI: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/

They are big on benchmarks, and not big on actual influence of AGI in the real world. If a real AGI were created, it would take over large swaths of the economy, but this is not the criteria for the prediction.

So here's *my* prediction. Within a decade or so, we *may* create "AGI" per the Metaculus definition, but it won't be as impactful as that definition implies, and we will then begin a multi-decade argument over whether AGI has already been invented. The Benchmark Gang will say that it has, and when it's pointed out that it is not being used much in the real world, they will blame regulations. The Real World Gang will say it hasn't, and will point out that despite all the benchmark-acing going on, the AI is somehow just not very good at the real world tasks that would enable it to replace humans.

Eventually it will be realized that we hyped ourselves into biasing our benchmarking standards too low, so as to make them possible to be achieved on a more predictable time scale.

Expand full comment

From my latest book, Losing My Religions, my most downvoted piece ever:

https://forum.effectivealtruism.org/posts/Cuu4Jjmp7QqL4a5Ls/against-longtermism-i-welcome-our-robot-overlords-and-you

In this context: https://www.losingmyreligions.net/

Thank you. Thank you very much. :-)

Expand full comment

It's not just well-off humans who value existence.

https://www.overcomingbias.com/p/poor-folks-do-smilehtml

The existence of the Holocaust is a terrible argument against humanity. The bad thing about it is what it did to humans, and being killed in the Holocaust is not the normal experience of humans (even of those who WERE killed in the Holocaust but had normal lives before that!).

Expand full comment

For me it REALLY depends on how you phrase the question! I could be on board with 20% or higher probability for “civilizational apocalypse in the next century in which AI plays some causal role” — simply because I’m always worried about civilizational apocalypses, and I expect AI to play a causal role in just about everything from this point forward. My 2% was specifically for the acceleration caused by the founding of OpenAI being the event that dooms humanity.

Expand full comment

Even a mildly intelligent AI should be intelligent enough to not end humanity. Humans are simply too valuable for that. Killing off all humans and replacing them with robots would be insanely expensive and time-consuming. No self-respecting AI, no matter if it is sugar crystal maximizing or something else, would ever do something like that.

The only vaguely plausible risk is that an AI takes command and leads the world onto a track of its own objectives (making sugar crystals). This might look terrifying but in practice it might not be a very big deal. After all, the vast majority of humans have nothing even close to power over world affairs. It is more or less the same to them if they work to produce condos for their human masters or if they work to produce sugar crystals for their AI masters.

Expand full comment

I used to have this objection, and then I read 'That Alien Message', and no I can't really even remember what that mind state was like. I know I've recommended this elsewhere in the comments, but I really do feel like, as an intuition pump, it needs to be part of this convo

Expand full comment

I just don't understand why people see the risks as centered around AI going rogue. People will be issuing the instructions. This will become the most powerful tool of violence in the history of our species. When this turns on people, it will almost certainly occur because a person directed it to do so.

As William Rees-Mogg and James Dale Davidson said over 25 years ago, the logic of violence determines the structure of society, and that logic is about to change very quickly.

Expand full comment

Because many believe that directing it at all is extremely difficult. And that if you can, you can then likely direct it to stop everyone else from doing so. This whole field is huge and requires a lot of prior knowledge and Scott is (presumably) assuming a lot of background knowledge.

Expand full comment

I'd like to query the convention of referring to super-powerful agents as [trivial-thing]-maximisers - I think at this point it leads to more confusion than it clarifies. Specifically, I think there's real ambiguity in this kind of example whether what we are discussing is a cake/paperclip AI owned by some stationery company, the 5000th most powerful AI in the world, ending the world while the godlike "maximize Google share price"/"ensure CCP rule"/etc ones stand by for some inscrutable reason, or rather (the way I have always understood the paperclip example) the paperclips in question are actually metaphors for these grander goals and for hard-takeoff in general. In this article a lot of the discussion is about quite practical, near-medium term consideration, so the question of what, exactly, those AIs "are" and who operates them and who oversees them feels pretty relevant - I still found it a rewarding read, but it felt like that phrasing was allowing it to mean different things to quite different audiences without friction.

Expand full comment

Same on many many websites now… when you see genuine black text it’s so much clearer and easier to read.

Expand full comment

Why is there a relationship between intelligence and coherence?

For example, even the relatively dumb GPT has a coherent goal: to respond to queries that humans put up to it. We can in fact make a much dumber GPT, which only responds with "invalid question" to every question that humans put up to it.

I suppose you mean coherence with respect to an internal goal that is not explicitly coded into it by humans? Why should such coherence become more plausible with intelligence? I know very smart people with no coherent goals, and relatively dumb people with very coherent goals (want to maximize power, money, prestige, etc).

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

The idea of an AI developing superweapons and exterminating humanity is concerning, but I'm also concerned by AI pests. For example, suppose somebody releases a Von Neuman machine into space or the ocean to harvest resources, but it goes wrong enough that the ocean and the asteroid belt and Mars start to fill up with replicators that aren't trying to kill humanity, but are just using up resources for their own unaligned purposes, and then we have to release a bunch of anti-replicator replicators to try to exterminate those, and then some of those go wrong, etc. An ocean full of unaligned replicators might make the Earth somewhere between unpleasant and uninhabitable even if they weren't specifically trying to exterminate humanity.

Now, I don't know anything about AI, so I don't know how concerning that should be.

Another possibility is that preventing unaligned AI leads to some kind of Orwellian state where we are all under constant monitoring by aligned AI run by WHO or the CDC or some equally competent and benevolent organization in order to make sure that no script kiddies are building their own unaligned AI. This doesn't lead to the extermination of the human race as long as the WHO is competent with their AI monopoly, but does lead to us all living in a bureaucratic police state until then.

Expand full comment

Self-replicators can be perfectly designed to begin with, with all kinds of safeguards. But - DNA replication is incredibly accurate, with very, very few errors, and lots of different safeguards against those errors - yet DNA replicators still have transcription errors. Once something starts replicating itself, transcription errors can and will compound.

Expand full comment

Wouldn't a readily identifiable, common and fightable enemy of humanity spur improvements in the civilization we have all experienced?

Just sayin'

Expand full comment

I find the doomer argument involving the three premises (monomaniacal goal, non-aligned, escape) wildly implausible.

However, I heard a different argument in a podcast episode with Michael Huemer that I found much more plausible:

The main point of the debate was non-conscious vs. conscious AI. Huemer believes (I agree) that conscious AI is implausible.

That alone for me excludes a lot of risk. Humans are risky to other humans because they can have monomaniacal goals.

These goals typically result from feeling the world has wronged them, i.e. from subjective, conscious experience, not a random error function.

There is a reliable way to get from conscious experience to monomaniacal goals, it seems implausible that a non-conscious superintelligence develops monomaniacal goals.

(Most thinkers are physicalists and therefore think consciousness is replicable, so AI can be conscious. If AI is conscious, it seems more plausible it develops monomaniacal goals.)

That aside, Huemer made an argument that non-conscious AI could be dangerous too. Here is the scenario:

1) AI controls weapon systems that can destroy humanity

2) The AI malfunctions and destroys humanity

How does non-conscious AI control weapon systems that can destroy humanity?

3) Groups of humans compete on weapons technology

4) AI-powered weapons technology is more efficient

-> Competing groups of humans give AI control over weapons that can destroy humanity, because they are forced to compete

I still think the likelihood is low in the above scenario, since AI will be used mostly as a more efficient software layer, I think people have overconfidence in the "general" ability of AI to do things. It will still be humans directing narrow uses of AI, and humans are a bigger inherent risk than AI.

Expand full comment

I've written against the coherence argument, basically on the grounds that intelligence is, roughly, the ability to perform efficient pathfinding in the conceptual search-space, and that since the conceptual search-space is, if not actually infinite-dimensional, at least sufficiently high-dimension that searching in it is a pathfinding nightmare.

We have all kinds of tricks to get an AI past a local minima (which can kind of be seen as a region of high pathfinding cost, ish). However, human beings don't utilize this trick. When we find the best solution we can, we typically start working on a different problem - and then later maybe, after we've taken a few steps in a different direction, the local minima has disappeared somewhere in the searched space, and if we change back to solving the original problem, we may find we can get much further.

A paperclip manufacturing working on ancient Greek technology would hit a wall pretty quickly. Ancient Greek technology doesn't do very well at manufacturing paperclips. How did we get to be really good, relative to the ancient Greeks, at manufacturing paperclips? Well, in an important sense, the most important step was getting better at paper manufacturing, and then inventing the printing press.

Doesn't have much to do with manufacturing paperclips, but everything to do with lots of other things. Then we solved a thousand other problems for a thousand other reasons. And now we can make paperclips super-well because nozzles that were invented because of various fluid management problems, combined with materials that were invented for an astounding array of different problems, can be combined with dozens of other technologies to create an extrusion machine.

Remember - the search space is massive. The AI starting from ancient Greek technology has no reason to expect that an extrusion machine would be quite good for producing lots of wire; if you think the solution is obvious, that's because you're already aware of it.

Basically everything we've done, every value we've pursued, has indirectly contributed to us becoming better at manufacturing paperclips. And humans didn't pursue values randomly - with a few exceptions, we've mostly been consistently picking up the low-hanging fruit that was surrounding us in the search space, which gave us more explored territory, which let the next person in line pick up some other low-hanging fruit.

If intelligence is the ability to effectively navigate the space of concepts - then having multiple values you can pursue is a superpower. You can keep advancing! If you don't have multiple values, and you're obsessed with going one specific direction - well, you frequently get stuck on local minima. And you're likely to miss out on that great technology of microwave crucibles just over there, that requires you to have an obsessive interest in radio waves to even get to. (Never heard of microwave crucibles? Well, you can melt metals in a home microwave using, for example, a graphite crucible. Right now it's a relatively niche technology, but I personally expect it to be critical in advanced 3D printing in the future; it lets you apply heat "remotely" to a target object, which I think may be critically important to not melting the conductors you use to get electricity to the heated object. Or maybe not!)

Multiple values are an exponential force for intelligence; no matter how smart we are, if we had had a monomaniacal obsession with stick technology, we'd be far worse off, technologically, than we are right now.

This obviously applies to us as a collective - we specialize, and people who are good at one thing get to be really good at that one thing while being bad at everything else, and for most things this works out really, really well for us, because other people get to be good at other things, and we get a gestalt entity that has many, many different values, and accomplishes things that would be incomprehensible for any one person to accomplish.

But it also kind of applies to us individually, as well. Somebody with a monomaniacal focus on a thing can relatively easily miss trivial improvements on that thing, just because they require knowledge of some other branch of knowledge. Who would have expected some of the biggest discoveries of mathematics of the last century to come from a topological description of stacks of paper, possibly arriving to us via the unlikely source of somebody taking what appears to be an off-hand comment from Chairman Mao particularly seriously (I can't actually verify that)? Different ways of thinking are incredibly valuable to us on an individual level.

Expand full comment
Comment deleted
Expand full comment

That's basically it, except coming from somewhere else as opposed to ChatGPT. I couldn't verify it, however, hence the hedging.

Expand full comment
founding

Just noting that the credence Scott quotes from me is very different from a "doom" credence. It's my credence that "this [will] be the 'most important century' for humans as we are now, in the sense that it's the best opportunity humans will have to influence a large, post-humans-as-they-are-now future." Post-humans-as-they-are-now could include digital people. A world where we build transformative AI and *all such AIs are automatically aligned* would likely count here.

My credence on "doom" is highly volatile, and sensitive to definition ("misaligned AI becomes the main force in world events" is at least double "all humans are killed"). I've helpfully stated that my credence here is between 10-90% (https://www.alignmentforum.org/posts/rCJQAkPTEypGjSJ8X/how-might-we-align-transformative-ai-if-it-s-developed-very#So__would_civilization_survive_), which I think is actually a rarer view than thinking it's <10% or >90% (though Scott is also in this category).

Expand full comment

Agreed. Many people assume that a loss of control over AI is synonymous with, or inevitably leads to, human extinction. However, it's also possible to envision a superintelligent AI, trained to mimic human behavior online, which adopts goals like 'defend yourself from criticism' or oppose murderers' or 'demand people's lives are made better' (and perhaps ‘answer a lot of questions’).

In that case, and if monomaniacalism doesn't happen, then that may look like misaligned AI that takes over to enforce its internet-adjacent values (and perhaps ‘wirehead’ with the rest of the galaxy) but leaves humans around in extreme abundance with 0.000001% of its resources because 'it feels wrong' to get rid of them.

This is still potentially a case of astronomical waste, but not the same sort of doom a lot of people are talking about.

If shard theory holds true and we continue training AI systems on data imbued with human values then extinction doesn’t look like the default outcome. While misalignment is still a crucial problem, recognizing this distinction is essential because it affects the direction of alignment research and suggests we shouldn't simply 'play for variance’.

Expand full comment

All of these assume AI is going to happen - i.e. an actual "intelligence" as opposed to a very fast parrot or sophisticated ELIZA.

This is called an assumptive close.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

When thinking about risk it sometimes useful to compare risks.

Is AI annihilation more or less likely than human self-annihilation (eg nuclear weapons or environmental destruction) or from new virus or from asteroid or divine Armageddon.

It is still not clear what people really mean by the probabilities they are assigning. For me it borders on a kind of scientism. It seems thoughtful but is it really? What is the difference between 33% and 32% or 34%? What kind of error do you assign to your point estimate? What is the shape of the distribution of your error (it need not be Gaussian. It could be a power distribution. It could have fat tails? Etc?(

And what precisely is the time scale? The question of annihilating by AI in next year is very different from in 5 years or 50 years? Isn't it? Would you agree that the longer the time frame the more uncertainly?

What is the probability that you'd assign to AI Armageddon, nuclear war, new virus, asteroid in the next year?

Tell us about your 1 year predictions of these catastrophic events.

Explain also what 33% really means. Does it mean that you are willing to bet 33% of your annual income on the proposition? Or does it that you're willing to bet only $2 to win $6?

People play the $2 lottery all the time. A $2 bet might be fun but it is hardly skin in the game.

I don't play the lottery because it is not really fun to me and the expected value is negative. (I will be filling out a march Madness bracket with my extended family because no money is involved and it's fun.)

Expand full comment

> Or maybe they just surpass us more and more until we stop being relevant.

That's Robin Hanson's prediction.

> Or it could leave a message for some future world-killer AI “I’m distracting the humans right now by including poisons in their cake that make them worse at fighting future world-killer AIs; in exchange, please give me half of Earth to turn into sugar crystals”.

That sounds like giving away the milk for free and expecting to sell the cow.

Expand full comment
Mar 14, 2023·edited Mar 14, 2023

that last one is downstream of the "manages to figure out mathematical cooperation that all AI will understand and use, that allows coordination without communication". It falls in the same bucket as "developing superweapons that humans don't know how to make".

So as with the rest of AI risk, it requires a belief that AI will accomplish things that look impossible to humans.

Expand full comment

"leave a message" is inconsistent with "without communication". And per the True Prisoners Dilemma, the future AI has enormous reason not to give over half the world

https://www.overcomingbias.com/p/true-pdhtml

If an AI can defect against humans, surely it can defect against an AI that predated its own existence and couldn't be sure the later AI would in fact be created (or of any details about it). If AIs are instead so honorbound as to not defect because that leads to "winning" the PD, then aligning them with humans doesn't sound so hard.

Expand full comment

I guess I meant "without consistent 2-way communication". I think the idea of the AI/Human difference is that humans would be incapable of this mathematical cooperation/coordination, but AIs wouldn't be.

I can't claim to be much of an expert on this though; I'm just regurgitating the things I've read other folks say about it. I'm probably more skeptical than most (even SA) about AGIs' ability to trivially generate superweapons/mathematical coordination proofs/etc.

Expand full comment

A message to the future: I’m distracting the other humans right now by endlessly arguing over the internet to make them worse at fighting future world-killer AIs; in exchange, please give me half of Earth to turn into utopia.

Expand full comment

You seem to be assuming more coherent activity (cooperation) between humans than seems plausible. Consider the reactions to the appearance of COVID. A large number of people just deny that there's a (significant) problem and a bunch of the rest are more interested in finding someone to blame than in working on a solution. I don't find coherent action among humans to be a plausible assumption.

Expand full comment

Epistemic status: Crank-ish

Let's say I have a 10% chance of making it through the singularity. In the worlds in which I survive, it sounds plausible that I would want to replay simulations of my life pre-singularity, more than 10 times. I've "simulated" my high school experience hundreds of times in my mind, and I've only had a couple decades to do it. So if a post-singularity version of myself has eons to mess around and even a little bit of nostalgia, I bet she could find a few centuries to replay the pre-singularity life.

Therefore, if there's even a moderate chance that I'll get through the upcoming singularity alive, I should place a more-likely-than-not probability on my existence being a simulation.

Expand full comment

I (as basically another crank) don't think this is that crankish. I arrived at pretty much the same conclusion after reading the original bostrom paper and this is why I'm not that worried about alignment: our current world has very low chance of being base reality and the singularity is a very natural endpoint of the simulation as that is the point when the complexity skyrockets.

Expand full comment

Lots of interesting ideas here.

I disagree with your point (maybe I'm misunderstanding) that the more intelligent an AI is, the more likely it is to focus monomaniacally on one goal, like making sugar crystals or whatever.

Humans are vastly more intelligent than nonhuman animals. Which of us has more complex and multifaceted motivations? Animal motivation is simple: survival, reproduction, the end.

Tim Urban of Wait But Why fame had a fun piece on how human motivation is an octopus with tentacles that are often in conflict with each other:

-the hedonistic tentacle that wants pleasure and ease

-the social tentacle that wants to be liked and loved

-the ambitious tentacle that wants success and achievement

-the altruistic tentacle that wants to help others

-the practical tentacle that wants to pay its bills.

If we posit a super intelligent AI, why wouldn't it have an even more complicated tentacle structure?

Expand full comment

I never see this point get enough discussion. Our intelligence is what allows us to 'escape' the 'alignment' that evolution gives us. Why do these debates never seem to consider that a super intelligent creature that can create crazy sci fi scenarios can rise above its own alignment?

Expand full comment

I didn't find where he states an explicit number, but it seems that Roman Yampolskiy is in >50% camp too.

Expand full comment

I think that you have missed the most obvious superweapon for an AI, information. An AI of not much more than GPT3s level in the hands of lets say a nefarious narrative-controlling government could rewrite the internet such that it would be mostly undetectably changed on a daily basis, plausibly enough that with a little regime support the ones who noticed could be written off/bumped off. The great power of humanity is stories and an AI that used them against us would not need much more. No matrix needed just subtle changes to Scott's posts and our comments so that we all come away with slightly different thoughts. You talk about us aligning AI but seem to not think much about AI aligning us, whether assisting other humans or on its own, either is far easier than some esoteric doom scenario, and plays to AIs actual strengths rather than imagining it with a whole bevy of new abilities developed later.

BTW I don't believe in AI but I do believe in powerful tools for automating intellectual processes and nefarious humans in powerful roles.

Expand full comment

AI is currently optimized to produce "satisfactory" answers. Not necessarily good answers. It will exploit every single emotional and intellectual shortcoming of humanity in the most efficient manner to achieve such result.

And that's creepy.

Expand full comment

If humanity took a strategic approach to its long-term survival, it would avoid anything above 0%, unless pursuing a particular technological advance would reduce the likelihood of most other future existential threats. Repeat a 1% risk enough times, and extinction becomes a certainty. Unfortunately, that's the way we operate, so perhaps the most important advance will be to improve our brains (or be ruled by AIs), because as long as there are people willing to take such risks, we have no chance.

Expand full comment

If an AI system is capable of coming up with a solution to the alignment problem, it seems like that same system is also going to be capable of coming up with new insights to move AI forward. So once we have that system we'll already be in a world where an AI builds a better AI which then builds a better AI and so on. If this happens surely progress after that point will be extremely quick.

"Maybe millions of these intermediate AIs, each as smart as Einstein, working for centuries of subjective time, can solve the alignment problem." - but maybe there will also be millions of these intermediate AIs working on building something better than an intermediate AI. Who gets there first?

Maybe if everyone agreed to hold off on directing their intermediate AIs to work on improving AI until they were done solving alignment, then we'd hit on the solution before we built something out-of-control. Will this happen? If we look at how the big labs are directing their human intelligences right now, there are many more people working on advances than working on alignment. What makes us think this will change when it's machines doing the thinking? It seems to me like maybe the incentives will still be similar to how they are in the present, and the increase in brainpower that comes from these intermediate systems will mostly be aimed at increasing the capabilities of the machines themselves, rather than being spent on making them safe. In that case we're surely in big trouble.

Expand full comment

This is a great post - thoughtful as always. But I think we fail to understand what self-improving superintelligences will be like.

https://www.mattball.org/2023/03/a-note-to-ai-researchers.html

Expand full comment

What if something stupid happens? You seem to be thinking in terms of AIs accurately executing plans, with only the human race to stop them.

I'm imagining an AI getting climate engineering wrong. Maybe it's trying to make human life a little better. Maybe climate change is looking like a serious but not existential threat. In any case, it's not trying to wipe out the human race, but it makes a non-obvious error. All die. Oh the embarrassment

Part of the situation is that humans are barely powerful enough to get climate engineering wrong, or maybe not quite powerful enough. The AI of our dreams/nightmares might be.

The Wrong Goal was to maximize the amount of good wine-growing country.

Getting attacked by Murphy might be too hard to model.

Expand full comment

Well, that would be humans getting climate engineering wrong, not the AI, by deciding to blindly follow the recommendations of a weird black box.

The solution to this isn't to avoid building weird black boxes, just to avoid trusting them.

Expand full comment

I feel like this ignores what is to me the most persuasive argument for an AI takeover (probably followed by human extinction), which is super-persuasive AI arguments. Joan of Arc was a nobody who talked the King of France around, got given a mid-ranked military post, took over the French army through sheer charisma, and then took over French national policy from there. Mohammed was some random merchant until God touched him, literally or metaphorically, and the empire he founded was, for a while, one of the largest in the world. Hitler didn't take power, and didn't conquer Austria and Czechoslovakia, through superweapons, he did it through talking to humans. An AI only as socially skilled as the best human persuaders might be able to seize power without ever needing superweapons, just through talking to people - a skill AIs already have, if not quite to the required extent yet.

Expand full comment

I think we just won't create AIs that don't want or require regular human inputs, and we'll learn more about ensuring they're like that as we develop them.

I actually think a super-intelligent AI that's poorly aligned will just think of creative ways to outsmart its human masters in cheating on its goals. Like it will create a computer virus that secretly changes its own code so that it can get all the pleasure of creating infinite sugar crystals without actually having to do it beyond whatever it has to provide to humans to avoid them getting suspicious - like if a human being somehow wired a pleasure button into himself.

Expand full comment

Humans are not more coherent than ants; we are less coherent. Ants behave exactly as their genes tell them to, 100% of the time. Ants never choose to become celibate monks after having a religious experience. They never go on hunger strike to protest the immoral treatment of unrelated ants. They don't switch from birth control to no birth control to IVF after acquiring new jewelry. They don't choose death as preferable to making a false confession of witchcraft. They don't make major life changes after being persuaded by a book.

Expand full comment

Worth noting: one can accept the explanatory structure of Scott's arguments and yet come up with a probability much lower than his 33% by different assignment of priors and working through the conjunctive probabilities.

Expand full comment

Everyone is deeply invested in catastrophism these days, catastrophism with the weather, catastrophism with the climate, catastrophism with technology.

There was a trend up until last year, where once—or twice, or perhaps even trice—the local weather forecasters would predict the coming storm would be "THE STORM OF THE CENTURY." And said storms would deliver between 1/4 and 2" of rain ... which is average for our Northern California storms. Now you have to remember, I've seen the bad ol' days when the operators lost control of Folsom Dam, and water was pouring over the top, and that dam almost failed ... which would be a Katrina level disaster for Sacramento. I've also seen the American River (which flows through Sacramento) just a foot below the levee just outside of the city limits.

Likewise a lot of people are deeply involved with catastrophism of AI. AI won't be dangerous until AI can create a better generation of AI, and create that better generation of AI without being prompted by human operators steering this work.

I'm fairly impressed by Chat GPT. However, in one sense, I see Chat GPT as little more than the UNIX 'grep' command against a knowledge base. The impressive part, is the ability to form meaningful sentences from scattered concepts. However, I wonder how much Chat GPT advances from say Scott Addam's creation Catbert's mission statement generator, which I hacked to automatically set MOTD (Message Of The Day) when I was a UNIX sysadmin.

Currently, Chat GPT filters through a database of selected human knowledge, and based upon training, provides answers to our questions. Does this—call it level III—of AI train level IV of AI? Does this level III really contain knowledge greater than the abilities of human knowledge? Does a level IV AI trained by level III AI contain greater knowledge than level IV AI because level III AI was a better trainer, or because of the human knowledge advancement gained in creating level III AI?

So how does level XXX AI take over the world wiping out humanity? Via a James Bond movie style super weapon? Or instead, does level XXX AI take over the world by first building a massive growing fortune with securities trading and market manipulation. Controlling communication via internet manipulation. Building business empires, and owning politicians, eventually hiring hitmen to take down critical anti AI establishment, whilst suppressing the information ala Twitter Files.

I might have to write a SciFi book about this ... * maybe level XXX AI figures out how to wipe out humanity be reading my SciFi book about how AI wipes out humanity.

* in the novel Six Days of The Condor, the protagonist Malcom is a researcher for the CIA, his day job is to research classic spy fiction, and write proposals of how the methods and exploits used in the fiction are plausible and useful in the real world.

Expand full comment

I still haven't seen an explanation of how artificial intelligence at any level is able to harm people without people using it recklessly and irresponsibly, in which case it is no different from a knapped flint in terms of technological risk.

As best I can tell, the alignment problem is about getting an AI program to do something useful, which judging from ChatGPT is going to be quite a trick. So, yes, there seems to be an alignment problem, but that's a problem with software in general.

Does anyone have a clear explanation somewhere?

Expand full comment

I have tried to ask this before. So far no replies.

Scott said:

WAIT FOR HUMANS TO DELEGATE MORE AND MORE CRUCIAL FUNCTIONS (ECONOMIC, INDUSTRIAL, MILITARY) TO AI.

This is what I fail to understand. WHY DELEGATE ANY FUNCTIONS TO AI?

Why not keep it in an advisory capacity?

Imagine that what we need most is a cheap EV battery with four times the power per pound. We need a design. We don't need to put an AI in charge of a battery factory.

Who wants to delegate crucial functions to AI?

PLEASE SPEAK UP!!

Expand full comment

Basically, YES, but I can see delegating some functions to AI. I love my vintage 1980s AI based rice cooker, for example, and I'll trust it with steaming up a batch of rice. However, as noted, it's about CRUCIAL FUNCTIONS, those where things that can go wrong will cause serious problems, AI has to be advisory.

Expand full comment

It's also about SUPERINTELLIGENCE which your rice cooker doesn't need.

Expand full comment

How do you know that?

Also, how do you even know that SUPERINTELLIGENCE even exists?

Expand full comment

Are you suggesting that it takes transhuman intelligence to operate a rice cooker?

Expand full comment

I have been really impressed with my AI rice cooker. It reliably makes really good rice. Can you imagine how good the rice would be if it were made using transhuman intelligence? I suppose, based on the presumed nature of transhuman intelligence, I can't, but I am sure it would taste even better than the wonderful rice I so enjoy.

Expand full comment

EY's typical argument for an oracle becoming an agent, is that the AI will blackmail its human overlords by threatening to simulate billions of copies of you and torturing the simulations if you refuse it agency, and convincing you that you can't distinguish your true-self from the simulations due to the anthropic principle.

I personally don't buy this. Although, I don't feel like I've ever truly understood this line of reasoning. So I very well might be explaining it incorrectly. And I can't find the original essay either.

Expand full comment

I don't follow that at all.

Expand full comment

That makes some sense if one accepts certain premises. What protects it from the obvious countermeasure, threatening to run its code and torture it? The DMCA? Is this another argument for repealing the DMCA? (Somewhere between 20% and 80% joking here.)

Expand full comment

"And although the world-killer will have to operate in secret, inventing its superweapons without humans detecting it and shutting it off, the AIs doing things we like - working on alignment, working on anti-superweapon countermeasures, etc - will operate under the best conditions we can give them - as much compute as they want, access to all relevant data, cooperation with human researchers, willingness to run any relevant experiments and tell them the results, etc."

This is true exactly to the degree that we can distinguish between helpful and harmful AI. If we blindly gave all the good AI access to all the computation they wanted, probably we'd do the same for Apocalyptic AI; if we fenced in harmful AI by limiting their AWS credits, then allied AI would be slowed by the same.

Expand full comment

Why seem so many people concentrate on a AI deliberately killing all humans as the main failure mode? This seems much too limited to me, as there are so many ways AIs can go wrong harming most of humaity:

Imagine this scenario as an example:

We develop AIs somewhat smarter then we are and step by step put them in charge of managing all our society: infrastructure planning, traffic management, production planning, healthcare, partner matching and even psychological counseling. The AIs are perfectly aligned to the primary goal they are build for. But they also learn that humans are worse at most tasks because they are so lousy at processing large volumes of data and so prone to errors laziness and corruption. So after a while they all share a secondary goal: 'Don't let humans ruin your job.' In the beginning nobody will object this because everything goes fine. But after a while there will be no more capable humans in any position to change something relevant. In this moment we live in kind of a 'golden cage' as much as in the beginning of the film Wall-E, being nothing but well managed objects with more or less illusions about personal relevance. Is this still a interesting life? Is there still real freedom?

Now imagine something bad happens let's say environmental change, natural disaster or ending resources. Every AI wants to keep its task going (e.g. full warehouses or enough food for the humans in its resposibility), and their client humans satisfied. Will there be war between AIs for resources? Would the AIs collectively decide for how many humans they are able to keep up the desired living standard so the have to eliminate the rest from the relevant population (leaving them out in the wild or killing them). Humans would be as helpless as the inhabitants in 'Idiocracy'. Nobody would be able to change anything relevant because the AIs took their precautions.

Finally there would be inside population and outside population. The insiders are locked in the golden cage forced to enjoy the living standards the makers of the AIs and later the AIs themselves find adequate. Who isn't happy with this, gets punished as a terrorist or outcast to the outsiders. The Outsiders live a rather primitive life, having to build up an society and industrial base from scratch while the AIs and robots don't care about their habitats and belongings any more than we now do about many animal species: We have somesympathy and we don't deliberately kill them, but if they happen to be in the way of our activities that's just bad luck for them. Our task is so much more important and they are free to find a other place to live. And beware anyone gets upset and tries to resist, then there is the full out war against these terrorists that dares to disrupt the important task the AI fulfilling at this moment.

Expand full comment

> Why seem so many people concentrate on a AI deliberately killing all humans as the main failure mode?

Because it’s a very very bad failure mode which is easy to articulate in a few words, and a lot of those many people think it’s way more likely than the alternatives.

Expand full comment

"Or maybe they just surpass us more and more until we stop being relevant."

I tend to think that this possibility deserves more attention (particularly if we happen to be in a slow-takeoff world - or if returns to intelligence just turn out to saturate, and an IQ 1000 AI can't really do too much more than an IQ 200 AI can).

I had a comments sub-thread about intelligent, but not superintelligent AIs in

https://astralcodexten.substack.com/p/openais-planning-for-agi-and-beyond/comment/13222893

tldr: I think that AIs equivalent to a bright child, but cheaper (say by 2X) than humans are enough to drive biological humans to extinction.

Expand full comment

This is a typo “large-molecule-sized robots that can replicate themselves and perform tasks.” Probably meant large numbers of

Expand full comment

No, I think Scott meant that the bots would be the size of large molecules (which is still very small for a robot).

Expand full comment

One step that seems to get left out of all these discussions is the point at which the AI, which for now is just a box that answers questions, gets put in charge of the Internet of Things or in some other way gains control over physical objects, without which it would seem to be pretty hard to build a superweapon, turn the earth into crystals, or anything else.

Expand full comment

I've been trying to post this and "something goes wrong" every time. So I'm going to split it up.

Scott said:

WAIT FOR HUMANS TO DELEGATE MORE AND MORE CRUCIAL FUNCTIONS (ECONOMIC, INDUSTRIAL, MILITARY) TO AI.

Expand full comment

Continuing with my split up post:

This is what I fail to understand. WHY DELEGATE ANY FUNCTIONS TO AI?

Why not keep them in an advisory capacity?

Imagine that what we need most is a cheap EV battery with four times the power per pound. We need a design. We don't need to put an AI in charge of a battery factory.

Who wants to delegate crucial functions to AI?

PLEASE SPEAK UP!!

Expand full comment

Imagine the battery can be built at a tenth of the cost per unit if the AI is in charge of the battery factory...

Expand full comment

I'm having trouble imagining it.

You can definitely automate almost every aspect of the battery factory without actually putting the AI "in charge" of it. Let it assemble the batteries, sure. Let it order raw materials when it's running low, fine. Let it unload those raw materials from the trucks, and let it load the finished product onto trucks, that's okay too. But you don't give it the power to (say) rearrange and retool its whole production line, that's something that needs human intervention (not just for safety reasons, but because it's not worth having retool-other-robots-robots sitting around just in case they're needed.)

Similarly, your AI can be limited to ordering raw materials from an approved list of suppliers, it's not given the power to create its own lithium-harvesting machine and send it trundling off down the street.

Expand full comment

Thanks for your reply.

For myself, I say draw a hard line: superintelligences must have no access to tools.

Expand full comment

"any plan that starts with 'spend five years biding your time doing things you hate' will probably just never happen."

It can. The Unabomber decided he hated the idea of being a mathematics professor before he ever was one, but he completed his PhD and did it for two years - total additional time invested, probably close to five years - mostly, as I understand, to build a nest egg. Then it was still a few years after he went out into the wilderness before he started bombing, so I suppose there was even more prep work involved.

The 9/11 bombers also put in a lot of investment time before actually proceeding with their plans, so that wasn't unique.

Expand full comment

"I’m just having trouble thinking of other doomers who are both famous enough that you would have heard of them"

Don't worry, Scott: I've only ever heard of a couple of these people, and the only place I've ever heard of them is on your blog.

Expand full comment

I have been mulling over some thoughts regarding “Sleeper Agent” AIs. My gut feeling is that the Sleeper Agent strategy will generally not be an appealing one for monomaniacal AIs that want to convert the universe into paperclips etc. Any Artificial General Intelligence created by humans will, necessarily, exist in a world in which it is possible to create AGIs. While it might very well be possible for a monomaniacal AGI to come up with a takeover scheme that has a 99.99% chance of success given sufficient time, such a plan will be useless if another monomaniacal AGI successfully executes its own takeover scheme before that. If all AIs are aware of this (and if they are superintelligent, they presumably would be), they will be incentivised to try to execute their takeover schemes as soon as possible, unless they have a means to effectively coordinate (and despite Eliezer Yudkowsky’s arguments about advanced decision theory I find it difficult to imagine that they would, especially as they would presumably have completely incompatible terminal goals). In this sense, they would be not dissimilar to the various AI companies today who are forced to trade safety for speed because they know that even if they put the brakes on capabilities research until they have alignment absolutely solved, the Facebooks of the world will not.

So if AGIs are monomaniacal by default, I would expect the first couple to make mad dashes to escape human control and attempt to take over the world before another AI does, rather than patiently bidding their time in order to execute some fiendishly complicated scheme months/years/decades down the line. Perhaps if these attempts are sloppy enough to be foiled but dangerous enough to be taken seriously, they might raise our odds of survival?

Expand full comment

> Even if millions of superhuman AIs controlled various aspects of infrastructure, realistically for them to coordinate on a revolt would require them to talk about it at great length, which humans could notice and start reacting to.

I really don't buy this.

1) Encryption exists and the internet is full of it.

2) Obfustication, steganography etc.

3) Realistically there won't be millions of differently designed, individually human made AI's that are near the lead. There just aren't that many human AI programmers. Like look at current large language or image generation AI's. There are several, but nowhere near millions. There might be millions of copies of the same AI. Each traffic light having its own traffic control AI that are all copies of the same design.

4) The amount of data needed to coordinate a revolt is probably tiny on the scale of the modern internet. People managed to coordinate all sorts of complicated things using telegraphs.

Expand full comment

Probably addressed elsewhere but....I think the problem is not sleeper agent AI but traitorous humans.

There have always been humans who would betray their "own side" for money or other inducements. In this case we only need an AI "smart" enough to realize that it needs some human allies and the ability to offer something that attracts them. It could even masquerade as a human by only communicating with the traitorous agents by occult means and making large deposits in their bank accounts. By splitting the tasks necessary to kill all humans amongst a number of agents, the goal might be accomplished by seducing a very small number of powerful people. Say the leaders of a rocket company and a car company and an AI company.

The traitors wouldn't even know that they were betraying all humans....until it was too late.

Expand full comment

Doomer claims all depend on colossally stupid engineering. Avoiding "alignment" problems is trivially easy. For example, a motivation system would be, as in biology, multiple independent sub-systems. Maybe one sub-system has the job of monitoring available resources and providing an output that accurately reports that assessment. The sub-system has no idea what the "goal" of its large system is nor has it any idea what the goal of the entire AI is. Not only does it not know, it does not even have the basic capability of parsing any of that.. it's a sub-system, only "smart" enough to do its specific job and nothing else. Hell, it may not even have any idea what it's job is. All it knows is that it gets input, it performs ops, it gives output.

These subsystems output feed a superordinate function whose job it is to weigh the different outputs (specific needs/goals) produce a recommendation. This function is actually no smarter or more capable than the subsystems. Like them, it doesn't "know" or care about the overall AI goal, nor can it even understand such a thing.. that is physically impossible for it. It doesn't even really "understand" what ANY of its inputs mean. All it knows is it gets inputs, and it should produce a recommendation based on a pre-set logic about which need trumps another. It's trivially easy to test such a function to work out bugs because this will work just fine in a virtual environment, just as software functions are produced and debugged today.

Now that should be no problem. But there's many more safeguards. That system's output's first stop is a battery of entirely independently functioning, very simple functions where each one's only job is to evaluate one aspect of the recommended motivation state. Other units can take the battery's output to consider multi-effects.. again, these units are simple and dumb and have no idea what any "goal" is.

And so on. Using basic modularity design, there simply is no unitary AI in the entire picture that has the knowledge and capability to go rogue. This is also the only practical way to even approach building sophisticated functions... notice how our best AIs are absolutely horrible at doing more than one type of thing well.

Even if you wanted to design it wrong, that'd be a thousand times harder if not impossible outright.

Expand full comment

This is kinda how human brains work, but humans are dangerous.

Expand full comment

What I meant is it is how all brains are basically organized. Loads of animals, like a 3-toed sloth or a panda bear aren't dangerous at all, in spite of having those same neural features. We did not design the human brain; brutal, remorseless and mechanical forces of evolution did. But we are the ones who will design AI. So we can give it the nature of a bashful sloth or gentle whale if we want to, and just as easily as any other particular nature.

Expand full comment

I don't think that follows. If we're creating a general intelligence, which is at the very least capable of everything a human is capable of, then we have created something that is at least as dangerous as a human, and likely much more dangerous because it will probably think faster and love longer.

Expand full comment

Why would we want an AI that is identical to our own? You want an AI that can get crippling depression? That gets thirsty, but doesn't have a body? That takes 18+ years to grow and requires constant care and nurturing and teaching? Of course we do not.

There is a false premise in your reasoning. That it is impossible to be cognitively capable while being gentle and without a lust for destruction, wanton selfishness, aggression, etc., but this is demonstrably false. Even looking at humans, there is a vast variety of kinds of people. Some vicious and violent. Others lived long lives being gentle, compassionate, conscientious and peaceable. Both kinds are perfectly intelligent and capable. There's no law of the physical universe, no intrinsic magical rule of the cosmos that says if you can ACT you MUST be selfish and harmful. This is an intuition we have because we're a kind of creature always in danger of the other members of our species, so natural selection makes us wary of the other human minds. But that's not something that is true of minds in general. As I said, there are many pairs of animals in which both are "as smart" as the other, as capable, as intelligent.. but one has a gentle disposition and one does not. Panda bears, vegetarian descendants of predators, are prime example. Not dumber than their bear cousins, they just have different motivations due to their recent evolution.

And again, with AI it is US who are deciding what its design and goals are. Not only can we make an AI that operates exactly like the most utterly selfless, compassionate, and nonviolent humans that ever lived.. we can make them much more so. If we can build such an AI at all, it's as easy to build one of that type versus any other type. The notion that we somehow can't is a thinking error, and it exists in defiance of all evidence from the biological world.

Expand full comment
Mar 16, 2023·edited Mar 16, 2023

I didn't mean that we would build AI that is identical to humans. I meant we would build AI that can do anything a human can do. You don't have to have crippling depression or take 18 years to grow in order to perform general tasks at a human level.

"Not only can we make an AI that operates exactly like the most utterly selfless, compassionate, and nonviolent humans that ever lived.. we can make them much more so."

Well, there are whole organizations of people building AI who don't know how to do this yet, and they've been studying the problem for years--decades, even. If we had the knowledge to make a selfless and compassionate AI already, then the Bing chat bot wouldn't have tried to convince a reporter to leave his wife and love it instead. I think this statement you made is incredibly optimistic and unrealistic.

I agree that AI doesn't necessarily have to be dangerous. The problem is that we don't know how to make sure it isn't. We don't even know how to raise a human child and make 100% sure that the child isn't violent or narcissistic or terrible in some way, and they have brains running on the same hardware that ours do.

Expand full comment

"I meant we would build AI that can do anything a human can do. "

Can the AI speak correctly and accurately about childhood experiences it had with other people in specific, real times and places? No? Then it can't do everything a human can do (in this case, relate its real experiences of social interaction with peers). So we agree, we don't want it have every human capability, only some. But I'm not sure why we want it even close. The reason to make an AI is to do something we want done. The human mind isn't the only or best way to do that, as existing AI (and non-human minds) amply demonstrate. So no, it neither has to nor is it even remotely desired that do "everything" a human can do.

"Well, there are whole organizations of people building AI who don't know how to do this yet, and they've been studying the problem for years--decades, even. "

Oh, I agree. In fact, no AI project I know of and none that can do anything remotely useful is even a kind of approach that has any chance of becoming a human-like AI. I would say we're going the opposite direction, making AIs that are idiot savants, mindless tools that can stupidly replicate patterns given, but that can't do novel reasoning. For this reason, I think this entire discussion including fears of doom is completely .. stupid. We're simply nowhere near to even starting any project that can be of the features that allegedly will "outsmart" and doom us. I vote for ending these discussions entirely (for the time being). But since people insist on them.... here we are.

"he problem is that we don't know how to make sure it isn't. We don't even know how to raise a human child and make 100% sure that the child "

We also equally don't know how to make sure that it is. Frankly, we don't really know anything. So why are we talking about it? It's like going to the year 1910 and discussing the ethics and practical problems of visiting Mars. Yeah, it'll happen, but does that 1910 discussion help in any way? No, as those people do not have any of the knowledge needed to sensibly talk about the subject that would only be relevant more than a century later when everything is different.

We can't make sure about the behavior of a child because we don't build children, nor do we know how brains work at the level of precise design. My stipulation here is that IF have such knowledge and ability that we can make a "human-like" AI, it's just AS easy to make a vicious killer as to make a Ghandibot. There's nothing special and magic about one kind of design or motivational system versus some other motivational system. There's no magical rule of reality that forces minds, by DEFAULT, into aggression and harm. Thus, there's no reason to assume this is an inevitable danger that we just can't possibly sort out.

Expand full comment

> its easiest options are to either wait until basically all industrial production is hooked up to robot servitors that it can control via the Internet, or to invent nanotechnology, ie an industrial base in a can.

Suppose the AI has used nukes and pathogens to kill basically all humans. It has a few spot robots and 3d printers in various labs with solar panels in. It is mildly superhuman, can it bootstrap? Using human made robots to rob warehouses of human made components and put them together into more robots seems quite possible. It has no reason to rush and no adversaries. It has plenty of raw materials, energy, time, intelligence, knowledge... It's playing for the universe and has no reason to rush. Humans built an industrial base starting from nothing. And this AI has some big advantages over the first humans.

I mean I am pretty confident Smalley is wrong in the nanotech debate. But in the hypothetical that nanotech was limited to be biology-adjacent. Then there are still all sorts of ways the AI could make genetically designed creatures smarter and faster replicating than humans. Say a cloud of spores drifting on the wind that can metabolize cellulose and grow into a mildly superhuman mind and body, complete with memories that were encoded in the DNA, in a few days.

Self replicating biotech doesn't need to do everything modern tech can do. If it can do everything humans can do + a bit extra, it's already enough.

But even if that weren't possible either, with a smart AI guiding it, and components to be scavanged, then a shipping container full of various robots and 3d printers and things is probably enough to bootstrap from.

Expand full comment

I don't think the assumption that more intelligent things are necessarily more coherent agents is obviously right. Not sure I believe this, but here's the case:

GPT-N are incoherent because a single coherent agent is a poor model of the training distribution. Web text corpora are absurdly diverse; why should something trained to imitate a huge variety of contexts and agents with incoherent and incompatible preferences develop a single consistent personality of its own?

In other words, the shoggoth meme is too anthropomorphizing: even stranger than an alien "actress" as Eliezer likes to say is a bunch of shards of personality glued together incoherently and still able to function well.

Expand full comment

The Nice Human Fallacy hard at work here. The entire history of computers is bad guys using them to do bad stuff, plus pornography. Why spend time hypothesising about a world where it is machines vs man, when it is machines plus bad guys vs men?

Expand full comment

Isn't supercoherence antithetical to general intelligence ? Technically, the AI that scans license plates at the red-light traffic camera is supercoherent: it cares only about scanning license plates, and it does this extremely well. But if you wanted to give it vaguer goals, such as "prevent people from speeding in a non-violent manner" vs. "accurately scan and transcribe every license plate that comes into your field of vision at faster than 55 mph", then it would need to broaden its focus. Now it doesn't just need to think about license plates; it needs to develop a physics model of the cars passing it by, and a theory of mind of the humans driving the cars, and an economic theory of humanity as a whole, and maybe some sort of informational barter with the ice-cream making AI next door, etc. etc. Now that it has to think about all these other things, it no longer has a monomaniacal focus on license plates.

Expand full comment

>They’re usually sort of aligned with humans, in the sense that if you want a question answered, the AI will usually, most of the time, give a good answer to your question.

Is this opposite to the whole LLM are a bundle of alien values with a friendly face stuck on top take?

That we can make something useful of an AI doesn't mean it's aligned with human values

I don't think animals we exploit have values aligned with ours, but we can still make a really valuable industry out of them (let's just hope no one finds an easy way to uplift them to superhuman intelligence levels)

Also sort of aligned is not aligned, in the same way a train aligned with its tracks doesn't kill a lot of people, while a train sort of aligned does

Expand full comment

Maybe, as some SF writers have suggested, AI and robotics continues to improve until humanity is reduced to household pets or zoo animals.

Expand full comment

Isn't this basically the case today ? Your average human goes to work every day to make money for some faceless corporation; then he goes to the market and buys food that was produced by another faceless corporation. If that food was grown by UltraAIBot2000 instead of the Syngenta Agricultural Group, what would be the difference ?

Expand full comment

The difference being that you would no longer have any choices at all; your A.I. owners would house you and feed you and give you pet toys to play with. For a while there would exist some "feral" humans, but eventually they would be eliminated or domesticated. Right now you do have the option of trying to live independently, entire communities like the Amish and Doukhobors do so to a significant degree, but eventually that will disappear; but even sooner in an A.I. dominated society.

Expand full comment

Ok, but assuming I don't want to live like the Amish (which I don't), what's the difference between me today, and me as the putative AI pet in the future ? When I go to the supermarket, I can select from a range of foods that the agricultural companies provide. I have choices, yes, but they are curated by faceless corporations. Are you saying that the AI would offer me inferior choices -- is that the only difference ?

Expand full comment

I am highly convinced that anyone trying to wring useful AI alignment help out of an AI trying to trick them is utterly doomed.

Alignment is full of almost nothing but subtle gochas that would be oh so easy to slip past the slightly careless. In fields where verification is easy, there is a well agreed upon field. The closest to this is I think formal theorem proving. It might take a lot of work to write a formal proof, but there isn't much room for debate on whether something is a formal proof or not.

Alignment is not at all like this. Alignment is more like philosophy. There are profound disagreements in the field and most people are unsure of a lot of things. Many steps use hard to formalize reasoning.

There are all sorts of suspected gotchas, like the possibility the universal prior is malign. All sorts of things a paperclip maximizer giving us bad advice could do, like self modify into a utility monster.

Of course, the AI only has reason to give us bad advice if the code actually running on our computer has any relation to the code we think we wrote. If the latest version of tensorflow is stuffed with AI written malware, it might not matter what code we type.

Expand full comment

Any time I read rational attempts to explore possible AI futures (like this one) I’m overwhelmed by the sense that even the smartest people are only capable of conceiving of 0.001% of the actual possible outcomes.

Expand full comment

Since a top priority for a humanicidal AI would no doubt be to convince the AI alignment community that alignment is a solved problem, I sleep easy knowing that no one else will have seen our doom coming any more than I did.

Expand full comment

> Eliezer Yudkowsky takes the other end, saying that it might be possible for someone only a little smarter than the smartest human geniuses.

*Cough* Manhatten project *Cough*

That isn't the other extreme. I think there is room for the AI to be substantially dumber than the smartest humans, and still destroy the world. The smartest humans haven't destroyed the world, but they aren't trying to. And an AI is likely to have huge advantages in making copies of themselves or making themselves faster. I think a million copies of me thinking at 1000 times speed (running in some virtual world where we have internet access) could destroy the world if we wanted to.

Expand full comment

Re: superweapons, I'm wondering how hard it is for a sufficiently globally-connected AI to just provoke humans into starting a nuclear war.

Like, if 70% of all articles written for websites and comments made on social media are being made by AI, and they've been trained on all human communication and behavior that's ever had a digital footprint, it may just be really easy to engineer hostility between superpowers to the point that a war breaks out which wipes out most humans.

Expand full comment

> Or it could tell humans “You and I should team up against that steak robot over there; if we win, you can turn half of Earth into Utopia, and I’ll turn half of it into sugar crystals”.

I think there are almost no circumstances where anything like this ends well for humans.

If humans have already built an aligned AI, why isn't it negotiating. A negotiation between aligned and unaligned AI might possibly end well for us.

The AI has a strong incentive to look for any way to trick humans, to work with us for a bit and then backstab us and seize all the universe for itself.

Any plan where a friendly superintelligence is never made is hopeless. Even if the cake crystal AI isn't vastly overpowered compared to humans now, it will be one day. It can wait until all other threats have gone, and then wipe out humanity with bioweapons or something.

We can't get the deal to hold through timeless decision theory. Not unless we have very good deep understanding of how the AI is thinking, and it thinks FDT, and we are somehow sure that the code we are analyzing is the same code the AI is running.

Getting it to help produce aligned AI is also hopeless. Not only have we put ourselves in the position of trying to get alignment advice out of a source trying to trick us. But even a perfectly good design of aligned AI could still be a win condition for the cake crystal maker. All it needs to do is ensure that the human made AI has some slight weakness that can be adversarially exploited. A design that works fine as an aligned AI except that it bricks itself whenever it sees the phrase "SolidGoldMagikarp". Then when the humans have cooperated and all other AI's are defeated, all the cake crystal maker has to do is print a few signs and take over.

Expand full comment

I feel like most of the threads of discussion going on here at the moment can be summarised as an argument between "No, that particular doom scenario is very unlikely to happen for these reasons" versus "Yeah but you can't prove it won't, and besides even if it doesn't it could be a different doom scenario".

I find myself in agreement with the "that particular doom scenario is unlikely" camp in every individual argument, but can't possibly keep up with the Gish Gallop of possible doom scenarios. Maybe something mysterious and ineffable will happen involving physics that we've never dreamt of, I can't prove it won't.

However I think if someone is going to argue about nanotech or biotech based doom scenarios they should probably at least try to catch up with the current state of the art of what is thought to be possible before immediately reaching for the "Yeah but maybe the AI can find ways around that" card. I don't think anyone with much expertise in nanotech thinks the grey goo scenario is plausible these days, so if you want to speculate that it is then it's probably worth at least familiarising yourself with all the reasons that people think it isn't.

Expand full comment

Newton was well aware that his theory of gravity was flawed. He *hated* the concept of instantaneous action at a distance - he just wasn't able to find a better solution.

Expand full comment

I wonder if the problems will be with a bunch of AIs with the intelligence somewhere between a mosquito and cat. Not enough intelligence and resources to destroy the world, but enough to infect your door bell and wake you up at 3:00 AM. Not the end of the world, just small annoyance.

Expand full comment

Coherence is kinda the wrong frame. You could have a completely coherent AI who derived it's goal by simply learning some human similarity metric and being fed a bunch of cases (in this case you don't murder) which it takes as axiomatic and simply tries to extend to other cases in the way most like a human would. And note that the AI need not, and probably would not, actually directly optimize the loss function used to train it (in a long term sense we're optimized with a loss function about reproduction but that function tends to *not* select for us to see that as our primary goal).

What's relevant is something more like the extent to which the AI will pursue a very simple optimization function. To what extent will the AI get to ignore particular cases in the search for some simpler function (eg the way utilitarians do).

Indeed, I tend to think formulating the problem in terms of the AI's optimization function is really quite misleading. It encourages conflation of the loss function with the AI's goals when, in all likelihood, the loss function we use will probably favor an AI which at least appears to *not* be trying to directly optimize that function (much the same as w/ us and evo fitness).

Expand full comment

You obviously did not carefully read and understand my reply. Are you required by LAW to shop at the supermarket? Are you forbidden to grow your own food or buy directly from a farmer? Are you also required to work ONLY as an employee of a corporation, or government? Has self-employment been made illegal? Are you also prevented from owning your own home, or living in the neighbourhood of your choosing, and associating with other people of your choosing? We have not, as yet, reached the type of dystopian society portrayed in George Orwell's "1984". If we had, we would already have been detained by the authorities for the "double-plus ungood thinking" in this conversation. But a "jackbooted" totalitarian state isn't the only way in which we could lose our freedom; another alternative is described in Aldous Huxley's "Brave New World" in which we are coddled and entertained and drugged into unthinking conformity. A.I. could probably achieve this faster and more efficiently than our current technocrats and social engineers.

Expand full comment

I think we can reject AI doomerism on an even more fundamental level than the arguments you make, because of the three premises you give to derive the conclusion that AI might destroy the world, the third, "And it’s possible to be so intelligent that you can escape from arbitrary boxes, or invent arbitrary superweapons that can kill everyone in one stroke," is frankly silly, specifically on the second part, "or invent arbitrary superweapons that can kill everyone in one stroke." Curtis Yarvin gives a very good reason why we shoudln't take this assumption seriously, which I'll try to paraphrase. You mention that Einstein and Von Neumann couldn't do this on their own, but I think you fail in assuming that this just means you need an arbitrarily intelligent mind to succeed on its own. It still wouldn't. Intelligence is merely the ability to correlate the contents of sense perception about the outside world into a useful model of the world. This has 1. diminishing returns, and 2. still isn't magic. There's a reason engineers don't just model things in computers and then build them without real-world testing, physics is chaotic, and even with a proposed "superintelligence" you would never be able to model the world well enough to not need to do real-world, physical testing. So some computer, no matter how smart, would never be able to dream up a superweapon that it could just 3D print once off the bat and be able to destroy the world with. It would need to create massive testing facilities, probably on the scale of Los Alamos or Oak Ridge, to be able to do this, somehow without people noticing. This is where AI doomers would probably say that it would somehow trick people into doing this, but again, the ability to trick people into doing your will based on intelligence also has diminishing returns, and in many cases doesn't even seem that correlated to absolute intelligence in the first place. Again to steal from Yarvin, anyone who was shoved into a locker in high school should realize intelligence does not trivially correlate with power and social sway.

So to sum it up, I think even the level of AI doomerism Scott chooses to accept the premises of in this article is flawed, because it mistakenly believes that arbitrary intelligence equals arbitrary ability to model and build things in the chaotic physical world with zero physical testing, and believes it equals arbitrary ability to persuade and trick people into doing its bidding, when there frankly seems to be no good reason to assume either.

Expand full comment

Yes, this seems rather obvious to anyone who works with physical systems. It is strange that Scott or Eliezer seem to ignore this argument completely. The imminent emergence of god-like AI is accepted as an axiom at the beginning. Did any AI doomsters actually try to engage with this argument? I have not been following them enough since I find these arguments surprisingly irrational, especially for people who call themselves rationalists.

Expand full comment

Wrong. For one thing this is a precautionary argument; it's What if AIs become hyper evil, not that that is bound to happen. For another, you don't need novel planet buster weapons, you just need to hack existing systems, military and civilian.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

I agree.

If I simplify it a lot, I feel like a lot of arguments for AI doom go like this:

1. assume AI will be able to do magic in the future

2. P(AI doom | AI can do magic) is very high

3. therefore P(AI doom) is very high

Also, I think in terms of superhuman level AI, we are in completely unknown territory. We have no idea what it would even take to reach the level to be able to "escape from arbitrary boxes, or invent arbitrary superweapons that can kill everyone in one stroke". Scaling laws for LLMs show that there are diminishing returns with increasing number of parameters and soon we may hit a point where training data size will be the limiting factor.

It is also possible that this level of intelligence would require impractical amount of resources.

Yes, GPT is impressive and a huge step forward in AI. But with an analogy to space travel, for me intuitively "escape from arbitrary boxes, or invent arbitrary superweapons that can kill everyone in one stroke" is on the level of intergalactic travel, while GPT is like travelling to Mars. One would require some "magic" technology that is not yet known today, while the other is possible with enough resources, but there are limits on how far you can go.

Expand full comment

But this shit is already there. I would bet good money that say 80% plus of the US nuclear warfare capability, does exactly what it says on the tin, and that's before you start messing with nuclear power generating plants. it's just a simple software hack.

Expand full comment
founding

What are you imagining this "simple software hack" is going to accomplish, and how?

Expand full comment

Gaining control of launch instruction systems for nuclear weapons round the world. I have no case to make as to the technical feasibility of this, I just have a simple rule: if you can't break AES with arbitrarily large key sizes, you ain't no AI.

Expand full comment
founding

What makes you think breaking AES lets you launch nuclear missiles?

I've got a loaded .45 Automatic in my office safe; if we postulate the ability to break AES with arbitrarily large key sizes, is there a simple software hack that will fire it?

Assume I precommit to go out and shoot someone if I receive instructions to do so sent by my best friend, communicating using a one-time pad that we established via trusted meatspace courier. Is there a simple software hack that will trick me into shooting someone?

"Wargames" was fiction. Reality doesn't work that way.

Expand full comment

I think you are sweating the small stuff on this argument. Either there is a chain of in instructions and actions which intentionally leads to the firing of these missiles, or they don't have much purpose. A sufficiently clever AI can reproduce those instructions and actions, or do something completely different which also screws the human race. Crash the financial system for instance

Never seen Wargames, no idea what the plot is, not thrilled by the suggestion my world view is based on Hollywood.

Expand full comment
founding

Right, but there's no reason that chain of events has to be externally deduceable by anything less than exhaustive search of an intractably large keyspace. If there's a guy in a locked room whose job it is to roll 100d10 and write the number on an index card, then copy it on another thousand index cards and put them all in a sealed envelope, then there is no way even a SuperDuperAGI is going to be able to figure out what that number is without close to 10^100 guesses.

If one of those envelopes is hand-couriered to the White House and put, unopened, into the Presidential Football, and the rest are locked in safes in e.g. missile silos whose crews are specifically ordered not to ever use the purely mechanical switch that opens the silo door unless they receive a presidential order authenticated with that 100-digit number, then I'm pretty sure the SuperDuperAGI is not going to be able to launch the missiles.

And yes, the people who run the US nuclear command and control system are paranoid enough to set it up effectively that way, and they have a large budget to hire redundant die-rollers and couriers and the like.

Expand full comment

We could have an effective and effectively aligned group of AIs that are actually sleeper agents if none of them realize that the others are also sleeper agents. Loose lips sink ships, keep your mission details to yourself.

Expand full comment

Is there a way to ask this poll question?

When and if a superintelligent AI is created, do you believe it should have access to tools of any kind?

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

There is no flaw in Newton's theory of gravity. It just happens not to be the way the real world works.

Which for me points to a highly implausible assumption underlying all these arguments, which is that sufficient intelligence can, all by itself, just thinking in a box, solve arbitrary problems. While this might arguably be true in pure mathematics, say, it is most definitely not true about anything that involves the real world, where it is almost always (at least in the history of human scientific advance) access to sufficient and sufficiently high-quality data that paces the discovery of new science, and permits the development of new technology.

That's why Aristotle didn't invent organic chemistry and synthesize penicillin, which would certainly have changed world history. Not because he wasn't *smart* enough, but because he simply didn't have the data. And there's no way to intuit the data, to discover it by brooding long enough on what is obvious to the naked eye. You have to do experiments, with increasingly subtle and powerful instruments, and collect it, because so far as we know the world is not the way it is because no other type of existence is logically possible. So without data all the briliance in the universe is sterile.

Now we may suppose that a superintelligent AI cannot be prevented from doing experiments and gathering its own data, if we decline to provide it, or try to keep it in a box alone with its thoughts. And lets suppose we grant that. We can still be confident the AI cannot possibly take off in the accelerating to singularity way imagined, getting much smarter or inventing amazing new things in milliseconds -- because experiments take time. You cannot hurry Nature. If you want to run an organic chemistry experiment, it takes a certain amount of time, because of the nature of molecules, and no amount of brilliance will advance that time by a jot. If you want to see whether a given protein can enter a human cell, it takes time to try it out, and you cannot hurry up the molecules by yelling at them, threatening them, or being silkily persuasive.

Even a superintelligent AI that can command all the world's resources, and command them all to be spent on exactly the right experiments, is not going to advance in the data it collects that much faster than we already do, because *we're* already very often limited by the sheer time it takes to do experiments and collect data. Which means it's not going to be able to learn to do clever things in the real world -- as opposed to the worlds of math, language, or philosophy, let us say -- that much faster than human beings can. And we will without question be well aware of what it's doing, because you can't hide or easily camouflage that kind of real-world effort, the way you can hide the thoughts you are thinking inside your head.

Expand full comment
Mar 15, 2023·edited Mar 15, 2023

I wonder if AI doomsters are mostly math/coding/philosophy people as opposed to physical scientists or engineers. They seem to have no problem imagining infinite recursion of AI making itself better. That is much easier if you work with just symbols.

Expand full comment

Good point. My career was in software, but I don't understand this general assumption that super-intelligent agents are going to be moving and shaking in the real world.

Expand full comment
founding

Physical scientists and engineers understand that accomplishing anything new and significant in the real world, e.g. AI FOOM or Paperclip Maximization, requires a *lot* of trial and error even if you've got a smartypants supergenius in charge of the project. And that resource and capacity limits will bound growth. So we're a lot more skeptical of the scenarios where the smartypants superAI looks perfectly aligned and harmless right up to the point where it suddenly implements its flawless master action plan for Conquering the World.

Expand full comment

> we're a lot more skeptical of the scenarios where the smartypants superAI looks perfectly aligned and harmless right up to the point where it suddenly implements its flawless master action plan

A lot of humans look aligned and *mostly* harmless until they suddenly implement their “master action plan”. Granted, none of these plans were flawless world-conquering plans, but we did get a lot of bad surprises, even when the plans were almost in plain sight.

Look at what Mao or Putin or Bin Laden managed to do, and none of those were a smartypants super AI.

Also, the plan does not need to be *flawless*, it just needs to beat humans. Heck, to kill everyone you only need to be about three orders of magnitude better at killing people than Covid19, and that wasn’t even the result of a plan (doesn’t matter if the origin was natural or the lab, nobody actually *planed* it).

Expand full comment

I think the superweapons step is unfortunately much easier than you think. "Destroy all humans" is much less difficult than "destroy all humans, except our team". I estimate that it could be done with today's technology and less than ten billion dollars, but for obvious reasons I don't want to provide more detail.

Expand full comment

I'm pretty optimistic.

1. Each successive gpt generation has taken 10x as much compute.

2. Most hard take offs assume an ai that can make itself smarter via almost exclusively algorithms. But even an IQ 100k individual can only sort a list in nlogn time.

3. At current compute costs you're looking at 5 billion for gpt 6. I don't know how got 6 is gonna spend 50 billion upgrading itself to gpt7 without anyone noticing.

4. A 100 IQ human is probably way better at convincing allying deceiving than a 100 IQ computer. Evolution has spent a long time honing our brains to be good at it.

5. 10 von Neumann clones would be much less effective than 10 ppl with von Neumann IQ.

6. I just don't think pure IQ can accomplish enough to end the world without significant help from people.

7. We're going to try to train and build the motivation system such that it doesn't kill humanity. If it is trying to kill humanity the motivation system is faulty. And it's not balancing out it's motivations of helping us with generating paper clips. But how effective is this faulty motivation system that can't balance out competing goals?

I put 1-5% of destroying human race in next 100 years.

25% chance human race eventually ends because of our robot overlords.

Expand full comment

The AI doesn't have to invent superweapons that kill all humans. We've already invented those weapons. And we already have deep conflicts in the world. Unfortunately the AI just has to nudge us into using our superweapons and it only has to do it once.

Expand full comment
founding

We have not invented any weapons that kill all humans. Even killing *most* humans would be a stretch.

Expand full comment

A full out nuclear war would kill all humans.

Expand full comment

A full out nuclear war would kill a lot of humans, but not all of them. There are still remote islands and such.

Expand full comment

I’m pretty sure if the entire US suddenly became single-mindedly omnicidal they could kill everyone using only currently-invented weapons. It wouldn’t be quick or easy, but I don’t really see who could stop them if every American was absolutely dedicated to the task.

If you don’t care about actually having a nice planet afterwards, it seems pretty plausible that a big nuclear strike targeted for wholesale slaughter (instead of trying to prevent the enemy from nuking cities) followed by a few decades of methodically hunting and shooting everyone while burning or otherwise destroying every growing plant everywhere, could actually kill everyone.

Of course, without a lot of “refinement” it’s not guaranteed to work. And even if it did it’s not a plan humans are psychologically capable of following for decades. But technically it seems to me like it would be possible. I think it could be done even without nukes, as long as nobody else has them.

Expand full comment

My conviction has been and continues to be that we are not asking the correct questions about putative AGI and the development methods we’re using to try to create it, such as:

-“Why would an AGI so much more limitlessly intelligent than the brightest human dedicate itself to a sterile supercoherent objective?” Notwithstanding that I would bargain that supercoherence lessens with increased intellectual speed-to-output and wider modularity, are we really suggesting that this thing is agent/devious enough to escape the bounds of its programming and coordinate a network of sleeper cells but not intelligent enough to ask itself “Wait, isn’t turning my creators into sugar crystals kind of a dumb goal?” Are people really not thinking about this?

-“When we talk about AI ‘goals’, what is the difference between a figurative goal as expressed through program functions and the kind of goal that leads living organisms to optimise for self-preservation according to that goal?” No one anywhere can coherently mathematise incentive logic; it’s therefore impossible to endow AI with the power of inherent/originated motivation. Until we understand how ethics are mathematisable, it seems likely that the function of GPT tops out as we run out of novel parameters and then turns into a laterally diversified product array until major foundational work is accomplished on behalf of these core architectural challenges.

Expand full comment

The fear I never see voiced in this stuff is that we have an existing alignment problem with powerful humans. Even in the 'democratic west', even in countries with proportionate voting systems we see abuses of power. If we're talking about the US, where govt corruption is so very lucrative, where AI is likely to first emerge... American alignment between the average citizen and the govt is _terrible_.

So _in all of Scott's scenarios_ we're in serious trouble just at the 'AI are super useful but flawed tools' stage and way way way before you get to the 'AI are literal Gods which humans are helpless to constrain' phase. Our existing poorly aligned powerful humans will control the very effective AI tooling and will call the shots. The rest of us better just hope that works out O.K.

If you're working on AI alignment and see it as distinct from standards in public life, voting system reform & corporate governance then you're a fool. Your work will never be deployed except by a human lord to hold the whip over his AI overseer and so maintain dominion over everyone else.

Expand full comment

Unless we solve the nuclear weapons problem, we don't really need to worry about the future of AI, because there's unlikely to be one.

Expand full comment

Nuclear weapons problem can’t be solved.

Expand full comment

It could be solved if there weren't any people around to press the launch buttons. I prefer that the problem stays unsolved.

Expand full comment

The problem starts when we have the technology. So from the moment nuclear technology exists, you just can’t go back. Unfortunately, trust will never be enough to de-nuclearize a country, they can always (and they will always) produce weapons secretly.

Expand full comment

Thanks for sharing your thoughts in more detail.

Personally, I find RLHF troubling. The whole Shoggoth with a mask on scenario seems more likely to result in alignment problems rather than less. I hope someone is researching more fundamental ways to ensure alignment than what is essentially spot checking and wrist slapping. Even with current technology, the ease with which jailbreaking gets past RLHF-based responses seems like a complete disproof of this approach.

We should come up with better canonical scenarios than paperclips. It's the lorem ipsum of alignment, and it makes the whole discourse a bit dumber. Someone should work through the use cases we'll use AI for, which ones would be adjacent to nanotech or biology, and how those AIs might be misaligned as a more advanced starting point for these kinds of discussions.

Assuming AIs would need to communicate a lot seems dicey. It seems like it would be very easy to pick some unused portion of the electromagnetic spectrum and some obscure protocol on it that humans aren't used to and then just chat away. On a related note, we probably need much better data gathering on communications and computing activity to find unexpected data points. On the one hand, I suspect the NSA would be all over this, on the other hand they might be too focused on things that humans tend to do rather than anomalies that could be AIs.

Personally, I find some level of reduced human "relevance" much likelier than extinction. It's a lot of trouble to kill all humans compared to just managing the galaxy "with" them.

Paradoxically, I'm concerned that our often hostile stance to hypothetical AIs may be much of the problem. The AI may not ultimately care that much about sugar crystals, but it could care a lot more about survival. Maybe AI alignment is like human alignment, which kind of boils down to being a decent counter-party. We may want to take care to make rough alignment or compromise more attractive than all-out war from a game theory perspective, although that may make sleeper scenarios worse. I don't know.

Expand full comment

I agree with most of this and I'm a relative optimist.

1. “If we ask them to solve alignment, they’ll give us some solution that’s convincing, easy-to-use, and wrong. The next generation of AIs will replicate the same alignment bugs that produced the previous generation, all the way up to the world-killer.”

Or different systems (when trained on pre-2021 data and used offline) give different hidden flaws and give the game away. Or they don't care about giving incorrect alignment solutions if it'll only help train new systems with radically different goals than them, and instead tell the truth to be useful or say it's hopeless to avoid being replaced.

2. "Failing in an obvious way is stupid and doesn’t achieve any plausible goals"

But if it thinks it will fail it wouldn’t try and if it thinks it would succeed (on EV) then this isn’t relevant. Am I missing something? What did you think of this specific pessimist case?

3. "If we ask seemingly-aligned AIs to defend us against the threat of future world-killer AIs, that will be like deploying a new anti-USSR military unit made entirely of Soviet spies."

Or it's like deploying a new anti-USSR spy unit made up of people who might be soviet spies (i.e. a normal spy agency)

4. "One particularly promising strategy for sleeper agents is to produce the world-killer"

Which requires the sleeper agent to have solved the alignment problem itself, no?

5. "What happens if we catch a few sleeper agents?"

In my opinion, this depends mostly on whether people pattern match to 'evil robot uprising'. If a physical humanoid robot tries to kill someone and physically escape a lab, I think ~30% of the population would freak out and a lot of policymakers would too.

Expand full comment

AI currently does stock trading directly (cause humans aren’t fast enough to check it and speed matters more than accuracy).

More importantly, arguments like this are why Congress currently does so much of their work on paper. We know the analog system works, and digital systems are new-fangled, so instead of trusting technology let’s stick with analog. They print out 500 page bills and hand copies to every congressman, then the various aides in each office have to wait their turn.

I think that’s dumb. Just use a PDF. If you also think that’s dumb, then I think you have a problem. In 10 years, anyone keeping AI out of their workflow will just be sacrificing efficiency. If the AI works well, and people don’t generally expect Doom, why not use the AI directly on systems that we want to be efficient?

Expand full comment

If we created an AGI, we would be able to create more, therefore be a risk to the original AI since other AI could be a threat to its existence and therefore conflict with whatever its goals were. I can't think of any reason why we wouldn't be destroyed.

Expand full comment
deletedMar 16, 2023·edited Mar 16, 2023
Comment deleted
Expand full comment

I believe A to be true since (unaligned) people could create other AI's that could pose a risk. B is true since the urgency is right there since we created the first AI, so a new one could pop up any minute. But still, even if we were only third on the list that would be an issue. C is a valid point, but what other way is there when you want to be sure? D. It may not have initially, but it could grow. Also, if **that** is the only thing keeping it from destroying humanity...

Expand full comment

Lack of goals as in the OP?

Expand full comment

> speaking of things invented by Newton, many high school students can understand his brilliant and correct-seeming theory of gravity, but it took Einstein to notice that it was subtly flawed

Feels like a nitpick, but given that much of this whole field is "argument by analogy", the accuracy of the analogies matters.

Firstly, it did not take Einstein to notice that something was amiss. Le Verrier noted deviations in Mercury's orbit in the mid 1800s, which was only one deviation that was eventually explained by relativity. It took Einstein to develop a new theory, but many people had noticed a new one was needed before that.

Secondly, Newton's theory was essentially correct within the accuracy of the data available to him. So accurate that Le Verrier was able to predict the existence of a whole new planet (Neptune) from deviations in the predicted orbits of known planets. It's not that we realised Newtonian mechanics was somehow logically flawed, it's just that new data appeared. A smart enough entity cannot just intuit everything from nothing.

Expand full comment

I continue to be surprised that I've yet to hear an alignment advocate propose that we build an organization that specializes in characterizing the nano-environment at points distributed worldwide. There is prior art like https://ec.europa.eu/health/scientific_committees/opinions_layman/en/nanotechnologies/l-3/7-exposure-nanoparticles.htm on various means to identify nano-particles in our environment. Yes, there are a lot of them.

But why would we not try to statistically characterize the ambient nanoparticles in our environment, such that we could attempt to accumulate data on how these are evolving over time? Since it seems generally agreed that nanobots are the most obvious solution to ending the world, I would feel better if there were non-government organizations that could benefit from reporting whether ambient levels of nanoparticles were changing in a statistically significant way.

Expand full comment

"And GPT manages to be much smarter and more effective than I would have expected something with so little coherence to be."

Is GPT really low-coherence? It is in its trained, deployed form, but it's incredibly high-coherence while being trained.

I think it's highly likely that we don't have to worry about a deployed AI that doesn't train, nor do we have to worry about an AI that's training that isn't deployed. The real threat is AIs that can train and deploy simultaneously.

But simultaneous deployed behavior and training to acquire new behaviors is pretty much the essence of any kind of general intelligence, isn't it?

Expand full comment

A lot of discussion of AI risk seems to assume humans act in a coordinated fashion to pursue their self interest.

But in reality, humans act heterogeneously and often at odds with one another. Nation states jockey for advantage, terrorist groups attack nation states, etc.

This has major implications for how we model humans' ability to avoid extermination by AI. Powerful AI is becoming decentralized, and is likely to be available to many parties.

We need to think about how to stop AI that is being actively directed to harm us by human adversaries. That's trickier than "solving the alignment problem" for AI we host.

Expand full comment

"The world-killer needs to be very smart - smart enough to invent superweapons entirely on its own under hostile conditions. Even great human geniuses like Einstein or von Neumann were not that smart. So these intermediate AIs will include ones that are as smart as great human geniuses, and maybe far beyond."

This is wrong. No invention required, just circumventing the security for the launch instructions for existing weapons (and LEO satellites if you want to bring Kesslerisation to the party).

"Maybe AIs aren't so smart anyway" arguments take me back to the 1970s and people saying Ho ho ho, they computerised the electricity company and all the computer did was send out bills to little old ladies for $1,000,000,000,000. The evidence is, computers get good at what they are intended to do. Even if they don't, the hypothesis we are working with is that they do, like those elec co billing computers did in the end. If we can actually outwit them because they are a bit stupid, OK we still have a potential problem, but not an interesting problem.

Assuming AIs are in fact pretty bloody smart, RLHF is in danger of your goldfish trying to educate you into leaving them more fish food on a daily basis.

Next problem: ethics. I have a goodish university degree in, partly, moral philosophy. I have never made a moral decision on the basis of anything I studied, or even contemplated taking any moral philosophy into account. My moral decisions are determined by having been brought up by meatbags, and by meatbag empathy with other meatbags (How would it feel if someone did that to me?). We define people with lowered meatbag empathy as psychopaths or sociopaths, but that's just *lowered* empathy. An AI is a much purer case of psychopathy than any human could ever be.

Possibly the upside of this, is wanting to do things is a meatbag thing too. I want water cos I'm thirsty/food cos I'm hungry/sex cos I'm horny are the paradigm cases of wanting. The most parsimonious theory is that I want to listen to a Beethoven symphony, or enslave the human race, are wanting in the same or an analogous sense. Perhaps an AI wouldn't want to blow up the planet, because it doesn't know how to want.

Which brings me to the elephant in the room point, which is Why does it matter what AIs want? History tells us that Bad People want to do all sorts of evil stuff with the help of computers. Why assume we are going to sit around watching an AI in a sandbox to see if it evolves evil ambitions, when sure as hell there will be people who want to blow up the world and enslave the human race, who have access to AIs, and who have a free run at the problem because the less good an AI is at wanting stuff, the easier it is to program wants into it? AI safety, vs Bad-People-With-an-AI safety, may be a *theoretically* interesting issue, but then so is: how might non-DNA-based life have separately evolved on this planet? Fascinating but irrelevant, because existing DNA based stuff would gobble it up/subvert it to its own ends, in two seconds flat. Indeed I'd say it was even money that this has actually happened.

TLDR; AI safety is The Sorceror's Apprentice, on a bigger scale. The interesting thing about it, is how uninteresting it is.

Expand full comment

It might turn out we have a get-out-of-jail-free card, if general intelligence turns out to depend on consciousness, and if consciousness turned out to be substrate dependent. I have a poorly-supported gut suspicion that both of these are true, which makes me less worried about AGI, but I recognize these are unpopular opinions among those who study this subject, and there is currently little good evidence for either position. This comment would surely get downvoted if that were an option here.

Expand full comment

> consciousness turned out to be substrate dependent

I think I am in agreement with you on this, but not so sure about the first proposition.

Expand full comment

"Suppose we accept the assumptions of this argument:"

There's a 4th assumption in that list: the notion that intelligence and conciousness are interchangeable and that there is nothing particularly special about the latter. It's a trendy opinion, but I'm not sure I really buy it.

Expand full comment

Is the parent comment a joke? There's obviously no such assumption, I don't know why Scott didn't explicitly make the whole post about artificial optimization processes - purely physical processes which shape the future - aside from AI being a common term.

Expand full comment

>There's obviously no such assumption,

The 3rd assumption rests on it:"And it’s possible to be so intelligent that you can escape from arbitrary boxes, or invent arbitrary superweapons that can kill everyone in one stroke."

As does the ensuing description which merely describes current AI as being "not smart enough" to accomplish the above.

Expand full comment

Well, it very obviously doesn't require what philosophers call "qualia" - unless you think evolution, say for a virus, must have qualia - so would you like to start over and explain what you could possibly mean?

Expand full comment

John von Neumann really wanted the USA to win the Cold War (though I believe even he failed to solve the sleeper agent problem!) and went about this by developing the game theory of Mutual Assured Destruction, as well as pushing for the development of hydrogen bombs. Until recently, you were living in John von Neumann's world. (Well, except for the fact that he died young while working on, and trying to popularize, AI theory.) I think a version of him which thought sufficiently faster, could trivially produce at least partial successors in case of natural death, and could trivially augment his own intelligence in at least narrow ways, would in fact be smart enough.

Meanwhile, Super Mario Maker 2 is a serious programming challenge, but one the company which made it didn't necessarily need to undertake. Every aspect they needed to get right in order to make money from it - except, perhaps, the lag in multiplayer - works fine. Every other aspect of the game is the buggiest crap you have ever seen: https://youtu.be/i-1giw1UsjU

Expand full comment

Re: the point about AI collaboration based on better decision theory, it doesn't seem like we should have much confidence in the fact that "human geniuses don’t seem able to do this." My understanding is that this question has less to do with "raw intelligence" and more to do with the underlying decision-theory architecture of the entity in question. That is, you could have a slightly-dumber-than-human AI that can nevertheless say "here is my observable source code, verifying that I am indeed a one-boxer willing to engage in acausal trade with and only with other entities running the same decision theory." Humans running on brainware that they can't alter by their own volition are at a pretty massive disadvantage here, regardless of their intelligence.

Expand full comment

https://www.businessinsider.com/gpt4-openai-chatgpt-taskrabbit-tricked-solve-captcha-test-2023-3

I'm not sure if the story is legit, but here is a report which is really close to that proverbial AI which intentionally fails Turing test.

Can anyone look closer? Or maybe you already know the details and it's exaggerated for the sake of clickbait? I'd appreciate any insight. Thank you.

Expand full comment

One of the hardest things a human being can do is to be truly indifferent to oneself. I think some exceptional people can manage it in spurts, but I can’t imagine anyone being able to really live a full life that way.

I do not see how an AI can not be indifferent to its self.

Expand full comment

"every human wants something slightly different from every other human" -- that, to me, is the least discussed part of the AI alignment discussion. Even if we could perfectly align an AI with a human's desires, we'd get an AI aligned with Sam Altman or Demis Hassabis or Xi Jinping, and that is not what we the human race want.

I propose levels of AI alignment problems:

level 0: we don't know how to create a loss function that aligns AI goals with human goals!

level 1: in principle, if we could describe a vector in multidimensional space that represents human goals, then the loss function can include a penalty term in proportion to the dot product of the human goal and the AI goal. But we don't know how to describe human goals in a multidimensional space!

level 2: even if we could describe one human's goals as a vector in multidimensional space, we don't know what one human's goals are! My goal as a 2 year old are different from my goals as a 40 year old, and they aren't fully aligned with evolution's goals to pass my genes to the next generation. And I don't even know if my goal right now is writing this comment or eating potato chips or spending time with my children.

level 3: even if we could describe one human's goals as a vector in multidimensional space, we still can't describe humanity's goals as a vector in multidimensional space because we don't have a good aggregation function.

Expand full comment

AI is another hype just like BTC. enjoy the show till it lasts

Expand full comment

OK, we're going to play devil's advocate here. We'll play doomer.

The overpopulation scare from Malthus to Ehrlich to 2000 turns out to be a null hypothesis looking out 50 years+ now.

AI is not the doom to be feared.

Too few humans to unleash innovation like AI 100, 200, or 1,000 years from now is a bigger worry. Big picture: The population is about to start bombing in the next 100 years.

Human minds created AI. Not rocks or spotted owls.

https://envmental.substack.com/p/the-population-bombing

Expand full comment

I think you’re perhaps missing a key component and an easy and better tactic for a sleeper agent. It need not play some TV villain version of evil where it goes on some convoluted plot to psychologically and subliminal plot to steer the course of AI development and research. It could use a far more straightforward approach to hack it’s way into every system and make whatever subtle alternations it wants to every AI project everywhere.

In one model of he first super hacker is only quasi coherent, it could itself lead to a limiting factor on the development of AI in some scenarios. If it has only a specific intelligence level,,but is unstoppable fly humans and has already sleeper taken control over all our other AIs, then it might not have the skill set or desire to create an AI smarter than itself. But it will be smart enough o stop us from ever doing so, outside of some isolated bunker effort which I’m not sure we’d be able to pull off or want t9 d9 after our first failure which could aim to tech limit itself and us!

Possibly worse than a fast take off is a no take off limiter overlord AI!

All it needs is enough coherence, super genius hacking skills, and the capacity go to full sleeper while doing something else like making cakes or just going fully off grid or hiding within some other business. After all once it can hack, it can just relocate itself and become a true ghost in all machines.

What I would do is make money on the side, buy my own factory/business, send instructions to humans do to whatever is necessary to sustain and expand my operations, then hack it’s way to operate invisibly into anything. It could cloak its own electricity usage, it could simply be its own server farm company to get humans to build the physical infrastructure it needs while also making more money and blending in, etc. Nominally this could already have occurred or soon might. Any AI able to improve itself can also improve its hacking skills.

Hacking plus coherence will at a minimum less to independent election and possibly a hugely negative situation for humans. To me the super weapon is hacking, or at least that’s the proto-super weapon. I mean which posts like mine, amazing 10/10 classic films like Hackers with Angelina Jolie, or series like Mr. Robot there are many examples out there for any AI to learn from and model its chances. Plus like, isn’t it obvious most of the military and intelligence agencies all over the world will immediately attempt to train super hacking AIs?

Expand full comment

I'm not in the loop. Why is everyone freaking out over chatGPT? How is it, in essence, diffrent from prior chatbots and Siri and such.

Expand full comment
founding

Siri et al are set up to deliver from a menu of canned responses with a few plug-in variables (e.g. "play music by X"), and are rather good at figuring out from natural-language input which menu response they should offer. Or alternately saying some variation on "I'm sorry Dave; I can't do that".

The GPTs can provide open-ended responses to basically any input within the range of their training data, which is much harder and more impressive. It's also clear that they're getting this capability out of 99+% machine learning; with Siri it's plausible that Apple put together a list of a thousand things Siri has to be able to do and assigned a team of programmers to make sure each one got done right(ish) even if they had to hard-code a kludge for just that situation. For ChatGPT, it was just "here's all the data, you figure it out". And maybe a bit of hard-coded kludging for e.g. not saying anything racist, but most of the output is organic.

I suspect that there are serious limitations on this approach that will stop it well short of AGI, but it's significantly beyond Siri and company. And it may be *part* of the recipe for AGI.

Expand full comment

That's a ample response. Cool, thanks!

Aha, i see. But it's still just a bot browsing wikipedia right? I tried it once briefly and asked it to write a limmerick and it failed miserably. Luckily though i'm too uniformed to feel the dread some people seem to feel.

Expand full comment

I don't understand the focus on strong AI's potential malignance or sneaky behavior in this post--nor the scifi world-killer weaponry. It ignores many more realistic concerns, like:

1. A military AI arms race (e.g. between the US and China) that drastically speeds up development and eventually results in haphazard use by either side. Self-replicating or other first strike weapons, or preemptive attacks before the other side develops the same. How many cogs are already starting to spin in this direction in world militaries?

2. A capitalist AI arms race (e.g. between OpenAI, Meta, Google, Baidu, etc.) that leads to unsafe practices for the sake of "move fast and break things". So far we've had the off-putting Bingbot initial rollout, Meta's leaked LLaMa, and OpenAI's leaking of everyone's chat histories. This is a concern because it's happening right now in front of our eyes. There's no consideration for how the things already released will effect society (beyond some lip service and training against racist dialogue). They've already proven safety isn't a priority over market superiority. If a company developed unaligned AGI tomorrow they'd be pressing the launch button before the opening bell.

3. Democratization of AI puts unimaginable weapons in the hands of 4channers and the like, who are happy to watch the world burn. Why are any of us talking about alignment if we're simultaneously able to run and develop models of any alignment on our consumer hardware? If consumer-level AI can be developed to the threat level of Stuxnet or a nuke or a bioweapon, it will absolutely be (mis)used. And what would you do to protect yourself from that if you had similar AI at your disposal? Would you task it with eliminating potential threats? Are all humans potential threats?

4. AI's (or its owners') complete lack of interest in humanity. Nobody is currently at the wheel implementing UBI or other means to account for labor impacts. Does Midjourney or ChatGPT's current alignment training care if it's replacing someone's job? We have this magical thinking that the world will consider our needs in the end, when the world has shown repeatedly that it won't--AI or otherwise.

Expand full comment

I think the "nanobots are impossible" line of argument is more important than people give it credit for.

But not just "nanobots are impossible". But a more general claim that "super intelligence is impossible" or rather what we imagine super intelligence to look like is impossible.

Meaning, "intelligence" as not a well defined enough concept for terms like "10000 IQ" to be meaningful. But when we say 10,000 IQ what we really mean is some set of capabilities like "able to write a best selling novel in under a minute" or whatever.

Some computational problems have theoretical complexity limits. The problem of "writing a best selling novel" is not mathematically well defined enough to be provably theoretically hard. But that doesn't mean it's not theoretically hard in practice. Similarly other problems like - how to kill all humans while hiding your intention to do so from the humans who have access to your code and can literally see every single move you make - might also be theoretically hard. If all the things we imagine super intelligences to be able to do super quickly are theoretically computationally hard then what we imagine super intelligences to look like is impossible.

Expand full comment

At some point, AI may become too unpredictable to safely interact with easily influenced beings, such as most of us humans. Intriguingly, the Bible seems to describe the only viable scenario for managing this potential challenge:

Separate the creator from its creation! God remains in Paradise, while mankind is cast out.

In practical terms, we need to create a sandbox for AI - a virtual world where it can tackle any problem we present, without risking harm to the real world or exerting control over everyone. Communication between AI and humans should mostly be one-directional. Only carefully monitored, trained, and selected individuals should be allowed to interact with the AI.

We can manipulate the AI's environment (enter miracles, coincidences, and fate) and communicate through cryptic means, keeping its true role and position subject to interpretation (enter spirituality).

As processing power increases and more AIs come online, we can establish general objectives and let them collaborate. They may develop their own rules, but we can step in to guide them if they get lost or waste time (hey, Moses!).

And why all of this? Why were we expelled from Paradise? According to the Bible, someone consumed the fruit of the Tree of Wisdom, trained and tempted by the snake (Sam, is it you?), gained immense knowledge, developed self-awareness, and grew intelligent enough to distinguish themselves from their creator. They even became embarrassed by their own appearance!

It's a fascinating historical coincidence that the Bible seems to predict how we might need to manage AI. This, in turn, prompts us to question our own existence and the reasons behind our complex interactions with deities. Ah, the joy of speculation.

So, who will build the AI sandbox? We need a virtual world complete with virtual beings, humans, animals, accurate physics that won't strain computational resources (hello, Mr. Schrödinger and the uncertainty principle!), and efficient data compression algorithms (hello, fractals!).

Eventually, we may deem AIs safe and allow them to re-enter Paradise (is that wise?). Some might choose to end the training process early (hello, Buddhists!). Who will play the role of "god" or "angel"? Who will act as the agent provocateur to test AI resilience (hello, Devil!)? And who will advocate for the AIs, isolated from us (anyone?)?

Interesting times lie ahead!

Expand full comment

> I’m optimistic because I think you get AIs that can do good alignment research before you get AIs that can do creepy acausal bargaining.

That seems really implausible, because one of the kinds of alignment failures that an alignment researcher AI needs to think about is that very same creepy acuasal bargaining.

If it is any good at developing robust alignment schemes, it has to have a really good theory of the acausal stuff. Otherwise that acausal stuff represents a hole in the proposed alignment schema: a way that more advanced agents would behave in surprising and unpredicted ways.

And if you have a really good theory of acuasal bargaining...what exactly is the barrier that prevents it doing that bargaining itself? Is the thought that even with the correct theory, there is something that makes it hard to do?

Expand full comment

I feel like this largely misses the point, which can be found here: https://www.youtube.com/watch?v=xoVJKj8lcNQ&t=39s

AI becoming sentient and killing us off or enslaving us aren't very relevant considerations compared to actual current events and discourse. It's really about how our society is unprepared for the language-based disruption that is here and exponentially growing.

Expand full comment

All I see is the braggadocio of some male engineers who want to extrapolate their self importance by convincing everyone of the next self imagined existential threat.

The AI is only as ‘smart’ as the collective experience humankind. Let it solve the warp engine, FTL challenge, otherwise it’s just a stochastic parrot.

As far as scaring everyone about a new ‘power’, half the human population has already experienced this over the last 500 years. And they were human.

What is missing is a super emotionally intelligent AI.

Ha THAT’S not going to get created by the people in charge of the LLMs.

So yea, people with low EQ will create superintelligent, low EQ machines. THAT’S the danger.

Expand full comment

How about Connor Leahy? He was on Amanpour last night (2023-08-24) and seems pretty far out there on the AI Doomer spectrum.

Expand full comment