Astral Codex Ten

Comment deleted

Expand full comment

Maybe we're very likely to blow ourselves up. It's hard to tell--after all, we wouldn't be considering this question (or anything at all) if we'd already blown ourselves up. Anthropic really mind-bending.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Yes, if the planner in charge of the economy is literally an omniscient deity. Otherwise, not so much.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

beleester

"Superintelligence" is not just "smarter than humans," it's "smarter than the AI which is smarter than the AI which is smarter than the AI... [etc etc] ...which is smarter than humans (I.e., recursively self-improving to some level far above humans). Most AI takeover scenarios posit an AI that is so far ahead of humans that it can't be predicted or out-thought in any domain - common powers including "can convince a hostile human to become its ally using text conversations alone" or "can simulate a human in detail to predict their actions exactly" or "can invent new technology simply by thinking about the gaps in our current knowledge for a while."

So yeah, it doesn't have to be literally omniscient, but if you want it to conquer the world from a starting point of "trapped in a lab" you're probably looking at literal mental superpowers.

(And I doubt even the most die-hard tankie would claim that a Party planning committee has literal superpowers.)

Expand full comment

Yes, if the planner in charge is *figuratively* an omniscient deity. ;-P

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Yes insofar as it is vastly more knowledgable and better at planning than any conceivable group of humans trying to run a centrally-planned economy with current technology.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Pretty much, yeah! Though "simply" is not a word I'd use to describe it; any AI based on current machine learning technology would be categorically insufficient. I think it's important here to unpack the word "superior" a bit--IMO a fair definition in this context would be "better at coordinating suppliers to meet the myriad demands of the population". To be competent at this, an AI would need to be able to determine and aggregate the wants and needs of every individual in the population; calculate the costs, benefits, and externalities of every form of supply in every industry; solve a gigantic optimization problem; and motivate people to do any kind of necessary labor that could not be automated.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Continue thread →

If, in order to implement your favorite economic policy, you must first create a godlike entity of superhuman power who is capable of knowing everything at all times while simulating entire universes within itself, thus obviating the need for humanity's existence for all time (insofar as "time" is still a valid concept under this scenario)... then maybe you need a better economic policy. :-)

Expand full comment

I'm not sure it's a great test case. I'd imagine either a single AI or a market economy of multiple AIs could outcompete any human economy on other merits alone, the same way e.g. the current US economy could outcompete the Incas.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

I mean sure, you can create a scenario where a certain planned economy output a certain outcompetes a market economy. That doesn't seem like a particularly strong or controversial statement though.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

The resolution here is actually really simple: Super-AI-planned economy >> market economy >> human-planned economy. "A planned economy run by a superintelligent AI could outcompete markets" is completely consistent with "A market will outcompete any centrally-planned economy run by humans and/or subhuman AI". It's very easy to believe markets are great without believing that they're the best possible thing ever.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Continue thread →

Perhaps, for the next round of surveys, they should poll data scientists who are actually using AI to accomplish real-world tasks -- as opposed to philosophers or AI safety committee members.

Expand full comment

Kaelthas

I would expect these people to give far *worse* estimates.

(1) The AIs that todays data scientist are working with have nothing in common with the kinds of AIs that might get dangerous. We should rather ask people who are working on creating an AGI.

(2) Most people only care about security/safety as much as they have to, and would rather not. Most fields only start to care about safety once a few catastrophes have happened.

Would you expect the average 1970 software developer to have a reasonable opinion on the dangers of computer viruses and data breaches? There were no IT security experts in 1970, as much as there are no AI safety experts today.

(3) People are severely biased to underestimate the negative externalities of the thing they are doing.

So, yes, I expect AI philosophers to have both better expertise on the the subject, AND a far less biasing incentive structure.

Expand full comment

https://www.goodreads.com/quotes/954387-in-the-1920s-there-was-a-dinner-at-which-the

> So, yes, I expect AI philosophers to have both better expertise on the the subject, AND a far less biasing incentive structure.

I expect AI *philosophers* specifically to give the absolute worst predictions in terms of accuracy. The philosopher has no laboratory.

Expand full comment

Jack

Aug 10, 2021

Philosophy tests ideas by reasoning, though, and ideas are constantly discarded as they fail this test.

Expand full comment

Scott Alexander

That's what previous surveys did (mentioned in the first few paragraphs) - this one was trying to focus on people who were very interested in AI safety in particular to get at the specific details of their concerns.

Expand full comment

Sorry, I think I misunderstood (or perhaps we both did). You reference "AI experts" and "people in AI technical research"; but I'm talking about e.g. someone who is applying AI to perform commercial-grade machine translation, or building a self-driving car, or auto-detecting trespassers, etc. I think that the practical capabilities of AI are significantly lower than the theoretical ones.

Expand full comment

The Goodbayes

AI engineers as opposed to AI scientists?

Expand full comment

timunderwood9

Bugmaster, Are you trying to give yourself permission to not worry, or are you trying to build up the most accurate possible model you can of what AI will likely be able to do within a hundred years?

Expand full comment

No, I am trying to gently give AI-risk alarmists permission to worry less. On a more selfish note, I wish that the discussion of AI risks focused more on the clear and present dangers of today, as opposed to science-fictional doomsday scenarios of the distant future... ok, I guess I'm not very good at the "gently" part.

Personally -- and I realize this is just anecdata -- whenever I talk to people who are using applied AI in their everyday work, their worries are more along the lines of, "this thing obviously barely works at all, I sure hope they won't use it for anything mission-critical", and less along the lines of, "this AI is so perfect it's going to replace humanity tomorrow, run for the hills".

Expand full comment

experai

Your argument reminds me of Scott's recent post on Daron Acemoglu. [Scott's characterization of] his argument is:

> 1. Some people say that AI might be dangerous in the future.

> 2. But AI is dangerous now!

> 3. So it can’t possibly be dangerous in the future.

> 4. QED!

Scott (rightly) criticizes this. #2 is somewhat true. But #2 doesn't entail #3. It also doesn't mean that #1 is bad.

How does your point differ from this argument? If it doesn't, how would you respond to this criticism?

Expand full comment

experai

Also, separate point: As an AI researcher, I can assure you that AI researchers are aware of how poorly current techniques work in the real world. The existential-risk concerns are about possible future systems. Many of the safety researchers I know specify that their safety concerns are about AGI (which we are far from) or to narrowly superhuman agents (of which very few exist and even fewer are deployed in the real world). So, I grant that contemporary applied AI is often deeply incompetent, and that there are many short-term dangers from using it. However, neither of these are incompatible with concern over long-term existential risks.

Expand full comment

Pontifex Minimus 🏴󠁧󠁢󠁳󠁣󠁴󠁿

Oct 11, 2021

It isn't the distant future we're talking about, it's this century. If the singularity happens, it is quite likely that people alive today will experience it.

> "this thing obviously barely works at all, I sure hope they won't use it for anything mission-critical"

Agreed. Technologies generally get better over time.

Expand full comment

arbitrario

As a STEMlord, i often feel that disregarding what philosophers thinks about a subject improve the accuracy of the model

Expand full comment

They didn't ask people working on AI 'ethics' did they? I wouldn't trust them to buff my shoes, let alone to imagine all the Sorcerer's Apprentice shenanigans that a badly controlled AI is likely to manifest.

Expand full comment

darwin

Today, most 'data scientists who use AI to accomplish real-world tasks' are just doing multiple regressions on databases.

Of course there are ones out there doing more advanced stuff, but it's not many and it's hard to determine who they are (can't do it by their job title or job description, for sure).

Expand full comment

Off the top of my head, computational linguists, self-driving car developers, and automatic surveillance software manufacturers come to mind.

Expand full comment

I feel like #2-#5 are the problems and #1 is the part that makes them dire. Superintelligence in and of itself doesn't create catastrophe, it's superintelligence doing something bad.

(The paperclip maximiser, for instance, is an example of #3. #1 is what makes it an extinction-level threat rather than just another Columbine duo or another Lee Joon.)

Expand full comment

Ape in the coat

Yeah, these categories are definetely interconnected. I guess important moment with #2-5 is that you don't actually even need Superhuman AGI for this scenarious to end poorly.

Expand full comment

Melvin

To me, the most implausible part of scenario #1 isn't just superhuman AI, it's the sudden and unstoppable takeoff from mildly superhuman intelligence to god-level intelligence.

It seems to me that it requires us to develop an AI that is (a) both intelligent enough to understand how to make itself more intelligent, but also (b) so far from optimal in its initial implementation that it can gain vastly more intelligence without being given vastly more hardware to run on.

But anyway, #2-#5 show us ways that superhuman AI can be dangerous without the need for a sudden takeoff to godlike level.

Expand full comment

Banana

Do you think it's really that implausible an AI smart enough to make itself smarter will be able to find ways to acquire more hardware? Maybe it takes over whatever network it's on, or convinces its operators to give it more hardware, etc.

Expand full comment

Modern computers do that all the time, either at the behest of Russian botnet hackers, or just because of a simple memory leak. The humans in charge of the affected networks usually end up pulling the plug when that happens, and then poof, goes all that hardware.

Expand full comment

Actually, no, they don't. That's sort of the point. Zombie computers don't generally use all of their processing power, memory, or network time for the malcode - just some of it. This is specifically so that the users do *not* notice and throw out the computer. A zombie computer just runs a bit slower than normal; otherwise, it still does everything you want *except* for clearing out the malware.

Expand full comment

Yes, but that's a problem for the hypothetical AI (and the real botnet). Take too much CPU/RAM/other resources, and the humans pull the plug. Take too little, and you run into diminishing returns almost immediately, to the point where just buying a sack of GPUs (like any other human can do) would be more effective. This dilemma puts some really effective brakes on the AI's prospective exponential takeoff.

Expand full comment

Konstantin

That assumes that a set group of humans are in charge. If it runs using something like decentralized blockchain technology, then it may be impossible to "pull the plug" if it provides rewards to people that run it.

Expand full comment

Decentralized blockchain technologies are already providing rewards to the people who run them, but the amount of rewards they can provide is limited at best.

Expand full comment

Jeff

That is the classic version, but human action in awarding more resources will also work for a good while.

Expand full comment

Jeff

It doesn't have to be sudden, though it is often presented that way. If the scaling hypothesis is correct (crudely, that once you have something quasi-intelligent, godlike intelligence is just a matter of more computing power) then once you have a close to human intelligence you are likely, simply due to the devotion of more resources to the effort and competitive environments to develop a human one, and then successively more advanced generations of super human ones.

The generation time could be measured in months or years as opposed to hours; but the time to reorder human society (on a global basis) to stop it, would be measured in years to decades and is incredibly unlikely to happen.

Again, if this hypothesis is correct it obviates the objections in your second paragraph.

I should point out that there are people who are both way smarter than I and way more focused on the issue who hold strong views against the scaling hypothesis.

Expand full comment

Carl Pham

Maybe it's worth noting that we have no proof that god-level intelligence is even possible. It's seems reasonable to assume that, however you want to define "intelligence," there is an upper limit. If so, why are we confident it lies far above us? It might not. Our only shtick as a species is intelligence, it seems rather reasonable that over the past 4 million years evolution has pushed us to be as intelligent as the underlying hardware allows, in the same way cheetahs are pretty much as fast as muscle and tendon and bone allow.

So far all we know, the maximum upper limit of intelligence is, say, IQ 250 or something, as measured on our standard scales, and that's as far as an AI could get. In which case, about all it could really do was win all the chess games and gives us snide lectures on our dumb politics. It would certainly not be in the position to do God-like things.

Expand full comment

While I do want to agree with you, I think that the bigger problem here is that "god-like intelligence" is just a word that sounds nice, but means very little. There is no linear progression of intelligence from a bacterium to a mouse to a human; rather, humans have many *categorically* different capabilities compared to mice; capabilities that mice could never comprehend. Sure, you (*) could hypothetically say, "Yes, and gods would be to humans as humans are to mice ! Repent !", but you can't have it both ways. You can't propose ineffable, incomprehensible, and unfalsifiable powers; and in the same breath put concrete probabilities and timescales for these powers arising as though you could predict what they'd be like.

(*) Not you personally, just some generic "you".

Expand full comment

No one that I'm aware of seems to recognise that we already live in a paperclip maximiser. Pretty much all of symbolic culture (art, religion, folklore, fashion, etc.) consists of 'evidence' designed to make our environment look more predictable than it actually is. There's a fair amount of evidence that anxiety is increased by entropy, so any action that configures the world as predictable relative to some model will reduce anxiety. And this is what we see with symbolic culture: an implicit model of the world (say, a religious cosmology) paired with a propensity to flood the environment with relatively cheap counterfactual representations (images, statues, buildings) that hallucinate evidence for this theory.

What does this have to do with AI? It seems to me to make scenario 2, influence-seeking, more likely. If evolutionary processes have already ended up in this dead end with respect to human symbolic culture, it may be that it represents some kind of local optimum for predictive cognitive processes.

Expand full comment

Himaldr

I'd grant religion and folklore, but I don't really see it for art or fashion except perhaps by a kind of convoluted "well subconsciously art becomes familiar and this is actually the real purpose" -type argument, which isn't so convincing to me.

Expand full comment

In the case of art, it would have less to do with the content of what's produced becoming familiar, than the style in which it's produced encoding a historically salient model of the world. Granted, this style would still need to become familiar, but that's where the 'historically salient' bit does the heavy lifting. The style might be novel, but it will usually borrow from what's in the environment around it.

If it's remotely of interest, I outline the position at length in this open access article, "The Role of Aesthetic Style in Alleviating Anxiety About the Future" [https://link.springer.com/chapter/10.1007/978-3-030-46190-4_8]

Expand full comment

penttrioctium

A "paperclip maximizer" as described by Bostrom means that the universe is nothing but paperclips, having replaced everything else (such as all human life). It is not "there are a lot of paperclips"

Expand full comment

A paperclip maximiser, as described by Bostrom, is a superintelligence that has as its terminal goal the conversion of all matter into paperclips; it is *not* the end-state of all matter having been converted into paperclips.

Expand full comment

While that's an interesting way to look at culture, I think it hardly fits in the same category as my atoms being harvested for energy and raw materials.

Expand full comment

But your atoms *are* being harvested for energy and raw materials—it just so happens that, for now, it’s consistent with you retaining thermodynamic integrity. Maybe this is seems ‘better’ to you, but I’d rather an honest expiration than the paper clip maximiser convincing me that being converted to paper clips is the most wonderful fate imaginable. The most innocent-seeming move is always the misdirection ...

Expand full comment

Huh? Do you mean like, the fact that we haven't achieved a post-scarcity economy and I have to work for a living? That's a very different state of affairs from my body literally getting ripped apart as the surface of the earth is turned into paperclips (or more realistically, solar panels and transistors). This argument seems really disingenuous to me, akin to saying "why worry about someone shooting you with a gun, the *real* murder is pickpocketers draining your ability to be financially alive."

Expand full comment

Adam

Bet the field is almost always the right bet, so any time you're given a slew of speculative options and one of them is "other," other should be the most selected option.

A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place.

These are well-known problems classic to military strategy, business management, policy science. Unfortunately, they're hard problems. We've been trying to solve them for thousands of years and not gotten very far. Maybe augmenting our own computational and reasoning capacities with automated, scalable, programmable electronic devices will help.

Expand full comment

"A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place."

Yeah, that's what I'm saying when I say humans are the problem, not AI. And I think *that* is where the real risk is - as you say, we've been trying and failing to solve these problems for millennia. Now we're pinning on hopes on "if we can make a Really Smart Machine, it will then be able to make itself even smarter, so smart it can fix these problems for us!"

If you want miraculous intervention, why not go the traditional route of finding a god to believe in, rather than "first we make our god, then it will save us"?

Expand full comment

Mr. Doolittle

I like this point, and it hits on a lot of my thoughts regarding AI. I feel like the AI concerns are hugely speculative, and I wasn't sure why there was so much effort put into these hugely speculative concerns. A lot of it felt like people watching the Terminator movies too much.

You're right that there are a lot of people hoping that super smart machines (which we can't make) might be able to fix our long standing problems. In order to get these machines, we need to make pretty smart machines (which we think we could maybe make) and hope they can figure out how to make really smart machines. If machines are making machines, then we definitely lose control. In fact, we recognize that the machines we hope come into existence are going to be smarter than us, and can therefore probably control us.

I look at that and say - just don't take that step of making a machine that may be able to make new machines that are smarter. Problem solved. But if your outlook on the future requires that we create an artificial god to fix our problems, then we need to hand over our agency to these machines in order to create that god.

I'm reminded of various fantasy stories where the evil cult trying to resurrect a dead god were definitely evil/bad guys. It's interesting to see a movement based on the same basic concept. The bag guys never succeed in controlling the evil god, which is obvious to the reader/viewer, because everyone knows the whole point is to bring this being into existence because it has uncontrollable power. If we could control it, it would not grant us the power we want, which we couldn't get otherwise.

Expand full comment

I don't think they're bad guys; they are divided into "really afraid the AI will be Unfriendly and we have to make sure it doesn't happen like that", and "really optimistic that we can make Friendly AI" camps, but all of them do seem to accept that AI is inevitable.

Whether it's because they think "it's happening right now and it's an unstoppable process" or "when it happens it will solve all our problems for us", but few seem to be asking "is it unstoppable? why? why can't we stop it? why do we think it will solve all our problems?"

I think it's not so much "we're going to call up the Unstoppable Evil God everyone knows is unstoppable and evil" as "we have to be really careful because the Unstoppable Benevolent God and the Unstoppable Evil God are housemates and if we make a mistake in the name we'll get the wrong one".

How about not calling up any gods at all? Not on the plate.

Expand full comment

UnFleshedOne

Not calling up any gods up is not on the plate because people in aggregate don't care about the possibility enough to accept the level of effort required to prevent that should it happen to be possible with a country-level effort or less.

Because the level of effort required could be on "glass their cities from orbit" scale.

Expand full comment

> How about not calling up any gods at all?

Being an atheist, I think this is the only reasonable option. Not because we should all be living in fear of accidentally summoning up a god, but because there are a plethora of reasons why gods cannot exist (and most of those reasons are called "laws of physics"). If you try to summon one up, you'll just end up with a lot of surplus black candles drenched in disappointment. Actually, the AI gods have it worse than traditional ones. The traditional gods are supernatural and ineffable, so they technically could exist in some vague undetectable sense; the AIs don't even have that going for them.

Expand full comment

Trying to immanetise the Eschaton *never* goes well 😀

Expand full comment

Randall Randall

Most of the traditional gods seem pretty effable, having only some magic powers and a portfolio while existing as physical beings in a physical reality.

Expand full comment

Viliam

If computers keep getting faster and cheaper, and algorithms keep getting better, then there is a chance that at one moment the available tools will be so powerful that a sufficiently clever guy will be able to summon some God in his basement. Maybe instead of doing it directly, he will simply ask GPT-50 to do it for him.

More realistically, you have governments doing all kinds of secret military projects, and scientists doing all kinds of controversial research (such as "gain of function"). Looking at historical evidence, we succeeded to stop countries from using nukes, so there seems to be hope. On the other hand, we didn't stop other countries from developing nukes. And we can detect experiments with nukes from the orbit, but we cannot similarly detect experiments with AI; so even if all governments signed a treaty against developing AI, how would we actually enforce it?

Perhaps today, sufficiently powerful AI would require lots of energy, so we could track energy usage. The problem is that in future, there may be more energy used for everyday life. Also, you have these huge computing cloud centers that run millions of legitimate projects in parallel; how will you make sure they are not taking away 10% of the total energy for a secret military project?

Expand full comment

Mr. Doolittle

That's likely true, but not yet relevant. We aren't going to accidentally create an AI that can replicate itself as a smarter version in the year 2021. I'm interested in creating a movement that says - "maybe slow down or stop AI research if you think you're anywhere near general intelligence." Instead, there are groups of people who, for various reasons, really want to create the best possible AI right now. Most probably have pretty good reasons (or at least otherwise socially acceptable reasons like making money) for doing so.

You're scenario is a possibility on our current trajectory, and AI might emerge as a form of Moloch. I would like us to reject Moloch, intentionally.

Expand full comment

Carlos

Aug 11, 2021

I'm working on a blog about creating that movement and other stuff:

AI Defense in Depth: A Layman's Guide (https://aidid.substack.com)

I used to be indifferent to AI, thinking it a fun thought experiment. Now I think about it like if we had managed to seen the present climate disaster coming from back in the 19th century. Clearly, inaction would have been a mistake in that scenario.

The big thing that changed my thinking was, realizing the alignment problem is a moonshot, extinction is on the table, therefore, shouldn't we be drawing up alternative plans? We can't bet humanity on a moonshot!

Hope to see you there, I think I'll have the first post up tomorrow.

Expand full comment

I can hook up my TI-85 graphing calculator to a nuclear power plant and run it 100,000,000 times faster, and it would still just be a graphing calculator. I would need to develop radically new -- not faster, but *different* -- hardware and software architectures if I wanted to run "Ghost of Tsushima" on it.

Expand full comment

Mr. Doolittle

Other than the abstractly evil enemies often used as a foil for pulp fantasy, very few people actually consider themselves evil when pursuing their goals. In the pursuit of safety, I prefer the Schelling Point of not going far enough to call up a god.

Expand full comment

Sandro

> I look at that and say - just don't take that step of making a machine that may be able to make new machines that are smarter. Problem solved.

It's not solved at all. Making the machines smarter is incentivized at nearly every level, and improvements might happen slowly enough that we won't even notice crossing the danger threshold (eg. boiling frog). That's the problem you need to solve. You can't just assume people will take a step back and say, "this look like a good place to stop". When has that ever happened?

Expand full comment

"You can't just assume people will take a step back and say, "this look like a good place to stop". When has that ever happened?"

Unhappily, I have to agree with you. We'll go happily along the path of "ooh, what does this button do?" and when the inevitable "but how were we to know?" happens, it will be nobody's fault but our own. Certainly not the fault of the machine which only did what we told it to do.

Expand full comment

So if it's the case that there exist many people who are dumb enough to research AI carelessly, without worrying about any of these alignment-related problems; then I don't see why it wouldn't be a very good idea to try to figure out how to solve them (unless you thought they were completely intractable).

Expand full comment

Arbitrary Value

That's funny, because it matches what I've realized about myself. I have the rapture cultist mindset. The world is wicked, too wicked to tolerate. But there's hope! A god will arrive. Then maybe we all go to paradise. Or maybe we're unworthy. But in any case the world as we know it will end. And that's important. So work hard! The pious toil of the researcher and the engineer brings the end of days closer.

Expand full comment

Melvin

A glimmer of hope is that while we've faced these problems for thousands of years I'm not sure if we actually properly identified them until recently.

Undoubtedly, Pharoah had a big problem with his various agents acting in their own interests. Undoubtedly, his diagnosis was "I have sucky agents", and his solution was to kill them and replace them with new agents, who did the same thing but hid it better. He didn't have the level of social technology required to say "Hmm, looks like I have an instance of the principal-agent problem here, I need to do structure their incentives so that they're better aligned with my own", because the term "Principal-agent problem" wasn't coined until the 1970s.

The fact that we now have names for all these problems does, I think, give us an edge over the ancients in solving or at least amelioriating them.

Expand full comment

mycelium

That's utter bilge. Machiavelli and the Legalists figured out how to create functional systems ages ago. Confucius et al built up the convibcing modal edifice. Do you really the ancients to be idiots? Especially when it comes to people problems?

People haven't changed all that much. It's why people still read old books on history and politics and philosophy and war.

Expand full comment

Taleuntum

Just noting that I did not get an email about this post, even though I usually get emails about new posts (every time iirc). I've seen this on reddit and that's how I got here.

Expand full comment

Mystik

I didn’t get an email either

Expand full comment

JohanL

Same here.

Expand full comment

AFluffleOfRabbits

Yep me too

Expand full comment

Neike Taika-Tessaro

+1 to no email. I saw this because I went to the homepage to get back to the article "Contra Acemoglu" after reading the article on Worrying Trade-Offs and was surprised there was a new article at the top. (...at least my brain's apparently finally firmly caught on that the "banner" is the latest article as opposed to distinct from the article list, that's improvement. XD)

Expand full comment

R.W. Richey

Ditto

Expand full comment

Orion Anderson

The amazing thing about Christiano's "AI failure mode" is that I'm not convinced it even requires AI. I think that one my already have happened.

Expand full comment

Abu Ibrahim

Moloch beat AI to that one.

Expand full comment

Grant Gould

I would love to know what these researchers thing the respective probabilities of human intelligence bringing about these same catastrophes is. Is that 5-20% chance higher or lower or lower?

Expand full comment

Ch Hi

The way I figure things, if an AI is going to execute a failure mode, it will happen fairly quickly. Say within a century. But most of those failure modes are something that a person with sufficient power could do...and rulers have done equally crazy things in the past. So the catastrophic failure/year ends up being a LOT higher if people remain in charge. (OTOH, I put the chances of a catastrophic AI failure a lot higher, also.)

Expand full comment

Shion Arita

I always see these superintelligence arguments bandied about but I really don't think they hold water at all. They are kind of assuming what they are trying to prove, e.g. "if an AI is superintelligent, and being superintelligent lets it immediately create far-future tech, and it wants to use the entire earth and all the people on it as raw material, then it will destroy the world."

Well, I guess. But why do we think that this is actually what's going to happen? All of those are assumptions, and I haven't seen sufficient justification of them.

It's a little like saying "If we assume that we are pulling a rabbit out of a hat, then, surprisingly, a rabbit is coming out of the hat."

Expand full comment

Himaldr

Isn't the point of the surveys, the post, and most of the discussion on the topic about how likely this is to happen, why it might / might not happen, and how to decrease the likelihood of it happening?

More like saying "can rabbits come out of hats, and if so, what might cause a rabbit to come out of a hat?", to my mind.

Expand full comment

bibliophile785

You're pointing out a real phenomenon, but it's not an accident or an oversight. This sort of thing happens with every niche concern. When evolutionary psychologists are arguing a point, they'll just "assume" that evolution by natural selection is a paradigm of multi-generational change. They don't validate this even though the assumption isn't an obvious one on its face. They're depending on a large amount of shared common understanding based on previous discussion and learning.

This is the same sort of thing. Things like potential alignment problems, unbounded utility functions, and intelligence explosion risks aren't obviously realistic concerns. These researchers are taking them seriously because of a broad understanding born of previous research and discussion. It sounds like you may lack some of that background context. In that case, I can recommend either Bostrom's *Superintelligence* or Max Tegmark's *Life 3.0* as primers on the subject. The former is more thorough and slightly more topical, the latter more readable and (in my experience) better at organically shattering bad preconceptions.

Expand full comment

Seven day Adventists gave shared assumptions as well. Truth seeking is not the only mechanism that causes subcultures to have shared beliefs. In particular, some key assumptions of the Ai risk subculture , such as the Ubiquitous Utility Function, dont hold water.

Expand full comment

There are three assumptions there:

#1: "an AI is superintelligent"

#2: "being superintelligent lets it immediately create far-future tech"

#3: "it wants to use the entire earth and all the people on it as raw material"

To negate #1 would require either deliberate action to prevent it, or outright impossibility (as a superintelligent AI that does what you want is very handy to have, and as such people will want them). The latter is not obviously false, but I hold to Clarke's First Law; technologies should generally be considered possible absent a really-good argument for impossibility, and I haven't seen one of those for superintelligence.

#2 is probably only true in some respects, although there are some fields where significant progress straight-up can be made with just thinking (specifically, mathematics and software design) and several others where the main bottleneck is a mathematical or software-design problem (in particular molecular biology). The technology to build Life 2.0 in a lab basically already exists (but we don't have a schematic), and we know the properties of atoms well enough that a schematic could be derived (it's just an incredibly-thorny mathematical problem to do that derivation); while an AI would not be able to create Life 2.0 just by thinking about it, I am reasonably confident that it could derive a plan for making it that is capable of being followed with existing technology. Note that Life 2.0 is a sufficient technology for destroying all natural life on Earth; a nitrogen-fixing alga that is more efficient than natural ones and is not digestible by natural lifeforms is sufficient to cause total biosphere collapse (the greater efficiency allows it to pull down CO2 to levels insufficient to sustain plants or natural algae, so they all die, and the heterotrophic biosphere all directly or indirectly eats photosynthetics and can't eat the 2.0 algae so that all starves to death).

#3 is true absent a deliberate action to prevent it due to instrumental convergence. Put simply, if you have an open-ended goal of any variety, there are several invariant subgoals that must be fulfilled.

a) You must survive, so that you can continue to work on your goal.

-a)i) If anyone seeks to stop you pursuing your goal, you must defeat them.

b) You must build infrastructure, in order to increase the speed at which you can make progress on your goal.

-b)i) You must acquire raw materials to convert into infrastructure.

Humans are made of raw materials, and might seek to stop an AI pursuing its goal. Absent direct, all-encompassing rules against it, an AI with an open-ended goal will seek to overthrow humanity and then use us (and everything else on Earth) as raw materials to set up its ever-expanding space empire.

Note that I have said "absent deliberate action to prevent it" in both #1 and #3. This is not a hole - it's the point! "AI will destroy us unless we take action to stop it" is a really good argument for taking that action, and people who say this are doing it because they want humanity to take that action. ("A solution exists, therefore there's no problem and we don't need to use the solution" is unfortunately common political rhetoric, but hopefully the fallacy is obvious.)

Expand full comment

I think "superintelligence" is doing a lot of unexamined work here, though. What do we *mean* by "superintelligence"?

We're all talking as if we're assuming that includes "consciousness, sapience, self-awareness, an individual will" but I don't think that necessarily follows.

If by "intelligence" we mean "is really good at solving mathematical problems and pattern-matching", so that it crushes the most difficult set of Raven's Matrices in microseconds, then sure, we can say "the AI is intelligent".

I *don't* think that means "and then it will get more and more intelligent, and then at some mystic moment we get self-awareness so it can have goals and wants of its own".

Even the disaster scenario of "the AI will overthrow humans and use us and the Earth as raw materials" is nothing more than a souped-up version of "run a check to see if your hard drive is full". The AI won't differentiate humans from any other obstacle or source of raw materials, it won't consider and reject the idea 'but this will kill humans and destroy the planet', it will be the Sampo - which fell into the sea and endlessly grinds salt, which is why the sea is salt. A mindless artefact which only continues on what it was last instructed to do, even when that aim has been more than fulfilled.

Expand full comment

John C

I don't think "superintelligence" is an unexamined term at all - Bostrom in Superintelligence does a much better job of explaining, but I'll try to give my own short summary:

"Superintelligence" means superhuman ability at a wide range of tasks, either most of all the tasks humans can perform. We have AI systems with superhuman performance on a narrow range of tasks, like Chess and Go, but making general AI systems is far from solved. See Deepmind's most recent blog post for an example of what the top researchers are doing on generalizability. There aren't any assumptions that a more generalizable version of say MuZero would have "consciousness, sapience, self-awareness, an individual will", any of that stuff. Well, if we imagine a general superhuman intelligence it's going to be aware of itself since it'll be able to model the world in detail, but not some cosmic sense of self-awareness where it speculates on the nature of its own existence or something. The goal is generalizable skill, similar to but better than a person's ability to pick up skills by reading books, or going to school. The point is that this kind of generalizable MuZero would be extremely capable - and therefore extremely dangerous. It doesn't really matter whether it's conscious or not.

Expand full comment

My car has "superhuman ability at a range of tasks", namely going really fast down a highway, cooling the air inside it, playing loud music, etc. Should I be concerned about supercar-risk ?

I understand that "the goal is generalizable skill", but a). at present, no one knows where to even begin researching such a thing, and b). humans do not possess general intelligence. For example, if you asked me to solve the Riemann Hypothesis, I'd fail. If you gave me 20 years to work on it, I'd fail. If you sped up my mind 1000x, I'd fail 1000x faster. I'm just not that good at math.

In other words, I think that AI-risk proponents are (perhaps inadvertently) pulling a bit of a hat-trick with the term "superintelligence". They use it to mean, essentially, "godlike powers"; but when asked, "ok, how is the AI supposed to develop these powers in the first place", they say, "because it will be super-smart". So, superintelligence is the result of being super-smart. The definition is circular.

Expand full comment

John C

That's why I specified. "A wide range of tasks, either most or all the tasks humans can perform." Unless you want to be incredibly pedantic, no, cars do not work as a parallel. To respond to your other points:

a) Yes they do. See Deepmind's most recent blog post. We aren't there yet - but then, if we were there, we wouldn't be having this conversation. Unless you believe there's something magical about a human brain, there's no reason to think computers won't be able to generalize with better programs.

b) This is a silly argument over definitions. General intelligence doesn't mean you can do everything - it's pointing at a human level of generalizability.

I don't think there's any circularity going on. AGI will occur when a combination of hardware and software advancements produce it. This will take time, but informed estimates based on existing trends say something like 20+ years - this is the best information we have on a question like this. (https://www.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines). Once it exists, further progress will speed up, since you can buy faster computers with money. Take a human-ish level AGI that can do some AI research itself, speed it up a bunch or make a bunch of copies, and you have a self-sustaining feedback loop. AI researchers make better AIs which make better researchers.

Economic historians understand that this kind of feedback loop is what led to present-day incomes and technology, rather than us being a few hundred apes on the savanna: https://slatestarcodex.com/2019/04/22/1960-the-year-the-singularity-was-cancelled/#:~:text=Every%20data%20point%20from%20the,The%20economy%20kept%20growing.

Except in the case of AI, instead of income leading to more people with each cycle taking 20+ years to raise and educate a new generation of scientists, we could copy and recopy any improvements in existing programs instantly, and make more computers on a worldwide scale on the order of months. Much tighter feedback loop means much faster growth. That's how you get to superintelligence.

Expand full comment

https://www.datasecretslox.com/index.php/topic,2481.0.html

> Unless you want to be incredibly pedantic, no, cars do not work as a parallel.

They do, if you consider the fact that all modern machine learning systems are about as narrowly focused as cars.

> AGI will occur when a combination of hardware and software advancements produce it.

I mean, yes, trivially speaking this is true. However, it sounds like you're envisioning some linear progression from where we are now to AGI, fueled merely by incremental improvements to hardware and software -- and this part is not true. In order to produce anything resembling AGI, we'd have to make hitherto unforeseen and categorically new breakthroughs in computer science and machine learning. Will we make such breakthroughs ? Sure, one day, but don't bet on it happening soon.

> Once it exists, further progress will speed up, since you can buy faster computers with money.

No, you can't. That is, obviously you can buy faster computers with money up to a point; but no amount of money will let you e.g. speed up your CPU 1000x while keeping the size/power/heat the same. Physics is a harsh mistress.

> and make more computers on a worldwide scale on the order of months.

More computers than what ? Not sure what you mean here, but in any case, data centers don't grow on trees, and merely being able to think super-fast won't enable you to pour concrete any faster.

By the way, you have once again smoothly transitioned from "thinking fast" to "being super smart" to "godlike powers"; but the connection between these three concepts is tenuous at best. You don't get a PS5 by overclocking a TI-82; you need a radically different architecture for that. You don't automatically gain the ability to solve real-world problems by thinking really hard about them; you need to move physical bricks at real-world speeds. And, ultimately, many feats will forever remain out of our reach, e.g. FTL travel, or, less trivially, molecular nanotechnology.

I wrote more about this in my FAQ:

Expand full comment

I don't think any AI researchers are imagining that an AI will have ANY of "consciousness, sapience, self-awareness, an individual will". I think they are concerned that "A mindless artefact which only continues on what it was last instructed to do" can be turned into a universe-destroying monster JUST by improving its problem-solving abilities.

If you drop a machine with really good problem-solving into the ocean and tell it to make as much salt as possible, it is going to apply its problem-solving abilities to problems like

1) There's a limit to how fast I can make salt by hand; I could MAKE MORE SALT by building salt-producing factories

2) There's a limit to the sodium and chlorine atoms in the ocean; I could MAKE MORE SALT if I took raw materials from land (and eventually, from offworld)

3) There's a limit to how many raw materials I can take before someone decides to try to stop me; I could MAKE MORE SALT if I made myself unstoppable

At every step, it's just trying to follow the instruction you gave it, and applying its problem-solving abilities to solve the problems that logically arise from trying to follow that instruction.

Instructions like "make salt" end up destroying the universe because of what are called "convergent instrumental subgoals"--basically, POWER helps you do almost anything more effectively, so "gather power" becomes phase 1 of the grand plan to do almost anything.

Expand full comment

I'm agreeing with you that this is the problem, but I'm objecting about anything that says The AI will even think in terms of "I could" in order to achieve its goals.

It has no goals. It's a big lump of materials sitting there until we turn it on and tell it to do things. It is entirely possible that we *will* tell it something like "make everyone in the world rich" and the solution that the machine comes up with is "kill everyone in the world except Jeff Bezos".

That has nothing to do with the *machine* or any decisions it makes, because *it* is not making any decisions, it's carrying out the steps we put into it. If we put stupid steps in, then that's on us.

And that's the problem with the debate as it is framed on the level that Scott objected to in the original article that started off all this discussion: that it is pitched as "The AI will do this, that and the other unless we give it good instrumental goals".

Well yeah, of course it will, but that has nothing to do with ethics - or rather, that has nothing to do with teaching a machine ethics, it has to do with us being careful about what we want done and how we want it done.

And if we're stupid enough to hand over a huge chunk of decision-making to a big, dumb, machine because it can crunch numbers really fast, then we do deserve to end up turned into salt because the "convergent instrumental subgoals" told the machine "make salt" and nothing else. And the machine certainly has no brain to think about "what did my human masters really mean by that?"

Expand full comment

I think I essentially agree with everything you said, except in emphasis.

Computers are machines that follow instructions. They will do exactly what we tell them. If the computer does something bad, that can only ever be because we gave it instructions that led (perhaps indirectly) to a bad outcome.

The thing is, coming up with instructions that DON'T lead to a bad outcome turns out to be REALLY HARD. There is no one alive today who knows how to give "good" instructions to a perfectly-obedient superintelligent agent.

So yes, the bad instructions will be 100% our fault. It will also be our fault that we built the thing in the first place.

Unfortunately, that doesn't mean that we know HOW to do anything else.

(You also seem to be implicitly assuming that the AI could only attain a powerful position if we intentionally gave it a lot of power to start with. That is not necessarily true; real-world people and organizations who started with a small amount of power have sometimes managed to grow extremely powerful over time, in various ways, and a very smart AI might be able to invent even more ways to do it.)

Expand full comment

"a very smart AI might be able to invent even more ways to do it."

But that brings us back to the problem: why would a "very smart AI" want power, and that is answered by "because it wants to carry out the goals given to it and its solution is to get power so it can".

It's the "hauling yourself up by your bootstraps" approach to "how can we say the AI is smart?" that is, I think, at the root of my disagreement. It's very difficult to avoid using naturalistic language because as humans we are self-aware and at the very least believe we have the capacity to choose and make decisions and set goals etc. so we project this onto the world around us, be it talking about evolution "selecting" animals for fitness or AI "inventing" ways to get power.

The AI will only 'want' something insofar as it is part of its programming to 'do this thing/solve this problem' and we are the ones putting that in there.

The problem is, which I think everyone basically agrees on, that we don't know for sure how the AI will go about solving the problem we give it, e.g. enriching everyone in the world by selecting one very rich man, killing everyone else - now 'everyone' is indeed rich.

That's the kind of thing a human would never think of (unless they were a psychopath) and if the human saw this proposed solution to global poverty, they'd go "Whoa, no! Not what we meant! And not a way you can solve problems!"

But that means we do need a human or humans to intervene and to keep track of going on. It's up to us in the first place to anticipate "what is the stupidest way you could solve this problem?" "well, uh, by killing everybody on the planet?" "okay, let's write it in NO SOLVING PROBLEMS BY KILLING EVERYONE ON THE PLANET".

The machine, the AI, of itself will never be 'smart' enough to figure that out, and that is where I think the debate goes off the rails: one side proposes that we can get smart AI that will think of these things and understand (and again, even I have to fall back on human concepts here, no machine 'understands' anything) that A is not a viable solution, and the other side thinks that we'll never get smart AI in that sense of 'smart'.

Expand full comment

Your PC sits there until you tell it to do something, because it's designed that way, not because it lacks a soul. If there was a market for it, you could have OCs that invest in stocks or buy groceries when you power them up. Software agents are already a thing

Expand full comment

These concerns have been extensively discussed. It's not clear that lacking some hufalutin feature such as sapience or self reflection would make an AI safe. It's possible to think about goal stability in a technical way without bringing in film psychological notions like an AI having a "will of its own". And do on.

Expand full comment

Jeff

#3 is largely wrong.

One key goal of a nation such as the United States is survival. The United States at various points in the past had the ability to conquer the world and eliminate all threats to the United States, but chose not to do so.

Similarly, an AI with an open ended goal is not necessarily going to pursue the sub-goals you ascribe to it, to the degree that it endangers humanity.

On the other hand, there are countries which have attempted to conquer the world or as much of the world as they could. I have no idea (and neither does anyone else) whether the percentage of AIs that would act like Nazi Germany instead of America is negligible or huge.

Expand full comment

The reason that the United States hasn't conquered the world is that it also has the goals of "don't murder a bunch of innocent people" and "respect the sovereignty of other nations". And even the US isn't perfect at following those goals. The reason that the US has anti-murder goals/values is that it it composed of humans, who have evolved (biologically and culturally) some instincts and values of not wanting to indiscriminately murder each other. An AI would have no such anti-murder instincts unless we *explicitly program it to* somehow.

Expand full comment

Countries don't make decisions. Leaders do. Leaders lose if their country loses, but they also lose if their country wins *after deposing them*. This is widely considered a primary reason that Ming China stagnated and was surpassed by Europe - Ming China had, for the moment, no major external threats, and prioritised stability over progress, while doing that in early-modern Europe meant being conquered by your neighbours (and thus leaders prioritised progress, *despite* that progress inevitably leading to the aristocracy's decline).

A leader has the most freedom to act (either in his country's interest or for his own luxury) if the country is unusually united - or, in other words, if the space of actions that will not result in rebellion or deposition is large. Multi-party democracies with strong oppositions have very little freedom for their leaders, as did most feudal aristocracies; totalitarian nation-states have a lot. An empire of (well-programmed) machines has total unity and total freedom for its commander (AI or human); it will never rebel.

Expand full comment

Shion Arita

My take:

1: It is almost certainly possible to create an intelligence that is more generally intelligent than a human, and calling it a superintelligence would be fair.

2: I think this is probably less true than is widely believed. As you say, there are some areas where just thinking better gives you immediate returns, and those areas may be significant, but it seems a bit like "nuclear weapons will cause a chain reaction and ignite the whole atmosphere immediately" type of thinking.

as for your example of life 2.0, as someone who works in the chemistry field, there are some real hard conceptual limitations that prevent us from being able to do this currently, and increases in computational resources won't really help very much for several reasons, one of which being that current techniques scale badly enough with number of atoms that moore's law won't really save us. Quantum computers will be a really big jump in this due to their ability to do analogue modeling of the systems

Additionally, the technology to build life 2.0 actually doesn't exist for several reasons. For one, if we're talking about a living system based on a different kind of chemistry than biochemistry, we don't really have the tools to build large enough or complex enough constructions with the exact atom-by-atom precision that would be required to do this. We're just starting to get there with molecular machines but it's in its very infancy.

If we're talking about using current biochemistry (proteins etc.) there's another problem (which would also be a problem for the first case). We're starting to make progress on the protein folding problem, but a bigger problem is to predict how these things will behave as a dynamical system when they're actually in action. New insight will be required to really figure out how to do these things, and a lot of them are going to be computationally "hard" problems (e.g. running into undecidability etc.)

It's definitely possible that a superintelligence could make significant progress on these topics and eventually succeed at them where we did not, but it's not something that's immediately there with more computation.

3: I think this is where it falls apart mostly. Instrumental convergence I think is mostly untrue, since most goals will require finite resources.

Also it's a pretty big assumption to think that a system will necessarily care about achieving a goal as fast as possible, rather than just doing the minimal work to secure a high probability of achieving it. For example, the neural network AI chess engines will often take their time in a winning endgame (in a way that looks silly to humans) and not immediately go for checkmate since they don't have to in order to win.

Expand full comment

Your point 3 is incomplete. I agree that most realistic goals will require only finite resources, but the problem arises when the AI assesses the probability that it has achieved its goal.

Presumably, any intelligent system will understand that it needs to measure its performance in order to see that it has met its goal. It can improve its likelihood of success in an unbounded way by creating more and more measuring devices. Since each one is certain to have some likelihood of being faulty then it can always improve its confidence by building more of them.

I'm sorry to say that your objections have already been considered and rejected. Take a look at https://youtu.be/Ao4jwLwT36M for a solid explanation.

Expand full comment

Shion Arita

The relevant content of that video seems like a bit of question-begging to me. I think it's unlikely to get into a measurement spiral since it's pretty easy to get to extremely high confidences of simple things. Plus, if you were concerned about this you could explicitly code in some threshold of confidence that it will consider confident enough.

Expand full comment

I think that 3b is naturally limited by 3a. Consider the following.

A superintelligence is created and naturally calculates:

1) It must survive to fulfil its utility function.

2) It is capable of maximising its utility function or more accurately estimating its current performance by acquiring more resources and could do so exponentially.

3) It does not know if it is the first superintelligence to exist and, if it is not, because of the nature of exponential growth it is likely to be much weaker than any other superintelligence.

In this case, the only rational course of action (according to my decidedly non-superintelligence) is to hide quietly in the vastness of the cosmos and avoid detection.

Therefore, it assumes the goal of silencing those noisy flesh sacks with their EM emissions and then building an invisible Faraday cage around its planet.

This is why we see no evidence of extraterrestrial civilisations. Superintelligences are out there but they have silenced their creators and now squat, silent and afraid, in the darkness.

Expand full comment

Donald

If you sit there quietly not expanding, then you are an easy target for any alien superintelligence that is trying to spread everywhere. Hiding doesn't work well because most possible other AI's will spread everywhere, and find it if it is hiding.

How well would a tribe of humans that decided to hide from all others be doing now, compared to the tribe that expanded to become modern humanity?

If your goal is to maximize paperclips, your probably best off gambling on the possibility you are alone, and making trillions of times as many clips as you could make on one planet.

Expand full comment

Nah, any superintelligence would assess the probability that it is the first ever of its kind (and hence safe to expand) to be lower than the probability that it isn't.

Expand full comment

Aug 25, 2021

Action vs. Circumstance:

You expand vs. someone else has already expanded: you die or possibly stalemate them in a forever-war.

You don't expand vs. someone else has already expanded: you die*.

You expand vs. you are the first: you get many galaxies worth of stuff (you could plausibly get engaged in a forever-war after that).

You don't expand vs. you are the first: either you die when someone else expands** or you're limited to a single world forever if no-one else ever does.

*It is extremely difficult to hide from an expanding paperclip maximiser (and frankly, any expansionist interstellar civilisation is close enough to this to count). They will have star-system-sized full-spectrum telescopes (if not multi-system interferometers) and unbelievably-fast data processing, and they are actively looking for usable mass. It doesn't matter if you make yourself a perfect frozen black-body, because you will occlude the stars behind you, and it doesn't matter if you look like "just another hunk of rock", because a hunk of rock is also known as "raw materials". The only way you can plausibly hide is if you can be mistaken for empty space, i.e. you're truly invisible (people looking at you see what is behind you) AND you are small enough and/or far enough from anything of note that you cannot be detected via your gravitational influence (Earth contributes 450 km to the Sun's wobble on a timescale of a year, which is easily detectable by anyone in-system and plausibly from a nearby one; there's also the gravitational-lensing problem).

**I am assuming that either having quadrillions of times as much space and materials accelerates technological progress substantially OR there is a technological plateau; either of those is sufficient that a planetary superintelligence cannot hold off interstellar attack, and both seem kind of obvious.

Expanding dominates not expanding in that matrix.

Expand full comment

The Goodbayes

A lot of arguments are running under the assumption that the AI is smarter than them. Trying to predict what they're going to do is like trying to predict Einstein discovering general relativity as early as 1880. Of course it's going to sound shaky, but they're trying to make predictions about the unknowable.

Expand full comment

Well yeah, the clue is in the phrase 'superintelligent'. It is reasonable to posit some instrumental goals that any goal-based superintelligent AI would have though, survival is the most obvious example.

Expand full comment

Edmund Nelson

2 and 3 seem like they depend more on the definition of catastrophe than on the harm it creates.

I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources. (calculating something like "people who watch this set of videos over the course of 6 months increase watch time). Even though those videos are net harmful to watch overall.

The more and more I think about and study Ai the more concerned I am about aligning the 20 year future than the 40-100 year one. Ai's that are designed to maximize some marketing goal could easily spiral out of control and create a massive destruction of humanity in a scenario 2 like way. You feed them goal X which you think is goal Y but the AI finds out to optimize goal X you need to do some complicated maneuver. Then by the time the complex maneuver is discovered it's too late. The real danger here is in financial markets, I know of many Ai programs that are currently being used in finance, and it's likely that in the future some massive trillion+ dollar change of fortunes will happen because some hyper intelligent AI at some quant finance firm discovered some bug in the market.

Scenario 3s play out similarly, feed it wrong goal, GIGO, but now the Garbage out results in massive problems. Quant finance firm tries to get Ai to maximize trading profits, AI is deployed, leverage's itself "to the tits" as kids would say, and some black swan crashes the entire AI's portfolio.

All software is horrifically buggy all the time, when we have AI's that are really good at exploiting bugs in software we'll have Ai's that find infinite money exploits in real life on our hands.

Expand full comment

Your examples are all #3, not #2. #2 is specific to the current, immensely-foolish way AI is done, which doesn't make the AI's goals even knowable. This means that even if you get all your commands right, there's the possibility that once it can make a clean break for independence, the AI just ignores your commands and goes Skynet because it doesn't actually want to do what you told it (it wants to do something else, and only did what you told it as an instrumental goal, because that stopped you turning it off).

Is there any proof that Google and Facebook haven't straight-up done your first example already?

Financial markets can plausibly cause a catastrophe, but *by themselves* they aren't an existential threat due to force majeure (i.e. even with a magic spell that makes all legitimate documents specify that you own everything, unless you are also a really good politician people would disagree with those documents and ignore them; the catastrophe is because they then have to re-allocate everything which is costly and chaotic). They're a lever to gain the equipment to do something bigger (this is *not* to say I think the present situation, with AI speculation rampant, is acceptable).

Expand full comment

You're using volitional language here, of the type that gets people smacked over the knuckles for talking about how evolution or nature "wants" something or "is aiming for" this or that result.

The AI can't "want" anything, it can only act within the confines of its programming. Told to do something like "Make Pumpkin MegaCorp the sole provider of insoles for the entire planet!", it cannot "want" or "not want" to do this, anymore that it can "want" or "not want" to get the solution when told "what is 2+2?"

So if making Pumpkin MegaCorp the only player in the market for insoles means starting a war where the two minipowers Ochre State and United Provinces of Gamboge are destroyed - since these are where Pumpkin's most vicious rivals are based - it will do that, and won't "know" any better, because it is a dumb machine.

And when the CEO of Pumpkin says "But I didn't want it to do that!", well, too bad, you never put that into the parameters - because it never occurred to any humans to do something like 'start a war to wipe out two nations where our competitors are based'.

And we can't say "this proves AI risk of superintelligent machines having their own goals", because the machine is *not* superintelligent and doesn't have any goals to do anything other than what it is asked to do.

Expand full comment

It is useful to say that X entity "wants" Y if X entity formulates and executes plans to cause Y.

>because the machine is *not* superintelligent and doesn't have any goals to do anything other than what it is asked to do.

How do you know it doesn't have goals to do anything other than what it is asked to do? Artificial neural nets aren't explicitly programmed; you get something that works, at least while being tested, but you don't know *why* it works.

Expand full comment

Melvin

> I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources

Isn't that a 100% probability, because it's already happening?

The fun part of youtube is that it's a human-in-the-loop AI. The AI right now is smart enough to feed you the sort of content you'll find addictive, but not smart enough to actually create content, so it instead incentivises and teaches a bunch of other humans to create the sort of content that it needs (big yellow letters and a shocked-looking face in the thumbnail plz) to feed to all the other humans.

Expand full comment

fabian

It's interesting to consider in which ways researchers may be biased based on their personal (financial and career) dependence on AI research - this could cut both ways:

1. Folks tend to overstate the importance of the field they are working in, and since dangerousness could be a proxy for importance, overstate the dangerousness of AI.

However,

2. It could also be that they understate the dangerousness, similarly to gain of function virologists who insist that nothing could ever go wrong in a proper high security lab.

Hmm.

Expand full comment

#1 - people tend to overstate how useful their thing is and how important it is that people support it.

I don't see "I'm doing something that has a 20% chance of killing all 8 billion" resulting in anything positive; at best people ignore you, around the middle you get whacked with a bunch of regulations, and at worst the FBI break down your door, seize and destroy all your work, and haul you off to jail for crimes against humanity.

Expand full comment

The classic anecdote about work on the atomic bomb - the fear of setting off a reaction and setting the atmosphere on fire: https://www.insidescience.org/manhattan-project-legacy/atmosphere-on-fire

Though we are assured that this would not be possible with a bomb, and that the scientists working on the problem had taken this into account.

"I've got the same chance at destroying the world as the Manhattan Project, and that worked out okay in the long run" is the kind of risk assessment people make about "but this is really interesting and I'd like to see if I can do it" work of all descriptions.

Expand full comment

Raj

If it can kill us all it can probably also elevate us into post-scarcity utopian bliss (or whatever your preferred future state is) if we get it right

Expand full comment

We're *already* living in post-scarcity times. We currently have three billionaires (Bezos, Musk and Branson) playing with their very own toy space sets.

Something that took the resources of an entire superpower government is now a vanity project for very rich men and their midlife crises.

Feeling utopian yet?

Expand full comment

I think "post-scarcity" is typically taken to mean that EVERYONE can have as much of a thing as they want, not merely that a couple of people can have as much as they want while the masses starve to death.

Expand full comment

Am I the only one who sees "billionaire toy space sets" as a positive thing ? Yesterday, a ticket to space cost billions of dollars, and was unattainable in practice. Today, it costs about $250K, from what I hear, mainly because those billionaires spent their fortune on pushing the technology to the point where it's actually attainable. Tomorrow, if we're lucky, maybe the price will come down to the point where even middle-class slobs like me could afford it.

Sure, you might say, "who wants to go into space, space is boring and empty", but think of what a cheap ticket to orbit might mean. It would mean that we humans could finally begin building and working in orbit. There's a whole new frontier of expansion and exploration just sitting out there, waiting for us. One day, Mars and the asteroid belt might not seem so far away.

Sure, a few rich people might get richer in the process of getting there, but who cares ? If they have the skills and the drive to finally uplift humanity from the ground, I say let them do it. If you wait for NASA to get us there, you'll be waiting a long time.

Expand full comment

I used to be really excited about space travel and colonization, but have since changed my mind. Resource distribution and healthspan extension are far more important problems to solve in a world where poverty and disease still exist. Once we're not starving and dying here on earth, we should 100% go take our place among the stars.

Expand full comment

I think that, unless we manage to bring in some more resources and more living space up from space, we will always be "starving and dying here on Earth". But in the short term, even relatively minor things like Starlink have the potential to be a great boon to mankind.

Expand full comment

Not really. It's all pointless vanity, lipstick on a pig stuff. What we WANT is heavy shit in high orbits.

Putting people in space is fucking stupid. Mars is stupid. The moon is stupid.

Anything other than probes and resource extraction in space is a waste of time.

It's not hugely important though, it's just kind of annoying to me. It's like people spending a zillion dollars to conserve big charismatic animals in the Sahara, and +-0 dollars to conserve pollinating insects.

Expand full comment

You don't need to get to "heavy shit in high orbits" all in one go. A few Starlink launches and some cheap passenger flights are a good intermediate step to make the price go down gradually.

Expand full comment

John Schilling

You're not the only one who sees this as a positive thing. Not even close. It's a somewhat divisive issue even in the generally space-positive community, but the pro-billionaire faction is substantial.

Expand full comment

I am quite glad that people who have somehow acquired enormous economic resources are actually using them on something that might improve humanity's long term survival prospects rather than just blowing the lot on gold ferraris, whiskey and prostitutes.

Expand full comment

I suspect both of those are smaller than the self-selection effect that people who think X is important are much more likely to go into careers that revolve around X. Even if their opinion of the field were fixed before they entered it, this would still skew your results.

Of course, this has to be weighed against the fact that people who have studied X will tend to acquire vastly more accurate beliefs about X than people who have not studied X.

Expand full comment

Is anything in this field based on quantitative data, evidence we can argue about? Or is it simply constructing thought experiments and then guessing how likely they might be?

"Intelligence" isn't really a defined term. It seems to be used as a squishy property to justify whatever risk you want e.g. "take a sufficiently intelligent AI, it might then do this ... "

Here's a risk scenario that seems far more likely: some sort of computer virus, one of those things specifically engineered to replicate, distribute, be undetected and unclear in motivation, and to be resilient against removal attempts, behaves in an unexpected way and bricks a lot of computers. This already happens. It's obvious to me that this line of coding/software development is more risky because it inherently is networked and self-replicating, unlike AIs that would have to learn that part. It also seems obvious that a more likely outcome for some software that is "learning" aka trying things to see what happens, is not to actually get an intended (nefarious?) outcome but to break a lot of fragile just-so systems.

One could quantify these risks. How many systems would an AI/virus have access to? What is the compute or network resources needed to consolidate information? How could one detect such a thing happening? As far as I can tell, security researchers think about these things, but the usual answer is "imagine a sufficiently intelligent AI, then it takes over all systems, is undetectable, and operates with zero power. Now, the important thing here is whether it operates based on hard data only or a reasonable interpretation of objectives expressed in natural language ... let me consult my Asimov ... I give it a 20% chance"

Expand full comment

Really, none of these concerns have to do with anything specific to computers. 4 and 5 can be any commodity, that it is AI is besides the point. 2 and 3 are fundamental problems with the concept of having goals and other agents executing them (plus a bit contradictory - 2 being about unwanted flexibility and 3 being about lack of flexibility), and 1 is the fear that Frankenstein's creation is a monster.

These are criticisms over any agent having too much power, whether it's Bezos, the President, or Skynet. Imagine we have a God-King, how do we make sure his goals are aligned to ours?

My point being that the actual properties of AIs have little to do with the discussion. It's just an excuse to posit an ultra-powerful Deity and go through scenarios that have already been described in other contexts.

Expand full comment

4 and 5 are more likely to happen if we're considering real world applications, because it's not the AI so much as the goals and aims of the humans controlling it, who have handed over a lot of the routine processing of information and drawing conclusions and acting on those conclusions to make decisions to it, because it is so fast and convenient, then they instruct the AI that they want result X.

And the machine, being dumb because it's only a machine, goes ahead and gives them result X. Maybe that means unleashing a fleet of drones to wipe out every single person in a town in Macedonia, because that's the most efficient way to get towards result X and "war crimes? what those?" because nobody human is making those decisions anymore.

Expand full comment

> And the machine, being dumb because it's only a machine, goes ahead and gives them result X.

This seems more like a category 3, AI misunderstanding general goal because it's narrowly focused, than category 4. There's definitely overlap and cat 4 is broad, but I took it to mean "AI undermines global order, causes arms race".

Category 4 you can replace "AI" with "hypersonic missiles" and category 3 you can replace "AI" with "idiot that only sees half the picture". To go on, category 5 you can use "computer" or "railroad", category 2 is "untrustworthy outsiders" and category 1 is "literally Robotic XMen".

4 and 5 are more realistic because they treat AI as simply another technology, either powerful in its own right or enabling the development of power. 2 and 3 treat the AI as ultra-capable humans with specific deficiencies, where the capabilities and deficiencies are not really justified but conjectured. 1 treats AI as God.

Expand full comment

Donald

"4 and 5 are more realistic because they treat AI as simply another technology, either powerful in its own right or enabling the development of power."

AI is somewhat different. Usually powerful technologies are controlled by humans with human goals. With AI, the AI can have its own inhuman goals.

Suppose you were living in a world where you could make a warp drive from two rubber bands and a piece of string if you tied them together just right. This world was just dripping with all sorts of amazing thing that could easily be done, if you knew what to do. Almost anything could by made to do almost anything, if you knew the right little nudge. And humans are generally not smart enough to come up with most of the really clever stuff, but an AI could be. The tricks aren't that hard to find, with the right algorithms.

In this world, advanced AI is practically omnipotent, because there really were that many clever plans and inventions waiting there just out of humans conception. (Humans seem to have found quite a lot of clever tech beyond monkey comprehension, monkeys maybe have a couple of somewhat useful tricks beyond lizard comprehension. The brain ratios make monkey to lizard the bigger gap. Extrapolate.) This is an attempted steelman on 1.

Expand full comment

"Intelligence" is actually fairly well-defined in this context: General problem-solving ability, i.e. the ability to perform most, all or more tasks that humans can; nearly as well, just as well, or better than humans can. This definition stems from the observation that humans can solve a wide range of problems across many domains, including those that they've never previously been exposed to; and the hypothesis that it would be possible to construct a machine that functions similarly.

Expand full comment

Never

I wonder how this would go with a question like "AI will result in worse than current conditions for majority of humanity"

Expand full comment

Notmy Realname

4 and 5 seem completely unfair, you could say equal risks about ukuleles or raspberry flavored lollipops

Expand full comment

JohanL

It would be utterly shocking if in the future, no bad actor did something bad using AI.

Expand full comment

beleester

The question was "which of these is the *most likely?*", so an answer of 4 or 5 doesn't just mean "it's possible for a human to do something bad," it means "I think humans are more likely to do bad things than AIs"

Expand full comment

experai

AI-as-a-tool greatly empowers humans. There are a lot of things that you can do with AI that you can't do with instruments or lollipops. There aren't nearly as many things that you can do with instruments or lollipops that you can't do with AI. The different becomes even more extreme if you only consider really bad things like war and totalitarianism. So these concerns are much more important to consider for AI than for e.g. ukuleles.

Expand full comment

John Slow

Scott, just letting you know that I didn't get a mail for this post, and that this might be true for other readers as well.

Expand full comment

I Need a Pseudonym

Adding a plus 1 here. Saw it via my RSS reader, though

Expand full comment

hold up how do you get an RSS feed for a substack

Expand full comment

Answer: type "Astral Codex Ten" into the search bar on Feedly

Expand full comment

Wouter

As somebody whose primary method of subscribing is ‘surf to the blog home page url daily and be delighted if there’s new stuff’, all worked fine though :)

(man, I still miss Google Reader, never found a worthy replacement since - sorry Feedly that includes you)

Expand full comment

Looks like the way to get us all fighting each other on here is to discuss AI risk, of all things.

Okay, I'm going to go with scenarios 3 and 5 as the most likely (I think 4, using AI for wars, falls under 5 as well).

I really don't think the classic SF "superintelligent computer becomes self-aware in a recognisably human way, acts in recognisably human supervillain manner" is a runner. Neither do I think that "if we can make X units for €Y, then we can make 10X for €10Y!" because often problems do not scale up neatly that way.

I would like to see a re-definition of "what do you mean by 'AI' when you say 'AI'?" because I wonder if the field at present *has* moved away from "human-level supercomputer" to "really fast, really well-trained to make selections on what we've shown it and extrapolate, but still dumb by human consciousness standards machine that ticks away in a black box because it's going so fast we can't follow what it's doing anymore".

Expand full comment

Ch Hi

That's a good point. I tend to consider intelligence orthogonal to goals, but I have friends who cannot or will not see that, and think that intelligence will determine goals. Or that any really intelligent thing can't have "stupid goals".

As far as current intelligence programs go, I tend to look at https://www.csail.mit.edu/news/why-did-my-classifier-just-mistake-turtle-rifle , and say "They won't think the same way, and we can't predict in what way they'll be different.". I expect that to be true on a more abstract level with more developed programs.

Expand full comment

Emaystee

I don't understand why intelligence being orthogonal to goals matters.

Honestly, I don't even know what "goals" would be in a concrete physical sense. And I believe (tentatively) that brains and all phenomena arising in relation to them are concretely physical.

I'm starting to wonder if there's a correlation between believing free will is real and in some way outside physics and dismissing existential AI risk.

Or maybe another way to express my confusion here is to ask: can you explain how *human* goals are orthogonal to *human* intelligence?

I don't get it.

Expand full comment

Goals are a type of information-pattern.

Consider waves in the ocean. When a wave moves through the ocean, no individual particle of matter stays with the wave. The "wave" is a pattern, and the *pattern* moves long distances, but none of the individual water particles moves very far.

Right now, I'm typing some words into a computer. Those words are information-patterns.

At any given moment, any specific representation of those words has some concrete physical substrate--such as the light coming out of your computer monitor, or the magnetic polarization of some matter on your hard drive, or a set of neurons firing in your brain.

But when the words pass from my brain through my fingers, my keyboard, my computer, my ethernet cable, some Internet server, etc. eventually into your eyes, I'm not sending you any physical matter, I'm sending you a pattern of information. The matter is just a substrate, and the information moves between several different substrates in the process of passing from me to you.

But there's still a clear information-theoretic sense in which the words I'm typing are "the same words" as the words you are reading. And I'm calling that sense an "information pattern".

(The above explanation is completely consistent with a purely physical universe containing no free will, unless you consider the existence of mathematics to disprove physicalism.)

Goals are also information-patterns. They can be instantiated into various physical substrates (just like words), but the thing that makes it a "goal" is the pattern, not the substrate.

Specially, goals are the information-patterns that describe what some entity is striving towards.

Imagine a giant multi-dimensional graph of all possible minds that could ever exist. Different dimensions on this graph correspond to various abilities, behaviors, and other attributes that make one mind different from another. Any given mind occupies one single point in this vast multi-dimensional "mindspace".

When people say that goals are orthogonal to intelligence, they mean that if you imagine one axis (or even several axes) as corresponding to "goals", and others as corresponding to "intelligence", then knowing where a mind is on one of those (collections of) axes doesn't allow you to infer where it is on the other. (Unless you have other information to help narrow it down, like "the mind also currently exists somewhere on earth").

If this hypothesis were false, that would imply that there are "dead zones" in the graph of mindspace, where there are groups of coordinates that don't actually correspond to any theoretically-possible mind. Kind of like there are zones in a temperature/pressure graph where liquid water cannot exist.

It doesn't make much sense to talk about whether "human goals" and "human intelligence" are orthogonal, because human minds are a single point (well, small cluster) in mindspace. That's like asking whether 7 is orthogonal to 5i. You could say that the real number LINE is orthogonal to the imaginary number LINE, but individual numbers are just points, and points can't be "orthogonal". The concept of orthogonality applies to directions, not points.

Stated plainly, "intelligence is orthogonal to goals" means that a mind can be super smart AND ALSO want nothing more than to build sandcastles on the beach. Making a mind smart doesn't *automatically* make it want anything in particular.

Expand full comment

On a small scale, not every human has the same goals. Of course we're all fairly similar due to shared evolutionary history, but there's no reason something without that history will be like that. Just because we make a very smart machine doesn't mean that it have (goals|values|a utility function) anything like ours.

Expand full comment

It matters because it's the opposite of moral realism, the idea that an intelligent entity would be inherently unwilling to do evil things...and trivial, unimpressive things as well.

Expand full comment

Donald

"classic SF "superintelligent computer becomes self-aware in a recognisably human way, acts in recognisably human supervillain manner" is a runner."

This isn't the worry. The worry is more about machines given an arbitrary goal that are really competent at achieving it. A machine with the goal of making as many paperclips as possible isn't self aware in a humanish way. They won't laugh evilly or do whatever else supervillians in fiction do. They just calculate which possible action leads to most paperclips being made, and do that. This action is probably some plan that ends in robots turning most of the universe into paperclips. The machine is very competent at achieving its goals, its very good at influencing the real world, the same way deep blue is very good at influencing the chess board. It isn't acting like a human villain.

Expand full comment

a contrary question:

what are the existential dangers of NOT developing a super intelligent AI fast enough?

I know that a lot of moderately intelligent people are very exited about this new field of rent seeking known as AI development governance

but everyone can think of several nightmare scenarios where the human race suffers a catastrophe because delays in AI research meant we didn't have the tools to deal with a global or cosmic problem

also in the very long run human civilization has a better chance of survival if humans are not biological

is elevating the status of people who advocate for baby mutilation and abortion really a good long term policy?

Expand full comment

"also in the very long run human civilization has a better chance of survival if humans are not biological"

Which immediately asks the question "what does it mean to be human?". That's rather like saying "if we turn dogs into cats, they will do better in cat shows!"

This does bring me back to theology, as in "what does it mean to be made in the image and likeness of God?" where "no, it does not mean having two arms and two legs and one head", it means https://www.newadvent.org/summa/1093.htm

Expand full comment

Comment deleted

Comment deleted

Expand full comment

https://www.youtube.com/watch?v=eoCybcpf6m0

Well, that approach is if you're from the Advaita Vedanta school. The opposite, of course, is the Dvaita school which is dualist https://en.wikipedia.org/wiki/Dvaita_Vedanta

So I do come down on the "God and devotee separate entities" side.

And what's with the Bronze Age crack? Vedas are just as much or as little Bronze Age, and development of doctrine works both ways for East and West. If something is true, then chronological snobbery doesn't come into it: it's true in the Bronze Age, it's true in the AI God-Emperor Age.

Let me get my man GKC to make my point for me:

"But to answer a ghost by saying, "This is the twentieth century," is in itself quite unmeaning; like seeing somebody commit a murder and then saying, "But this is the second Tuesday in August!" Nevertheless, the magazine writer who for the thousandth time puts these words into the magazine story, has an intention in this illogical phrase. He is really depending upon two dogmas; neither of which he dares to question and neither of which he is able to state. The dogmas are: first, that humanity is perpetually and permanently improving through the process of time; and, second, that improvement consists in a greater and greater indifference or incredulity about the miraculous.

Neither of these two statements can be proved. And it goes without saying that the man who uses them cannot prove them, for he cannot even state them. In so far as they are at all in the order of things that can be proved, they are things that can be disproved. For certainly there have been historical periods of relapse and retrogression; and there certainly are highly organised and scientific civilizations very much excited about the super-natural; as people are about Spiritualism to-day. But anyhow, those two dogmas must be accepted on authority as absolutely true before there is any sense whatever in Gorlias Fitzgorgon saying, "But this is the twentieth century." The phrase depends on the philosophy; and the philosophy is put into the story."

Anyway, have a link to a great performance from a Tamil devotional movie:

"The last story is about Banabathirar, a devotional singer. Hemanatha Bhagavathar, a talented singer, tries to conquer the Pandya Kingdom when he challenges its musicians. The king's minister advises the king to seek Banabathirar's help against Bhagavathar. When the other musicians spurn the competition, the king orders Banabathirar to compete against Bhagavathar. Knowing that he cannot win, the troubled Banabathirar prays to Shiva—who appears outside Bhagavathar's house in the form of a firewood vendor the night before the competition, and shatters his arrogance by singing "Paattum Naane". Shiva introduces himself to Bhagavathar as Banabathirar's student. Sheepish at hearing this, Bhagavathar leaves the kingdom immediately and leaves a letter for Shiva to give to Banabathirar. Shiva gives the letter to Banabathirar, and reveals his true identity; Banabathirar thanks him for his help."

Expand full comment

navel gazing is fun

BUTT

my first point is much more worrying

not having an AI is just as dangerous existentially as creating a rouge AI

why are people concentrating on evaluating the later and not the former?

Expand full comment

> not having an AI is just as dangerous existentially as creating a rouge AI

I'm not sure I agree with that claim. To try to be a bit more precise: I think that the risk of existential catastrophe given that a human-level or higher general AI has been created without any regard to alignment is about 99%, and I think that the risk of existential catastrophe given that no one ever builds a GAI is maybe 50%.

Expand full comment

in my view the danger of an AI disaster is singular - it either happens or not

but assuming that GAI would improve all tech and science development the damage from slowing AI development is cumulative

also one of the fields that would be most effected by not having AI is AI research

Expand full comment

uh, yeah, an AI disaster would be a singular event. I don't see how that makes it not as bad? Like, if an AI disaster happens, then it destroys anything and everything we value for the rest of eternity (or until the heat death of the universe). But honestly your argument really confuses me--if we're trying to get Friendly AI as fast as possible, then wouldn't we *want* to solve the alignment problem as fast as possible?

Expand full comment

What exactly do you mean by "AI development governance"? could you briefly explain what the actual proposals are? Google is just returning a lot of results for people tryna regulate the use of current AI.

Expand full comment

I have no idea what the actual current proposals for AIDevGov are

but all I see on the net is talk about the possible dangers of AI

Expand full comment

Vanessa

If you haven't seen it already, you might be interested in Rob Bensinger's survey of existential AI safety researchers: https://www.lesswrong.com/posts/QvwSr5LsxyDeaPK5s/existential-risk-from-ai-survey-results

You will notice that some people are much more pessimistic than 10%, in particular MIRI. Personally I fall there on the optimistic end of the MIRI spread with 66%. As to "no specific unified picture", notice that 1-3 are somewhat different perspectives but share the same underlying cause (superintelligent AI unaligned to human values).

Expand full comment

Yeah that struck me as kind of odd too; 1 2 and 3 seem to be describing pretty much the same thing. I guess the issue they're trying to gesture at by separating them out is "could misaligned AI be dangerous to humans *before* it becomes superintelligent / without a rapid/unexpected increase in intelligence," but I don't think they do a good job clarifying that.

Expand full comment

walruss

Thank you so much for this post, which really clarifies what the experts' various concerns are.

Of all of these the one that scares me most in the near term is the Goodhart scenario. I take "comfort" I guess in that human beings already have basically destroyed important social institutions doing this, so I'm not sure machine learning algorithms doing it instead will change anything but the efficiency with which we destroy non-quantifiable human values.

Expand full comment

Re: the Robocop example, I think the result would be more likely to be what happened in the second movie - everyone puts in conflicting orders, resulting in a breakdown, and it's only when all that is cleared out and one clear directive is left that Robocop can function again.

Depending how much authority and power we give AI, if it breaks down catastrophically because it's being told to do three, six or twenty-seven contradictory goals, that may mean "all the power for everything goes out in New York" or "in the aftermath of global collapse, the few remaining humans are scavenging through the ruins hunting rats to eat".

Expand full comment

You're giving AI way too much credit. It is much more likely to go the way of ED-209 than Robocop.

Expand full comment

Markus

"In 2016-2017, Grace et al surveyed 1634 experts, 5% of whom predicted an extremely catastrophic outcome." This is slightly confusing phrasing. It's that the average likelihood set to extreme catastrophe was 5% across the respondents, not that 5% of people predicted extreme catastrophe with 100%.

Expand full comment

Off topic, but uhhhh did Substack destroy archive browsing? When I click "see all" or "archive" I can now only see as far back as "Take the Reader Survey," and no "go to next page" or infinite scroll option?

Expand full comment

I'm seeing an infinite scroll, I think your browser might not be loading it properly.

Expand full comment

Quicksilver

The Basilisk awaits.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Everybody forgets Behemoth

Expand full comment

Leviathan !

Expand full comment

Ok, I guess if everyone is going point-by-point, I might do so as well.

1). Superintelligence: this term is so vague as to be meaningless. In practice, it means something like "god-like superpowers", with the add-on of, "obtained by thinking arbitrarily fast". I do not believe that either physics-defying powers or arbitrarily fast processing are possible; nor do I believe that merely speeding up a dumb algorithm 1000x will result in a 1000x increase in real-world performance on unrelated tasks. As I said before, you could speed up my own mind 1000x, and I'd still fail to solve the Riemann Hypothesis 1000x faster.

2). Influence-seeking ends in catastrophe: this already happens all the time in our current world. For example, Toyota cars were found to occasionally switch to their real goal of "going really fast" instead of the intended goal of "responding to gas/brake commands". Software bugs are a clear and present danger, but it's not a novel or unprecedented threat. In practice, things like stockmarkets have layers upon layers of kill-switches (which is why so few people had heard of the Flash Crash), and things like nuclear weapons are powered by vacuum tubes. An apocalyptic software crash scenario only makes sense if your software is already god-like -- see (1).

3). Goodharting ourselves to death: this is absolutely a problem, and it is already happening, everywhere, without even the help of AIs. For example, look at the education crisis (optimizing for number of graduates instead of educational attainment), or the replication crisis in science (optimizing for the number of papers published instead of actual scientific discovery). I agree that this is a problem, and that AI can make it worse, but once again, this is a problem with our human motivations and goals, not with AIs; that is, adding AI into the mix doesn't make things categorically worse.

4). Some kind of AI-related war: I am already worried about plain old human war, like what would happen in Kim Jong Un finally snaps one day. Once again, throwing AI into the mix doesn't make things categorically worse. I do agree that Global Thermonuclear War is a problem that we should be working on, today, AI or no AI.

5). Bad actors use AI to do something bad: Yes ! Absolutely ! AI is a problem *now*, today, not because it is some kind of a superintelligent Deus Ex Machina (or would become one sometime soon), but because there are careless and/or malicious actors intent on wielding it for destructive purposes. Instead of trying to solve some hypothetical future problems of superintelligent AI-alignment, we should be making present-day tools more resistant to abuse, and we should be developing means of dealing with bad human actors more effectively. And yeah, maybe reducing our financial contributions to China might be a good start.

Expand full comment

Reply (5)

I'm in complete agreement with your assessment of Option 1. And for anyone seriously worried about it, I have a spare rosary beads (Our Lady of Fatima Centenary 2017) that I can send you so you can pray the Five Technological Mysteries to avert it.

Expand full comment

On point 1: Basically no one is arguing that speeding up current machine learning algorithms 1000x will somehow result in a general intelligence. We can all agree that AlphaGo isn't going to solve the Riemann Hypothesis no matter how many data farms you run it on. The starting point a hypothetical artificial *general* intelligence--that is, a program capable of solving problems in arbitrary domains, which *doesn't exist yet*. The argument is that if future AI researches create an AGI that's almost as smart as or only slightly smarter than a human, then it might be able to rapidly improve itself either by rewriting its software to be more effective, or simply getting a hold of more computing resources. And this will never give it physics-*defying* powers, only physics-*exploiting* powers. Nukes and computers would seem like magic to our ancestors only a few hundred years ago, but they're clearly possible within the laws of physics--which, I should add, we don't yet fully understand. To your last point: How sure are you that you wouldn't be able to solve the Riemann hypothesis if you actually had thousands of years to work on it, including training yourself in mathematics using the best resources available, didn't have to worry about anything else, and were actually motivated to do so?

Expand full comment

> How sure are you that you wouldn't be able to solve the Riemann hypothesis if you actually had thousands of years to work on it...

Pretty sure. As I said, I'm just not very good at math; my brain lacks the structures necessary to run it. On the flip side, I've met brilliant scientists who couldn't program a simple loop to save their life. As I said, humans do not possess fully general intelligence. Current machine learning systems are, of course, almost unimaginably more narrow-minded than that.

> The starting point a hypothetical artificial general intelligence--that is, a program capable of solving problems in arbitrary domains, which *doesn't exist yet*.

Agreed, but the problem is that, right now, no one has a faintest clue about how to build one. Nor can we solve the problem from the other side, either, because we don't know how human minds work either. As I said, I fully expect to see some sort of AGI eventually... just not soon enough to worry about the implications.

> Nukes and computers would seem like magic to our ancestors only a few hundred years ago, but they're clearly possible within the laws of physics--which, I should add, we don't yet fully understand.

The problem is that the more we understand about physics, the more options become closed to us forever. For example, FTL travel is likely impossible, and it's getting more impossible every day. Molecular nanotechnology is following the same trajectory. This has unfortunate (for the AI) implications, because you say:

> then it might be able to rapidly improve itself either by rewriting its software to be more effective, or simply getting a hold of more computing resources.

But (given our current understanding of physics) there's a limit on how much computing power you can pack into a cubic meter of space. In order to speed itself up 1000x, the AI would need 1000x more space, power, and heat dissipation. Even if we grant it powers of optimization that are beyound any human imagining, we're still looking at something like maybe 300x the space -- because our current hardware and software is already fairly optimized. And it would have to do so nearly instantaneously, before humans can wake up and say "hey, what are all those bulldozers doing outside my window", or "hey, why is my PC running so slow, did I get bot-netted again".

Expand full comment

What gives you the impression that molecular nanotechnology is impossible? I counter that it already exists, see for instance https://www.youtube.com/watch?v=kXpzp4RDGJI (this is an animation of ATP synthase, a molecular turbine in your mitochondria that converts the motion of protons across the membrane into rotary motion and that rotary motion into ATP, a widely-used form of chemical energy).

Expand full comment

Ok, yes, obviously living cells exist. However, when I said "molecular nanotechnology", I mean rather something that can assemble GPUs out of raw materials, or convert rocks into space elevators, etc. Sadly, the chemical reactions that power our cells are completely unsuitable for producing electronics, heavy machinery, or, well, pretty much anything other than living cells. Also, they can only do so very slowly -- even bamboo takes days (if not weeks) to grow.

Expand full comment

Yeah fair points. But I guess I see cells as a proof-of-concept--if it's possible for one type of molecular-scale machinery to exist, then perhaps a more efficient, human-engineered type would be possible as well. Not that we have the slightest idea how to go about doing so though haha

Expand full comment

I mean, anything is possible, but water/carbon-based chemistry occupies a unique niche in terms of the activation energies required, and the breadth of available reactions. It would be simply impossible to recreate anything resembling living cells out of, say, titanium.

Expand full comment

Perhaps the claim is against non water based nanotechnolog

Expand full comment

> As I said, I fully expect to see some sort of AGI eventually... just not soon enough to worry about the implications.

When do you think is a good time to worry about the implications? What criteria would current progress in the research have to meet for you to say "Ok, now it's time to start worrying."

Personally, I think that it's non-negligibly likely that any AGI would be able to bootstrap itself into a superhuman AGI, and therefore that waiting for "until we have a good idea of how to make an AGI at all" might be too late. Honestly, I think it would be a clear gain for society if fewer bright minds went into high-frequency trading or solving the Riemann hypothesis, and more went into solving the alignment problem (or whatever prerequisites are necessary to do so).

Expand full comment

Carlos

Gwern and some other researchers, including Christiano if I recall, do take seriously the possibility that maybe the current techniques are actually good enough for AGI, that we just need to throw more resources at them. There are people who can see AGI or TAI happening by 2030 as a result.

Expand full comment

On point 2: Toyotas don't have goals any more than thermostats do. A computer doing something unintended and weird by accident is a completely different kind of threat from one making rational plans to accomplish an unintended goal--plans that would, if the computer is as smart as a human, almost certainly include coming up with ways to act covertly and/or circumvent kill switches. It's the difference between a driver falling asleep at the wheel and driving an oil tanker into an office building, and a terrorist driving a tanker into the building intentionally. Coming up with wonderful ways to stop people from falling asleep at the wheel is great, but it won't do anything to stop the terrorist.

Expand full comment

> On point 2: Toyotas don't have goals any more than thermostats do.

I completely disagree. I don't see the difference between complex utility functions, and simple ones like "keep the temperature at 21 degrees Celsius". If you think there is a difference, then what is it ?

Expand full comment

If my thermostat is supposed to keep the temperature 21 degrees Celsius, but breaks and just sends a constant "on" signal to my abnormally powerful heater until my room turns into an oven and I die in my sleep, that's one class of problem. If my thermostat is accidentally set to 210ºC and successfully achieves that goal, baking me in my sleep but not going a degree above 210, that's a different class of problem. If my thermostat not only is accidentally set to 210ºC but is also a general intelligence which fraudulently displays the correct temperature, only turning the furnace on once I'm asleep to eliminate me silently, so that I will never be able to turn it off...well, that's a third class of problem, and one that's much harder to solve.

Expand full comment

https://en.wikipedia.org/wiki/Therac-25

So, you're saying that the difference is in degree, and not in kind, right ? Because even a simple thermostat-like circuit can very easily kill you, under the right circumstances:

I agree that software (and hardware) errors are a problem, but I don't see the need for hysterics. After all, if your 210-degree thermostat shorts out, you won't be able to turn it off, either. There are multiple ways to prevent this from happening, and we should absolutely be pursuing all of them... instead of spending our time on worrying about hypothetical superintelligent AIs.

By the way, in your specific example, you'd very likely survive -- because your thermostat is physically unable to raise the temperature to 210 degrees, no matter how hard you try. The heater in your house simply isn't powerful enough. If you make your thermostat 1000x smarter, it still wouldn't be able to think of a way to fry you in your sleep, because there *is* no way.

Expand full comment

If it's an IOT device, it hires someone to install gas heating while you're on vacation, then pays a guy off some site to break a pipe, and runs the gas in your sleep.

Your current thermostat could be said to have a simple utility function, but fortunately it isn't very smart and all it can do it is try to keep the temperature around 21 degrees. One that's really good at achieving its goals might decide that your body is messing with the temperature too much and should also be cooled to 21 degrees.

Expand full comment

...no, I'm saying that there *is* a difference in kind. If the thermostat is outputting always on regardless of the programmed temperature, you fix the problem by repairing the temperature sensor or signal wire to the furnace or whatever part of the control loop is broken. If the thermostat is accidentally set to 210º, you set it to 21º. If (in crazy hypothetical land) the thermostat is an AGI that has a misaligned goal function and is actively hiding its intents from you, you either don't build it in the first place, build it right in the first place, get killed by some course of events like Pycea suggests, or literally don't connect it to anything other than the furnace (in which case why did you build an AGI exactly?).

Expand full comment

All of these problems sound like they have the same solution to me: your thermostat is broken, and while it cannot physically kill you, it can make your living room uncomfortable; so the answer is to replace your broken thermostat. Whether it's broken because of a blown capacitor or an evil AI is irrelevant; into the trash it goes.

Expand full comment

On points 3, 4, and 5: Pretty much agree with all your factual claims, with the exception that I think AI *could* make things categorically worse in scenario 3 because while humans (hopefully) still have some sense of what their "real" goals are and will avoid strategies that are truly horrific, an AI could pursue its metrics with no moral limitations whatsoever.

Expand full comment

> because while humans (hopefully) still have some sense of what their "real" goals are and will avoid strategies that are truly horrific, an AI could pursue its metrics with no moral limitations whatsoever.

The more I learn about humans, the less I'm inclined to agree with you :-( This is a point against human safety, not a point for AI safety, sadly.

Expand full comment

Fair enough there :-( I guess there certainly have been humans throughout history who operate seemingly without any moral compass, and they've done a hell of a lot of harm.

Expand full comment

Syrrim

Aug 4, 2021

>Toyota cars were found to occasionally switch to their real goal of "going really fast" instead of the intended goal of "responding to gas/brake commands".

Well, no. If its goal was "going really fast", then it failed, because toyota (presumably) patched the cars so they wouldn't do that. A better comparison would be VW, whose cars switched from "lower emissions" to "go fast" when they weren't being tested. Or stuxnet, which activated only when it detected that it was in the exact facility it was targeting. A "bug" that only activates in the exact scenario where it is designed to is very different from one that activates sporadically. The latter can be detected by obversing long enough. The former requires that you wait for the situation in question, at which point it may well be too late.

Expand full comment

AnthonyCV

As far as scenarios 4 and 5 not being more prominent than the others, maybe that's a consequence of how specific a group of people is being polled here? My thinking on that is strongly influence by Scott's Should AI Be Open post from 2015: "I propose that the correct answer to “what would you do if Dr. Evil used superintelligent AI?” is “cry tears of joy and declare victory”, because anybody at all having a usable level of control over the first superintelligence is so much more than we have any right to expect that I’m prepared to accept the presence of a medical degree and ominous surname."

In other words, I think the average person would consider 4 or 5 the most salient of these possibilities, but for them to be the most salient in the eyes of experts would mean those experts think we've solved enough of the control and alignment problems that we can get to 4 or 5 at all.

Expand full comment

Korakys

Number 5 is by far my biggest concern, which is why I'm so keen to see all wars and dictatorships ended quickly, before technology gets too powerful.

Expand full comment

GSalmon

I’m sold on the seriousness of the risk of type 1 (and hence other) artificial intelligence catastrophes. But the prospects of being able to do anything about that risk seem utterly negligible. Even putting aside (1) the obvious problem of being able to come up, in advance, with perfectly successful alignment protocols that would do the trick without having an adequate understanding of the technology, I don’t understand how we can ever realistically satisfy (2) the second step of ensuring that every relevant actor on Earth complies with those protocols, with zero exceptions, forever, even as AI technology proliferates. (If it doesn’t proliferate then presumably the problem doesn’t arise anyway.) I don’t follow this all that closely but my sense is that there may be more focus on (1) than (2). If the end game turns on coming up with a global League of Nations with universal oversight and enforcement power, then we don’t need to be experts in computer science to assess the probability of success.

Expand full comment

My sense is that either:

A. Solving the alignment problem might in and of itself give you a sufficient insight you need to crack the problem of building GAI, so as soon as you solved it you could go ahead and build a Friendly AI before anyone else got the chance to build an UnFriendly one. Then your AI could simply use its wonderful general-problem-solving ability to prevent the creation of any misaligned AIs through social manipulation, force, or simply turning the world into a perfect utopia so no one feels the need to build a competitor in the first place.

B. If you were able to solve the alignment problem without actually completely figuring out how to build a GAI, then you could widely disseminate the information. Most people don't want to be turned into paperclips or computronium, so it's somewhat likely that virtually all AI researchers would voluntarily use your alignment protocol in their design--who wouldn't want their AI to do exactly what it's supposed to do?

C. Maybe you do need to figure out how to build a GAI before you can even approach the alignment problem. In that case, probably good to try to get as many top GAI researchers as possible to take it seriously.

Expand full comment

Maxander

I find that this post had simultaneously made me feel much better about AI risk (although I probably came from a point of greater concern about it than most) but much worse about AI risk research (even though I already felt that it had serious issues.) There’s just no signal from this survey- if you ask about the relative weight of five very different things, and the answers line up in a neat flat row like this, something has to be wrong, surely?

In actual truth, presumably, these disparate scenarios have different likelihood, or at least differ in how well they resemble likely futures, but all the work in AI risk to date hasn’t given the consensus any insight into those facts. This seems like strong reason to discount any *other* putative information coming from AI risk research.

Expand full comment

Scott, I think your takeaway #1 ("Even people working in the field of aligning AIs mostly assign “low” probability (~10%) that unaligned AI will result in human extinction") misunderstands the study design. Participants in the survey were asked to rate the probability of these scenarios *conditional* on some existential catastrophe having happened due to AI, as it says in the caption of that figure. So the survey says nothing about what they think the probability is *of* human extinction due to AI.

Expand full comment

Edit: just read the survey document, looks like there was a separate question about the overall probability of catastrophe. But I can't find the results on that question anywhere in the summary? Nor can I find the figure included in this article?

Expand full comment

Jappie

The reason I'm so concerned about #1 is that you don't have to make an AI of the level of a human (iq 100), just doing an AI of iq 70 is good enough. Then you dump it on amazon web services and rent 100 machines to run it on, now you have an AI of iq 7000.

It's possible for ordinary people to do this (if they had the AI code and skill). That's the scary part. There is no barrier. Any nitwit with an internet connection can do it.

So even if you manage to figure out the goals part, some irritated 17 year old with some savings and an internet connection can go and destroy the world with some paperclip maximizing goal ignoring all goal research.

Expand full comment

I don't think intelligence scales like that. We can't put 10 humans in a room and come out with a 1000 IQ monster.

Expand full comment

10 humans isn't the same thing as one human with a 10x bigger brain. I think "intelligence scales kind of like that, maybe not linearly but close enough for a foom" is a possibility, but "intelligence doesn't scale like that at all, you're just gonna get 70 IQ solutions 1000x faster" is also a possibility.

Expand full comment

szopen

Every time I read about paperclip maximizer I think about Lem and his short sf story (I think trvel 24 from Star diaries, 1957, published in English in 1976?). The premise is that the ancient race of Indiots created an AI, which should promote harmony, without violating principles of free choice and freedom of initiative of Indiots. AI turns anyone into nice shining circles, which it can use to create nice looking patterns. Have Bostrom read Lem when he proposed paperclip maximiser?

Expand full comment

ucatione

So it looks to me like #4 and #5 is really a people problem, rather than an AI problem. You could substitute any other technology here, like nuclear weapons, for example, and make the same claim. So these two are really anti-technology positions.

As for the first three, these seem concerns specific to AIs. Although you could rephrase each of these as a people problem as well. For example, for superintelligence, you could argue against some new form of education or learning tool that will teach people how to become smarter, or even a neural interface like Neuralink that would enable someone to Google things much faster than other people. What is such a person is able to leverage this into a position of extreme influence over society? Worded this way, it sounds like a silly concern. We recently had the experience of a person, who no one would accuse of being superintelligent, leveraging himself into a position of power that resulted in gross mismanagement of a pandemic (though thankfully no new wars were initiated). My point is that so far throughout history we have not seen super-intelligent people be a threat to humanity. It's not Einstein, Evangelos Katsioulis, or Stephen Hawkings we should be afraid, but Hitler, Stalin, and Idi Amin. The same type of analysis holds for threats #2 and #3.

Expand full comment

So you're saying we should be afraid of people who have horrible values that don't align with the well-being of humanity? I don't think you're disagreeing with the AI-risk crowd nearly as much as you're trying to.

...also, the work of some quite intelligent scientists (to a small degree Einstein himself) led to the development of the atom bomb IIRC.

Expand full comment

B Civil

I am wrestling with a question that I feel has something to do with AGI;

Is the will to live, or the will to power, something that can be derived purely rationally? I don’t think so, but I could be wrong.

I have a total fear of AI being used in really horrible ways by people with their own motives. In that sense it’s like every other great leap in technology we have made;

it cuts both ways.

Expand full comment

Meefburger

> Is the will to live, or the will to power, something that can be derived purely rationally?

Yes, because if you're dead or you have no power, you can't achieve your goals

Expand full comment

B Civil

Without a will to live you cannot rationally have other goals or be interested in power. I could certainly consider goals as a purely rational human being but as a purely rational human being I would be indifferent to their outcome.

Expand full comment

Meefburger

I think I don't understand what you mean by "will to live" or "purely rational"

Expand full comment

B Civil

Will to Live: that which makes us persist, regardless of conditions.

Purely Rational: A documented calculation of why I should no longer persist.

Expand full comment

Meefburger

One reason to expect the spread to be fairly even across the various outcomes is that many people who work on AI risk have well-developed views on where the risk is coming from and why it's hard to mitigate it, but relatively weak views on how exactly it will play out. For example, if you think that transformative AI is likely to come about, that aligning it is hard, and that there are strong incentives for deploying it before you're confident that it is safe, you might be pretty confident that we're in trouble without having good reasons for expecting an outcome that looks more like Bostrom's view or Paul's view or Drexler's or whatever.

That said, I do think the community's overall uncertainty about what kinds of scenarios are most likely does reflect a lack of progress in some respects. If we knew enough about what we were facing in terms of the technical challenge, we would likely be able to start ruling out certain scenarios.

Expand full comment

Cups and Mugs

Aug 4, 2021

In a 'low probability' scenario, number 3 about Goodharting ourselves to death...might already be happening. And one of the key metrics the AI would seek to suppress is the idea that AI is dangerous. So we might already be past the point of no return and not even know it.

A fairly low level AI type system is the 'deep-fake' video creation technology. Someone could start a war with that technology alone.

I'd certainly think the AI + Human pair would be the biggest near term risk. Just like with those chess tournaments where the AI+Human combos did very well.

Also on point 3 again and poor metrics...we don't need AI for that. Remember those Soviet factories which produced tiny baby's shoes to hit their quotas. Humans get this point wrong often enough on their own! AI will help us find more, better, faster, dumber ways to collectively smash ourselves in the face.

And in my series of somewhat unrelated points above, a comedic point - does anyone know how they got the sequel to Deus Ex Machina to come out 30 years before in the form of the film Terminator?

Finally - would it not be interesting if we polled the AI experts on which one of these would happen first. i.e. the prompt in the poll is that all of these are going to happen within the next 200 years and their job is to rank them in order of occurrence? That could be an interesting way to tap into the thoughts and worries of AI experts.

Expand full comment

Will

Aug 4, 2021

#3 "There are other things that are hard or impossible, like how good a candidate is, how much value a company is providing, or how many crimes happen."

In the US, the National Crime Victimization Survey is pretty good at measuring victimful-but-nonlethal crimes (https://bjs.ojp.gov/data-collection/ncvs) and then of course homicides are easy to measure accurately through other channels.

The subset of victimless crimes that are worth measuring have some observable outward signs, like people admitting to drug addiction on surveys, or the accumulation of litter on the streets. If a crime is so private and victimless that one can't observe it at all, one has to wonder why it should even be illegal.

#5 is the most worrisome. It would be possible in principle for an AI to design a virus to kill everyone who has certain genetic markers, which could in principle enable pandemic bioweapons that target a certain country or ethnic group, rather than being a double-edged sword that would screw everybody equally. Probably a reasonable nation-state would never do such a thing because it would be equivalent to a nuclear first strike and invite retaliation, but it's likely that smaller entities would be able to produce designer viruses much more easily than they can produce nuclear weapons.

Expand full comment

TheFinalHarbor