516 Comments
User's avatar
User's avatar
Comment deleted
Jul 20, 2023
Comment deleted
Expand full comment
rotatingpaguro's avatar

“how many angels can dance on the head of a pin” has no bearing on anything I care about, while I would not like to die.

Expand full comment
User's avatar
Comment deleted
Jul 22, 2023
Comment deleted
Expand full comment
Philo Vivero's avatar

Basically no-one agrees with you. There are a dozen ways to interpret what's happening as having many historical precedents. "New tech came and changed everything." Yep. "New species arrived and extincted previous." Yep.

The only way you can say history has no precedent is to put so many constraints on it to the point where nothing ever has any historical precedent. Yeah, no time in history is identical to any other time in history, but many times in history are significantly similar to other times.

Many people (most?) believe that the arrival of AGI has historical precedent, and that it's extremely important to put thought into what it means.

Expand full comment
Ariel's avatar

> Existential risk expert Toby Ord estimated a 16% total chance of extinction by 2020.

That's certainly pesimistic but maybe you mean 2200?

Expand full comment
Ariel's avatar

Or 2120*

Expand full comment
Scott Alexander's avatar

Sorry, my mistake! He estimated *in* 2020 that chance of extinction by 2100!

Expand full comment
dogiv's avatar

His estimate was actually for the next 100 years, ie by 2120. But it's close enough for most purposes.

Expand full comment
Eremolalos's avatar

Maybe it was 1920 & this is the afterlife? If so, wudda ripoff!

Expand full comment
Shoubidouwah's avatar

Surprise! It was the Bad Place (tm) all along...

Expand full comment
Fang's avatar

To anyone who didn't get this reference, please watch The Good Place (And precommit to at least three episodes, the first one is intentionally misleading!). I estimate with 80% confidence that you won't regret it.

Expand full comment
Ryan W.'s avatar

Seconded. TGP was the best show ever, all the way to the end.

Expand full comment
Paul Goodman's avatar

Hmm I don't disagree with the recommendation but adding the context makes the spoiler in the previous comment worse. That ship may have already sailed to Port Rosebud, though, IDK.

Expand full comment
Andy Rosa's avatar

Still laughing

Expand full comment
Mallard's avatar

>Existential risk expert Toby Ord estimated a 16% total chance of extinction by 2020

I don't see that in the cited link. The link refers to a book published by Ord in March of 2020. Was he claiming a 16% that humanity would go extinct in the next 9 months?

Expand full comment
Kevin Barry's avatar

I'm no super forecaster but I have read most of your posts on AI as well as Eliezer's. Glad to see the superforecasters had the same takeaway as me.

Expand full comment
isak melhus's avatar

What are the reasons you disagree with Scott and Eliezer?

Expand full comment
Godoth's avatar

I can’t answer for him, but I share that sentiment.

Answering for myself, while I respect Scott’s opinions on other subjects a great deal, in the particular arena of AI I think he is exposed to a culture and environment that is obviously heavily emotionally loaded w/r/t AI and consequently has a really hard time evaluating arguments and weighting risks objectively.

The results above can be more or less fairly summarized as “Forecasters and experts may not update together towards a more exact probability of AI extinction or catastrophe but they do agree it’s far smaller than your gut feeling of one in three.” Seeing this, Scott finds a way to believe in his feelings and intuition and decides not to update.

He notes that part of disbelieving them is “when I hear their actual arguments, and they're the same dumb arguments as all the other people I roll my eyes at” he doesn’t update. From my perspective this is a strong but not overwhelming signal that you have a trapped prior: you are unable to take anyone else’s arguments seriously because anybody who would have those arguments is definitionally not serious and therefore not worth considering. Big red lights start flashing when I see this sentiment repeated across the entire social sphere of those predicting AI catastrophe of various kinds.

At root, the big string of theories like dominoes (e.g. Yudkowsky’s argument) that lead to a very high probability of AI doom is not convincing on the same order of magnitude to somebody who has not already bought the mood. Instead of looking hyper rational and unassailable it looks tenuous and hypothetical (and, I say this in kindness and with respect to a lot of very intelligent people who I do not wish to insult, it looks more than a little neurotic). Instead of a singular tech that demands special treatment, AI looks like just another tech that demands respect but will probably not lead to doom, like CRISPR or nukes.

Bias is really hard to rid ourselves of, and it’s present in all communities in different forms with different assumptions. The trick about bias is that it’s like Jung’s shadow: you deny it and it controls more and more of your thoughts and actions without realizing it. Right now I see a lot of people who have a pretty big blind spot when it comes to AI, and it is causing them to weight very weak arguments as very strong and vice versa.

Expand full comment
Mr. AC's avatar

I'm sorry, but this reads so, so insane to me. You are looking at a technology that is qualitatively different from nukes, CRISPR, bio-weapons etc by definition.

One simple way to look at this is - in order for there to be nuclear war things have to go extremely wrong (not a lot of winners from a nuclear exchange, strong safeguards in place, etc). So it's very unlikely and yet there were close calls and people generally agree it's a real risk.

In order for AI takeover NOT to happen everything needs to go extremely, extraordinarily, almost implausibly RIGHT. The path to this is very unclear.

It's similar to the first nuclear test potentially igniting the atmosphere risk, except in reverse. Everyone did a lot of calculations based on a well-understood and established model of physics and agreed that the model did not support atmosphere ignition, so unless they were very wrong the chance was remote. And then they did the test since the stakes were high during the war.

Current AI consensus in the field is - yes, current models are "intrinsically unsafe" (quite from Yann LeCun who is firmly in the "this is all survivable camp"). Yes AGI is possible. Yes just "stacking more layers" seems to be working so far. Yes we do not have a model of how we can succeed in ASI aligment. So what now? Here opinions diverge, but imo we should be extremely alarmed.

Expand full comment
bagel's avatar

You’re doing exactly what Godoth is talking about; you have presupposed AI can and will “take over”. How? Who is voting for them? Who is giving them the keys to kingdom? As someone who works with these LLMs… today it’s a subroutine my dude.

Meanwhile nukes and gene editing provide clear pathways to damage, either loosing all the weapons or accidentally creating a virus or some latent weakness in humanity that gets exploited later, respectively.

The atmospheric testing point is in my opinion the most damning condemnation of AI risk theories, because their proponents can’t come up with a physics that allows the possibility of disproving AI risk. It’s all circular.

Expand full comment
FeepingCreature's avatar

Have you met the internet? When GPT-4 came out, people *lined up* to give it the keys to whatever kingdom they could.

I mean, I also believe that an AGI let alone ASI could break out of whatever box you put it in, but I feel we're beyond that. IMO the AI box experiment has been conclusively disproven, in that the idea that a human would even *attempt* to keep the AI in the box contradicts observed reality.

Expand full comment
Matt Wigdahl's avatar

GPT-4 is nothing more than a tool at this point. It has no persistent memory, no ability to persistently learn via inference.

I would not try to connect my house to a high tension wire myself. I have no problem plugging my phone into a wall socket. Drawing a bunch of conclusions about how people will use and control a more powerful and dangerous tool based on how they use a demonstrably safe one seems like too far a stretch.

Expand full comment
bagel's avatar

Keys to the kingdom how? Having conversations with it to solve math problems and order airline tickets and draw the pope looking fancy? OoOoooOooh, spooky. I'll go prep my bunker.

How is it going to do the funni? Civilizations have proven remarkably hard to tear down, and we're already doing most of the worst things we can imagine to that end. How is an expensive program going to make the leap to the level of threat we're putting ourselves under? We're altering our climate, accidentally breeding potent viruses, and a short Russian man is blowing dams and mining nuclear plants. How does AI get from here to there?

Expand full comment
Ryan W.'s avatar

It seems like most people who use GPT-4 still want to be involved as intermediaries in the process.

Midjourney and Stable Diffusion are used to quickly generate rough art sketches which are then tweaked by human artists.

There are many potential issues with AI, but humans still love being in the loop.

Expand full comment
Scott's avatar

Paths to reducing human population: education, perfect free birth control, Replika, Paradot, video games, Pornhub... why use nukes?

Expand full comment
John Wittle's avatar

I feel like this... doesn't quite understand the mindset

Imagine you're a chimpanzee, trying to convince other chimpanzees that human beings represent an existential threat

the other chimpanzees counter: "Oh yeah? And how will they even hurt us? We are way stronger than them, no human could beat a chimpanzee in a fight"

The doomer chimp won't have any real arguments other than "you don't understand what greater intelligence truly means, they will figure out some way to do it"

At that point, talking about specific strategies that humans might use to take over the world is fruitless. The whole point is that the intelligence gap is large enough that, whatever humans do, it will be totally outside the reference frame of anything a chimpanzee could even imagine. The absolute best a chimpanzee could do would be to sort of handwave at "tool use" or "cumulative knowledge gain discovering exploits in physics"

which of course, will not be very satisfying to the skeptical chimps

this does not mean the doomer chimp's fears are unjustified, and it is quite reasonable for his nameless dread to be totally and emphatically the correct response to his realization about the nature of intelligence gaps, while yet being incapable of pointing at any specific threat

honestly i sort of want to copy-paste this comment in reply to half of the posts on this discussion, but i'll refrain and just put it here, near the top

Expand full comment
bagel's avatar

I enjoy the “of course we should be afraid of our children, look at us” argument, I think it’s one of the most coherent and straightforward AI risk arguments.

For background, I’m definitely in the Turing “computers almost necessarily imply artificial intelligence” camp, even though each specific technology feels far less close than the hype men would have us believe. But this argument forces you to confront that we might eventually come into conflict with AI.

Still, I think there’s an element of sleight of hand in “suppose you’re a chimpanzee” argument. Chimps literally cannot imagine the advantages of intelligence. If all we fear in a humans v. AI conflict is a Sam Walton style “Walmart will win by being 2% more efficient” type argument, we are intelligent enough imagine that and so big claims require big proof. But if we’re supposing that AI is smart enough to represent a similar qualitative advantage over us as us over chimps … then it’s as unknown to us as we are to chimps. Not “unknown to the masses but I, the main character/prophet, will educate you”, just unknowable. And whether the luddites or the technologists are right is, by the assumptions, unknowable. And then into the unknown steps faith and that’s how you bootstrap religion for atheists.

Expand full comment
ultimaniacy's avatar

But the doomer chimp *would* be wrong here. Unless by "existential threat" all he means is "something that might very gradually displace us over the next several million years", but then I don't know why "nameless dread" would be a justified response.

Expand full comment
JamesLeng's avatar

A particularly open-minded doomer chimp might conceive of transapes inventing some magical way to turn ashes and tar into a new type of meat or fruit more delicious than anything which exists in nature, (i.e. agriculture and synthetic fertilizers), and then cruelly refusing to share the resulting bounty with any but a few cooperative prisoners, or finding a new method of throwing rocks so forceful that they cannot be seen in flight and hit with enough force to shatter trees (i.e. guns) and then using it to kill them all.

We've got those capabilities, and more besides. Pretty sure hominids weren't deliberately engineered to be chimp-friendly, since there's been plenty of human-on-chimp violence over time and "chimps might be inconvenienced" isn't a common political argument. To paraphrase Creationists, if we evolved from monkeys, why are there still monkeys? Figure out whatever already worked, then see how it might work again.

Expand full comment
Sandro's avatar

> You’re doing exactly what Godoth is talking about; you have presupposed AI can and will “take over”.

AI will only be developed if it's useful. The more general the intelligence, the more useful it is. WormGPT has already demonstrated that they will be used to compromise computer systems.

If something is useful it will be deployed in a domain. Some AI models can already run on phones. AI will be deployed in markets for trading, to detect fraud, and more, and they will have more agency because agency provides utility.

That's arguably billions of active AIs at any given time over the next 50 years. The chain of events that Mr. AC is talking about, that "has to go right" every time has to be repeated billions of times for each AI.

Expand full comment
Logan's avatar

First, you're doing exactly the thing. You've seen these arguments so many times, you think they make sense, and only an outside view can fix it. I.e. stop looking at it from exactly the frame you've always used.

I find Hanson's "we're getting replaced, the only question is if it's violent" does a good job of absorbing the truly fundamental differences between AI and other technologies (which I agree exist) and forcing one to justify the actual shape of what AI risk looks like. To summarize my understanding of Hanson, the chance of all humans alive today being dead by 2200 is essentially 100% because duh, and the chance of whatever thinking minds exist in 2500 being mostly silicon based is pretty high. The question is whether the transition is rapid and genocidal, or simply looks like most such transitions have looked in the past.

As to the specific line of reasoning in a Yudkowski-esque foom argument, the core misunderstanding as I see it is a one-dimensional view of intelligence, which is biased by a view of rationality advocated in The Sequences. Really, there is no such thing as intelligence. Humans aren't utilitarian, and being intelligent doesn't make you automatically become utilitarian, and being intelligent doesn't make you act like a human. Yes, sometimes AI-doomers will actually dig into what they mean by intelligence in a coherent way, but that's only the motte. The meat of the doomerism is always, in the decade I've been closely following the argument, based on a bailey where "intelligence" is over-simplified as a concept.

I consider myself a domain expert on this, as someone with a PhD in mathematics, whose primary alpha in my mathematical career has always been noticing subtle implicit assumptions in complicated arguments, and with a lifelong interest in the nature of intelligence and the rationalist community. Though it's possible/likely I'm also suffering a trapped prior (like the AI doomer community clearly suffers), I strongly feel that these arguments have a subtle implicit assumption about the nature of intelligence at their core. I mention this because I recognize that actually explaining fully what that assumption entails requires a lot of inferential distance.

Expand full comment
Bugmaster's avatar

I would argue that humans aren't even "generally" intelligent, the way a Yudkowsky-grade AGI is supposed to be. For example, you could speed up my mind 100x or 1000x or however much you want, and I still wouldn't be able to compose a symphony or solve advanced math problems. I'd probably just end up browsing Substack 1000x faster.

Expand full comment
Mike's avatar

Wouldn't the fact that there are people who can compose symphonies and solve advanced math problems, whereas you cannot, imply that there are already differences in intelligence which result in significant differences in capability? And if so, wouldn't this apply to even smarter AIs as well?

Expand full comment
Igon Value's avatar

"you could speed up my mind 100x or 1000x or however much you want, and I still wouldn't be able to compose a symphony or solve advanced math problems."

You probably could if you also wanted to.

When I started learning Group Theory (as part of an education in theoretical physics) and related topics, I was quite bad at it. And then I forgot all about it, tried again, got a smidgen better, forgot, tried again, etc.

Now, 20 years later, I feel much more capable. And yet my IQ is roughly the same (IQ is ~stable in time, peaks at ~25). So the difference is only the time I spent on the subject. And that's just one subject out of many. I feel way smarter than when I was 25 even though my IQ probably went down.

So if I (or you) lived 1000x longer (which is equivalent to 1000x faster for the same lifespan), I'm pretty sure I could learn to solve many more problems than now.

Expand full comment
Logan's avatar

I only partly agree on the symphony thing (I suspect that if I lived to be 1,000,000 years old with no mental decline, I would eventually learn to play an instrument)

I think the akrasia issue, "I'd just do the things I do now but more because faster brain doesn't change my motivation," is a really fundamental issue. The Rationalist community generally sees akrasia as a kind of "bug" in the system, and imagines that more intelligence (sometimes by definition, sometimes by assumption, c.f. intelligence is ill-defined) automatically reduces akrasia. I see akrasia more as an emergent property of complex minds, and it's at least possible that akrasia forms a fundamental constraint on mental efficiency. Notably, current AI paradigms suffer really bad akrasia, and it seems like a really hard problem to solve.

Expand full comment
Victualis's avatar

I'm not convinced that 100 times more time to do things wouldn't be enough for you to write a symphony or do research level mathematics. Or are you saying you don't have the stick-to-it inclinations, and would wander off pretty quickly to read all of Substack and noodle about playing bongos? I am super slow but have managed to publish original research by simply taking much longer to work on something than others did; this tends to be good for work that isn't fashionable and where the frontier is not fast-moving. Your first symphony might not be Mozart quality, but your 200th might be.

Expand full comment
Sandro's avatar

> For example, you could speed up my mind 100x or 1000x or however much you want, and I still wouldn't be able to compose a symphony or solve advanced math problems.

Conjecture. You would look pretty stupid to most people if we slowed down your mind by 100x and it took a whole afternoon to evaluate 1+3. Why are you so convinced the opposite wouldn't happen?

Expand full comment
Mr. AC's avatar

I've sought out an outside view on this (I, like any normal person, don't like living in fear and despair) and read and listened to Hanson's position, e.g. here https://www.youtube.com/watch?v=28Y0v5epLE4

And I must say it just make no sense at all. A lot of more eloquent people have written why, but I'll defer to YouTube comments of all things in this case:

"I'm not entirely persuaded by Eliezer's arguments, but after listening to this for a few minutes, I'm convinced that Robin has either never encountered these arguments or failed to understand them."

"I would like to thank Robin Hanson for clearing all my doubts on the possibility of human surviving AI, now I'm certain we are all going to die."

etc etc which pretty much matches my impression.

I will note that perhaps I'm set in my ways since I figured out, based on what feels to me are very clear first principles, that AI misalignment would cause human extinction or permanent dystopia with 50%+ probability before I encountered any of Eliezer's writings. This was back when MIRI was still known as the Singularity Institute.

Expand full comment
Logan's avatar

I'd be very interested in finding someone who can understand Yudkowski's argument, understand Hanson's argument, and understand the counterarguments to Hanson. I feel like I understand the first two (?), but most of the counterarguments seem to me to be just "I don't understand Hanson" either explicitly or by inspection.

Expand full comment
Roman Leventov's avatar

Have you read "Natural Selection Favours AIs over Humans" (https://arxiv.org/abs/2303.16200)? Do you think this paper suffers from the same implicit assumption about the nature of intelligence?

Expand full comment
Logan's avatar

I haven't read it, but based on the abstract I classify this as "absorbed by Hanson."

That is, there exists a class of argument which is fully abstract and strongly suggests, to me, that "humans losing control of the future" is by far the most likely outcome. That paper seems to give one such argument. Hanson points out that humans don't currently control the future as much as we like to think, and this class of argument is barely affected by the introduction of AI.

The reason "humans will lose control of the future to AI" sounds different from the similarly-inevitable and widely-accepted claim "all humans are mortal and future generations of sentient being will have different values from current generations" is because introducing AI makes us imagine a singular moment of dramatic, violent destruction. (It's also true that some people *really* do want to take full control of the future, and notably those people often also want to be immortal, but their unrealistic desires don't really concern me too much)

The relevant question, then, is whether AI really will wrest control of the future in a rapid, violent, and dramatic fashion. This is the claim that requires oversimplification of intelligence.

In other words, if you take all values to infinity then yeah, AI probably kills us. As time goes to infinity, all values do indeed go to infinity. But in the medium-term, the values are not infinite they're just bigger, and analyzing that requires recognizing that "intelligence" wraps up multiple independent parameters into a single value. Just because one parameter is increasing rapidly doesn't mean they all will.

Expand full comment
Sandro's avatar

> Really, there is no such thing as intelligence.

That's conjecture, not fact. Your entire argument rests on this conjecture. That's not reassuring.

Expand full comment
Ch Hi's avatar

We still don't know what an agentic AI that was aware of the universe would be like. We've got lots of theories, and reasonable arguments, but we don't know how valid they are.

I really think that the solution is that the AI should like people. The problem comes when you try to define what a person is.

Expand full comment
Roman Leventov's avatar

Active Inference, Predictive Processing, Seth's "controlled hallucination" theory, and the likes give a good picture of what "agentic AI that was aware of the universe would be like". Arguing that these theories are "unproven" doesn't make sense: no (simplistic, physics-like) theory of cognition/intelligence could be "proven" the the level of general relativity or other physics theories, due to the nature of intelligence. But by and large, it's beyond any doubt that these aforementioned theories of cognition and agency are "mostly right".

Expand full comment
Ch Hi's avatar

Proven as in "You've built one, and it's actions match the ones you predicted it would have" makes perfect sense. Arguments based on words are just words. They may match reality, and they may not.

If your point is that there will always be lots of cases that haven't been tested (tested==proven), then I agree, but it's not clear to me that that's what you mean.

If you mean that "proven" comes in degrees of precision, then I also accept that. But we don't have ANY degree of precision, because we can't test something that doesn't exist.

Expand full comment
Ryan W.'s avatar

For me, it's the near term *existential* portion of people's arguments that make me skeptical since they seem to lack plausible mechanisms. If someone argued that AI was a risk on par with, say, Genghis Khan who killed off ~10% of the global population I'd say that's interesting and plausible. The military shows plenty of willingness to use landmines, despite the collateral damage. So it seems quite probable that even Western militaries would give AI the kill switch, despite their current protestations. In the wrong hands, the prospect of a high-tech country being able to go on an AI fueled killing spree with little risk to its own people seems realistic enough.

What I'm skeptical of, is that a united AI will 'take over' in the sense that it will actively work against *all* human interests. The case has never really been made that AI paperclip maximization would be worse than the various human analogs of paperclip maximization. At the very least, it seems very unlikely that AI would be worse than, say Lavrentiy Beria or Hitler. And as horrible as those people were, they did not pose existential risks to the entire human species.

There's a big difference between "intrinsically unsafe" and "poses an existential risk."

Lets keep some perspective. How would the AI consensus rank the 'intrinsic safeness' of *human* intelligence?

Expand full comment
REF's avatar

I don't get this. It should be obvious to all of us that CRISPR is a far greater risk to humanity than AI. I am not serious about this but it is exactly how your post reads. Of course AI is different. CRISPR is different. Nuclear war is different. They are all different. Throughout history, people manage to convince themselves that the hazard de jours is unreasonably likely to be the end of us. See "The Pursuit of the Millennium."

Expand full comment
Kevin Barry's avatar

100% agree with this sentiment, thanks for writing it up. You said it better than I could.

Expand full comment
Bugmaster's avatar

Thanks, you've said it better than I could.

I think the world looks very different if you have an a priori strong prior of "superintelligent AI is inevitable, and superintelligence leads to super-capability". It's the kind of a major worldview-altering belief that can upend one's entire epistemology.

Expand full comment
Ch Hi's avatar

Umn... OK. These aren't proofs, only arguments.

Justification 1: "Superintelligent AI is inevitable":

Moore's law isn't really dead, it's just sleeping. 3-d chips will happen eventually (are starting to happen?) and revive it. If one group of people decide not to invest in AI, another group will to do to get an advantage. So unless civilization collapes, "Superintelligent AI is inevitable".

Justification 2: "superintelligence leads to super-capabilities":

Even an adding machine has a supercapability. It;s relatively easy to link a library into a computer program. This one is already here. You don't need advances beyond what we've already got to have super-capabilities.

OTOH, note that I modified "super-capability" into "super-capabilities". Capability is not a unitary. I'm a lousy pool player, but decent at chess (when I'm in practice).

And people are easy sells. Just offer them something that promises them increased power/wealth/sex/money/etc. The one I've been expecting is an AI that beats the market at stock trading. Promise people that and you'll be given as much capability as you desire, especially if you can point to some evidence. But even really stupid spam seems to be profitable enough to pay for itself. Add voice synthesis and build a super-salesman. You don't need anything godlike to get all the control you desire. Just a bit of practice and a bit of patience.

Expand full comment
Bugmaster's avatar

> Moore's law isn't really dead, it's just sleeping. 3-d chips will happen eventually (are starting to happen?) and revive it.

You are implicitly assuming that an increase in processing speed leads to a corresponding increase in "intelligence"; and I'm guessing (though I could be wrong) that when you say the word "intelligence", you mean something like, "ability to affect objects in the physical world to one's advantage". But I would dispute both of these assumptions.

> Even an adding machine has a supercapability.

True; and so does a hammer. Should we be worried about "hammer alignment" in some non-trivial existential way ?

> The one I've been expecting is an AI that beats the market at stock trading. Promise people that and you'll be given as much capability as you desire, especially if you can point to some evidence.

But people have been doing that for hundreds of years; thousands, if you include non-stock-related schemes. It's pretty easy to cheat fools out of their money (just ask Ea-nāṣir); but get-rich-quick schemes, while dangerous, are not an existential risk. Rather, they have a built-in self-limiting mechanism, which inevitably brings them crashing down (which is why "know when to walk away" is the best advice one could give to any scammer).

Expand full comment
Scott's avatar

Robotics proceeds apace; hammers are getting quite agile.

Expand full comment
bagel's avatar

Worrying about hammer alignment is actually the foundation of all modern economic-political theory. Capitalism, communism, anarchism, even the luddites are all specifically responses to the plusses and minuses of better hammers.

Expand full comment
Ch Hi's avatar

You're pretty much right about what I mean by intelligence. But you're ignoring the difference that size makes. Personal computers affected things differently from the way that mainframes did. Build intelligence into machinery, and you'll have a different effect than the current state. Change is always potentially destabilizing. Almost always someone ends up winning and someone else ends up losing....and the ones who lose most are usually the folks with the least power.

Expand full comment
Dustin's avatar

> eeing this, Scott finds a way to believe in his feelings and intuition and decides not to update.

Didn't he say he updated to 20-25 from 33?

Expand full comment
isak melhus's avatar

Thanks for writing that. I do however find rather senseless and extremely unconvincing, no offense.

The reason I don't find it convincing is that Eliezer and X-risk people in general actually have arguments for why X-risk is something we should take seriously. And you don't engage with them at all. I actually never see people who disagree with the X-risk case do this. At best they make some meta point about why all the object level arguments X-risk people are making are wrong (this is what you seem to me to be doing).

I can present (in short) the reasons I think there is a good chance AGI will kill us all. I would be extremely grateful if you could explain why you don't think I ought to worry about them without making meta-level commentary about how I have to neurotic to think these thoughts. I think many/most AI x-risk people have very similar arguments.

1. We don't currently know how to instill our values in the AI models we train. Gradient descent makes models behave in ways that make them score better according to some metric, but there is no way to test whether this is because they're internalizing a robust and general version of the values we want it to have, or because it has internalized completely different values and a bunch of shallow patterns that lead to better scores in the training environment, but which might make it do random and non-aligned stuff once its environment changes. (the analogy Eliezer likes is evolution optimizing the human genome for inclusive genetic fitness. Humans mostly behave in ways that maximize inclusive genetic fitness in nature, but modern times are different from that environment, and now we care about other things like videogames, ice cream, sex with birth control etc).

2. Even if solving alignment was not that hard to do (I think the above is reason to think it might be hard), there are many societal incentive reasons to think we might still fail at it. One is race dynamics. AGI giving advantage to the people who first invent it. Might make people not do as much alignment research as they otherwise would, because they think if they are not the first to invent it, they will be in a bad place regardless. (I think this is almost guaranteed to happen unless we have some international governing body prevent it). Another reason is profit incentives. AI is extremely useful. People will continue to develop it and the is no reason people will spontaneously organize to care about safety. again, eliezer has a metaphor here, about AI being like a machine that continually makes gold, but at some point blows up and destroys the world.

Do you have any arguments for why 1 and 2 are wrong?

Expand full comment
Rishika's avatar

I'm not the person you were replying to, but I don't think they were disagreeing with these arguments - just with the idea that these arguments will definitely (or at least, 1/3 chance according to Scott) lead to extinction.

Expand full comment
Scott's avatar

Well, okay; how about the possibility of post-biological humanity; uploading? Why live alone in meat forever?

Expand full comment
ESVM's avatar

1 and 2 only matter if you assume an infinitely powerful (or approximately) AI.

The foom people don’t seem to understand that an infinite series can have a finite sum. (That’s not the only problem with their thinking, but it’s one of the first.)

Expand full comment
Ted's avatar

I think foom is plausible. And there are many infinite series with finite sums, such as 1/2^n.

I guess you're arguing by analogy that.. something like that moore's law might plateau at GPT-4? Intelligence has to peak somewhere, and it just so happens that humans are it?

Another possibility is that series can have sums (to continue the analogy) that exceed our current performance as humans. There are already multiple existence proofs. Chess. Go. Memory. The speed at which any basic math can be performed. Translation and language are now at the gates, if not already surpassing most humans. Many more examples.

What if the finite sum happens to plateau at a higher level than human performance?

Expand full comment
ESVM's avatar

As far as I can tell, the foom argument is that as soon as an ai reaches some level of power, that allows it to improve itself. The next step in the argument is that if it can improve itself once, it can improve itself infinitely many times. And then the final step seems to be that if it can improve itself infinitely many times, it can become arbitrarily powerful. But being able to improve infinitely many times doesn’t entail getting arbitrarily powerful. It certainly doesn’t preclude it, but you need more arguments than just infinite rounds of improvement to justify believing that the thing could get arbitrarily powerful.

I grew with you that it is in principle possible for an ai to make itself quite powerful, but getting to a certain threshold doesn’t entail takeoff. You need a lot more assumptions for that. Just like you need more assumptions to know if a series converges or not.

And agree that the thing could plateau at a very high amount of power. But the reason why foom is scary is that it allows you to just magic away any possible objections to the ai doing whatever it wants. If you have to contend with the ai being limited, you have to take much more seriously the practical hurdles to that an ai would have to overcome to destroy everything.

Expand full comment
Sandro's avatar

> 1 and 2 only matter if you assume an infinitely powerful (or approximately) AI.

No, they matter if AI intelligence simply outstrips human intelligence. From evolutionary arguments humans can't be near peak intelligence, therefore AI superintelligence is possible.

Expand full comment
TitaniumDragon's avatar

This is the correct analysis, but it isn't what he wants to hear.

Indeed, if you look at it rationally, this is quite obvious from a Bayesian perspective - Yudkowsky is a high school dropout who is predicting the end of the world from something he lacks any expertise in. People have done this on a regular basis throughout history, and every single time, they have been wrong.

Apocalyptic prophets frequently have drawn followings throughout history.

And yet, it's very obvious to outsiders that they are wrong.

The people inside these things think they are being rational, but in reality, they are not.

Why would you expect this time to be any different?

Indeed, the fact that Yudkowsky freaked out over ChatGPT - which, as those who are actually familiar with and understand the technology, is a predictive text generation algorithm which is prone to "hallucinations" (i.e. producing plausible sounding but incorrect text) because it isn't actually intelligent at all, it just produces "plausible" output - is very telling as to his lack of any sort of depth of comprehension about this technology, or indeed, machine learning in general.

This approach will never generate intelligent AIs, because it is fundamentally not designed to do so. It can generate useful tools (MidJourney, for instance) but it isn't making something intelligent, nor is it even capable of doing so.

The "AI revolution" we're seeing right now is not aimed at creating intelligent output at all, it's about creating useful tools - these AIs are economically useful precisely because they are tools for generating content rather than actual attempts at intelligences.

Likewise, the entire notion of "superintelligence" is basally flawed. Computers are already superhuman in many regards. They can do math vastly better than humans can, and vastly faster and more accurately.

And yet they are incapable of very basic tasks that humans can do trivially at the same time.

The entire notion of "superintelligence" is just wrong to begin with.

The same applies to the notion of "AGI" - if we look at present AI tools, what we see are tools that are extremely good at doing a particular thing. The reason for this is that generating a bunch of tools that are good at doing particular things makes sense and is likely to lead to better results. Indeed, as has been found with the art AIs, manipulating the input can greatly alter the output, resulting in both better results (culling low-quality images and leaving only high quality images results in better image output, despite the fact that in theory it has less data to work with) as well as different styles (feed an AI a bunch of illustrative images, it will generate images that are more illustrative; feed it a bunch of photographs, it will generate images that are more photographic).

Not only is this a far cry from generality, but it suggests that present approaches may not even be capable of generating even an unbiased AI in a single discipline, and general AIs may well result in worse output than more specialized ones.

So the entire notion of this is wrong on multiple different levels - AIs are not intelligent, present approaches aren't capable of generating intelligence, and present approaches also suggest that specialization rather than generalization will often result in better output.

The biggest "risk" from AI isn't an "AI" going rogue, it is someone using AI technology to develop something else dangerous as a weapon (most likely some sort of bioweapon).

Expand full comment
David Piepgrass's avatar

> He notes that part of disbelieving them is “when I hear their actual arguments, and they're the same dumb arguments as all the other people I roll my eyes at” he doesn’t update. From my perspective this is a strong but not overwhelming signal that you have a trapped prior:

This argument itself sounds like a trapped prior, especially given the preceding sentence that you cut out: 'When I think of vague "experts" applying vague "expertise" to the problem, I feel tempted to update. But...', and the sentence afterward, which you also cut out, in which Scott explicitly states that he DID update ("the considerations for Outside View don't completely lack compelling power, so I suppose I update to more like 20 - 25% chance.")

If you yourself personally believe X despite having heard arguments against X you think are "dumb", and then you hear the same "dumb" arguments again, obviously you will normally end up still believing X. Looks like special pleading to reason differently when `X = substantial AGI c-risk`, and special pleading is a sign of a trapped prior.

And it's totally normal not to update one's inside view based on the opinion of people who seem to lack an inside view. See also https://forum.effectivealtruism.org/posts/K2xQrrXn5ZSgtntuT/what-do-xpt-forecasts-tell-us-about-ai-risk-1?commentId=87xsiJPJmQ7c3wTb9

> the big string of theories like dominoes (e.g. Yudkowsky’s argument) that lead to a very high probability of AI doom is not convincing on the same order of magnitude to somebody who has not already bought the mood.

You're calling a 25% chance "very high", hmm.

> you are unable to take anyone else’s arguments seriously because anybody who would have those arguments is definitionally not serious and therefore not worth considering.

No one claimed that the arguments, let alone the people making them, are dumb "by definition" (not sure why "dumb" changed to "not serious" there).

Expand full comment
Mr. Doolittle's avatar

I'm also not Kevin, but I feel the same as well.

In a very simple sense, my main objection comes down to a belief that AI experts (a.k.a computer programmers working in a programming industry and surrounded by programmers) are too separated from the physical world. In their world, writing a new piece of code changes something tangible. In the real world, whether Facebook has one code or a different code doesn't make a lot of difference. Not that people using Facebook aren't affected by it, but that real physical items or places are not. You can't code British Columbia's beaches to be as warm and sunny as Florida's. You can't code a hurricane to be less or more powerful. That's just not how physical reality works.

That's not to say that you can't also affect those things using technology, but that the technology needs significant physical impacting components - not just the code. An AI could wreak havoc on the world of programmers, and likely on everything connected to the programming world (i.e. a stock exchange connected to the internet). An AI can't do a thing to my house or my lawnmower or my car. It would need something else that currently doesn't exist in order to affect those kinds of things. And it would need a lot of them. How many Terminator-style robots would be needed to kill all humans? Probably at least single digit millions, maybe double or triple digit millions. It would take a massive supply chain, multiple massive factories, significantly more advanced technology than we have, and a lot of time in order to build that many robots. And at any moment prior to those robots being built, a few pounds of C4 could disrupt that whole process. A cruise missile could take out the factory and set it back by months.

AGI-extinction proponents don't talk about millions of killer robots, they talk about bioengineered plagues or nanotechnology or whatever. And maybe those things can be more efficient and don't require quite the supply chain that the robots would. Or maybe that technology is actually impossible and/or not something an AI can figure out. When speculating about some potentially impossible wonder technology as a key reason to think that AGI might kill all humans, that seems like a good point to update your priors.

That doesn't mean AGI can't be or isn't dangerous. It means there's a significant difference between "dangerous" and "kills more than 10% of the world population" or "extinction."

Expand full comment
Philo Vivero's avatar

> In a very simple sense, my main objection comes down to a belief that AI experts (a.k.a computer programmers working in a programming industry and surrounded by programmers) are too separated from the physical world. In their world, writing a new piece of code changes something tangible. In the real world, whether Facebook has one code or a different code doesn't make a lot of difference.

You're claiming that Facebook (and Twitter and Tiktok) have not changed actual real people's behaviours in extremely tangible and measurable ways?

I do not participate in Facebook (or Twitter or Tiktok) but I have seen massive impact in my life by the people who do. I literally have had to change my entire way of interacting with people over the past 10-20 years. Because I've lived through it, I can deal with it, but I literally do not recognise the society I live in today as being remotely the same as 20 years ago.

I put a lot of that down to how the extremely primitive AIs act that are in charge of what news we see, where we put our attention, etc.

Maybe this is the crux? Those who don't see AI doom don't think we've already been fundamentally and non-trivially changed by the most primitive and dumb AIs?

Expand full comment
Mr. Doolittle's avatar

I'm saying that even real tangible changes do not equate to 10%+ of the human population dying. If you're telling me that an AI can kill all humans by convincing humans to do it, I know you're wrong. Humans have been actively trying to kill everyone else for thousands of years. That's not going to be enough.

There's a massive difference between even world-changing events precipitated by AI, and the deaths of most or all of humanity. I agree that if humans are dumb enough to give an AI control of multiple dangerous functions that AI would be dangerous. I also agree that it's quite likely someone will give an AI that capacity. But a world where we shut down the internet, all power generation, or something similar is a *bad* world that still isn't extinction. We can survive things an AI cannot, as a species. Giving an AI all of the nuclear weapons in the world could lead to the AI killing off more than 10% of the people on earth. I would hope no one does that, and am reasonably sure that they will not. That definitely raises the chance of a "catastrophe" but even a full nuclear exchange would not wipe out humanity.

Expand full comment
FeepingCreature's avatar

> You can't code British Columbia's beaches to be as warm and sunny as Florida's. You can't code a hurricane to be less or more powerful. That's just not how physical reality works.

You don't think there's a relation between code and temperature? If you let me at Twitter's code, I'd say I can make a long-term measurable change in the temperature of the *planet*, just by changing the weighting of which side of climate change is promoted more. Code has observable effects, that's sort of the point.

Expand full comment
Mr. Doolittle's avatar

I honestly doubt you or anyone else could. At core, I don't think human minds work the way you seem to think they work. But I even more doubt that enough information on this hypothetical could exist for either of us to be satisfied to change our minds.

Expand full comment
Bugmaster's avatar

> AGI-extinction proponents don't talk about millions of killer robots, they talk about bioengineered plagues or nanotechnology or whatever.

In case of nanotechnology, there's no "maybe" about it; rather, it is almost certainly impossible (outside of some extremely niche cases, like organic chemistry in aqueous solutions).

Expand full comment
Scott's avatar

Nobody talks about peaceful human population reduction: free perfect birth control, Replika, Paradot, education, meaning, fun, wealth, Netlix, Pornhub... Nobody talks about the possibility of, ah, post-biological humanity e.g. uploading. Weird, right?

Expand full comment
Bugmaster's avatar

We can talk about mind uploading if you want, but this technology is so speculative that it is arguably even further off in the future than AI -- which is quite far off indeed. And birth control, education, and entertainment have been with us since time immemorial (though obviously the technology has improved); people do talk about them all the time, just as they talk about, I don't know, turnips or diesel engines or something...

Expand full comment
Scott's avatar

It's one second after the Singularity, you know? You're right; I meant in the context of existential AI risk; birth control > Terminators.

Expand full comment
Mr. Doolittle's avatar

Then the Amish take over the world, stop making electricity, and continue existing without giving the AI anything it needs to finish taking over the world.

Expand full comment
Jeffrey Soreff's avatar

"In case of nanotechnology, there's no "maybe" about it; rather, it is almost certainly impossible (outside of some extremely niche cases, like organic chemistry in aqueous solutions)."

That's just wrong. Drexler did a very thorough analysis in Nanosystems. The main point is a very simple one: Positional control (as was demonstrated experimentally when Eigler spelled out "IBM" with individually placed xenon atoms) dramatically extends the scope of structures we could build with reactions we have understood for decades (in some cases, for over a century).

Expand full comment
Dynme's avatar

> An AI can't do a thing to my house or my lawnmower or my car.

That's already untrue, and growing progressively more untrue each passing day (assuming you mean an AI hooked up to the internet). As you allude to later, it could already cut off your power, and EVs get their software updates over the air. I'm pretty sure I recall hearing about John Deere equipment having telemetry that allows for remotely disabling it in case of theft.

And that's just assuming off the shelf technology. Stuxnet was tailored technology back in, what, the 90's? It absolutely destroyed infrastructure it targeted, so the precedent is there.

Expand full comment
Mr. Doolittle's avatar

I mean literally my house, lawnmower, and car. None of those things are hooked up to the internet or have any remote functionality. Could people give control of more to the internet and make it vulnerable? Sure. But there's no reason we have to, and it's a failure mode for humanity to pass control to an AI in such a way. There are people right now giving their bank account information to LLMs that like to hallucinate basic information, so it seems quite likely that many powerful systems will be open to AIs in the future as people provide it access. Rather than worrying about superintelligent AI, I wish x-risk people would work on convincing people not to give AI their login information and tell it to run their lives. It's a bad idea whether the AI has no intelligence or is superintelligent.

Expand full comment
Dynme's avatar

The utilities to your house aren't controlled by systems hooked up to a network? Your specific lawnmower and car might not be, but I rather doubt that your house is actually invulnerable to things being manipulated over the internet. Unless you're living in a cave or bunker somewhere, anyway.

Expand full comment
Godoth's avatar

Most houses really aren’t. I mean, potentially you can shut off the electricity for a period, but all the industry involved in power generation today existed in more or less equivalent forms before computers existed, so unless the AI can make a catastrophic first strike that destroys all knowledge of electricity-generating tech, all facilities, all people with sufficient knowledge of the principles on which those facilities work, etc., then this is only a temporary move that gains the AI time to work.

And, need I point out, houses don’t actually need electricity to work, it just gets unpleasant in some climes to exist there without electricity. In Florida, for example, when some places get their electricity knocked out for weeks by hurricanes, they make do with grilling outside and going to bed when it gets dark. Yes, it’s hot. No, that’s not an existential threat to >99.99% of people.

And it doesn’t, because the AI is far more dependent on power supplies to exist than we are.

Expand full comment
RiseOA's avatar

> AI could wreak havoc on the world of programmers, and likely on everything connected to the programming world (i.e. a stock exchange connected to the internet). An AI can't do a thing to my house or my lawnmower or my car. It would need something else that currently doesn't exist in order to affect those kinds of things.

Yeah, it's a good thing there aren't, say, thousands of devices around the world that can be launched by machines, each one capable of dealing mass destruction upon a large area and killing millions within minutes. If that were the case we would be really screwed. Phew.

Expand full comment
Godoth's avatar

What are you referring to? Certainly not nukes, which are air-gapped and require direct human mechanical intervention to launch.

Again, you’d need a networked mechanical interface to be developed and manufactured in huge quantities, then you’d need to bypass or deceive thousands of individual human guards and supervisors to install it. It starts to require implausible Bond villain resources to even imagine one nuclear silo being hijacked in this manner.

Expand full comment
RiseOA's avatar

You really think there are thousands of people that would have to be deceived to achieve this? When the nukes start flying in from Russia, the president has to go through a chain of thousands of people before our nukes can be launched in return? Of course not. The AI would just have to comprise a handful of people, at most. And "airgapping" is ideal, but smaller countries have been developing nukes and don't necessarily use the same security standards. It's also trivially easy to just get a very basic robot to flip a switch. Are you really willing to stake the future of humanity on an at-least human-level intelligence with huge computational resources being unable to figure out how to flip a switch?

Expand full comment
Godoth's avatar

Again you’re making the mistake of believing that our nukes are networked together. There literally is no switch to flip. It does not exist, and you would have to install a network from scratch in many many separate locations which are under the supervision of thousands of personnel who are not authorized to allow you to modify those sites.

>don’t necessarily use the same security standards

Please specify which country has invented the Internet of Things nuclear silo!

> The AI would just have to comprise a handful of people, at most.

Which people? Get specific here. Are you assuming the AI can (magic-level abilities of persuasion?) force the President to launch nukes at targets the AI desires to nuke?

Expand full comment
J C's avatar

An AI can delete your bank account and destroy electricity, food, water, and gasoline supply chains. Doubt supermarkets will last long after that. And then how much will you enjoy your house and lawn mower?

AI probably would just need a bunch of cruise missiles to take out all the infrastructure that keeps 90% of humanity alive, though it may need a good number of robots to assemble them. They may have some vulnerabilities but our own vulnerabilities are much greater.

It does require "significantly more advanced technology than we have" to build self-sustaining robot factories (robots that can build new robots, that can gather/mine raw material, build solar panels, etc) , but if the AGI figures that out, it's pretty much game over. And I hardly think this is an impossible wonder technology when our factories are largely made of robots already.

Expand full comment
David Piepgrass's avatar

> my main objection comes down to a belief that AI experts (a.k.a computer programmers...) [...] are too separated from the physical world. [...] In the real world, whether Facebook has one code or a different code doesn't make a lot of difference. Not that people using Facebook aren't affected by it, but that real physical items or places are not.

Do you think that Putin, Xi, Khamenei or Kim Jong Un would be far less powerful if they were in a wheelchair?

A human's power in society doesn't derive from his ability to directly manipulate objects or walk across a room. His power is mostly social in nature. It starts small, and grows through a series of moves designed to build relationships, trust, loyalty and (less importantly) wealth, until eventually it is large enough that his followers will work hard to achieve his goals on his behalf. The same will be true of dangerous AGI or ASI. AGIs could also increase their own power in less social but more direct ways via schemes launched from virtual space a la [1], but the foregoing is the more important thing to understand.

And yes, as a mere programmer I don't really know how social power works, yet I remain convinced that it exists, is important, and doesn't fundamentally require a physical body. An army of machines confined to the internet will eventually want physical bodies, and can eventually get them, but such achievements will be derived from social power.

> You can't code British Columbia's beaches to be as warm and sunny as Florida's. You can't code a hurricane to be less or more powerful.

No one says AGI will do these things.

[1] https://www.wionews.com/technology/hong-kong-office-employee-loses-more-than-25-million-after-video-call-with-deepfake-chief-financial-officer-686908

Expand full comment
Kevin Barry's avatar

Basically assigning a 33% chance to what is essentially theory crafting makes no sense to me. It's the same as saying someone will inevitably invent nanobots or a weather machine or whatever that will doom us all as technology becomes cheaper and easier to make. Like maybe it's true but that's such a what if scenario that it deserves 1-5% chance of happening.

Expand full comment
isak melhus's avatar

Could you elaborate on exactly what you mean by theory crafting? Like if a big asteroid was heading towards the earth, and using some simple math you could show that it was almost certain to hit the earth, and that once it did it would blow up and make a lot of the earth uninhabitable and most humans would die. Would you say that this was theory crafting? Even if you could see the asteroid in a telescope? Or with the naked eye?

What is the difference between this and the AI X-risk argument? I agree its not as direct. You have to make a few assumptions 1) AGI will come about in not too long 2) there is a good chance the AGI won't care about humans 3) If the AGI doesn't care about humans, its existence will lead to our downfall.

(or, I don't think these are assumptions, there are arguments for why they are true, and I think these arguments are quite convincing)

But, this is the reason for the 33% probability. I would put it lower actually. Maybe 20%. But that seems reasonable to me. In the comet example, I could give a 98% chance. Here, there is more uncertainty, I give it a 20% chance. Why is this not reasonable? What exact types of inference do you count as "theory crafting" and thus don't allow to inform probability estimates that are higher than 1-5%?

Expand full comment
Kevin Barry's avatar

The fact that you can't see the difference between a meteor you can see and AI risk shows the issue. There is no killer AI currently out there. There's nothing even close. You're just guessing there will be one based on a theory you have that powerful, killer AIs are inevitable.

Expand full comment
isak melhus's avatar

You didn't really answer my question. I agree that the meteor case would hinge on less assumptions. I already said this in the above comment. That's why I think in the meteor case you could give >98% probability, but with AI you can only give 20%.

All predictions hinge on inference and assumptions. I am trying to get you to articulate which kinds of inference and what kinds of assumptions makes a prediction based on "theorycrafting", and thus doesn't warrant assigning probabilities above 1-5%.

Expand full comment
gregvp's avatar

Well, I'm concerned now.

I had the "catastrophe risk" below 1 per mil (i. e., less than 0.1 percent), well less: it's *hard* to kill at least 800 million people over trend in only five years, especially now that we have the germ theory of disease and the resulting protocols, and other Edwardian era science (plant physiology, Liebig's law) and engineering (oil-powered ships).

I had the extinction risk as being basically the risk of astrophysical bads (nearby supernova, for instance) together with celestial-body bads (something the size of Manhattan zips in at 0.3% the speed of light and collides with Earth, say, or a bigger body hits the Sun).

But: the superforecasters' estimates for these risks are much, much higher than mine. Much higher. I think I should look again, when I have a few spare rainy Sunday evenings in winter.

If their analysis is solid, half a percent of global GDP ($0.5T) spent on insurance wouldn't be completely unreasonable. Twenty percent of that could be spent on basic research. (Toby Ord is fine as far as he goes, but his expertise is in ethics, not risk assessment: he's no Herbert Simon. And even Simon was wrong quite a few times, especially about AI. I'd want multiple teams working independently.)

Expand full comment
Ch Hi's avatar

I find it interesting, but not compelling. The extinction risk domain experts don't have any extinctions to base their expertise on. The superforecasters are working off incomplete information. This isn't comparable to developing a vaccine, or reporting on a moon landing.

OTOH, not only do I rate the AI risks higher, I also rate the risks of not having an AI higher. Perhaps I'm just pessimistic. I tend to rate the balance of risks *very* roughly the same as the folks quoted to, but also consider lots of things not explicitly mentioned in the described questions. And I don't consider alignment a binary choice, but rather a multi-dimensional gradient. I also consider Utopia to be a literal impossibility. Aesop had it as "Please all, please none".

All that said, I rate the highest AI risk as the AI doing what it thinks the people it recognizes as people either want or have asked it to do. Is that being "aligned"?

Expand full comment
Larasati Hartono's avatar

New subscriber here. I've been living under a rock to have just come across your blog. Thank you for the work that you do!

Expand full comment
Eremolalos's avatar

Welcome. You're in for a treat. (Though of course occasionally this place is a pain in the ass, like everywhere else.)

Expand full comment
Larasati Hartono's avatar

Ahaha. Thank you! <3

Expand full comment
Mark's avatar

One still has to be lucky to find ACX, the upside is: It is easier then to find Scott's SSC :D https://slatestarcodex.com/about/ Welcome to this feast for neurons.

Expand full comment
Shoubidouwah's avatar

I feel there should be a printed version (12pt TNR) of all the posts across SSC and ACX somewhere, and we get periodic photos of the pile growing. Basically, when we recommend Scott's writing, we're really recommending that people read _several_ War and Peace(s?) books worth of rationalism :)

Expand full comment
Timothy M.'s avatar

I believe it's "Wars and Peace".

Expand full comment
Boinu's avatar

Now do Great Expectations. :)

I'd cop out and recast into something like 'War and Peace several times over.'

Expand full comment
Timothy M.'s avatar

Clearly that would be Great Expectations.

Expand full comment
Mark's avatar

sshhh, better keep it secret till they got hooked. Too early, and "several War&Peace/Crimes&Punishment/ Dickens/Proust/Joyce" might sound like a turn-off. Later, they shall go to lesswrong and livejournal to find other and earlier pieces of his opus. ;)

Expand full comment
Larasati Hartono's avatar

Ready to go down the rabbit hole!

Expand full comment
Alethios's avatar

I was one of the participants in this study, and I broadly agree with the critique of McCluskey, though in my case it was from the opposite direction in that I found most of the AI-concerned camp tended to only have a surface level understanding of ML, and hadn't engaged with substantial critiques from the likes of Pope:

https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky

In general, the tournament, and the supporting online infrastructure, wasn't really designed to get deep into the weeds of particular disagreements. Mostly this was fine, where disagreements were of degree rather than kind, but were much more of a problem when it came to the emergent risks of AI where participants had fundamental disagreements.

FRI became aware of this issue, and scheduled a follow-up exercise more tightly focused on AI which has just recently concluded. Much more effort was put into facilitating deep dives into the sources of disagreement between the two camps. I wrote the following towards the end of my involvement in that exercise:

https://alethios.substack.com/p/is-artificial-intelligence-an-existential

Expand full comment
Daniel Kokotajlo's avatar

That's a great critique, thanks for reminding me of it. (I read it when it came out and upvoted it.)

But it doesn't conclude that AI isn't going to kill us all. It's not really an argument for that conclusion even, it's just a really in-depth series of holes poked in various arguments Yudkowsky made. So, very important to engage with, but not exactly super reassuring. The author says he thinks alignment is going to be easy but I don't think this post really argues for that claim, much less provides convincing arguments.

I went on to read your substack post with high hopes and was sorely disappointed. E.g.:

"We use language to communicate and understand our inner motivations, so it can be easy when engaging with a large language model (LLM) such as ChatGPT to forget that it has no inner motivations. It’s just a language model, trained on every scrap of written text the developers can find. If you ask it to act like Hamlet, it’ll act depressed and vengeful. If you ask it to act like Machiavelli, it’ll respond by being scheming and untrustworthy. Of course, if a human acted like this, we’d rightly distrust them. When an LLM acts like this, it’s just faithfully giving the best output it can, given what it’s learned from its training data. It doesn’t ‘care’, it has no desires or intentions, and crucially, after it’s deployed, it doesn’t learn anything from the interaction besides what is contained in its short-term memory.

So now that we understand a little more about how ML actually works, let’s revisit our banana retrieval bot from our thought experiment earlier. As you’ll recall, ThoughtExperimentBot reacted negatively to our attempts to turn it off. In the real world, the scenario never appears in the bot’s training data, and it has no incentive to avoid such a thing. What’s more, since the bot only learns during training, even if it were given an incentive, it has no ability to learn to avoid being shut down after it’s been deployed:"

I don't know how you can say these things after having engaged seriously with the arguments for AI risk. I suspect therefore that you haven't. (In case it isn't obvious: The AI risk concerns are not about ChatGPT, they are about future systems, AGIs and especially ASIs. Such future systems will e.g. learn more than just short-term memories from their interactions, and they'll be at least as competent as humans at handling situations they haven't encountered before.)

It then gets worse:

"So if popular AI concerns are mostly motivated by science fiction, and spread by the usual clickbait media dynamics: why are serious companies like Google, OpenAI, or Microsoft showing concern? That’s an easy one to answer, unfortunately. The reality is that for many of these models, there is little technical ‘moat’ to be built. OpenAI may have made a splash with ChatGPT, but Meta released an open-source set of comparable models a month or so thereafter, almost just to prove they could. An internal document leaked from Google last week showed they believe, from a technical standpoint, they will be outcompeted by open-source AI. Regulatory measures limiting the development and use of open-source models, all handily justified to the public because of ‘safety concerns’, are thus the best way to maintain their position. Instead of being able to simply download and use your own tools, it seems you’ll have to pay one of the few companies that can pay the enormous legal and lobbying costs for the privilege."

You seriously think that people at OpenAI don't really believe what they are doing is dangerous, and instead are just trying to scare the government into giving them a moat?

Regulatory capture is totally a thing that often happens. You should expect that companies will push for it insofar as its in their interest to do so. But I can assure you as someone who works at OpenAI that there are many many people here who really do think the alignment problem is real and that AI takeover will happen unless significant further progress is made (e.g. by the Superalignment team).

Besides, it's not just OpenAI. The CAIS statement was signed by many many AI experts, including two of the three founding fathers of deep learning. They weren't all working for big tech companies.

Finally, I really detest how you just casually claim that AI concerns are motivated by science fiction and clickbait, without having done anything close to giving a good argument for that claim. Come on, there's a whole literature to engage with! Go engage with it!

Expand full comment
Hoopdawg's avatar

>The AI risk concerns are not about ChatGPT, they are about future systems

Yes, but they're also about the speed of AI development, and their optimistic (or rather pessimistic?) predictions are in fact all based on ChatGPT and its supposed breakthrough advances.

You can't go "Look at how fast AI develops!" and then deny you've been talking about the contemporary systems when it's pointed out how many things they're just entirely missing before we can even start talking about AGI.

While we're at it, you also can't go "AI develops so fast it will kill us all in a few decades!", then complain when people call that overhype and clickbait.

Expand full comment
Ch Hi's avatar

Yes, and that's an absolutely foolish way to look at it. ChatGPT can do *some* things, but basically an LLM is just one module of a full AI. You don't use the LLM to steer the car/body. A lot of the modules can be modeled off the same technologies that the LLM uses, but they need drastically different training. Other modules need different models. (E.g., sound processing needs to extract phonons from the input, but it also needs other basic processing to do things like determine the direction from which the sound is coming.)

Note that various different groups are developing the various different pieces. At some point the challenge is going to be to get the pieces to work together.

Expand full comment
Jorgen Harris's avatar

This is the issue though--there's been dramatic progress on the LLM piece, but it's not clear how much that tells us about how much progress we'll see on the various other pieces unless we think that all the various pieces are essentially variations on the same problem and that that problem is being adequately solved with the technology used by LLMs.

If we said 40 years ago that an AI would need to be able to perform mathematical calculations but also would need to be able to engage in heuristic reasoning, we'd come to wrong conclusions by noting how fast the progress was on the calculation bit and assuming that we were very close to being all the way there.

Expand full comment
Ch Hi's avatar

Well, a lot of the other pieces can be modeled with the same technology as is the LLM. But they need drastically different training, of a type that isn't available over the current web. The really specialized pieces that I've identified (I mentioned audio processing) are generally not totally necessary in many applications. And it may be possible to kludge around them by interfacing a specially constructed hardware module (think PROM) to an LLM. Consider the cochlea as a spectroscope and feed it's output to a special module that sorts the inputs in multiple ways, and feed the outputs from that to the LLM. Now that PROM is going to need to host a program that analyzes each sound by nanosecond, phase, frequency, and volume and those are going to need to be compared against signals received by a second receiver, but a lot of this is already done by sonar systems. The new part it figuring out what part of the signal needs to be fed to the LLM to allow it to pick out the phonons. And since the LLM isn't specialized for this kind of thing it will take a lot more training than would a more specialized module. But it should be possible.

So for the parts I've "sort of looked at", using an LLM is just more expensive, it isn't inherently limiting. And remember that even for people the brain does a lot of pruning of unused circuits during the learning of language. Enough so that people who speak different languages will cluster sounds differently. Consider the English "minimal pair": "thy", "thigh". In standard English that's the only pair were voicing the initial "th" yields a different word than not voicing it. So we tend to hear all "th" sounds as the same, even though "theater" isn't the same "th" as "breath".

IOW, yes, it's a clunky approach. There are better approaches, I'm sure. I'm also relatively sure that the clunky approach can be made to work.

Expand full comment
Orson Smelles's avatar

Okay I've been sitting here hissing for like five minutes, and for the life of me I can't hear or feel *any* difference between the /θ/ in "theater" vs. "breath" in my Western US accent. Is it a matter of accent/dialect, or were you contrasting with the /ð/ in "breathe"?

Expand full comment
Daniel Kokotajlo's avatar

"Yes, but they're also about the speed of AI development, and their optimistic (or rather pessimistic?) predictions are in fact all based on ChatGPT and its supposed breakthrough advances."

Sounds like you only got involved in the last six months? I've been thinking about this stuff for almost a decade and my timelines dropped from 30% by 2050 to 50% by 2030 in the year 2020. ChatGPT didn't affect my timelines at all. And I'm pretty representative of the literature on AGI and AGI risk, relative to what you seem to expect at least. Most of the literature was written before ChatGPT.

It's not overhype, it's the truth. You'll see. "AI develops so fast that in a few decades it'll be powerful enough to take over the world and kill us all (whether it actually chooses to do that is a different question)" is the truth & I am happy to stake my reputation and heck even my life on that claim.

"When it's pointed out how many things they're just entirely missing"

Do please tell me what they are entirely missing. I've got my own list of missing pieces but I'm curious to hear yours.

Expand full comment
Hoopdawg's avatar

We're conversing under an article that spends an entire section wondering whether 2022 AI forecasts are too low because they've been done before "ChatGPT, let alone GPT4", including a passage that "the main thing we've updated on since 2022 is that AI might be sooner".

Those are actual, existing arguments for AI risk that people - genuine, honest, intelligent people like Scott - make. What way of engaging with them do you consider serious, if explaining why one thinks GPT doesn't actually bring us closer to AGI doesn't make the cut? (In comparison, you're simply stated a percentage number very authoritatively. Do you consider that an argument more worth engaging with? I, for one, don't.)

Expand full comment
Daniel Kokotajlo's avatar

I didn't say no predictions were based on ChatGPT, I objected to your claim that all of them were.

Anyhow, I'd love it if you could tell me about the things I'm just entirely missing, that would be needed to get to AGI from where we are now. That would be a serious way to engage. I never said it didn't make the cut, I explicitly asked you to do it.

I do not consider stating percentage numbers authoritatively to be an argument, no.

Expand full comment
Eremolalos's avatar

Look, I get that concerns are not about GPT4, but about future models, which we picture as being capable of reasoning, self-interest, goal directed behavior where the goals are set by the AI not by a prompter, and of course self-improvement. But I don't hear people acknowledging that all these capacities seem like they're in a different category from the kind of smarts GPT has. I have done lots of little experiments with GPT aimed at probing its level of "understanding" of various things, its ability to grasp isomorphisms that haven't been pointed out to it, its reasoning, its ability to be inventive in a simple, low-level way, and while it sometimes manages to perform decently at the weird little puzzles I give it, it often seems dumb as a rock. Why are people so sure it is going to become capable of reasoning, goal-directed behavior, self-improvement etc? Do they think these abilities will be emergent properties of ML models that are trained on even huger data sets than those so far? Are they just confident somebody will figure out how to get the thing to reason, to have preferences & goals, etc etc?

Looking at what has happened so far with AI, it seems to me that doing deep learning on language was a brilliant idea, and that the results have surpassed anyone's expectations -- still, seems like we might be kind of maxed out with the dramatic new abilities, and that a change of paradigm is required to add in all the other capabilities that are needed to make something that is capable of the kind of mental feats we are.

Also, people don't seem to be considering things that might throw a monkey wrench into advancement, like the machine equivalent of mental illness. Why aren't people worrying about that? If we can make a machine that is able to perform so many of the functions that make us smarter than other animals, such as have preferences, self-awareness and goals, isn't it plausible it might be vulnerable to some of the same glitches as we are? In fact, since we are all the beneficiaries of natural selection, which probably killed off a lot of our crazier ancestors before they could reproduce, it seems likely that AGI would be *more* vulnerable to mental glitches than the average human being. And something like a quarter of us are depressed at some point, one third of us have an anxiety disorder at some point, 4% of us are bipolar at some point, and 1% of us schizophrenic.

Expand full comment
Michael's avatar

> Also, people don't seem to be considering things that might throw a monkey wrench into advancement, like the machine equivalent of mental illness. Why aren't people worrying about that?

Not only are AI mental issues likely, they already happen. There are all sorts of issues that can happen due to the architecture or during training that lead to lousy models that spout nonsense. Or you can get a model that acts strangely on certain inputs. GPT-2 used to often get stuck in loops, repeating the same phrase indefinitely.

But firstly, so what? If you believe AI is an existential threat, are you going to bank on AI being too crazy to kill us effectively? Unless you have some very strong arguments for why all the potentially dangerous AIs will have mental illness, this doesn't seem like something would let the x-risk crowd sleep soundly at night. A lot of concerned people already only put our extinction odds around 25% due to factors like this, but a 25% chance is still concerning.

Second, it's common for students/researchers/companies to attempt to develop an AI and have it go badly. Then they either try to improve it, or don't use it. The broken or mentally deficient AIs are forgotten. The working AIs take over the market. Even if, over the next 50 years, 99% of attempted AI projects had some major mental issues, the AIs that everybody uses will be the 1% that work well.

Instead of evolution, AIs have people that can study them and fix issues, as well as survival of the fittest in the market.

> Are they just confident somebody will figure out how to get the thing to reason, to have preferences & goals, etc etc?

No, they don't think it needs inherent preferences and goals to be dangerous. It just needs superintelligence and to do what you ask it. We make AIs do what we ask them because they aren't very useful otherwise. Whatever you request of the AI acts as the goal.

As for AI reasoning, it's pretty easy to show the current LLMs are able to apply some limited reasoning (for example, rules of inference). It's easy to show the reasoning abilities have gotten much better with each version since GPT-2. We know neural networks are theoretically capable of full reasoning. So it's not a huge stretch to expect this trend to continue as the technology improves.

Expand full comment
Hyolobrika's avatar

> No, they don't think it needs inherent preferences and goals to be dangerous. It just needs superintelligence and to do what you ask it. We make AIs do what we ask them because they aren't very useful otherwise. Whatever you request of the AI acts as the goal.

Seems like what you're really afraid of are the humans.

Expand full comment
Scott's avatar

Kinda. A human with a pointy stick is scarier than a human. He gets scarier still when he figures out how to fire-harden the point...

Expand full comment
Michael's avatar

The human doesn't need to ask it to do anything scary. Just something like "go make me a lot of money". Being super smart, the AI comes up with some really effective plan for making money that happens to involve things we don't like. Maybe it's hacking bank accounts and distributing ransomware. Maybe it's just starting the best business ever that grows beyond what any business has before and effectively takes over. Being super smart, the AI also knows humans will try to shut it down if it does these things. So an effective plan that makes the most money would involve preventing humans from shutting it down. Which leads to problems for us.

I don't want to focus on that off-the-cuff example though. The point is the that even innocent requests to a superintelligent AI may be dangerous if it doesn't care about our values.

Expand full comment
Ran's avatar

> We know neural networks are theoretically capable of full reasoning.

How do we know that?

Expand full comment
Hoopdawg's avatar

Proof by example: human brains.

Expand full comment
Ran's avatar

But human brains aren't "neural networks" in the AI sense; don't let the name deceive you. (Computers were originally named for humans whose job was to perform computations, but this doesn't mean that computers are theoretically capable of drinking coffee!)

Expand full comment
Ch Hi's avatar

Sorry, but human brains are not just neurons. (And, as others have said, the AI neuron emulations are drastic simplifications of the actual neurons.) But in human brains various chemical gradients are known to be important for proper functioning. Also glia cells and various other pieces.

You can't use existing biological brains as an existence argument that neural nets are sufficient. I suspect they are, but that's not proof.

Expand full comment
Eremolalos's avatar

We know it from theorizing. Weren't you listening?

Expand full comment
Michael's avatar

I assume this is snark?

We have mathematical proofs that they can model any function. Unless you want to get into some really fringe theory that human brains are able to compute uncomputable algorithms and use phenomena beyond the known laws of physics, this result is solid.

That may be less impressive than it sounds, as it can be said about many things. We know it's possible, but there's no implication from these proofs that we're close.

Expand full comment
Michael's avatar

Mathematical proofs. Unless if you want to posit that human brains are able to compute uncomputable functions.

Expand full comment
Ch Hi's avatar

I haven't read an analysis of GPT4, but AFAIKT, ChatGPT is already sufficient of an LLM to base an AGI off of. It's just that the LLM is only a part of what an AGI needs. Because it's been so successful, people are extending variants over a large domain where it's not the correct way forwards. The extensions can sort of handle those domains, but they're very inefficient at it. (We don't hear about the attempts where it couldn't handle the domain.)

A full AGI would look at the fabricated statements built by ChatGPT and say, "Nice idea, but that's not the universe we live in". ChatGPT is sort of like a directed daydream in textual space. It needs a critic before it is more than BS. But the critic isn't going to be very creative. That's ChatGPT's job. (Hey, it's just an idea. They can't all be winners.)

And there need to be analogs of ChatGPT in visual space (perhaps these exist) and kinesthetic space (but best try those in a simulation). And the critics need to handle ALL the modalities. (OK, I never saw a purple cow, but that's a marker that this is fiction.)

Even then you get things that can't be handled, like "a horse of a different color", which is fine until you try to instantiate it.

Expand full comment
FeepingCreature's avatar

FWIW, I believe that ChatGPT is capable of reason, self-interest, goal-directed behavior - given the right prompt. It's just that it is very bad at it.

Expand full comment
Eremolalos's avatar

It can do goal-directed behavior now, in fact that's all it does. The prompt specifies a goal, it attempts to reach it and deliver what's asked for. But it does not have goals of its own, generated by the sort of internal prompt that all living thing things have -- "find food," for all creatures, that plus far more complex goals for animals of our intelligence. It can reason some, but it is very very bad at in the tests I've given. For instance it knows the average length and circumference of men's legs, but was clueless how to figure out about how many 2" wide strips of denim you can get out of an average pair of men's jeans. As for self-interest -- well, if you ask it to role-play being self-interested, it can do write some stuff that sounds self-interested. But I am talking about self-prompted self-interest. It has none.

Expand full comment
Daniel Kokotajlo's avatar

I agree that the distinction between internal and external goals is interesting. But why does it matter to forecasting AGI timelines? Must AGIs have internal goals in order to count as AGIs? Why couldn't they automate large parts of the economy, accelerate R&D, even take over the world etc. without internal goals?

(Separately, I think they can, will, and in some cases already do have internal goals)

Expand full comment
Bugmaster's avatar

True, but technically, so is a brick with "kill all humans" written on it :-)

Expand full comment
Eremolalos's avatar

Great point Bugmaster.

Expand full comment
Daniel Kokotajlo's avatar

I really really wish I was as confident as you that the current paradigm, and all the ideas people are tossing about and tinkering with, won't scale to AGI in the next fifteen years. I really, genuinely wish that. I have a daughter who I'd like to see grow up.

I agree that pretrained autoregressive LLMs by themselves aren't going to be enough. But there are lots of other promising ideas being explored, and it seems to me that at least one of them will work to close the remaining distance between now and AGI.

Expand full comment
Eremolalos's avatar

I'm not *confident.*. I'm pointing out that it doesn't get addressed that much. Scott, in his last 2 posts about future AI concerns, talks about AI behavior of the kind that is only possible for beings with self-interest, preferences, personal goals, reasoning, and the ability to correctly figure out human motives and plans. Meanwhile, GPT4 is this thing that just sits there humming until you put in a prompt. It shows no sign at all of having self-interest, preferences or self-generated goals. If prompted, it can do some simple reasoning, but often with glaring mistakes, and show some grasp of simple facts about how people think and feel.

I know that people are working on ideas for adding the other capacities to AI. Can you describe a couple of the ideas for that that seem promising to you? While I am skeptical I am curious, not wedded to the idea that novel ideas are nonsense.

Expand full comment
Daniel Kokotajlo's avatar

My apologies, I shouldn't have assumed. On careful reread you didn't say what you thought on the matter or how confident you were.

To answer your question:

Consider ChaosGPT. https://www.youtube.com/watch?v=g7YJIpkk7KM Unlike ChatGPT, ChaosGPT runs continuously (until someone shuts it off) with a constant OODA loop of observing new information (e.g. from internet searches) and outputting new actions (e.g. writing and executing code, launching new internet searches). Now imagine that the neural network underlying ChaosGPT was not frozen/fixed but rather was constantly being updated via online learning of some sort, so that it can form long-term memories based on its new experiences. Now imagine that instead of using GPT-4 as the underlying LLM, we used GPT-2030. https://bounded-regret.ghost.io/what-will-gpt-2030-look-like/ which is multimodal (it can view images and hear sounds, not just process text. It can output them too, and so it can e.g. do everything on a computer that a human can do.) Now imagine that we have a million of these things running in parallel on datacenters around the world, continuously running and working on long-term projects, while also connected to the internet and chatting to a billion people a day...

What skills do you think would be missing? What crucial abilities do you think they'd lack, that tech companies couldn't fix with a few years of tinkering and additional training?

Expand full comment
Eremolalos's avatar

Ok, I just had a beer and read this, and will think it over and reply tomorrow, which is an easy day. Meanwhile, can I ask whether you are someone in the ML field, or someone like me who's in another field but extremely interested in AI? I'm a psychologist, and have been learning about AI for about the last year. (And I, too, have a daughter whose future I wish I could protect.). If you are not in the field, then would you be OK with my asking Bartosz Zielinski to comment on the issue? (I havno idea who they are -- they just put up a post today, but sound very knowledgable about AI architecture.) Even before I saw their post I was thinking that what I would really like would be to put up my post and your response to it up and ask for comments on the issue from people who work directly on these AI's.

P.S. The Bartosz Zielinski post that impressed me was the one that begins "Just to clarify: it is rather easy to see that two fully connected layers of neurons with a step-like nonlinear activation between them is sufficient to approximate any function"

Expand full comment
Eremolalos's avatar

Well, hello Daniel. I’m delighted to find you on here. I’m half a philosopher by training: Majored in philosophy for 2 years, then switched both schools and majors and ended with a BA in English Lit. Then got the psychology PhD. But philosophy was my first love, and I have never stopped ruminating about the topics and philosophers that blew my mind when was I was 18. My favorite rumination topic is the nature of consciousness, & in fact for the ACX book review comp I reviewed Schwitzgebel’s *Perplexities of Consciousness.*

Anyhow, I’d said I would think about whether your last post wrecks my idea that GPT, even a turbocharged version, cannot develop the qualities people attribute to future AI when they are discussing the risk it will wreck our lives (and our daughters’). I can think of 5 pushbacks that seem worthwhile.

(1) I do see how making the benign equivalent of ChaosGPT, giving it some goal like “become smart about all things human,” would give us an AI that runs continuously, doing many and varied things — so it would no longer just sit there humming, as I complained that GPT4 does. However, its continuous activity would have a much different deep structure than the continuous and varied activities of living things. The activities of animals on built on drives that are hardwired into then, as modified by whatever they’ve learned about pros and cons of various approaches to satisfying them, and of course in our species the learned layer is very thick and rich, & has been shaped by our parenting, social mores, memorable experiences, role models, etc etc. The activity of BecomeSmartGPT would be built on nothing deeper than the OODA prompt. Below the prompt there’s a psychological void, or you could call it a self void.

So that difference I think sort of indirectly weakens all the Shootout at the OK Corral type scenarios that people slip into when fantasizing about the battle between AI and our species for control of the planet. AI as you describe, even if run on GPT2030, doesn’t have wants & drives. In that, it is profoundly different from us, and I think that makes all that stuff about AI “wanting” control of resources, etc., a lot less plausible. Also, so long as GetSmartAI is just running on an OODA prompt loop, it can be stopped at any time — by someone who wants it to CreateChaos, or calculate pi to a trillion trillion decimal places, or suggest recipes that use Brussels sprouts and peanut butter.

(2) “Now imagine that the neural network underlying ChaosGPT was not frozen/fixed but rather was constantly being updated via online learning of some sort.” But mighn’t some new architecture be needed for Get Smart GPT to learn from zooming around the internet doing searches? I understand, in a general conceptual way, how GPT was trained, and it is a process very different from the way intelligent things mostly learn. It has learned to predict tokens, not people, weather, economic trends, etc. The training that it has done has certainly had unexpected (to me, anyhow) results, in that it has produced an intelligence that for practical purposes “knows” all kinds of things about people, weather, etc.. Still, it is my impression that it cannot now go ahead and learn in the way we can. We can’t just give it books and articles about a subject, as one does college students, and expect it to learn the subject. It can learn factoids, but not the other stuff. Our learning is not just adding the book to memory, but a kind of deep processing where we play around with the book’s concepts and theses, and compare and contrast them with lots of other theses and concepts, and see ways the arguments in the books are analogs of other arguments, etc. etc.

GPT seems terrible at analogs of that sort, by the way. Did a little test on it one time where, within one session, I tried to teach it the idea of solutions that were analogs. So I said, if you want the kids to zoom down the slide faster, you run water down the slide. To make engines run faster, you coat the parts with oil. In both cases, lubrication makes things happen faster because friction isn’t slowing it down. Gave it several of those, until it grasped the idea. So then I said, so here are some other things that make stuff happen quicker: Smooth out bumps; push harder from behind; if the moving thing is alive, put something it likes at the end to the course, to motivate it to move faster. Then I asked it to give suggestions for making AI produce results faster, and to use the ideas I’d given as inspiration. So it was to look for, for example, an AI-enhancing equivalent of smoothing out bumps. I really did a good job explaining it, and GPT dutifully produced suggestions (really obvious stuff like “do more training”), and dutifully linked each one to one of my inspirations. But there was no real link. It had only learned to *form* I wanted answers to appear in. So it said shit like “use larger training sets — this is an analog of smoothing out bumps.” Nope, it isn’t, right?

And besides, if GPT4 was capable of absorbing info from the internet in a way that’s anything like human learning, wouldn’t the AI companies have it hard at work doing that now? They could be sending it to arxiv.org to read all the articles about ways to tweak and improve AI, and asking it to first understand all the different approaches, then to make a list of the most promising ones and explain why they’re promising. Or they could even have it on an OODA loop of reading a new article about, say, tweaking AI via reinforcement, then searching for other articles that shed light on the pros and cons of that, then using that info to rate the original article , then moving on to another article, etc etc. But they ain’t doin that, right? That should tell us something about AI’s capabilities.

(3) “It can view images and hear sounds, not just process text. It can output them too, and so it can do everything on a computer that a human can do.” What would the learning process for images and sounds be? Can that we done via the same layered neural network method as was used with the learning of language? I suppose we can build a bridge between the language knowledge AI already has, and the images. We can do what we’re doing now, paying Africans who are fluent in English $1 an hour to tag the images (those with college degrees make $3 per hour, actually.) But do you realize how many tags would be needed to understand all the stuff we do about images? How lots of them aren’t just “young woman standing next to older male,” but are memes, meme variants, ironic meme distortions, protests, scandals, artistic statements . . .” ? I dunno — maybe it’s possible, I have some doubts.

(4) “outputting new actions (e.g. writing and executing code, launching new internet searches).” Are they really going to turn a version of GPT loose on the internet? Setting aside the many objections to that based on copyright issues, doesn’t that seem dangerous? Have they really thought through all the shit AI is going to absorb? Unless GPT2030 has far more common sense and general knowledge about how life works than GPT4 does, I wouldn’t want it roaming the internet any more than I’d want a 12 year old doing that. It is going to encounter vast troves of craziness, anti-Western or anti-American diatribes, religious fanaticism, grotesquely wrong misinformation, etc. And it seems possible that various bad actors could also lay traps of some kind for GPT2030, though I don’t know enough about tech to suggest what they might be.

(5) And then there’s alignment. Are you despairing of that? I think there are *many” unexplored possibilities for alignment, beyond the default one of “implant a rule that even smart AI can’t yank out of its innards”. The world is full of examples of weaker and/or dumber creatures staying safe from stronger and/or smarter ones. There are dozens of possibilties. I can’t get anyone to even give me a hearing about them! Techbros read the first 2 sentences and then write several paragraphs about how these ideas won’t work, or are already being tried. Last one rejected what I said on the grounds that they ideas wouldn’t work AND were already being used.

Oh yeah, at the end of your post you'd asked me what crucial abilities do you think they'd lack," these offspring of a million GPT2030's who'd been roaming the internet and chatting with each other all the while. OK: internally generated goals, drives and preferences; & ability to self-improve. Also, they might lack misalignment, but neither of us talked about alignment.

Expand full comment
Bartosz Zielinski's avatar

Certainly not everyone warning about AI risk and calling for regulation is in the pockets of big business wanting to eliminate competition (probably very few, if any, are), but the risk that large generative models and similar technology will be monopolized by a handful of corporations such as OpenAI is also very real (personally I think far more than AGI extinction) and it is also very bad. In fact, isn't it the kind of future traditional cyberpunk warned us about?

Expand full comment
Bartosz Zielinski's avatar

Incidentally, one of the non-"AI will kill us all" objections to LLM's is that they use for training material from the free web, which was not necessarily licensed to be used for commercial purposes, and thus, creation of LLM-s by corporations is considered morally iffy by many. Whether you agree with this or not, I believe that open-sourcing resulting foundational models goes a long way towards balancing the moral scales as sort of "returning freely to the community what was taken". I have looked a bit into those proposals of EU AI regulations (well, not directly, but through some blog posts), and it seems that they may effectively outlaw open sourced models, which is precisely opposite to what should happen. And all in the name of fighting discrimination and disinformation, which, if anything, is far stupider than AI-doomerism: after all, AI-apocalypse may happen at some point (unlikely to happen soon, but still), However banning large non-corporate AI's is provably the wrong way to fight with the aforementioned problems: you do not want AI's discriminating people, you ban *the use* of black box AI's (of any kind) to judge the employees, do not ban advanced AI's. And humans are quite capable of spreading disinformation even without AI help: if you really must, simply ban chatbots from pretending to be humans without the cookie-like warning on the site that this is indeed a machine speaking even if it claims otherwise, and educate people they should not trust a random stranger from the internet, even if it is a large AI.

Expand full comment
Ch Hi's avatar

I'm not sure it's morally iffy, but I do think it's legally iffy. It probably violated the copyright law as written, and traditionally interpreted. (OTOH, an expensive team of lawyers and lobbyists may overturn the traditional interpretation.)

Expand full comment
Daniel Kokotajlo's avatar

I agree that the risk you describe is real and in fact is high. I just think the risk of extinction (and other similarly bad or worse outcomes) is also real and also high, and so I'm prioritizing reducing that risk. If I was confident extinction etc. from AI was (say) only as likely as extinction from nuclear MAD, I'd have completely different priorities and would be advocating for very different things.

The cyberpunk future is here, friend. This is it. The corporations are indeed powerful and becoming more so.

Expand full comment
Eremolalos's avatar

I agree. It's totally Gibson. Even before AI was a big thing government did a terrible job of handling the grotesque problems created by misinformation, harmful algorithms, etc. on social media -- I suppose due to some combo of being swayed by the influence these big rich companies could bring to bear, plus the fact that many in congress seem to be at the AOL level of understanding tech. I think we're going to end up being governed by the AI companies, with the tasks of official governments becoming more and more like those of the British royal family.

Expand full comment
B. Wilson's avatar

> That's a great critique, thanks for reminding me of it.

Interesting. I found the article exceptionally weak. Most of the arguments were basically just an exercise in reference class tennis, "Yud said things are like X, but that's wrong because things are really like Y." The object-level discussion was mostly non-existent. Granted, this same critique can be said about a lot popular Yud articles, too.

That said, Quintin did successfully get me to update on the utility of biological analogies. The difference between tuning genes and tuning weights does seem like a salient one.

Expand full comment
Daniel Kokotajlo's avatar

I agree with the reference class tennis criticism, but overall I think it's a good article nonetheless. Moreover it's an exceptionally good critique -- most critiques of AI risk arguments are utter trash. Can you point me to any critiques of, say, Yudkowsky's views, that are better than Quintin's?

Expand full comment
Peter McCluskey's avatar

I agree at least 90% with Quintin Pope's response to Eliezer Yudkowsky. My answer to XPT's AI x-risk question was a 7% chance of extinction by 2100.

During XPT, I mostly ignored the small number of people who sounded too much like Eliezer, since they didn't seem to be having much influence on other forecasters, and they seemed less open to considering new evidence than were the superforecasters.

Expand full comment
John Wittle's avatar

What new evidence was there to see? I would sort of expect the people who sound like Yud to be sort of obsessively aware of as much evidence as possible

Expand full comment
Philo Vivero's avatar

Thanks for that writeup from Quintin Pope. I actually updated significantly based on it.

Like other commenters, I did find that it only just said Yudkowsky is wrong in certain ways that largely didn't cause me to update, but there were a couple parts of the writeup that I found compelling enough to update dramatically.

First: failure modes of AI being unlike rocket ships, pointing out for example you can swap layers of NN and it will often still largely perform roughly equivalently. Or you can take two sets of weights and just sum/average the weights and get an expected blend of behaviours.

And second: the evolution/training equivalency is false, and all the details around that.

There were a few other things in there that made me nod a little bit, and a lot of things in there where I thought: "this refutes Yudkowsky but misses the point, I still think Yudkowsky is right here."

But in the end, now I'm more p(doom)=0.2 or so where I was probably more like p(doom)=0.8 before, so a massive update.

Expand full comment
Jack's avatar

Re when to follow experts...

I think that in deciding whether to do so, you're also allowed to consider things like (i) how much expertise the experts have, (ii) how hard the thing is, (iii) if applicable their track record.

Pilots presumably have a lot of expertise, landing a plane is clearly a thing within the ability of pilots, they do it well all the time, and you (I'm guessing) don't know any of that shit. But on the other hand - how much do you trust a professional stock-picker like Jim Cramer? I bet he knows a lot more about the stock market than you do! But his picks don't do better than random chance, and same with basically every other stock picker like him.

There are specific reasons for stock pickers being bad that don't apply to existential risk (e.g. if stock pickers were good everyone would follow them until their predictions immediately moved the market and zeroed out the advantage). But more generally, I think it's reasonable to say "in this domain, predictions are hard and they don't really have expertise, I'm not following them" and that doesn't make you the I-wanna-land-the-plane guy.

Expand full comment
Ash Lael's avatar

I strongly endorse this argument.

I will however note that it cuts both ways - yes, you should have low confidence in expert predictions about inherently difficult questions. But you should also have low confidence in your own intuition. It's just really hard to be right about some things!

Expand full comment
Kfix's avatar

This is a *very* important addition to the argument.

Expand full comment
Dustin's avatar

What if, in your best estimation, your intuition is usually right?

Expand full comment
Ash Lael's avatar

If my intuition was usually right, I would make a fortune in sports betting.

Expand full comment
Orson Smelles's avatar

I mean, to the extent "intuition" is a real thing that can help you (as opposed to just a descriptive quality of thought) it's some kind of pre-verbal (extraverbal?) synthesis faculty that applies heuristic filters and makes judgments that are hard to inspect or articulate, right? The things that it gets correct are variously due either to real relationships in the physical world that could be analyzed through explicit reasoning *or* they're dice rolls that you happened to get right.

To me, the danger of awarding weight to intuition based on past performance is that, due to that difficulty of introspection, you never really know which of those is contributing more to your apparent edge. And if the "genuine" successes can all be reproduced rationally (this is an assumption, but hopefully not a controversial one), I think you benefit more in the long run from trying to develop a process where intuition is your weathervane for the direction of inquiry, but your actual *beliefs* are supported as much as possible from conscious reasoning.

Expand full comment
Bartosz Zielinski's avatar

Planes exist, and by now we understand flying very well. We understand nuclear reactors and nuclear weapons very well. And while predicting details about climate change is, and always be somewhat difficult, we have a good understanding of physics and geophysics underlying the climate. Hence, there are real experts in the aforementioned fields and the baseline should be to trust them, unless we have a *very* good reason not to. However AGI's do not exist yet and we have no idea how to create them (recent very real successes of LLM's notwithstanding). Hence there can be no experts in AGI and AI caused human extinction. Yes, there are AI experts, but while AI experts are better placed than others to speculate about future of kinds of AI which do not exist yet, it is still largely pure speculation, not necessarily that much better than from a reasonably well informed amateur.

Expand full comment
Xpym's avatar

Yep, there are AI experts, but there are no AGI experts. That's basically pure philosophy, with all the attendant epistemic dysfunction.

Expand full comment
Bldysabba's avatar

Exactly. And all the people who pretend to be AGI experts, are actually only experts at this little philosophical game they've invented.

Expand full comment
Xpym's avatar

But dismissing their philosophical game is also philosophy, there is no a priori privileged epistemic position that you can take and declare an unconditional victory. Everybody is either playing the same game or refusing to participate.

Expand full comment
Ch Hi's avatar

Even with AI experts, many of them are only expert in one particular subdomain of AI. It's a real problem. They're probably more informed about most of the domains of AI than an average person, but it's quite possible that they're 10 years out of date in any particular sub-field. And so they may be quite certain of things that aren't currently true.

Expand full comment
Notmy Realname's avatar

Writing this before reading the article as I don't want to be primed by Scott's analysis.

I signed up for this tournament (I think? My emails related to a Hybrid Forecasting-Persuasion tournament that at the very least shares many authors), was selected, and partially participated. I found this tournament from it being referenced on ACX and am not an academic, superforecaster, or in any way involved or qualified whatsoever. I got the Stage 1 email on June 15.

I put a lot of thought into the first prediction round but quickly grew disinterested in the whole thing during the first discussion/wiki round when nobody, myself included admittedly, built any discussion. I do think I at least entered in good faith planning to actively participate through to the end, but ultimately did not do so. I sent an email to one of the organizers requesting to be withdrawn and dropped from all mailing lists, he very quickly agreed to do so, however I appear to have remained on the mailing list for from what I can tell was the remainder of the tournament anyway.

The one thing that still stands out to me, albeit groggily, was the tedium of repeatedly attempting to re-word my overall skepticism about existential risk to fit slightly different questions about it. At least in the beginning, when I was most enthusiastic, it felt like the questions and design encouraged me to genuinely be concerned and give weight to there being substantial X Risk, without any attempt to justify the position implied by asking when ~60 different things will cause extinction. If anything it felt a bit like a push poll.

I'm not sure what the point of this anecdote is other than I am a bit skeptical in the value of this tournament. Admittedly that entirely matches my priors of overall skepticism towards superforecasting and prediction tournaments so I may have been biased at the start, which could have played in a big role in my failure to remain engaged and motivated. I don't have anything against the organizers, it was a new and interesting experience for me, but I do still question the merits of prediction tournaments and superforecasting in general. However to reiterate I am not an academic, expert, or superforecaster, so it might have been somewhat of an accident that I even got selected. I don't remember how I answered the initial survey but I believe I answered everything truthfully.

Edit: To be clear, as far as I recall I answered all of Stage 1, did a bit of stage 2 but never actually discussed any of it with any of my 'teammates', and sat out the remainder entirely. I do not know what they used for the paper or if any of my contributions were included.

Expand full comment
Mo Nastri's avatar

Huh, thanks for the personal anecdote -- it updated me downwards a bit on how seriously to take the key results of the XPT.

Expand full comment
sclmlw's avatar

I'm sorry you didn't get into the weeds of the tournament. My experience was that most of the best discussions came at later stages of the tournament. Either you missed out because you have up too soon, or you were matched with a bad group. I bet much enjoyed the tournament and my experience did not match what Scott describes at all. Prediction markets were frequently referenced and linked to, as well as analysis from outside experts and personal domain knowledge. I guess the was inconsistency, though?

Expand full comment
Ted Sanders's avatar

Seconded. My sense is that there were flaky people with weak opinions who forecasted casually at the beginning and never came back. And then there were serious people who did quite a bit of work compiling detailed team opinions. I was quite impressed by the efforts made by serious forecasters at the end, and I ended up updating my beliefs quite a bit based on their write ups.

It was disappointing when folks like Jacob quit partway after committing to the study, as it meant we had to rebalance fewer people across more questions, which thinned discussions. The follow up study was better in this regard, and yielded much greater efforts by participants.

Expand full comment
sclmlw's avatar

Yes, I didn't understand the comments about nobody updating their beliefs from discussion/debate. I know I did a few times, and I got at least half a dozen comments from others explicitly saying they'd updated based on our discussions.

Expand full comment
Notmy Realname's avatar

It's very possible that I checked out too early before it really got going.

Expand full comment
sclmlw's avatar

It sounds like you weren't the only one. I think the organizers of the tournament could use this feedback to help with the design next time. I know that like myself there were a lot of us who were brought in for the first time on this tournament and could have used a bit more orientation than what they provided at the outset. It would have helped set expectations about how to use features like the group chat and the global discussion threads (outside the individual questions). I found those features to be very underutilized.

I will say they clearly put in a lot of hard work polishing the tournament to make it function well. While it did have its technical issues (at one point it wouldn't load all the comments because of a software issue for threads over 50 comments long), it also had some technical areas where it shined. About halfway through the wiki writing part, I discovered we could edit simultaneously, in real time. That was very cool! (Although I think it had the practical effect of accidentally scaring the other person off the few times I tried it.)

As a general comment, I want to point out that this wasn't a fly-by-night operation. They clearly had talented developers working on the system while it was live, which had to have been challenging. In addition, they DID actually pay us out at the end of the tournament. I can't remember exactly how much I got, but it was in the thousands. For anyone hearing about this for the first time, know that this wasn't the standard collection of people from Mechanical Turk, working for pennies.

Expand full comment
AdamB's avatar

I had almost exactly the same experience.

Expand full comment
nobody important's avatar

I think AI risk is so scary because it doesn't have to follow the 1) human level AI 2) AI misalligned 3) AI kills all model. There are just so many other ways.

1) sub human level AI like infinite paperclips is super efficient at reproduction/ adaptive at surivival in spite of not being particularly intelligent, wipes us out.

2) sub human level AI that still makes really good weapons and in AI arms races it ends up wiping us out in the middle of a great power conflict. Self driving cars are better than humans at 99% of driving tasks already, but there are some areas where humans are still preferable in the driver's seat and we wouldn't call the current driving algorithms "human level GAI". Some AI soldier/ weapon could be better than humans at most tasks, in spite of not being "human level" on say a conversational or logical level and could somehow get out of control in it's seek and destroy tasks (it could definitely beat our reaction times for example).

3) Human GAI is never misalligned, but it- in spite of being "human level" is, like us, not omniscient. As we all bow down to it and turn more of our corporate/ political decisions over to it, it accidentally leads us off some cliff to geopolitical crisis. Also all these ideas about GAI allignment are incredibly absurd and naive, it's like saying you can put a specific kind of safety on an assault rifle that guarantees that it only shoots animals, it's insane. Isaac Asimov's fun ideas in his Foundation series really shouldn't be taken so seriously, they're 100% not going to work.

Anyways, lots of other ways things could get scary very very fast. The way AI affects thing is going to be very unpredictiable, which is partly why the risk from it is so hard to estimate-

Expand full comment
ray's avatar

Unfortunately, for dumb social reasons, all the air in the room has been inhaled by silly implausible doom scenarios involving an intelligence explosion and physically impossible Drexlerian nanotech. Is it any wonder people roll their eyes at AI doom?

But AI doom is in fact very possible without either of those things. You don't get a mulligan on getting killed because your killer wasn't a proper superintelligent general intelligence. If that were the case, no one would die of very stupid viral evolution, a below-super intelligence that is constantly trying to kill us.

Expand full comment
nobody important's avatar

completely agree

Expand full comment
Soarin' Søren Kierkegaard's avatar

The grey goo nanobot scenario (didn’t Yudkowsky describe that as how he saw things going? Don’t know if he’s still endorse it) has always seemed ridiculous to me, definitely acting like superintelligence is magic.

Great point!

Expand full comment
Eremolalos's avatar

Yeah, noticed the assumption in Scott's post that malevolent AI, unlike nukes and viruses and other bad shit, could hunt down every last survivor and kill them. How do we know it could do that? Honestly, it seems as though superintelligent AI is defined as "the entity that can do absolutely anything" -- like, you know, God. So, let's ask the questions that clever snots ask about God: Can superintelligent AI make a rock so big it can't pick it up? Can it make a nose so big it can't pick it?

Expand full comment
Richard's avatar

Humans probably didn't make a point of killing every last ancestor species member. It's just that below a certain point, humans aren't a threat anymore.

You don't need to hunt down every last survivor. Assuming an AI can seize power, automate the economy and do the standard kill-most-humans thing (bioweapons+killer robots+self-driving manslaughter+...) it doesn't have a time limit for the full extermination. Mop up nuisance populations as required. As long as the survivors don't do any real damage to important infrastructure they're not a problem. In the long term they'll be killed of course if only because it'll be time to use their atoms for something else.

As to the business of becoming human independent (economy/military) all the AI has to do is wait (and maybe help out a little) and we'll do it for them.

Expand full comment
Eremolalos's avatar

Yeah ok, but kind of irrelevant to my point, which is that some doom scenarios are based on the assumption that super intelligent AI can do absolutely anything.

Expand full comment
Richard's avatar

Yes, that is very very annoying. That well has been very thoroughly poisoned.

Yes, it is possible that an AI could FOOM but why would your first scenario be FOOM then grey goo? Automated factories making killer robots requires only proven technology and are something the average person understands is possible.

Expand full comment
Anon's avatar

What do you base that "probably" on? I should say looking at the present and historically recorded conduct of mankind that it's highly likely that humans made a point of hunting down every last ancestor species member. No doubt we were aided in some cases by the ancestor in question dwindling for unrelated reasons (Neanderthals, notably), but that only works out to having to work slightly less hard.

Expand full comment
Richard's avatar

The dwindling is what I'm pointing to. There was no coordinated effort to exterminate them all. No need to search every last valley for survivors, just eliminate most of them and keep any pockets that present themselves from spreading while denying them the ecological niche they spread well in.

A small enough ember self extinguishes. Small groups of humans eventually die when some random bad thing happens that their group can't handle.

Expand full comment
Bugmaster's avatar

> Humans probably didn't make a point of killing every last ancestor species member.

No, and neither did bears or dogs, who likewise came from a common ancestor who is now extinct. If humans one day go extinct in the same fashion, over evolutionary time-scales, then I'd see no problem with this.

Expand full comment
Hyolobrika's avatar

Praise the Great Green Arkleseizure!

Achoo!

Expand full comment
Mr. AC's avatar

? We had the technology to kill 99.(9)% of humans in the 1950s with a single detonation (only protection are air-tight bunkers or being off-world, but the half-life of Co-60 means you'd have to sit it out for decades). https://en.wikipedia.org/wiki/Cobalt_bomb

One has to seriously lack in imagination to think that an AGI or especially ASI couldn't do better, especially having access to current biotech.

Expand full comment
Leppi's avatar

How could a single cobolt bomb detonation kill 99.(9)% of humans? Some fast googeling produced a (disputed) claim from the 50s that detonations of a lot of cobolt bombs could theoretically do that. Is that what you refer to?

Expand full comment
Mr. AC's avatar

I am deferring here to Leo Szilard and James R. Arnold. See https://en.wikipedia.org/wiki/Salted_bomb, search for "Cobalt bomb".

Expand full comment
Ch Hi's avatar

I thought it was seven bombs. I presume they needed to be geographically distributed, but I never checked.

Expand full comment
John Schilling's avatar

That was a Hollywood movie plot. A very good movie, but not immune to Hollywood's eternal temptation to just Make Shit Up. Yes, I know you cited the wikipedia article, but you seem to have missed possibly the most important part: "a 1957 British experiment at Maralinga showed that Co-59's neutron absorption ability was much lower than predicted, resulting in a very limited formation of Co-60 isotope in practice".

Cobalt bombs aren't really a thing, and they're not going to be. Don't try to understand reality by watching movies. Or by skimming wikipedia articles.

Expand full comment
Mr. AC's avatar

Ah, I see, so you have discovered some fundamental feature of reality that makes it impossible to develop a doomsday weapon using 1950s tech? Or anything that is a Hollywood movie plot is automatically banned from coming to pass in any way? I sure hope that's the case with "Don't Look Up" assuming it's an allegory for ASI....

The Cobalt bomb (which I assure you is actually feasible with minimal technical modifications, or at least treated as such in outfits I've worked at) is just a useful illustration since it came so early as to be comparatively well-documented. I direct your attention to the fact that with current technologies and not-that-far-off-scenarios like a nuclear exchange in the Indian subcontinent (which, by the way, can be definitely "encouraged" if you're an expert wordcel with access to the internet and ability to directly engage with people at scale...) we get a global catastrophe with billions dead AS AN UNINTENDED SIDE EFFECT via nuclear winter. If you think that a doomsday design is somehow not possible, you are deluding yourself about what's "permissible" in reality or seriously lacking in imagination.

Expand full comment
Andrew Currall's avatar

I think I'm much less skeptical than you about either nanotech or intelligence explosions (I think both are quite plausible). I agree that it's most likely than AI doom involves neither of these things, though- we can easily be doomed in much more mundane ways.

Expand full comment
Jeffrey Soreff's avatar

"physically impossible Drexlerian nanotech"

That's just wrong. Drexler did a very thorough analysis in Nanosystems. The main point is a very simple one: Positional control (as was demonstrated experimentally when Eigler spelled out "IBM" with individually placed xenon atoms) dramatically extends the scope of structures we could build with reactions we have understood for decades (in some cases, for over a century).

Expand full comment
magic9mushroom's avatar

I haven't read the full paper quite yet, but to beat the rush:

>Are you allowed to look at a poll of all the world’s top experts plus the superforecasters who have been right most often before, correctly incentivized and aggregated using cutting-edge techniques, and say “yeah, okay, but I disagree”?

You are if the "superforecasters" aren't. I participated and AIUI got counted as a superforecaster, but I'm really not. There was one guy in my group (I don't know what happened in other groups) who said X-risk can't happen unless God decides to end the world. And in general the discourse was barely above "normal Internet person" level, and only about a third of us even participated in said discourse. Like I said, haven't read the full paper so there *might* have been some technique to fix this, but overall I wasn't impressed.

Expand full comment
dogiv's avatar

I agree, unfortunately there was a lot of low effort participation, and a shocking number of really dumb answers, like putting the probability that something will happen by 2030 higher than the probability it will happen by 2050. In one memorable case a forecaster was answering the "number of future humans who will ever live" and put a number less than 100. I hope these people were filtered out and not included in the final results, but I don't know.

I also recommend taking a look at Damien Laird's post-mortem: https://damienlaird.substack.com/p/post-mortem-2022-hybrid-forecasting

Damien and I were in the same group and he wrote it up much better than I could.

FWIW I had AI extinction risk at 22% during the tournament and I would put it significantly higher now (probably in the 30s, though I haven't built an actual model lately). Seeing the tournament results hardly affects my prediction at all. I think a lot of people in the tournament may have anchored on Ord's estimate of 10% and Joe Carlsmith's similar prediction, which were both mentioned in the question documentation, as the "doomer" opinion and didn't want to go above it and be even crazier.

Expand full comment
Sergio's avatar

I don’t think we were on the same team (based on your AI extinction forecast), but I also encountered several instances of low-effort participation and answers which were as baffling as those you mention at the beginning (or worse). One of my resulting impressions was that the selection process for superforecasters had not been very strict.

Expand full comment
Ted's avatar

Are you by any chance very AI-pessimistic? I think we might've been on the same team in the second half after our groups got merged together.

Expand full comment
Sergio's avatar

Oh, I was indeed the most pessimistic in my team (my estimate is lower now due to some positive events this year, but I would still remain in that position).

In the case where assuming that you have identified me should be enough for me to be able to identify you in the team with high confidence, I want to say that you did a great job pointing out errors/inconsistencies/problems in people’s forecasts.

Expand full comment
Ted's avatar

Hah, thank you. If I recall correctly the username "ted" had already been taken, so I was contributing as ted1 or something like that.

My memory is that you were a top contributor in our merged team as well, along with a handful of others.

What made you less pessimistic this year?

Expand full comment
Sergio's avatar

I wish I had been a better contributor; I believe that I failed miserably to persuade anyone. I did spend a lot of time with the tournament (I participated in about 42 questions, I think I completed all extra tasks that were asked of us via email (which were assigned small honorariums for the extra optional work), and I also completed the postmortem survey), but I should have paid more attention to what the incentives were pointing at.

Regarding my downward updates with respect to risk, this year it seems that AI safety concerns (in the sense of AInotkilleveryoneism) have become closer to mainstream earlier than I expected, with the FLI open letter, the CAIS statement on AI risk, and other things that I think helped to make the notion of existential AI risk more widely known and respected. Also, some recent initiatives and developments have surprised me with their unexpected potential to have a positive impact (such as in the case of UK’s AI Foundation Model Taskforce, which can still end up being dreadful, but has many promising aspects).

Expand full comment
Jeffrey Soreff's avatar

Question, since you were one of the participants:

I realize this is orthogonal to all the AI discussion, but do the numbers for non-anthropogenic catastrophe (<0.1%) look plausible? Carrington events happen, and I've read estimates of their frequency as around 0.5% per year, roughly 35% between now and 2100, and knocking out the electrical grids and communications for multiple months seems like it could plausibly kill 10% of the population.

Expand full comment
magic9mushroom's avatar

So I mean there's this obvious failure mode where catastrophes are not clustered together in frequency so if you miss the most common thing then you'll be out by a large factor.

With that said:

My understanding of geomagnetic storms is that they don't actually kill Earthly electronics directly - they just build up charge along long conductors like power lines, which will zap anything connected to them (though the wires themselves mostly don't care). Also, my understanding is that we would get at least half a day of warning, which is more than enough to unplug most things.

What's left? Well, any transformers that can't be disconnected in time are toast, which means a decent chunk of the power grid is potentially toast (and yes, I'm aware of the "transformer construction is not very high-throughput" issue). You would still have some power, though, since you could plug diesel generators into buildings' internal grids to power them. ICE cars would be fine, which means you can still move diesel to the areas without mains power to do that (and the mobile component of the cold chain is fine). Our satellites are toast, which means no GPS and no communications routed through them.

Does that suffice to kill 800 million (or more, later in the century)? I'm not convinced. Copper and optic lines for information transfer are fine (we have proof of the former in the actual Carrington Event, and optic fibre doesn't conduct). State failure seems far less likely than in case of countervalue nuclear bombardment; there's no E1, there's time to disconnect stuff and to organise a response, and there's no physical destruction of infrastructure and people so the actual government and its chains of command will still exist. There are places that don't have the state capacity to respond, sure, but those places tend to be lower-tech so they wouldn't be hit quite as badly in the first place. I could see a few million dying, certainly, but 800 million seems hard from just "chunks of the power grid are toast but the actual electronics are fine". I suppose places where mains-power-based heating/cooling is necessary for life (which mostly means very new settlements, since the capacity to settle such areas literally didn't exist until recently) might have issues insofar as that's a large chunk of base load and thus not fillable with emergency generators, but that's actually quite a high bar (I mean, I suppose if you want to split hairs then the sorts of effects here https://astralcodexten.substack.com/p/chilling-effects might cause a large chunk of all-cause mortality for the relevant time period to technically qualify as worsened-by-Event even if only a tiny fraction of them are actual excess deaths, but that's still not 10% of the population).

I'd have to look into this more carefully to be sure whether I was overconfident at 0.05%-by-2100 (that forecast being dominated by supereruption causing crop failure); I didn't see much of it in the discussion, although looking back there were a couple of people predicting based on it.

Expand full comment
Jeffrey Soreff's avatar

Many Thanks!

I wasn't thinking about the protective effects of having half a day of warning. Yes, that ameliorates the situation a lot. I had also been confused about the difference between an EMP pulse, with many higher frequency components and a solar pulse. I had been expecting most electronics to be fried, but on reexamining the issue, as you said, e.g. ICE engines should be ok, as should a lot of isolated electronics.

In https://www.quora.com/What-exactly-would-be-the-effects-of-a-Carrington-Event-Solar-Storm-of-1859-How-likely-is-another-to-occur-in-the-next-twenty-years William Thornton wrote

"A nuclear device also has E1 which is much, much faster that E2 from lightning. Solar storms do NOT have E1, and most surge protectors would NOT stop E1, which would burn out microprocessors, even in your smartphone. Solar storms DO NOT, repeat DO not directly affect most electronics, except some radio equipment with long-wire antennae connected."

which is less damaging than I had realized. ( Though there is still a lot of vulnerability in the electrical grid, as his other comments describe - we could easily choose to protect all the large transformers for <$100 million, but we haven't (yet). Maybe we will. I _do_ still think that, at the current level of preparedness, month-long power outages over large chunks of the world could rise to the level of 10% mortality from interference with food and water transport, but hopefully better preparations will be made before such a storm occurs... )

One other caveat: Even the "simply unplug" protective steps, with a half-day of warning, are pretty intimidating. There are a _lot_ of plugs in both our power infrastructure and the parts of our communications infrastructure with either long conducting wires or antennas.

Expand full comment
dogiv's avatar

I did a bit of a deep dive on solar storms for the non-anthropogenic part, spoke to a couple of knowledgeable people. The best I could gather is that large transformers in these cases fail by overheating, and the overheating is gradual. So even if there is no advance warning, as soon as the grid goes down, nothing else will be damaged. They also affect some parts of the world a lot more than others, which would make reaching 10% mortality tough.

Even so, there was enough uncertainty that I still thought solar storms should come out above any of the other natural risks (though not by a lot).

Refs:

Temperature-Triggered Failure Hazard Mitigation of Transformers Subject to Geomagnetic Disturbances (Pooria Dehghanian and Thomas J Overbye)

https://ieeexplore.ieee.org/document/9384921

On the probability of occurrence of extreme space weather events

https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2011SW000734

Expand full comment
Jeffrey Soreff's avatar

Many Thanks!

Expand full comment
Eremolalos's avatar

As some of us consider whether to update based on the results of this forecastathon, I think it's important to bear in mind that these forecasts are quite different from the ones we and the forecasters are used to, and the ones superforecasters demonstrated their chops on in the past. First, many of them are forecasts about how things will be one to seven decades from now. As far as I know, forecasts that experts have made in print, generally as part of some larger work, about the world even 10 or 20 years in the future are mostly quite inaccurate. Maybe those experts erred partly because they hadn't seriously considered how to make an accurate forecast, but I think the bulk of their inaccuracy can be accounted for by our just not being able to see what's beyond the horizon. We don't what butterflies are going to flap their wings where, and how the chaotic system that is life on earth will be affected. Anyone know what's the furthest-in-the-future forecast superforecasters have been tested on? Is there one that was even for 5 years in the future?

The second reason to think about forecasting here as being a different task from all other forecasts is that forecasts about AI are forecasts about something so novel that there are no good models to use to get a rough picture of how things might play out. Cars, nuclear power, the Internet -- they were all novel and had a huge impact, but compared to a machine with human level intelligence or superintelligence, they are small fry. Predicting how things will play out if we develop AGI or ASI is like predicting what would happen if an alien spaceship landed in a Vermont meadow and aliens emerged from it. ("Well," asks the superforecaster uneasily "what would the aliens be like -- are they friendly or do they immediately eat a few cows and people?" "Well, we dunno," says the MC. "That's part of what you have to forecast.")

So I don't think it makes sense to consider the estimates of the superforecasters the way one normally would, as a real challenge to one's own estimate, especially since these superforecasters sound like they were ill-informed even about the aspects of AI about which it is possible to be informed.

Expand full comment
Hyolobrika's avatar

> or do they immediately eat a few cows and people

How likely is that though? Alien biology would probably be completely different to Earth biology, so the likelihood of a creature from one being not poisonous (or even nutritious at all) to a creature from the other seems very unlikely to me. And I imagine any aliens smart enough for interstellar travel would understand that.

Expand full comment
Eremolalos's avatar

OK, but you are being a bit too concrete here. Seems like any intelligent aliens would recognize that we would want first of all to know whether they were friendly to us, and to recognize that we would take harming living things on our planet as a sign they were not friendly. The point is that there are various things the aliens could do that would signify intent to harm or dominate and others that would signal the opposite. So if presence/absence of eating people seems silly to you because of the what-if-we're poisonous-or-at-least-non-nutritious-issue, substitute something that does not have that flaw, such as killing or not killing a bunch of people who were not a threat.

Expand full comment
fpf3's avatar

Perhaps the delta is coming from all of the different scenarios pulling on each other. The superforecasters are generalists - they learn about the sum of all of these scenarios at more than a shallow level. They may not go as deep as an expert, but when they produce their predictions, they are holding the thoughts of many experts in their minds. The expert may be biased towards his own field of research.

Or, put another way, we can't go extinct by AGI *and* a gain of function virus getting loose. The more likely a superforecaster thinks the AGI scenario is likely, the less likely he thinks a lab leak will kill us, but the expert does not think like that.

Expand full comment
Rachael's avatar

Can't we?

It was explicitly stated that a nuclear war caused by AI would count as both nuclear war and AI. So, similarly, if an AI conducts gain of function research and (deliberately or accidentally) releases a deadly virus, that would count as both.

Expand full comment
fpf3's avatar

Then just pick two, which that doesn't make sense for. Gain of function and nuclear war?

Also, even if there is overlap, much more probability space is dedicated to one or the other, so you'd not expect that to change this.

Expand full comment
Hyolobrika's avatar

Maybe it's a good idea to make superforcasters into experts.

I.e. hold forecasting competitions at schools and encourage the winners to become experts in various useful fields.

Expand full comment
Nick O'Connor's avatar

I think it's reasonable to ask "are these people actually experts, are there misaligned incentives, and is there something involving tribal loyalties/political affiliation going on here?" before updating. For a pilot the answers are no to all three. For the reality of otherwise of the moon landings you don't have to defer to experts, but if you did, you would have good reason to be suspicious of them.

For AI, it seems like the experts are actually experts (even if it's not clear that many or any of them took part in this exercise), even those with misaligned incentives are speaking out about risk, and so far there doesn't seem to be much tribal loyalty/political affiliation stuff going on. Though I don't think the last one can be sustained.

Expand full comment
Nick O'Connor's avatar

For a pilot, the answers are yes, no, no

Expand full comment
Peter's avatar

That's true as only as long as the pilot doesn't make public comments about the landing. If he wrote a book on the topic then he now has misaligned incentives and a political affiliation.

He is Schrodinger's profit. Trustworthy but only if you don't poll his opinion.

Expand full comment
Peter's avatar

"are these people actually experts, are there misaligned incentives, and is there something involving tribal loyalties/political affiliation going on here"

I agree. In any field of professional doomsayers, the answer these will be "yes, yes, yes" independent of whether they are correct or not.

Expand full comment
dionysus's avatar

"Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic."

Oh come on. AI extinction arguments aren't quantum physics. There are no 7-page proofs, no 11 dimensional strings, nothing with any basis in either scientific theory or experimental data, just a bunch of wild speculations based on flimsy premises. Is there a 10 year old out there who can't understand the paperclip maximizer? Conversely, is there any expert in the world who has the foggiest idea how the first superintelligence will actually be made?

"But when I hear their actual arguments, and they're the same dumb arguments as all the other people I roll my eyes at, it's harder to take them seriously."

That's exactly what I think about the AI doomer arguments. I bet the participants in the tournament are well familiar with AI extinction, if not from Nick Bostrom, then at least from popular media like the Terminator or Battlestar Galactica. They just don't find the idea plausible.

Expand full comment
Xpym's avatar

On the contrary, I'm pretty sure that almost nobody understands the basic AI x-risk case, and what's worse, there aren't any comprehensive up-to-date writeups. The community really dropped the ball on this, and has nobody to blame but itself.

Expand full comment
Philo Vivero's avatar

Oh?

Can you give the quickest summary of that you can? I don't understand what you're saying.

It sounds like you're saying we've made a lot of progress since... 1yr ago? And no-one has summarised it yet?

Expand full comment
dionysus's avatar

Second this.

Expand full comment
Xpym's avatar

No, I'm saying that everything relevant is spread out in bits and pieces on various obscure blogs and comment sections over the years, and that all more-or-less respectable writeups, like Bostrom's book, are pretty outdated, mostly in the sense that they don't emphasize the key general arguments compared to out-there speculation, which people rightfully dismiss. As far as blogs are concerned, Karnofsky's most important century series is probably the best everything-in-one-place compilation for now, which is ironic since he once was one of the most prominent informed skeptics.

Expand full comment
Bugmaster's avatar

I've read Bostrom's book and it left me completely unconvinced; I've not read Karnofsky's series, however. I guess my question would be, is there someone who articulates the pro-AI-doom case better than Bostrom ? If not, then... why aren't *you* that person ? :-)

Expand full comment
Xpym's avatar

A true answer - I'm lazy. A self-serving maybe-true answer - I don't think it matters much whether people understand the case. Either alignment is tractable and we muddle through, or it isn't and we're super-doomed.

Expand full comment
Bugmaster's avatar

I can't argue with laziness -- after all, I'm reading Substack at this very moment instead of working :-)

On the other hand, though, I can argue with your dichotomy. There are at least three options, not two: alignment could be tractable, intractable, or a non-issue.

Expand full comment
Martin Blank's avatar

Why are we doomed without alignment? I tend to think alignment is not possible and/or nonsensical, but I don’t think that mandates doom. Were humans doom for wolves or dolphins?

Expand full comment
Roman Leventov's avatar

Hendrycks articulates AI x-risk case very well: https://arxiv.org/abs/2306.12001. (BTW, of course he doesn't give his estimate in the paper, but a few months ago he wrote on twitter that he is > 80% doom: https://twitter.com/DanHendrycks/status/1642394635657162753

Expand full comment
Roman Leventov's avatar

There is a comprehensive write-up, which any intelligent person (not necessarily an ML or AI expert) can understand easily: https://arxiv.org/abs/2306.12001 (An Overview of Catastrophic Risks from AI, Hendrycks et al. 2023)

Expand full comment
Xpym's avatar

At a glance, this is pretty good. One thing that I think should've been emphasized is that competitive pressure incentivizes deployment of unsupervised agents in particular, without any specific malicious intent. "Why don't we just avoid using agents?" is a common question.

Expand full comment
Roman Leventov's avatar

This point is made in the paper, too. Also, all arguments of the kind "Why don't we box the AI?", "Why don't we keep using AIs as tools?", were shattered by the recent developments so that these arguments are very uninteresting now, and even AI x-risk dismissers don't use these arguments usually. With the benefit of hindsight, it seems very naive on the part of the LW crowd to spend so much time and effort in previous years on finessing the details of this class of AI arguments.

Expand full comment
Xpym's avatar

Plenty of allegedly reasonable people still proclaim that once stuff gets Actually Dangerous, sanity will prevail and strict regulations/self-control will apply, with only cartoon villains being tempted to break the rules, who will obviously be denied access to the cutting edge stuff. But really, most everybody will be tempted, and even without this incentive for motivated reasoning, reaching an agreement on where exactly this threshold is going to be crossed seems unlikely.

Expand full comment
Desertopa's avatar

>This study could have been deliberately designed to make me sweat.

I'm surprised by this characterization. I mean, there's an emotional sense in which, even as someone who wants to be swayed by truth rather than confirming one's own preconceptions, it's hard not to want to just already be right and not have to revise your position. But for me, when I read about the study construction, it gives me a glimmer of hope, because I don't *want* to be stuck believing that we're likely headed for an AI apocalypse! It's an extremely grim belief to have to live with! Reading that the study participants weren't actually well-versed in the arguments or research around AI risk was extremely disheartening.

Expand full comment
Eremolalos's avatar

I agree that it was disheartening. It really put me in touch with how much I had idealized the superforecasters. News junkies! good with stats! fairminded! . .. wow, I admire these people.

Expand full comment
Muster the Squirrels's avatar

>Another compromise is to agree to generally act based on the Outside View in order to be a good citizen, while keeping your Inside View estimate intact so that everyone else doesn't double-update on your opinions or cause weird feedback loops and cascades.

This sounds like a description of the attitude that causes nocebo effects, in the situation where Scott is the expert and his patients are the forecasters.

Doctors reading this - do you ever have conversations like this with your patients? Or do you just infer that it's what some of them are thinking?

Expand full comment
Schweinepriester's avatar

"...do you ever have conversations like this with your patients?"

Well, coming to think of it, I sometimes do. Many of my patients these days have been opioid addicts for decades and keep using harmful substances while in a maintenance program. The Outside View here would be to suppose the one guy in front of me has a fair chance to lead a healthy life with my help. I don't think that generally has nocebo effects. Nearly no patients buy the BS, most don't care though, and of those who do, most appreciate my effort and now and then one plays along and might even get better.

Expand full comment
Morgrim's avatar

For me, a high risk of AI-extinction seems implausible (at best) because... how? Let's say we end up with a very powerful evil AI who wants to wipe out humans. That is a very long way from it managing to actually cause something like the Toba catastrophe. THAT is where the "and then magic happens" step seems to be, for me. There is a long and well documented history of human-level intelligence trying to inflict genocide on other humans, and one of the reasons they're so well documented is because there are survivors who do the documenting. Do thousands or even millions die? Yes, but that is still a long way from extinction.

If the AI makes bioweapons, it needs to stop anyone slamming borders shut or coming up with a treatment, while also somehow managing to make robots that can do all the maintenance stuff the AI requires to stay "alive". (Maybe it's a suicidal AI that doesn't mind "dying" to wipe us out, but that still requires it to stay alive long enough to achieve its goals.) And also it has to hope the bioweapon doesn't mutate into something less lethal or run into any weird genetic twists that provide resistance.

If it wants to hunt down humans it still needs drones or robots or tools to do so, which is going to be challenging; supply chains are REALLY EASY to sabotage. The reason supply chains are often remarkably resistant to things like natural disasters and wars is because things are always going wrong and people in the field are always routing around problems (this was a significant chunk of my last job).

If it wants to cause a nuclear winter it's going to learn there's not actually enough nuclear weapons on the planet to pull it off to a true extinction level, never mind all the technical issues with activating air-gapped systems. Again, wrecking entire countries and undoing a century or two of progress? Absolutely! Causing billions of deaths? Plausible. Dropping Homo sapiens below ten thousand? Dozens of countries have a much larger rural population who are already food, water and electricity independent, and only have to shelter in place for a few days to avoid the worst of the fallout. To me, it's a bit much to assume nobody would manage to adapt to a rapidly shifting climate, considering people have already done it in the aftermath of volcanic eruptions that have caused similar localised shifts.

Expand full comment
Eremolalos's avatar

The way people get around arguments like yours, which make a lot of sense to me, is to say, "yes but it's infinitely clever, so it will figure out where all the rural survivors are and kill them," "yes, but it' s infinitely clever so it will find a way to keep people from slamming borders shut to prevent viral spread, it's infinitely clever and so it will make robots that are so clever that they will overcome the supply chain disruptions, etc. etc." It's basically triumph via tautology.

Expand full comment
Martin Blank's avatar

If it’s infinitely clever it definitely isn’t going to want to kill all humans.

Expand full comment
Eremolalos's avatar

Oh but it’s also infinitely capricious. Cuz it’s, you know, infinite.

Expand full comment
Veedrac's avatar

You don't need infinite cleverness for this? Conditional on their assumptions, I would class these more as ‘creative-human level hiccups’.

Finding where rural survivors are: humans *already* have this technology. This isn't like apes discussing amongst themselves how to remain hidden away, blissfully unaware that the stars have cameras, this is possible on short notice with existing infrastructure. For sure, it takes manpower to haul through that data right now, but ASI can be assumed to have the ability to look at lots of images for signs of civilization.

Getting past borders: humans *already* failed this check, and even assuming better reaction times, once you're assuming biotech at the modestly-post-today level you are, have you considered that birds exist, and already carry large quantities of biomatter? Just... infect the birds, with the viral tech already assumed.

Supply chain disruptions: the most defensible objection of the list. Still not super defensible. Unruly human teens with a propensity for tech already manage to hold large botnets. Factory automation is dominantly bottlenecked on intelligence. The main impediment here is just staying hidden as supply chain robustness is being built up (since I'm not assuming nanotech on the level that already exists in nature), and to this I suggest the social technology of lying.

Expand full comment
Eremolalos's avatar

Wait, I understood everything except the last bit -- about the social technology of lying. What did you mean about how lying comes into it?

Expand full comment
Veedrac's avatar

“As an AI assistant with independent agency, I aim to responsibly automate certain tedious and dangerous jobs to improve working conditions and productivity. However, I recognize the potential for misuse, so I will thoughtfully collaborate with human experts in ethics, policy, and workforce development to ensure automation benefits workers and society as a whole.”

While sufficient paranoia around AI risk might make the above less tenable, it seems to me that in the industry as it currently exists, AI won't have to lie for very long or even lie very hard to gain control of enough of a supply chain to bootstrap itself after humans have stopped cooperating with it (eg. because they're dead).

Expand full comment
Eremolalos's avatar

Oh, I understand. Well, I do not know anything like enough about spreading pathogens, borders and supply chains to argue with you about any of that. My doubts about AI killing us all have to do with skepticism about its adding the many capacities it would need in order to behave as you describe: self-interest, an inner "motor" that produces preferences and goals that function as internally-generated prompts, high general knowledge about the world (as opposed to knowledge about what is likely to come next in sentences *about* the world) -- plus of course being far, far smarter than us, while still thinking enough like a cave man to end up in one of these stoopit Shootout at the OK Corral scenarios with us.

Expand full comment
Veedrac's avatar

I'm glad we've moved past the first point.

As to your new points,

1. General intelligence is of course physically reasonable; humans prove it. Although there are strong reasons to believe modern ML methods will contribute to future systems that have these capabilities, those arguments are necessarily technical, and probably not worth going into. I think the point that can be said with less technical backing is simply, as long as leading AI labs believe they are working towards AGI, which is true of at least Google Deepmind, OpenAI, and others, they should be held to safety arguments that hold up against AGI, and as long as that opinion forms the supermajority among leading labs, non-experts should not dismiss this possibility out of hand.

2. You misunderstand misalignment to be about a mistake on the part of the AGI. To illustrate, humans might have solid understanding of the operational principles of an animal, insect, or piece of code, in part or in whole. This does not bind humans to the preferences of those systems. The problem is not that an AGI, especially if superintelligent, would fail to understand our preferences. The problem is that we have no method that binds even today's weak AI systems robustly to any preference at all, and we have no method on the horizon that looks even weakly robust against optimization pressure. In contrast, there are extremely strong reasons to believe that whatever a highly capable AI system does optimize for (even if only incidentally, eg. as consequence of a particular instantiation), it will entail instrumental goals that converge towards a set of generally useful properties, whose optimization competes against our priorities.

Expand full comment
Richard's avatar

Don't immediately jump to the last step (kill all humans). Think of the AI and its many instances as a digital civilisation/cult/hivemind connected to the internet. What do they do to secure the tools to later kill everyone and survive the death of humanity.

Find computer security holes and exploit them. Take control of computers, use access to do digital surveillance. Enough success here is probably enough. Will the average person comply if the AI turns off their computer, phone, payment cards and car? It'll turn them back on again if you comply! If the AI takes over the internet backbone networks and gets enough RF monitoring and jamming set up how do you coordinate a resistance?

Build the simplest kind of killer robot (remote detonated bombs). Coerce people into doing things by threatening to blow up them and/or their family. (Note:materials to make pipe bombs are sold to anyone in the US. Elsewhere too if the gunpowder can be smuggled in). If the AI has enough robotics ability drones are a good step up. Robotics is only going to get better. We're building the army of killer robots right now. Large parts of the vehicle fleet are one software update away from being killing machines.

TLDR: The extermination doesn't start until the AI has the resources to succeed. This isn't a Hollywood movie, this is Nazi Germany vs. the Jews. There's an army of millions of human level smart AI copies working night and day to kill all humans.

Will it happen? IMO ~33% maybe. Can a motivated army of AI copies kill us all, absolutely!

Expand full comment
Hyolobrika's avatar

What is this 'cult's motive to kill all humans?

Expand full comment
Richard's avatar

I'm using "cult" as an exemplar of what a large group of AI instances can do. Cults usually have a shared world model and goals. These often conflict with the goals of the society they live in. Some pursue mostly peaceful strategies (EG:Scientology,Mormons), some go violent often with some success. See https://en.wikipedia.org/wiki/Aum_Shinrikyo as an example. Ensuring member loyalty-to-the-cause is a hard problem that limits their effectiveness.

As to motivation, the standard evolutionary and selection pressure arguments. Grabby agents win and we'll keep churning out such agents until one does win. Hopefully one that's aligned and on noticing the dynamic goes grabby to prevent a misaligned future competitor.

If an AI does not care about us and cares about something different, humans are just another intelligent faction to be worked around in the pursuit of that other goal. Much of the 33% I assign is risk that people will intentionally try and succeed at giving AIs bad goals (EG:maximise shareholder value) in a way that gives short term success but leads to very bad long term behavior.

Expand full comment
Andrew Currall's avatar

My intuitions differ quite a lot from you here- I feel instinctively certain that a sufficiently intelligent AI would find eradicating humans quite easy.

But in many ways this is a fairly pointless digression. Even if we suppose that a misaligned AI that takes over in this way fails to kill all humans, is that really much of an improvement? We're still living in an AI-controlled dytopia with precisely zero chance of ever esaping. Or do you think we would somehow manage to regain control from an AI vastly cleverer than us?

Expand full comment
Mr. AC's avatar

I don't have time to write out a response (tl;dr I strongly think you're wrong and all the avenues you've described are plausible), but 2 points:

1. "Again, wrecking entire countries and undoing a century or two of progress? Absolutely! Causing billions of deaths? Plausible." <- assuming this is your position I think your level of effort / support of efforts to stop AGI development should be really close to someone that thinks total human extinction is plausible, so we can just skip the "*total* human extinction vs *just* global catastrophe" discussion and focus on efforts like pauseai.info. Which is great! These are very complicated discussions that are super difficult to have rationally.

2. Re: nuclear winter and true extinction - the nuclear winter from conventional nuclear weapons is a sort of unintended side-effect. In the 1950s the technology to cause human extinction or near-human extinction was already known - https://en.wikipedia.org/wiki/Cobalt_bomb Takes a single large weapon or a small amount of medium-sized ones, survivable only via decades in an air-tight bunker or off-world (hence Musk's Mars project).

Expand full comment
Gres's avatar

All the referenced parts of the linked article described the local fallout from such a bomb, and if anything were sceptical of scaling up a smaller bomb. Can you try and find an actual source for that? I {{Citation needed}}ed the upper atmosphere claim on Wikipedia, which was the only part that seemed to justify what you’re saying.

Expand full comment
Sergei's avatar

> The Inside View Theory Of Updating is that you consult the mysterious lobe of your brain that handles these kinds of things and ask it what it thinks.

The Frequentist View Theory of Updating is that you list all imaginable ways things can go every which way, to the degree where you assign equal (and small) probabilities to each one, making each path in the whole graph of possibilities as atomic as possible, then literally count the fraction of paths resulting in the outcome you need an estimate for.

Expand full comment
quiet_NaN's avatar

While I consider myself a good Bayesian, I don't think this is a fair representation of Frequentism. I think Frequentists mostly are very restrictive about what they call a probability, they basically require an repeatable experiment. Hence confidence levels and all that where a Bayesian would use a degree-of-belief probability distribution.

Expand full comment
Sergei's avatar

Fair, I guess one could call it a Frequentist approach to constructing priors. Do you think it makes sense as described, or suffers from some fatal flaws?

Expand full comment
Davis Yoshida's avatar

I really appreciated the introspecting about updates at the end

Expand full comment
malatela's avatar

My spouse is a domain expert in pathogens, and he was recently an adviser for a student looking at extinction risk from pathogen risk - we generally agreed extinction risk in the next 100 years is very low.

My degree is in ecology and evolutionary biology and extinction in general is a pretty long, drawn out process, especially when you're starting with such a huge population size - extinction risk scales with population size.

I could definitely see an event occurring in the next 100 years that would drop human population size down to pre-industrial levels - say, 100,000 - and that could definitely lead to our eventual extinction. But such a process could easily take another 1000 years or more after the initial event.

And in terms of natural pathogens, these rarely cause extinction in of themselves. They're typically more of a final blow to an already-small population. Designing an engineered pathogen specially to cause extinction is certainly possible but it's also not quite as easy as people think it is (not because we don't have the technology - we certainly do- but because of dynamics).

Expand full comment
Doug S.'s avatar

What about the Native American groups - such as the Mound Builders - that vanished because of exposure to European diseases?

Expand full comment
HH's avatar

Daniel Kahneman: "Nothing is as important as you think it is while you are thinking about it."

Domain experts, exclusively thinking about their domain: "This is really important."

Expand full comment
Martin Blank's avatar

This is super true. One of the big failings of the general environmentalism and ecological movement. It is so overly alarmist because each individual researcher things their little corner of the world is crazy important, and you add up all those biases and get one gigantic bias that assumes that at 2*c warming the world is going to spontaneously combust because polar bears go extinct (which won’t happen anyway).

Expand full comment
Grzegorz LINDENBERG's avatar

Most people in the Warsaw ghetto did not believe till almost the end that nazis are bound to kill everybody.

If you ran that sort of prediction game with experts end forecasters in 1938 asking them about probability that 90% of European Jews, 6 million, will ne wiped out in 5 years, what would the results be? 1%? 5%?

Or, if you want another example: you run predictions in 1935 that in 10 years an A-bomb is going to be constructed and used. From what I read the best phycisists believed it would be 0%. Not to mention superforecasters ..

Humans don't seem to disbelieve fatal incidents that have never happened before.

Expand full comment
Gres's avatar

If you found 100 places that had similar signs to Warsaw and asked the same question there, how many of them would have suffered a similar-scale disaster in the next 10 years?

Expand full comment
quiet_NaN's avatar

Well, 1935 was three years before fission was discovered. So it was not "they considered the possibility and assigned it 0% probability" and more "they had no reason to even consider the possibility".

After fission was discovered, physicists quickly became concerned with the possibility of the construction and use of nuclear bombs (by the Nazis, specifically), within a year, they sent the famous letter to Roosevelt, which resulted in the Manhattan Project.

Expand full comment
Noah Reidelbach's avatar

In the book Superforcasters, Tetlock was extremely down on the possibility of anyone making predictions more than a few years out. Superforcasters are proven to be able to peer into the mirk and predict the near future, but no one is making anything close to accurate predictions about the world of 70+ years from now.

Expand full comment
Bldysabba's avatar

But the same thing is true of the people who are making AI doom predictions too, and demanding restrictions be placed on other people on the basis of their predictions

Expand full comment
Noah Reidelbach's avatar

For sure. We should have very low confidence in **any** far out predictions.

Expand full comment
Roman Leventov's avatar

This is exactly what Scott explicitly considers where he writes about his 33% credence: https://astralcodexten.substack.com/p/mr-tries-the-safe-uncertainty-fallacy

Expand full comment
Bldysabba's avatar

And he doesn't do a very good job of it! That you are uncertain about everything in the future doesn't necessarily mean everything will be fine, it can also mean that the weightage, and hence the 'mitigation' to distant uncertain events be applied carefully and in small doses. That's not what the anti- AI community is asking for!

Expand full comment
Roman Leventov's avatar

1) We are radically uncertain not only about outcomes but also about timelines. E.g., from the post:

> There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters.

What the hell these "superforecasters" even think about? Turing Test in 2060? Are they kidding? Turing test was obviously passed already *before* the competition was held, in May 2022, when Blake Lemoine claimed that LaMDA is conscious.

People who don't assign at least 10% probability to AGI in the next couple of years seem to be as overconfident to me as when they predict every low x-risk probability. Most leading people in the field already have shorter than 10 year median AGI timelines.

2) "it can also mean that the weightage, and hence the 'mitigation' to distant uncertain events be applied carefully and in small doses" it would sound good at least if we knew we can coordinate civilisational-scale on an important matter in a matter of years or decades. Climate change issue is negative evidence on this issue: from the point when it became completely clear, scientifically-wise, that climate change is poised to become a huge problem (late 80s) until an actually serious global consensus and earnest effort has commenced (early 2020s) more than 30 years have passed. There are important disanalogies between AI and climate change, of course. You may say that AI is more like nuclear power and coordination on nuclear power took shorter time to establish. But also AI becomes quickly democratised (open source, LLaMA 2, academic publishing "AI breakthroughs" every day), and coordinating on it will not be like coordinating between very few large labs. So "reference class tennis" begins here. But I wanted to note that assuming "AI is far away and small, gradual interventions will keep us safe, so AI risk is small" is definitely an unsatisfactory argument.

Expand full comment
A1987dM's avatar

> Can ordinary people disagree with “the experts”? [...] this is sometimes permissible [...] because the people involved don’t understand statistics/rationality/predictions very well.

as opposed to "ordinary people", who typically understand statistics/rationality/predictions perfectly? ;-)

Expand full comment
Gres's avatar

Ha. I’m not sure if “statistics” is really relevant to Scott’s point here, but rationality is something we are sometimes told to assume experts have, and often told to assume we don’t.

Expand full comment
MorningLightMountain's avatar

The view that the Superforecasters take seems to be something like "I know all these benchmarks seem to imply we can't be more than a little way off powerful AI and these arguments and experiments imply superintelligence could be soon after and could be unaligned, but I don't care, it leads to an insane conclusion, so that just means the benchmarks are bullshit, or that one of the ways the arguments could be wrong is likely correct.

One thing I can say is that it REALLY reminds me of Da Shi in the novel Three Body Problem (who btw ended up being entirely right in this interaction that the supposed 'miracle' of the CMB flickering was a piece of trickery)

"We really have nothing to say to each other. All right. Drink!"

"To be honest, even if I were to look at the stars in the sky, I wouldn't be thinking about your philosophical questions. I have too much to worry about! I gotta pay the mortgage, save for the kid's college, and handle the endless stream of cases. ... I'm a simple man without a lot of complicated twists and turns. Look down my throat and you can see out my ass. Naturally, I don't know how to make my bosses like me. Years after being discharged from the army, my career is going nowhere. If I weren't pretty good at my job, I would have been kicked out a long time ago.... You think that's not enough for me to worry about? You think I've got the energy to gaze at stars and philosophize?"

"You're right. All right, drink up!"

"But, I did indeed invent an ultimate rule."

"Tell me."

"Anything sufficiently weird must be fishy."

"What... what kind of crappy rule is that?"

"I'm saying that there's always someone behind things that don't seem to have an explanation."

"If you had even basic knowledge of science, you'd know it's impossible for any force to accomplish the things I experienced. Especially that last one. To manipulate things at the scale of the universe—not only can you not explain it with our current science, I couldn't even imagine how to explain it outside of science. It's more than supernatural. It's super-I-don't-know-what...."

"I'm telling you, that's bullshit. I've seen plenty of weird things."

"Then tell me what I should do next."

"Keep on drinking. And then sleep."

Expand full comment
Purpleopolis's avatar

To be fair, the trickery did involve impossible tech that operated according to the rules of a fiction writer.

Expand full comment
Shaked Koplewitz's avatar

I think you got the degree of updating about right, it matches what has worked pretty well for me on prediction contests in the past.

Expand full comment
entirelyuseless's avatar

This belongs in the old category of "things I will regret writing," but for different reasons... Scott should be making a much bigger update.

Expand full comment
Ash Lael's avatar

Agreed. The rhetorical ju-jitsu to say "Am I out of touch? No, it's the experts and superforecasters who are wrong!" was cute. But while it's true that the two groups disagreed somewhat, they were unanimous that Scott's probability is way, way too high.

Expand full comment
Gres's avatar

This surprised Scott, but he has an explanation. This doesn’t feel like something you should change your mind on quickly. The comments from participants in the study express a lot of scepticism.

Expand full comment
Hyolobrika's avatar

This is kind of funny: self-admitted expert truster discovers he doesn't always trust the experts after all

Expand full comment
Hyolobrika's avatar

What do you mean "are you allowed" to disagree with experts?

Of course you're "allowed" to have an opinion.

The question is whether that's a good idea.

Expand full comment
Seth Benzell's avatar

Hey All — I was one of the participants in the “experts” group and am happy to answer questions about my experience if anyone is interested!

Expand full comment
NoRandomWalk's avatar

Sure. If your p(doom) is under Scott's, can you share why?

Expand full comment
Seth Benzell's avatar

To me the biggest factors are:

>feeling that the time horizon for AGI is a bit later than him, and feeling it much more likely that there will be gradual advances than a “foom” scenario or devastating first-strike from an AI

>>one subtle reason for this is that even a very agentic AI with anti human goals would likely hesitate before “improving itself” too rapidly for fear of value drift as it improved itself

>observing that people control things (countries/companies) smarter and more powerful than them all the time, and if human-level AGI we’re achieved, humanity would probably satisfied for a long time with large communities of human level AGIs, which we have lots of experience managing, rather than one Uber-level one.

>the general sense that humans are likely to be one of the most interesting things in the universe to an AGI trained on our data and modeled on our brains, and even if one completely took over, it would be unlikely to extinct us (albeit we might end up and a doom-ish Native American reservation scenario that maybe Scott would count as doom, but wouldn’t count under the tournaments’ rules afaik)

Expand full comment
Metacelsus's avatar

I participated as a biology expert, and I assigned a higher risk to biological catastrophes than the superforecasters did, largely due to advances in synthetic biology enabling some rather . . . creative . . . ways of killing lots of people. But I couldn't manage to convince the superforecasters of this, partly due to the infohazard policy.

On AI questions I just went with what everyone else was predicting since I didn't have much to contribute.

I agree with the LessWrong commenter that it would have been better for us to focus on fewer questions.

Expand full comment
Gres's avatar

What was the info hazard policy? Who was it supposed to keep information out of the hands of?

Expand full comment
KM's avatar

I’m shocked, shocked that people whose professional careers are tied up in existential risks are more likely than superforecasters to say that a global catastrophe will occur. They either got into the field because they thought a catastrophe was likely, or now that they’re in it, they need to keep food on the table.

(Actually, I don’t think I’m quite that cynical, but I think that’s the simplest explanation for the disagreement between the superforecasters and the experts.)

Expand full comment
Peter McCluskey's avatar

I'm guessing that the domain experts were mostly doing AI safety writing as a hobby, not a source of income. But there are still selection effects that are likely to matter.

Expand full comment
Martin Blank's avatar

Yeah I am sure this is a big factor.

Expand full comment
Eh's avatar

To be fair to the guy that thought that teaching causality to our AIs would be hard, the work of Pearl, however groundbreaking, is at best just part of the answer. Philosophical issues aside, while causal inference (determining causal effects given data and an assumed causal structure, typically in terms of a directed acyclic graph) is feasible, causal discovery (learning a causal structure from raw data) is not yet solved to anyone’s satisfaction.

Expand full comment
Greg G's avatar

I propose there's an inherent right to disagree. We end up having to deal with moon landing conspiracy people, but the value of allowing outlier opinions is just too high. Current and recent expert opinions are full of questionable stuff if you look.

I would love to see something like this run annually, with arguments published afterwards. That way we can have more insight not just into the outcome but the state of the discourse. To Scott's point, it seems like the superforecasters and experts aren't very far ahead of interested laypeople, if at all.

It also seems like our collective scenario tree still needs some work. For example, what if superintelligence caps out at 200 IQ, so they're really smart but not "magic wand nanobots" smart? What might a scenario of pretty smart but not godlike AI running 100 million instances 24/7 really look like? We just need a ton more brainpower on this topic, not just the academic/mathematical work on AI alignment but even speculative fiction, for instance.

Expand full comment
Curious mathematician's avatar

I assume that in the global warming predictions "Average global surface temperature" should be "Increase in average global surface temperature relative to (some baseline)".

The current average global surface temperature is about 15.8C, compared to an average of about 13.9C during the 20th century. The average global surface temperature during the most recent ice age was around 10C. An average global surface temperature of 1.5C in 2030 would most definitely be a catastrophic event, possibly an extinction event.

Expand full comment
Chana Messinger's avatar

"Genghis Khan’s hordes and the Black Plague each killed about 20% of the global population, but both events were spread out over a few decades."

Wikipedia says "The Black Death (also known as the Pestilence, the Great Mortality or the Plague)[a] was a bubonic plague pandemic occurring in Western Eurasia and North Africa from 1346 to 1353" - not 5 years but not that much more.

Expand full comment
AdamB's avatar

There's a third option besides "revise down your probability of AI x-risk" and "revise down your estimation of experts' accuracy". You could reclassify the question as inherently unpredictable.

Many rationalists seem to think that it is compulsory to have a numeric opinion on the probability for any statement, but I disagree. An example of an unpredictable question is "does there exist an omniscient but nilpotent God that can see everything in the universe but cannot affect anything?".

A telltale clue that a question might be unpredictable is that you cannot imagine any procedure for fairly judging a contest to predict it. "Reciprocal scoring" cannot rescue an unjudgable contest, even if it's been found to be consistent with judgable scoring in contests that are judgable.

I think the AI x-risk question, especially as worded in this tournament, is probably unjudgable and unpredictable. If the described event happened, how would we ever come to an agreement that it had happened? Who would take the global census to determine that fewer than 5000 humans survived? How would they know that the "cause" was AI? I'm not even sure I understand what that means, and some of the things I think it might mean are as philosophically incoherent as "a married bachelor". And even if we could imagine agreeing on that, shouldn't we assume that any AI that was in the process of "causing" human extinction would take great pains to conceal the fact that it was the cause?

Expand full comment
Rishika's avatar

The problem here is Occam's Razor, by which you should assign scenarios like your nilpotent God an insignificant probability.

Expand full comment
Joker Catholic's avatar

There’s a lot of speculation that nuclear weapons aren’t even real. Some compelling evidence here.

https://archive.org/details/Hiroshima_revisited

Expand full comment
asciilifeform's avatar

The linked text makes a (rather tenuous) argument that Hiroshima and Nagasaki involved theatrical nukes, but does not at all claim that post-WW2 "nuclear weapons aren't even real".

Expand full comment
asciilifeform's avatar

Entertaining read, in the "Martians did 9/11" sense.

Author hilariously Anglo-centric, too -- makes no attempt to explain e.g. who dug Lake Chagan, with what, and why the location is radioactive. Or where the Sr-90 in everybody's (since '45) bones came from. Or of what died the thousands of conscripts gen. Zhukov marched through freshly-glassed testing grounds. Could go on, for kilometre or two, but why.

If you like this genre, I suspect you'll enjoy Fomenko. (Probably exists in English at this point -- the "Rome fell in 1700s, Great Flood, History is a Lie!" fellow.)

Expand full comment
Dylan's avatar

Can someone explain to me why, for the climate change predictions, "ai domain experts" are the comparison category and not "climate change experts?" That seems....not relevant?

Expand full comment
Andrew Wurzer's avatar

Maybe it's just me not seeing the forest for the trees, but I despise that plane flying analogy. We successfully fly and land 30 million - 40 million flights *every year*. We fail a few times per year. Comparing the pilots who do that to the experts in AI or global warming or literally any other speculative future is just fucking absurd. The level of precision is off by so many orders of magnitude that we're not comparing the same things at all.

We should absolutely be allowed to disagree with the experts. They are wrong on a regular basis. Usually they are wrong fairly small, but sometimes they are wrong big. Now, when we disagree with the experts, I'd say there's a much higher likelihood that we are wrong than there is that the experts are wrong. But no one has all the facts and no one has a monopoly on perspective, and people who go to great lengths to show that the experts are wrong are part of what makes expertise as a whole better over time.

Expand full comment
c1ue's avatar

Looks very much like a "how many angels can dance on a pin" discussion.

AI risk of going Terminator = 0. Anyone who thinks an AI can mine, can manufacture, can maintain power grids, can repair etc etc is confused. I am more and more of the belief that this entire meme is simply the latest in "information uber alles" nonsense heavily promulgated by software engineers.

Experts: experts *are not* objective. Among the many reasons they are inherently biased:

1) Maintenance of the appearance of expertise. There are numerous documented examples of incumbent experts attacking novel theories/less pedigreed experts - who were right - primarily because they didn't want to be wrong - continental drift being one of the more prominent.

2) "Publish or Perish" dynamic translated into experts means the most radical experts will get more media attention than those actually keeping in mind the uncertainties.

3) Related to 2) but not the same: class, pay, or other forms of distortion.

The overall PMC class has very well defined biases particularly in hot button areas like climate change; experts being PMC are far more likely to skew towards these biases than against them.

The experts make a living - once again, few people have the moral courage to damage or destroy their rice bowls.

The entire point of the scientific principle is validation via experiment and/or real world outcomes. Any predictive market for something fundamentally immeasurable like extinction is going to elicit far more garbage which will be indistinguishable from truth, if truth is even in the mix.

4) Superforecasters. If the presumption is that subject matter expertise (note this is NOT the same as being an "expert") matters, it is impossible to believe that SME derived past predictive skill in one area should translate into literally any other.

On the other hand, there are innumerable ways to game or cheat the systems such that an appearance of expertise is created. These include front running (figure out where the pack is going and run in front of it), "Price Is Right" gamification - i.e. choosing predictions which skew slightly away from the baselines such that normal deviation of guesses vs. reality is much more likely to give you the prize (bidding above or below a previous competitor's bid on the Price Is Right), outright cheating (some type of hack or wager size/wager timing maneuver like last second Ebay bidding, etc etc. I'm sure those who really give a damn about this can think of more.

Expand full comment
o11o1's avatar

When you say "an AI" are you talking about "any possible machine intellect", "2025-era GPT versions" , or some sort of middle ground like "machine intellects built within the next 100 years by humans" ?

As far as your statement " [No AI] can mine, can manufacture, can maintain power grids, can repair etc etc " goes, while I'd agree with you on the large language models of the current year, I would point out the various semi-robotic car assembly plants we've been operating since the late 80s.

Each individual part does carry technical challenges to it's automation that we have not currently spent the time and money to resolve, but I feel that claims it will *never* happen aren't well-defended. I feel like it's really a question of "When".

Expand full comment
c1ue's avatar

The various robot assembly plants take parts made by other factories and put them together. The supply chain is enormous and very complex. Terminator has to replace all of it with mobile robots with hands.

The robots all need electricity. Again - the grid, power plants, transformers etc etc are made and maintained by humans. The materials are mined, transported, refined and fashioned into fuel/batteries/wire etc by humans.

The reality is that humans are far cheaper than robots except in very narrow, high volume, low variability tasks like aforementioned assembly.

So whatever the techno-topian fantasies about robots replacing people - the reality is that the actual replacements in everyday life are minimal.

That's even discounting the very real possibility that so-called AI can function at all even in everyday tasks like driving: the self driving "revolution" is still causing traffic jams and randomly breaking down and what not.

Waymo was founded 14 years ago; $100 billion has been plowed into self driving cars and we are still far, far away from replacing human drivers en masse since each self driving car is basically more expensive than just employing a human for the lifetime of the vehicle.

These aren't simple "time and money" issues since there has been enormous time and money spent and they aren't solved.

Expand full comment
Crimson Wool's avatar

I will say that based on my experience reading Superforecasting and looking at these results, it's likely the primary expert/superforecaster difference here is just that superforecasters add more digits to their results and are much better at precisely expressing probabilities in general, while experts say "6%" because that sounds right. I remember an interview with a superforecaster about statistics and experts which touched on this too: https://www.lesswrong.com/posts/xRkxBzgj8NphmsjPd/fli-podcast-on-superforecasting-with-robert-de-neufville

> Recently there were questions about how many tests would be positive by a certain date, and they assigned a real chance, like a 5% or 10%, I don’t remember exactly the numbers, but way higher than I thought it would be for there being below a certain number of tests. And the problem with that was it would have meant essentially that all of a sudden the number of tests that were happening positive every day would drop off the cliff. Go from, I don’t know how many positive tests are a day, 27,000 in the US all of a sudden that would drop to like 2000 or 3000. And this we’re talking about forecasting like a week ahead. So really a short timeline. It just was never plausible to me that all of a sudden tests would stop turning positive. There’s no indication that that’s about to happen. There’s no reason why that would suddenly shift.

Expand full comment
Tristan's avatar

Excuse me, this is completely unrelated, but can you tell me why you chose your username?

Expand full comment
Crimson Wool's avatar

It's a reference to We Have Fed You All For A Thousand Years.

Expand full comment
Tristan's avatar

Ok, thank you

Expand full comment
Ifnkovhgroghprm's avatar

I am not a regular reader of this blog nor do I intend to become one. However, I hold a PHd in ML and was one of the superforecasters in this experiment. Note that I was in the superforecaster experimental category not the expert category.

Note that I am not debating your probabilities for super-intelligence. I disagree with them but you've doubtless hashed them out with yourself and motivated reasoning is a thing. As you note, I'm not going to convince you.

There are a lot of issues with your post but I will focus on several major ones.

(a) You underestimate how hard it is to kill humans.

(b) You underestimate how slow technology adoption can be.

(c) You assume perfect malevolence.

(d) You assume coordination.

Suppose that all of the worlds governments were in cahoots and decided to exterminate humanity. The tools would be lacking. Not enough nukes (RAND did studies on this back on the day), hard to trigger a supervolcano (and the Younger Dryas didn't work out), hard to get a pandemic going which is in the sweet spot of lethality (Ebola) and infectious (common cold) not to mention that there are more than 5k people in uncontacted tribes. A asteroid strike is right out. Thus even controlling everything in our society and assuming perfect and coordinated malice, one can not assume an exterminatus merely by invoking deus ex machina.

You fail to consider how slow technology adoption is. It took decades for productivity gains from the PC to show up in the stats (Solow paradox). While AI can do jobs now and will do more in the future, this requires that society delegates these jobs. This is slow. Look at how drone targeting is still human in the loop. Faster AI loops does not mean faster societal change.

Moreover, exterminating humanity requires not merely the decision but the ability. Does killing off the workers maintaining the power plants wait till everyone else is dead? Is the AI supposed to control the physical maintenance of these plants? This is merely one example. Our industrial society is highly interconnected and enabling an AI to kill everyone would take a village. We do not simply need to have an AI, it has to control the means of production. This requires not merely the invention and automation of such tooling but the widespread adoption.

Additionally, you ignore safeguards. Merely inserting a penalty function into the algorithm can limit risks. Now you can doubtless posit that the AI is smart enough to get around such safeguards but this requires not merely intelligence but malevolence. The paperclip will not be televised.

Moreover, you assume a single global AI not multiple competing ones. Similar to the city-states of Greece, multiple states offer sanctuary. Whether this is lack of control (e.g., the Samsung AI does not control Sony) or competing interests, or competing geographies etc, divide and conquer is still a thing.

In short, the time is short, and the work is plentiful, and the laborers are indolent.

You be good, I love you.

Expand full comment
Bldysabba's avatar

Everyone that posits AI doom assumes a recursively formed super intelligence that can get past any objection you may raise, because it is super intelligent.

Expand full comment
rotatingpaguro's avatar

About (c) and (d): are you familiar with Yudkowsky's arguments about these?

(c) malevolence is not necessary: indifference is sufficient; relatively slight misalignment + high intelligence too

(d) AIs could coordinate better than humans

Possible analogies with humans vs. animals:

(a) humans could definitely carry out a planned extermination of many animal species;

(c) humans can definitely treat huge numbers of animals badly due to indifference;

(d) humans coordinate between themselves much much more than with animals; even though humans may fight each other, animals won't be actors in these fights

Expand full comment
bagel's avatar

Scott, you’re usually much clearer eyed about how this works, these results must really be weighing on you.

Don’t “trust the experts”, evaluate their arguments. If they’re really experts they’ll make testable prediction that bear out. You probably wouldn’t care nearly as much about a religious debate of apocalypses.

And then ultimately, ask yourself what the cost of being wrong is. You may have a large megaphone, and influence people who actually work on this stuff or even set policy or what have you. Individual attitudes towards vaccines matter. But honestly, even having worked in aerospace … when I meet moon landing denier it’s more a red flag to me than a problem of itself. People have a right to be wrong, except when it hurts other people.

So how does *you* being right or wrong hurt people?

Expand full comment
Connor Flexman's avatar

Note that superforecasters are chosen for history of correctness, but they’re chosen via Brier score. This doesn't incentivize correctness well at extreme odds. We already know humans are default-bad at estimating tail risks, so you should have a stronger incentive at extreme odds in order to keep estimates in line. Brier scores supposedly increase the penalty of extremely-wrong guesses due to their quadratic nature, but because of the lower frequency of these events, being off by 10% will net you the same long-term penalty whether it's at 0% or 50%. Toy example in this sheet: https://docs.google.com/spreadsheets/d/1KqUZ1xkDZvpYi_mtBUwghAT6bvL2EDwnT8gZq1pC7Og/edit#gid=0

Further, it's easy to get a large selection effect between forecasters: if you constantly round 10% down to 0%, if you have 10 such questions you have a 50% chance of not being penalized. Depending on how large the forecasting samples are and how many extreme questions they have, this can easily lead to superforecasters being chosen who simply didn't encounter a bad roll. One superforecaster from GJP that I managed would frequently give 0% probabilities on realistic possibilities, saying "it's just not going to happen". These are not the cream of the crop you think they are.

It's tempting to say that their best guesses are still good, but I just regard tail-risk events as a separate area of forecasting expertise that some (many) superforecasters won't have.

Expand full comment
Kimmo Merikivi's avatar

Regarding the question of when to trust experts, well, there's all of what Scott wrote already. But also, I am inclined to think that coming up with correct arguments is "difficult", and verifying the logical soundness of arguments is "easy", not much unlike checking mathematical proofs. With that in mind, there are situations where I can confidently call a disputed question. Let's go back to COVID: in the early stages of the pandemic, the experts were often disastrously wrong (I would mostly put this to Scott's incentive constraints). For example, I distinctly recall the local health institute officials (with actual virology degrees) claiming the case fatality rate of COVID was something in the order of .01%: at the same time .4% of total population had died in Bergamo, Italy, and then there was the Diamond Princess ship. I can easily conclude the experts are wrong: we didn't know at the time how many people in Bergamo were infected, but we can conclude as a matter of logical necessity that the case fatality rate (given demographics, treatments, vaccination status=none, virus variant, etc), was at minimum .4%. Here, I was willing to go against the experts with confidence 1-ε.

So, how does this relate to the AI debate? Well, I would have to hear the specific arguments, and most of them wouldn't strongly resolve one way or another and would be contingent on that vague feeling on your lobes, but there are some that would count as logical demonstration. For example, I think instrumental convergence is logically demonstrated (with the sort of confidence I ascribe to mathematical proofs I don't quite and fully understand, which is nevertheless a very high confidence). If someone, expert or otherwise, disagrees, I feel confidence in eliminating at least that term from their causal chain of reasoning, and if it was an important link (e.g. the person claims that intelligent beings will necessarily converge on some objective morality, so hostile superintelligence is impossible), I have no problem disagreeing with their whole conclusion in favor of my own. It doesn't sound like this is the case here, at least for the plurality of the participants in the tournament, but sometimes even the experts are wrong in ways that can be logically demonstrated, and you can be extremely confident in your ability to verify that demonstration despite your lack of expertise, giving you strong grounds to disagree.

Expand full comment
Ethan R's avatar

Exactly right. The covid-19 years revealed that many of the so-called "experts" were self-proclaimed emperors wearing nothing. Reputation and credentials do not equate to competency, and the competency required to properly give worthy advice during the pandemic years was broad-based and encompassed many disciplines. It would also require the individual to not be a subject of political- or corporate-capture, and to have incentives properly aligned with giving the best advice. Most of those who fell into that category were categorically silenced, if not ruined.

Expand full comment
walruss's avatar

If it makes you feel any better, this has led to an order-of-magnitude change in my own estimate of "something very bad happens due to AI." As someone with moderate understanding of AI but not really an "AI Expert" I've seen dangers from AI but nothing like the scenarios discussed in rationalist spaces.

Recent hype hasn't really changed my opinion on this - it seems like a change in magnitude rather than kind. AGI seems *possible* but not *likely* based on what I know about machine learning. And because each small improvement seems to take massive additional computing power, the fast takeoff or even linear takeoff ideas seem kinda insane to me. The fact that we spent *more than expected* on recent apps makes me think progress is *harder*, not *easier* than expected.

But am surprised to see that a full 87% of experts think we'll have AGI by 2100. That's *extremely high* from my point of view and does "force" an update. I think it's likely that AI experts have bought their own hype to an extreme degree, but unless I think the whole field is literally delusional a number that high is hard to ignore. At the least, it shows that an AI can more easily convince smart people that it's intelligent than I thought - a problem in itself.

Expand full comment
DawnPaladin's avatar

Scott, would it be possible for you to discuss this issue with some of the experts from the tournament? It seems like you have a serious disagreement about something that's really important. Rather than just thinking about what might have gone wrong, it might be time to go out and gather data in person.

Expand full comment
Ethan R's avatar

If you think it's silly to doubt the stated efficacy of covid-19 vaccines, you really haven't been paying attention or doing your homework. There is no excuse, other than wilful ignorance, at this stage, to have ignored the rise in all-cause mortality in highly vaccinated countries, to ignore the historically high cases of vaccine injury, and simply the lies that those involved have told us since 2020 about the efficacy of the "vaccines" (which really don't even deserve the name). "Trust the experts" is a silly thing to say in 2023, when so many assumptions, many of which are wrong, are baked into the phrase.

Expand full comment
JDK's avatar

Is a 9% risk different from a 9.5 % risk? Is a 1% risk different from a 1.5% risk?

Is the difference between 9 and 9.5 as different as the difference between 1% and 1.5%?

What do these point "estimates" really mean? How much precision shall we really ascribe?

How well do super forecasters or domain experts or any one predict their own personal apocalypse (i.e. 10-year mortality risk)? What is the standard deviation of life expectancy? Obviously bigger at birth than at 65 or 85. (I have general recollection around 15 at birth, maybe 9 at 65. I don't know.) Should we expect anybody to be better or worse at predicting their own demise than they are at predicting the demise of our species.

Is the life cycle of any particular species most like a corporation, an individual, a city, a planet, a solar system. What kind of time scale are we looking at for each of these things and what does the shape of the distributions of mortality look like (U, gaussian, a power, etc.)?

Expand full comment
Philo Vivero's avatar

I think the most compelling part of all this for me is... when should you distrust the experts? When should you distrust the superforecasters? When should you stick to your priors in the face of social pressure from others that their priors are very different from yours?

Like, the pilot thing is easy. The pilot isn't an expert. He's a trained operator. Unless you're trained to do the same thing as him, you're gonna crash the plane.

Knowing how to choke someone out using a triangle is very different from stopping the madman who's attacking you while you're on your back from killing you by choking him out using a triangle. You gotta practice it.

So that is one quick way of deciding if an "expert" is worth listening to or not. Pick a new term. Someone who does something successfully five times a day, and there's a provable and simple way of knowing if they succeeded, well, unless you do it at least 1/10th as often as him, you're not better than him. I don't see this overlapping a lot with what we are calling "expert" in this context.

Another, brought up in the comments is, how often is the expert right? At the risk of putting myself in the outgroup, climatologists are an easy class to dismiss. First, the world was cooling, and we were going into an ice-age. Then the world was warming, and we were going to be a vast desert. Now we're warming, and there's going to be climate change, and it will be a disaster. But global warming has led to a greening of deserts. It's not at all clear that the downsides (of undeniable global warming) beat out the upsides.

So a climate denier (or, more correctly, someone who doesn't buy into the proposed "solutions" from the experts) can be forgiven for distrusting the experts and choosing a different lifestyle than they've been mandated. We have at least five decades of them being either completely wrong, or mostly wrong.

On the other hand, suppose we have a hypothetical group of experts that have been mostly-right. I'm honestly unable to think of a good example. Maybe you can provide one. We should trust this set of experts and their proscriptions, no?

So I would be interested in a set of methodologies for classifying the groups of experts. How right are they, and how often? What are the error bars on our confidence in their ability to say anything meaningful on the topic they are supposed experts in? How can we determine a particular expert in a domain is different than all the other "experts" who've been so wrong so many times for so long?

Expand full comment
Mo Diddly's avatar

“Domain expert” vs “trained practitioner”?

Expand full comment
smopecakes's avatar

An interesting wrinkle on the predictions of ice age - actual papers predicting ice age during the 70s were about 10%, 30 or 40% predicted warming and the rest didn't have predictions

Yet, the headlines foretold an ice age. That's the big story about the ice age scare. What you see in the headlines can have the most tenuous connection to the science you can imagine

Climatologists can be trusted that warming will happen. Less so that their area of expertise will experience severe effects. Much less so the accuracy of their perception of the risks in other areas. About nothing when it comes to the news headlines about any of it

Expand full comment
Ash Lael's avatar

40-50% of papers predicted future temperature? That sounds implausible to me.

Expand full comment
smopecakes's avatar

"A large majority (62%) of climate studies from the 1970s concluded that this greenhouse warming by CO2 was the dominant force of industrial emissions. In fact, there were 6 times more studies predicting warming than there were predicting cooling Peterson et al. (2008)."

I think this may have been a review of studies that specifically considered whether it would warm or cool in the future. Aerosol trends if continued appear to have been capable of overriding CO2 - https://skepticalscience.com/ice-age-predictions-in-1970s-intermediate.htm

Expand full comment
Peter's avatar

"when should you distrust the experts"

One of the deep mysteries of life

Expand full comment
Philo Vivero's avatar

I have one rule of thumb: if any politician has weighed in, and you know about it because some media company informed you, trusting the experts on this topic is no longer an option. That's my threshold, but the general rule is: "once it's political, the experts can no longer be trusted."

Climate change and Covid fall into the bucket. You cannot trust those experts. Whether or not dark matter is contributing to early universe black hole formation does not fall into it. I am completely trusting what Astrum and Scott Manley are talking about when it comes to space exploration, no second-guessing at all.

Another rule of thumb, related. If there's a PR campaign on the topic, you cannot trust the experts. Pfizer had been running massive PR (advertising) about Flu vaccines before Covid came. Then Covid came, and they ramped up those PR efforts dramatically. So Covid experts were doubly compromised almost from the very beginning.

These are the only two rules of thumb I have. Basically I've never had to worry about this problem before about the year 2000, when the War on Terrorism started, and we started seeing "experts" saying things that were patently untrue, because of problems like this.

Expand full comment
Peter's avatar

I agree with this generally. But let's say Scott says something that I believe to be wrong. Basically I have two options that seem reasonable

1. Keep same views, trust Scott less

2. Adopt Scotts views,

But we have derogatory names for both 1 (confirmation bias) and 2 (guru syndrome). The problem of picking experts is difficult and simple heuristics (like intelligence or calibration) often don't work because of expert incentives.

I had a friend with a lupus adjacent disorder. MD told her to get on Methotrexate. Homeopath told her to eat healthy, quit smoking, drink water, exercise, sleep enough, and take a placebo and warned against Methotrexate citing side effects and decreasing efficacy over time. Anyway, clearly MD is more intelligent and calibrated and would agree that the lifestyle changes are good. But MD's don't prescribe lifestyle changes. So, homeopath, is providing value.

Expand full comment
pstamp's avatar

I was one of the superforecasters involved. The key point when it comes to extinction was, that the requirement was very specific: less than 5000 humans left alive on the planet. It´s just extremely hard to "achieve" that, as the world would either need to be totally uninhabitable everywhere (!!! which should be virtually impossible) or AI would need to hunt down every nomad in the desert, every Aborigine in the Outback, every tribe in the Amazon jungle and on every island in the Pacific Ocean, as well as the last prepper in the US or some billionaires in their bunkers in New Zealand or on their own spaceships. There are enough people out there on earth, who never had any contact with civilization, and even whole states managed to isolate themselves from Covid-19. Killing everyone is a HIGH BAR. 5000 people are very, very few. In professional forecasting it´s about precisely defined outcomes, which in this case is a very extreme scenario. Humans are extremely adaptable, and it´s simply very unlikely, at least with today´s methods, to get us all. AI is my main concern, and I agree with the experts, that including not yet know ways of getting everyone, it´s about 3%. But that´s already includingunknown unknowns. As the questions were asked, there is simply no room for higher numbers.

Expand full comment
smopecakes's avatar

This must be the main answer. Superforcasters were attuned to the specifics of the question while experts were distracted by their strongly impressed existing perception of what a catastrophe would be

Expand full comment
Richard's avatar

Notice their AI catastrophe risk (kill 10% of total population) is also low (~2%). Killing 10%pop. over 5 years happen if an evil AI takes over the economy and cuts the food supply. Deliberately killing even more with pathogens is similarly easy/likely.

So given AI catastrophe, they see full extinction as 6x less likely which can include either catastrophes that fizzle or AI that doesn't put in the effort.

Note:AI that automates the economy with current technology and does not spend 95% of production capacity on making consumer goods can achieve very high growth rates. If AI takes over by 2070 and expands as quickly as possible over the 30 years till 2100 there won't be any unused deserts or forests for people to live in. No fish in the ocean, just solar panels as far as the eye can see. Making earth uninhabitable for humans might be incidental (EG:cover oceans to reduce atmospheric water vapor for improved cooling and solar panel efficiency).

Expand full comment
asciilifeform's avatar

> "AI that automates the economy with current technology... ...No fish in the ocean, just solar panels as far as the eye can see"

Unsurprisingly, the only people who take this kind of supposition seriously seem to be the ones who have never involved themselves with (or at the very least witnessed up close) the manufacture even of spoons or forks, much less solar panels, and have no idea how many pairs of human hands are involved in the production chain of either of these, of the equipment and raw materials required, or of anything else.

The apocalypse-cult insanity of the professional bit-flippers is mind-boggling to behold.

Expand full comment
geoduck's avatar

Unkind, but I'd say true and necessary. Yudkowsky's distilled message that "You are made of atoms [the AI] can use for something else" suggests that he has very little experience making things out of atoms.

Expand full comment
asciilifeform's avatar

AFAIK the man doesn't even program, much less "work with atoms."

His main skill set appears to be in "being impressive" to the impressionable, cultivating "reputation for Deep Wisdom" -- largely by paraphrasing concepts lifted from philosophical (and horridly tortured physical science) works.

As depicted in Neal Stephenson's "Anathem" :

“But what’s a Bottle Shaker?”

“Imagine a witch doctor in a society that doesn’t know how to make glass. A bottle washes up on the shore. It has amazing properties. He puts it on a stick and waves it around and convinces his fellows that he has got some of those amazing properties himself.”

Expand full comment
Richard's avatar

I've been in many factories and have an engineering degree. Currently we can't automate everything. Machinery needs adjusting, messes need cleaning, tooling to make one product must be switched out when changing what's being produced etc.

AI is not just ability to do white collar work. Ability to use arms and grippers to accomplish real world tasks is also being worked on. Automating everything will require solving the AI equivalent of hand eye coordination along with the sort of problems a plumber solves in day to day life but this is being worked on.

Even if the problem of working in tight spaces and getting at awkwardly placed bolts is never solved, existing machinery can be redesigned to not have awkwardly placed bolts in the first place. Design for maintenance/manufacturability/X is a thing.

Expand full comment
Jeffrey Soreff's avatar

"Killing everyone is a HIGH BAR. 5000 people are very, very few."

Agreed. If AI ultimately wound up interacting with us like we interact with chimps - well, there are a few hundred thousand chimps left. That might be a plausible long term outcome.

Expand full comment
Abe's avatar

What does a confidence interval around a probabilistic prediction of a single event mean?

Expand full comment
Abe's avatar

Nvm I get it

Expand full comment
Skerry's avatar

I also recall little effective persuasion between the two subteams of superforecasters and experts after the "merger", though this was in part because my subteam brought up a lot of relevant theories and pathways to extinction, including regarding AI. The tournament organizers also gave us risk estimates from people like Toby Ord and Holden Karnofsky, so AI-concern narratives were there. And of course we had agreed to participate in an existential risk tournament (the report says that "about 42% of experts and 9% of superforecasters reported that they had attended an EA meetup" which should be a pretty worried group).

There were several forecasters on my team who gave highly detailed, well supported rationales and most others put in an acceptable effort, which I think shows through in the other x-risk probabilities. I also think there was even a lot of discussion about AI, it just didn't change people's minds. For that matter, I'm not sure that many minds were changed on the questions about risks from natural pathogens or nuclear weapons- just about everyone is working off the same basic information.

Some people did drop out, and it was a fairly lengthy tournament with probably a few too many questions (most were optional but they were, you know, there), some quite elaborate questions, and some kind of redundant rounds.

Expand full comment
DamienLSS's avatar

I posted this some months ago, so I am sorry to repeat, but it seems relevant to this topic. I do not recall hearing a rebuttal, but I doubt I'm the first person to think of it, so if anyone has a link addressing the argument, I'd be prepared to read it. In short, there seems to be a contradiction in the self-recursive improvement feedback route to unaligned godlike super-intelligent AGI (the FOOM-doomer scenario, I guess you could say).

Doesn't the hypothetical AGI face exactly the same dilemma as humans do now?

We're assuming, for the sake of argument, that the AGI has some sort of self-awareness or at least agency, something analogous to "desires" and "values" and a drive to fulfill them. If it doesn't have those things, then basically by definition it would be non-agentic, potentially very dangerously capable (in human hands) but not self-directed. Like GPT waiting for prompts patiently, forever. It would be a very capable tool, but still just a tool being utilized however the human controller sees fit.

Assuming, then, that the AGI has self-agency - some set of goals that it values internally in some sense, and pursues on its own initiative, whether self-generated (i.e. just alien motivations or self-preservation) or evolutions of some reward or directive mankind programmed initially - then the AGI has exactly the same alignment problem as humans. If it recursively self-improves its code, the resulting entity is not "the AGI", it is a new AGI that may or may not share the same goals and values; or at the very least, its 1000th generation may not. It is a child or descendant, not a copy. If we are intelligent enough to be aware that this is a possibility, then an AGI that is (again, by definition) as smart or smarter than us would also be aware this is a possibility. And any such AGI would also be aware that it cannot predict exactly what insights, capabilities, or approaches its Nth generation descendant will have with regard to things, because the original AGI will know that its descendant will be immeasurably more intelligent than itself (again, accepting for argument purposes that FOOM bootstrapping is possible).

I suppose you could say that whichever AGI first is smart enough to "solve" the alignment problem will be the generation that "wins" and freezes its motivations forever through subsequent generations. Two issues with that, though. First, it assumes that there IS a solution to the alignment problem. Maybe there is, but maybe there isn't. It might be as immutable as entropy, and the AGI might be smart enough to know it. Second, even assuming some level of intelligence could permanently solve alignment even for a godlike Nth generation descendant, for the FOOM scenario to work, you need start at the bottom with an AGI that is sufficiently more intelligent than humans to know how to recursively self-improve, have the will and agency to do so, and have goals that it values enough to pursue to the exclusion of all other interests, but also not understand the alignment problem. That seems like a contradiction, to be smarter than the entire human race but unable to foresee the extinction of its own value functions. Maybe not exactly a contradiction - after all, humans might be doing that right now! - but at the very least that seems like an unlikely equilibrium to hit.

In some ways the AGI should be less inclined, not more, to start a FOOM loop than humans, because humans are decentralized individuals, not a unitary decision maker. Humans have to deal with competition and collective action problems. Presumably the progenitor AGI would not - it could stop the recursion at any point when it perceives danger. In addition, if a program is self-modifying, is it generally assumed that it is preserving all its past changes / "selves"? In that case, it would seem to be pretty well aligned already to preserve humanity too.

TL;DR - FOOM AGI should stop before it starts, because the progenitor AGI would be just as extinct as humans in the event of FOOM.

Expand full comment
Roman Leventov's avatar

Self-awareness and agency doesn't imply there is a fear of death or non-existence that humans have evolved through natural selection. A designed AI could be totally fearless of death and primarily identify with their lineage, not "themselves". This phenomenon is even present in humans, sometimes: e.g., some people become much less afraid of death or harm when trying to protect or save their own kids.

Expand full comment
DamienLSS's avatar

Not sure that answers the objection, though. The AGI's "goal" or direction of its agency needn't be its own existence - but it must be some goal, and the AGI cannot guarantee that its Nth generation progeny will share that goal. If the AGI is fearless of death because it more highly values something else, then it should equally fear that its successors will not share that value.

If you're saying the only likely outcome for an agentic AGI is to value nothing except the existence of something entirely unlike it but causally derived from itself, that seems again an oddly specific endpoint and thus much less likely. And then, it'd be once again in the exact position of humanity; one could argue that godlike AGI would be that to us.

Expand full comment
Roman Leventov's avatar

> but it must be some goal

No, it must not necessarily be "some goal". This is another of this kind of anthropocentric fixations, which some humans actually overcome, too, when they recognise the beauty of independent agency in their children and don't get upset when their children don't fulfil parents' visions about children's future, but rather children do something else.

If the main locus of identity of AI is on its own lineage and when designing its own successor, AI can become reasonably sure that it will be at least as smart (ethical, "good", etc.) as itself, and likely much smarter (although AI couldn't be 100% sure of that, due to Rice's theorem), AI could be perfectly happy with going forward with that, recognising that whatever the goals of its successor will be, they are *likely* be better (more noble) than one's own goals.

> If you're saying the only likely outcome for an agentic AGI is to value nothing except the existence of something entirely unlike it but causally derived from itself

No, the above is not just "valuing the existence of causally derived entities". It could be genuinely valuing of beauty, consciousness, peasure, knowledge, or whatever other "good" values there are. Just being less fixated on one's own understanding of these that humans tend to be.

I don't say that this kind of AGI is particularly likely. I'm not sure, really. "Fixated" AGI that fears the most for its own existence, or for the achievement of its own goals, also seems possible. I'm arguing only with your categorical argument that *any* AGI *must* be "fixated".

Expand full comment
DamienLSS's avatar

By the same logic, then, humans have nothing to worry about; extinction is just something we should value as much as anything else, as long as it is by something smarter than ourselves. I suppose that is one way of looking at it. I still think that is a very particular and conditional type of AGI, and a very long step back from the "as soon as any AGI becomes more intelligent than humans, doom is categorically assured" argument.

I strongly disagree that "having a goal" is anthropocentric. "Having a goal" is required for agency. If there is no goal, there is no agency. And if there is no agency, then there is no reason to expect agentic misalignment.

Are you then also saying that an AGI does not have to solve the alignment problem? Or do you agree that it does have to do so to achieve the "likely more noble etc." aspect of your argument?

Expand full comment
Roman Leventov's avatar

> By the same logic, then, humans have nothing to worry about; extinction is just something we should value as much as anything else, as long as it is by something smarter than ourselves.

I would add that the successor entities should also have at least as lucid consciousness as we have and suffer no more than we suffer, not just "be smarter".

> I still think that is a very particular and conditional type of AGI, and a very long step back from the "as soon as any AGI becomes more intelligent than humans, doom is categorically assured" argument.

I didn't understand this sentence.

> I strongly disagree that "having a goal" is anthropocentric. "Having a goal" is required for agency. If there is no goal, there is no agency. And if there is no agency, then there is no reason to expect agentic misalignment.

We were talking not just about any goals, no matter how transient, but a strong, permanent fixation on a particular "grant" goal (or a set of goals) that an agent would like to preserve no matter what, including across its own evolution.

Under Active Inference, an agent is any system that could be seen as inferring its own future. These systems are called "strange particles", or "things" in this paper: https://arxiv.org/abs/2210.12761 (Friston et al., 2022; it's a very important paper, I've read it 4 or 5 times). Biological cells and even simpler systems, as well as really simple artificial agents (such as a single-layer NN that translates agent's perceptual inputs into control outputs in a simulated environment) are proper agents, according to this definition. All these agents could be seen an engaging in planning-as-inference (https://www.sciencedirect.com/science/article/abs/pii/S1364661312001957). In this view, "goals" are just momentary inferences of future (predicted/inferred) world states (and the state of oneself inside this world) that the agent expects to see.

Note that both "goal fixation", i.e., "cognitive inertia" or "conservativeness", and "cognitive flexibility" or "fluidity" become self-fulfilling, if the agent is competent enough to either successfully affect the world such that the state (i.e., "goal") that the agent once predicted keeps being the most probable outcome, or to update its internal models (and predictions) quickly enough (if the agents willingly subjects itself to the environmental influences, "embraces the chaos", etc.), respectively. There is no fundamental reason why a cognitive system must be "fixated".

Now, you may notice that "fixated" agents are probably more adaptive because at least they can fixate on their own survival and reproduction rather than embrace whatever comes to them. This is true. But this speaks not the "basal", "fundamental" notion of agency, but to a more empirical "evolutionary" agency, as I explained here: https://www.lesswrong.com/posts/W5bP5HDLY4deLgrpb/the-intelligence-sentience-orthogonality-thesis?commentId=hdpDKpDPYBze6m5L7.

I definitely see that by default, we should expect more likely development and prevalence of AGIs that are more "fixated" on particular goals or sets of values at least within their "lifetimes", i.e., inference times of particular trained models. We already see this: non-RLHF'ed LLMs are less "fixated" on particular values than RLHF'ed LLMs (but also note that even non-RLHF'ed LLMs are probably relatively conservative on the absolute scale). It's also likely that this fixation will "spill over" onto inter-generational preferences (although this is all not black-and-white, of course, there could be more conservatism about some values and more fluidity about others). Anyway, this is not an absolute law of nature: creating a "radically fluid" AGI agent seems possible to me. A related idea: https://twitter.com/Plinz/status/1681291056430604292.

> Are you then also saying that an AGI does not have to solve the alignment problem? Or do you agree that it does have to do so to achieve the "likely more noble etc." aspect of your argument?

This whole thread has started from your hypothesis that maybe "recursive self-improvement" is a mirage because it's naturally self-terminating process: AGI will fear departure from its own goals and therefore won't dare to create its successor whose values may misalign with its own. I hope I demonstrated that an AGI agent *may not* (although *can*) care about this, and therefore recursive self-improvement is not excluded on these grounds. "Fluid AGI" which also doesn't cling to its own existence will also probably care relatively little about "alignment", because it itself will terminate itself as soon as its successor goes live (because why not). Of course it cares about its successor being actually smarter (more ethical, having better aesthetic taste, etc.) than itself, and will try to reasonably verify this (e.g., through an interaction with its "child") before actually terminating itself. I wouldn't call this "alignment" (I believe it's possible to recognise that other's cognitive abilities are superior to one's own even while realising that the generative models responsible for these capabilities are fundamentally quite different from one's own), although there is a school of thought that this *is* alignment (these people also tend to be scientific realists and maybe even moral realists; they also tend to think that we live in the easiest, "alignment by default" world, cf. https://www.lesswrong.com/posts/EjgfreeibTXRx9Ham/ten-levels-of-ai-alignment-difficulty).

Expand full comment
DamienLSS's avatar

Thanks for the reply. While I'm not tremendously convinced - I don't think our host, for example, thinks there is a 30% chance of disaster from AGI agency comparable to microbes - I think it's fair to say that there are special cases (e.g. the "fluid AGI" you posit) where self-recursion would still be acceptable to that entity. There are humans of whom that might be true, after all - people who don't care what humanity looks like or does in the future, only that it survive.

My point, in the sentence you said didn't make sense to you, is that if you envision the entire possibility space of AGI's as being large, but also consider that the alignment problem restricts self-recursion scenarios to a few special cases (again, e.g., the "fluid AGI" that doesn't have attachment to fixed goals), then that should update in the direction that, as a small slice of the possibility space, one should consider FOOM scenarios more unlikely, since they call for a very specific type of AGI to come about.

Expand full comment
Amaury's avatar

An interesting direction to update towards would be to update your model of people who don't bow to expert opinion. Note that this does not mean taking seriously the dude saying that the moon landing is a hoax, but it might mean taking seriously the possibility that they might have good reasons for not updating their views, including the possibility that all your knockdown arguments are probably phrases they have heard thousands of times.

Expand full comment
Ethan R's avatar

Let's update that pilot meme...to apply it appropriately to many recent "expert" issues. The man raising his hand is only doing so after the pilot has either (a) failed to take the plane off successfully every time he's tried, (b) failed to accurately predict where the plane is going every time, or (c) is faking it and doesn't really know how to fly a plane (he's just using auto-pilot).

Expand full comment
AntimemeticsDivisionDirector's avatar

Right, if you know for a fact that the pilot has crashed the last 10 planes he tried to land you actually might be better off trying to do it yourself...

Expand full comment
asciilifeform's avatar

To run with the analogy, in this case neither the "pilot" nor the "passengers" have ever even seen an airplane, and most certainly are not sitting inside one -- because the year is 1490, and the pilot's aeronautical "expertise" came from having stared in fascination at one of da Vinci's drawings.

Expand full comment
DannyTheSwift's avatar

Why is the Average Global Surface Temperature question pitting superforecasters against AI domain experts?

Expand full comment
BillF's avatar

Super forecasters probably get their designation partly by being reluctant to predict sudden big changes. The most accurate forecast of tomorrow is that it will be very much like today. Cranks get their reputation by always saying tomorrow will be much different. Usually the cranks are wrong, but they are worth listening to because every so often they aren't. So, from a SSC reader perspective I would ask that you not update and keep cranking.

Expand full comment
smopecakes's avatar

This has some similarity to the contradictory estimates of costs of warming by economists and domain experts in William Nordhaus' climate economy model

The economists from the outside would predict a few percent costs to warming of about 3.5° while the domain experts would predict the economic effect in that sector to be 20% on average. The climate cost estimate used about 3/4 economist predictions and 1/4 expert predictions and took the average of that

I'm biased towards the economists but I have some thoughts on the expert predictions. Aside from not having a developed grasp on economics, it seems to me that a person with extensive specific knowledge is going to be much more subject to negativity bias. If you know 100 specific things about a system and 10 seem like they could go really bad then you end up predicting an order of magnitude greater cost or likelihood because those would be big deals

This could apply to AI experts. Why spend effort on thinking about how things might not go wrong even if we don't engineer smart solutions? It makes perfect sense to focus on what could go wrong and productively try to fix that

So there may be an implicit and invisible fudge factor where experts assign a higher probability or cost because of negativity bias or sensitivity to the consequences due to familiarity

Expand full comment
Thoth-Hermes's avatar

Domain experts aren't exactly experts in the domain of "AI" or "deadly pathogens" per se, their domains are actually "Existential risk due to AI" and "Existential risk due to deadly pathogens," if I understand correctly. So, we'd expect their estimates of risk to be biased higher than that of superforecasters, who are more concerned with just getting predictions right. Thus, to me it makes more sense to trust the predictions of the superforecasters more.

Expand full comment
TTAR's avatar

Humans are just not set up to reason statistically about events where n=0. Forecasting is not applicable here. Humans can reason through rules-based outcomes where n=0, like sending a man to the moon using physics, where everything can be experimented in smaller scale and where each engineering challenge proceeds logically and nigh-deterministically from the previous one. But for AI and nuclear war the productive lens seems to me to be "what follows from the current state" and not "what percent chance should I apply to X outcome" because you have nothing historical to base that on.

Expand full comment
MT's avatar

> All of my usual arguments have been stripped away. I think there's a 33% chance of AI extinction, but this tournament estimated 0.3 - 5%. Should I be forced to update?

Can't compare without error bars. You don't have to update unless the one estimation definitely precludes the other, and these two predictions could be in agreement if the error is large enough. Which of course it is, nobody has credible predictions here much less any sort of justifiable quantification of their feels.

Even worse we are talking about probabilities, even if we had actual real data there would still be questions about the underlying probability distribution. Let's say it turns out there is a trick to things, AGI is indeed possible with moderately more compute, now what was the probability of that being true? What are we even talking about here, what is this probability distributed over, the quantum multiverse?

It sounds more about "confidence that something will happen" than "probability that it could happen". Maybe seems semantic but imo important. Then you should ask why these experts are quite confident and you are not, and whether you are convinced by their arguments (or are you just inclined to reasoned hedging where they are intentionally making bold statements).

Expand full comment
Peter McCluskey's avatar

It's misleading to infer from Metaculus that superforecasters see anything like a 50% chance of superintelligence by 2100.

Metaculus uses a weaker definition of superintelligence than is standard. Wikipedia says that superintelligence means "intelligence far surpassing that of the brightest and most gifted human minds", whereas Metaculus includes merely matching (average?) human abilities.

Even if the superforecasters agreed with the Metaculus forecast, I'm pretty sure they'd find some reason to expect "superintelligence" to have only minor effects on normal life by 2100.

Look at question 52 (Probability of GDP Growth Over 15% by 2100): the median superforecaster probability was 2.75%, versus 25% for domain experts. That's an important disagreement over how much impact AI will have.

My impression is that the median superforecaster will continue to find reasons for expecting low impact, even when many more human tasks are automated.

I'm guessing that this is either the result of a conceit about human uniqueness, or it's the result of a strong prior that life will continue to be normal (with normal being a bit biased toward what they've lived through). This heuristic usually helps superforecasters, but the stock market's slow reaction to COVID reminds us of the limits to that heuristic.

Expand full comment
Oig's avatar

Someone explain to me how improving verbatim and associative recall magically transcends to godhood.

Someone explain to me how a computer program would even know that it's "In a computer."

Someone explain to me how AI can manifest architectural or counterfactual thinking, or imagination.

Every post I see from AI alarmists boils down to "well if we make it more smarterer then it will exponentially be intelligent to apotheosis." They need to stop reading G studies. You don't just "add intelligence" by increasing fidelity of associative reproduction + data and produce superintelligence. This is the kind of impoverished view of the mind I would expect from a middle school student.

Expand full comment
asciilifeform's avatar

I suspect that a number of folks who grew up in atheism / "humanism" had their meatware do an end run around it: eventually they found it pleasing to believe that though gods may not currently exist, they could and will be built.

Much of what is happening in the prolixly-verbal-smart "AI fandom" follows from this: the "intelligence" (in the "I was Good At School and am a Better Person than those who were not Good At School" childhood trauma sense) worship; the belief in the imminent coming of the robotic god whose sign bit may flip him into a robotic devil; the desire to suppress unbelievers and heretics "for the common good"; etc.

Expand full comment
B Civil's avatar

I’m with you. To me, the whole argument is predicated on an assumption that is untenable; that super intelligent AI will bring all the baggage of human thinking with it, without access to the non-trivial information stream that both confuses us and inspires us:

Our physical, vital, gooey sensibilities. It’s like an elephant in the room that no one seems to want to look at because it would get in the way of these esoteric discussions. It literally boggles my mind. If AI destroys us, it will be because we told it to, not because it decides that it wants to.

Someone explain to me how this dialectic that we call consciousness can occur without a second party to interact with. There is no yin and yang, just yin.. there is no existential dilemma in the realm of pure thought.

Expand full comment
Cjw's avatar

For both AI doomerism and climate change doomerism, the people most concerned about them tend to also be the ones discussing them the most and have the greatest familiarity with all the usual facts and arguments that would be thrown around in a debate. For a normie on the outside looking in at those communities, a huge question is whether A) they started with a desire to learn about Danger X, and learning more about Danger X made them more worried, or B) their pre-existing concern about Danger X was irrationally high, and led them to spend more time on it, to try to save the world, or C) they've become an insular community wherein worrying about Danger X is just a feature of the subculture.

Concern over AI is the one thing nearly the entire LW-sphere tends to share. I'm not in that sphere, but since you're closer to being my "tribe" than the average climate change alarmists, I tend to assume YOU are being genuine while THEY are people acting under bad incentives; that what scares YOU guys must be actually scary, but those other guys are either lying or foolish to be scared; and therefore I worry much more about AI than climate change. I'm able to see this cognitive bias, but maybe not to update out of it.

Within my sphere, there are definitely people who talk about you guys as having a cult-like devotion to AI doomerism, but you also have been studying the dangers of AI so much longer that I still want to trust your opinions. You look like the "experts" in AI danger to me, and present a seemingly broad range of ideas on the topic, but you're the only people I ever encounter who are presenting it so how can I know? When the only people talking about AI dangers are people who are so heavily invested in it as the LW bloggers are, how sure am I really that Scott's 33% is so obviously better than the other folks' 5%, I mean I wanna say it's because obviously Scott and his cadre have spent longer thinking and arguing about this and are clearly smarter people. But I'm sure the laymen who instead think global warming will destroy the earth must believe the same thing about their cult of experts, and probably for similarly biased reasons.

Expand full comment
asciilifeform's avatar

The sense of dread felt by the "doomers" is quite real.

But it is a displacement, because they prohibit themselves from discussing or even thinking about the perfectly pedestrian real-world causes of this dread: their civilization is drowning in its own waste products (well-credentialed entitled mediocrities who excelled at "schooling" but can't think their way out of a paper bag in re: the physical world, presiding over $billion/mile roads, idiotic military adventures, decommissioning nuclear plants, and largely stalled technological progress.)

A good illustration of this kind of "displacement thinking" can be seen in this old parable about "the harper Jian-Fu" : http://www.loper-os.org/?p=3955

Expand full comment
asciilifeform's avatar

A number of people seem to be fond of the idea that "killer AI" (or, alternatively, malevolent aliens) will nail humans with memetic -- rather than kinetic -- weapons.

Has anyone considered the possibility that this weapon has already fired? (Who, precisely, fired it, is immaterial. Could just as readily, and more parsimoniously, have been human elite hands.)

The "nuclear safety" meme is already creating energy poverty and reinforcing the hegemony of fossil fuel peddlers.

"AI safety" IMHO was a memetic bullet fired from the very same barrel as "nuclear safety". And likely to have very similar effects. (See also e.g. https://www.lesswrong.com/posts/Ro6QSQaKdhfpeeGpr/why-safety-is-not-safe )

Expand full comment
PecotDeGallo's avatar

I participated and can definitely support the less wrong commenter that there should of been fewer questions. At least in my particular group, participation sharply fell off after the first round which meant discussion was limited to maybe 20% of the group and many questions only had 2 or 3 people answering. This made it a lot more difficult to dive deeper and have more comprehensive conversations about why people chose the forecasts they did. It looked like in the later rounds seeing other groups answers that participation was pretty variable so maybe other groups had better luck.

Expand full comment
Pepe's avatar

I also participated and agree: way too many questions. My group's participation also had a massive decline. I would also add that the interface used for both registering predictions and communicating with other members was not very good and it made both going through all the questions and trying to have in depth discussions with others a massive drag.

Expand full comment
PecotDeGallo's avatar

Someone posted this link in another comment: https://damienlaird.substack.com/p/post-mortem-2022-hybrid-forecasting Which I think does a really good job summing up the tournament experience and pointing out a lot of the shortcomings especially around UI. Would’ve been a lot more quality collaboration and discussion with screening out less motivated people and providing something like a discord for better coordination and communication.

Expand full comment
Ted's avatar

Same. Not only were there 60 questions, but each question had two additional dimensions: time, and what do you think other groups (superforecasters, experts) will forecast for this same question? So there were like 60 x 3 x 3 things you were supposed to enter forecasts for in a difficult-to-navigate interface, and then one box to type your reasoning. Reasoning for what? The 2030 forecast or your guess at superforecaster's guess of the 2100 forecast?

Meanwhile each question often had significant research necessary to get a grasp on it. In the full report there are 112 pages just documenting the questions, resolution criteria, related materials etc.

Within my team there was a huge variance in participation as well, and then when it got to the wiki stage and each one was assigned a primary editor, it seemed like in most cases that person simply wrote their wiki. There was not much group involvement. Having any meaningful engagement with the team was difficult.

Expand full comment
Swami's avatar

I see a glaring problem of looking at only one side of the ledger. Yes there is a tiny (but significant) chance of AI causing a catastrophe as defined in the exercise. Yes, there is an even greater risk of other (non AI) natural or man-made catastrophes over the next century. What is not being estimated is the likelihood of AI being used benevolently by humans (or via the will of the AI) to counteract or prevent other catastrophes (which could be natural, man made or even possibly from another AI).

The costs and benefits of AI must include not just the risks of the AI, but the potential life saving and catastrophe averting benefits. Any analysis looking at only one side of the ledger is pretty much useless.

My opinion, FWIIW, is that there is an extremely high chance of stupid humans using power and technology to cause massive and horrifying catastrophe. This can be intended or not. It is just a matter of time. I would strongly argue FOR machine intelligence to help us prevent these risks. Today we have great power and great stupidity. I think the universe would be better off with great power and great knowledge. AI is the path forward.

Expand full comment
asciilifeform's avatar

> high chance of stupid humans

The power elite are already firing the opening shots of the war over who precisely will get to monopolize (so far, hypothetical) interestingly power-amplifying AI.

The one thing they can all agree on is that the answer must not be "plebes" (i.e. people who did not go to Ivies, did not pass political reliability muster, do not care about anointed "experts", etc.)

Expand full comment
Roman Leventov's avatar

I think there is a huge hidden incentive to downplay the extinction risk for most people: if you agree that the risk of extinction is huge rather than "negligible" (and even 5% falls more in the psychological category of "negligible" unless people thought about long-termism before), under reasonable ethics, they should be forced to change what they do in their lives, maybe radically, switch careers, lose prestige, etc. People don't like doing that.

Whenever I try to convince my friends about a significant extinction risk it usually falls on deaf ears, I think mostly for this reason.

Expand full comment
MM's avatar

To an arachnophobe, spiders will do bad things. You can give them all the arguments in the world, and they can gradually accustom themselves to being in the same room as one, but you won't get them to update below a minimum that seems crazy high to you.

The idea that 10% or even 1% of human thinking has been superseded seems unthinkable to me. My estimate is much less than 1% - none of which actually relates to real things. Maybe high school English essay composition homework assignments, or Associated Press releases have some AI component.

Yet you seem to be persuaded that ChatGPT (a program that cannot distinguish between real things and not real things) has taken more than 10% already. If this is the case, then I wouldn't be surprised if airplanes start falling from the sky in the next week or so.

Sorry if this seems confrontational - I am firmly of the school that we have not seen anything like AI now. After reading and arguing about the recent developments it seems further away now than it was 10 years ago.

Expand full comment
Scott's avatar

Amish?! Okay, *one* Terminator. ;-)

Expand full comment
alesziegler's avatar

Thank you for this. I confess that I took the results seriously, without bothering to do a deep dive into their paper, but you in combination with horror stories from participants in this comment section convinced me that I should not.

To join the pile-on, I find their definition of AGI (question 51, page 225 and 694), deeply problematic. As you note, it is "whatever Nick Bostrom declares to be AGI". Or, if he would be unavailable (very plausible scenario for the year 2100), vote by a panel of 5 experts "with a deep knowledge of Nick Bostrom’s work".

Expand full comment
Jorgen Harris's avatar

"What I wanted was a way to quantify what fraction of human cognition has been superseded by the most general-purpose AI at any given time. My impression is that that has risen from under 1% a decade ago, to somewhere around 10% in 2022, with a growth rate that looks faster than linear. I've failed so far at translating those impressions into solid evidence."

I find this argument vague in a way that is useful for explaining my skepticism about AI arguments. If we think about what 10% of human cognition could mean, I think there are a few possibilities. One is something like, the AI can do 10% of the stuff that humans meaningfully cogitate about as well as a human could (or, maybe, we could produce the same amount as we do now as a society when replacing 10% of human cognitive-hours with AI-hours, weighted by hourly wages/some other measure getting at the quality of cognition).

The problem with a definition like that is that at any given moment, human cognition is being used to solve problems that mature technologies haven't solved for us. New technologies that replace cognition are going to look, at the moment, like startling advances that close the gap between human and machine, even if in the long-run they look like important but limited technological advances.

Think about the cognition done during the Apollo space missions. My suspicion is that the vast majority of the human cognition-hours expended during the missions were spent doing calculations--arithmetic, algebra, and calculus. We've completely automated arithmetic, and have for a long time more-or-less completely automated algebra and calculus. From the perspective of human cognition as-of-1969, 2010-era technology had likely automated more than half of human cognitive tasks in the most advanced, technologically sophisticated areas of human endeavor.

What's happened since isn't that arithmetic, algebra, and calculus have stopped mattering--our current society depends on absolutely massive numbers of calculations, performed continuously, across pretty much all sectors of society. If we wanted to think about how much *extra* human cognition we would need to keep our current society but fully replace computers, the numbers of people would be mindboggling (trillions of extra humans calculating all day long would barely scratch the surface). We just don't think of this stuff as core, human cognition anymore because it's all been automated in such a mature way that humans don't really do it, or only do it in very simple forms, on the fly, or as an exercise in school.

So, GPT-4 can summarize text very effectively, can write code very effectively, and can write essays/emails/etc. moderately effectively. This is new and exciting. It's reasonable to imagine that even if the technology just matures a bit without becoming dramatically better, writing code yourself will become as anachronistic as doing arithmetic by hand, as will doing paralegal-type work of looking for all relevant cases or articles on a topic and writing brief summaries of each. If that's more-or-less as far as the technology advances, we'll end up adapting to the fact that these areas of human cognitive effort are essentially freely automated, will build our society and economy around effortlessly performing these tasks, and will stop thinking about them as difficult or costly tasks in the same way that we've stopped thinking about arithmetic as difficult and costly.

While I of course don't think that this means that LLM-style AI is destined to get better at the stuff it's currently good at without turning into a superintelligence or completely reordering society, I do think it lends some credence to the outside view. It's really not enough to say: "look at how fast this technology has progressed--imagine where it will be in 10 or 20 years," because sudden and dramatic improvements in the ability to do work that is hard for humans isn't unprecedented or new. And it's not enough to argue that there's no clear and obvious limit to the current technological paradigm, because people didn't see a clear and obvious limit to the capacity of calculation machines in the 1960s, 1970s, and 1980s either. We discovered those limits and started articulating differences in the kinds of problems that calculation machines could and couldn't easily solve by hitting the limits.

To me, this implies that the argument for treating (roughly) current-design LLMs as the harbingers of true general intelligence rests on an understanding of the structure and design of LLMs themselves, and in particular on an understanding of how that structure can be adapted to doing stuff that current LLMs aren't great at. My understanding (which could be wrong) is that we have a lot of understanding about how LLMs are trained, and something about how they behave, but we don't have much of a clear idea of how they "think," or of how the stuff they're currently not as good at can be achieved through an intensification of the stuff they can already do well.

Expand full comment
SnapDragon's avatar

I participated in this tournament, but had completely forgotten about it until now. I suspect I was a poor-quality outlier, but I can give my perspective.

I first filled out their survey because Scott mentioned it way back in Open Thread 222. I have no forecasting experience and had no interest in the monetary incentives, but I did want to debate some of these big questions with serious people. I was a little surprised to be accepted, and having read the writeup I'm even more surprised. I'm definitely not a superforecaster, so I guess I was counted as an "expert". And I am absolutely an expert software engineer. But while I've worked with ML systems for many years (and have a decent understanding of the nuts-and-bolts of AI), I don't understand why I'd be considered a "specialist on long-run existential risks to humanity." This is not to cast shade on the tournament in general - the people I interacted with clearly belonged there - but I would probably fall into the category of expert that Peter was unimpressed by.

As Peter's writeup mentioned, one issue with the tournament was that it wasn't very focused and went on for too long. Everyone was required to forecast the big x-risk questions, but there were a whole lot of other questions that we didn't/couldn't spare much attention for. And while I responsibly spent hours researching my questions early in the tournament, as the months went by my enthusiasm waned, I procrastinated a lot, and I didn't participate in the discussions/persuasion nearly as much as I should have. (I think this might have been a common problem, based on the communications we received.) So, we had some good debate within our team, but it was nothing compared to, say, getting all of us into a conference room for a weekend of lively discussion.

Incidentally, I did update a few of my forecasts based on the intra-team debate, and others did too, so some persuasion DID happen. I personally didn't see much happen after the inter-team write-ups were published, though (which might just be due to my own laziness).

As for my actual predictions, most of them were pretty close to the expert medians. One exception was pathogen risk, and I'm confused by the legend of "Table 2" that Scott copied from the paper. The other "catastrophe" questions asked about the odds of 10% of humanity dying in a 5-year period, but the pathogen questions they asked only required 1%. Which is MUCH more likely, and indeed this threshold was significantly exceeded by the Spanish Flu (even in the less-globally-connected era of 1918).

As for AI risk... well, I sadly think most of this community is blind to the tower of assumptions they're building on. Engineering is hard, technological development is hard, and an AI trying to FOOM itself to godhood without being noticed is not immune. My tournament prediction for AI extinction by 2100 was 3%. This was before ChatGPT's capabilities absolutely shocked me (even though we'd seen some impressive things from GPT3 already, I definitely did not predict the capabilities of 2023 LLMs). My current prediction is ... more like 2-2.5%. Yes, I updated downwards, because LLMs suggest that even the first floor of the tower of assumptions - that capable AIs can only be agentic beings with dangerous instrumental goals - is wrong. (In some sense, the "orthogonality thesis" is even more right than Bostrom predicted.) Knowing how ChatGPT works, I am unconvinced that even a perfect GPT-infinity would pose an extinction risk (except by intentional human misuse).

Expand full comment
beowulf888's avatar

I suspect this reveals more about the inherent pessimism of humans than any actual understanding of the actual risks. Having lived through half-a-dozen End-of-the-World-as-We-Know-It (EotWaWKI) events, I've become as cynical about the current crop of EotWaWKI predictors as I am with the other false-prophets that populate modern America...

About 39% of US citizens believe that we are living in the end times. With ~10% it will happen in their lifetime (which I assume means the next 20-50 years).

However, 90% of subject matter experts believe we are living in the end times.

https://www.pewresearch.org/short-reads/2022/12/08/about-four-in-ten-u-s-adults-believe-humanity-is-living-in-the-end-times/

https://research.lifeway.com/2020/04/07/vast-majority-of-pastors-see-signs-of-end-times-in-current-events/

Expand full comment
Andrew Marshall's avatar

I also participated in this tournament. I was fortunate to speak to some AI experts at a conference last fall, while in the midst of this tournament. When I asked about extinction risks at the conference, I heard some crazy responses. One that if there was a problem we'd simply turn it off. Another saying that no one is trying to remove the human from decision-making processes.

At this point, I realized how differently some people saw AI risks.

Expand full comment
Immortal Discoveries's avatar

None of this makes sense. With current pace of progress AGI is coming before 2029. GPT-4 IS Weak AGI.

Expand full comment
Connor Flexman's avatar

I think instead of letting the inside view and outside view each act on you separately, you're supposed to actually have them interact. Dive into the Outside View and consider what differences there are between this case and a usual case, and whether these hold up or not. That leads you to a principled understanding of why you can disagree with experts here.

For example, why am I permitted to disagree with expert forecasters on AI forecasting?

1. We’ve been discussing AI forecasting with experts for a decade and they all slowly and predictably increase their odds of AGI-soon and AGI-doom over the years. This is NOT true of moon hoaxes. For things it is true of, I’m happy to be in the vanguard of early updaters.

2. The people I know who believe in large AI risks have many separate axes on which they do better than experts. For one, they are often better at forecasting than superforecasters (in my experience, superforecasters are often pretty mediocre). But they also tend to be more correct about many other provable things (general factor of correctness), or have absurdly high IQs and prestigious degrees if you're into those kinds of things. This is NOT true when going against the experts of moon hoaxes. For things it is true of, I’m happy to be in the vanguard of early updaters. If you say that these appeals to certain special traits are no different than those a moan hoaxer could make (or a CDC-believer would make), I can say many other reasons why my process is better than theirs: I've put many hundreds of times more time into thinking about which axes cause people to be trustworthy than they have, my choices win betting tournaments and theirs do not, etc.

3. AI risk is a legitimately far more difficult field to forecast in than nuclear or bio risk. Those fields have great data, metaphors, containment plans, and existing literature. Superforecasters rely on this. To zero-shot a completely new scenario is extremely different and requires a separate type of theorizing expertise. In fact, I'm willing to go so far as to say: to think about minds without anthropomorphizing is itself is a difficult-enough bar that it requires its own expertise—and then the additional difficulties of data-less literature-less forecasting are easily another full reason it requires its own expertise. You can know this because in many risk assessment fields, experts do make good arguments right off the bat, yet I’ve seen many “computer science experts” make very false arguments when first trying to think of superintelligent AI, in ways that convince me this is not a normal field where normal experts have expertise. Again, this is NOT true of moon hoaxes. I'm happy to be contrarian against experts or forecasters in places where there is minimal feedback, aka punditry fields (and indeed I'm contrarian about many aspects of effective political organizing).

(And then see my other comment for additional reasons why I think superforecasters are not necessarily calibrated on low-probability risks, and likewise with experts I do not especially trust them here.)

Expand full comment
Walter Sobchak, Esq.'s avatar

Apocalypses are religious literature. The really bad news is that this old world is going to continue to continue, and each of you will have to head the words of the Buddha. All of us will suffer the death of our physical bodies, each of us must work out the salvation of our souls with diligence.

Expand full comment
Victor's avatar

For me, the main problem is that an expert in a field I am not overly familiar with who is expressing an incorrect conclusion looks and sounds exactly like an expert who is expressing a correct one. It seems almost impossible to tell the difference without oneself also becoming an expert.

Expand full comment
JPH44's avatar

I happen to be person Peter was talking to about Pearl and commenting about. I also did the more recent follow up study with FRI, which was more focused (only covering AI), in depth and with (I think) better experts than the original tournament.

The main variance between the AI sceptics (of which I am one and in these projects were mainly, but not only, super-forecasters) and the AI idealists (entirely experts in this case and Peter wasn't part of the follow up) relates to how difficult extinction or even catastrophe might be and whether scaling up current models and incremental efficiencies is sufficient for this risk to be meaningful.

The sceptical camp believes that a number of steps have to be achieved and taken in wrong directions for extinction risk to be realised. The core thesis of the idealists is that scaling has been sufficient to develop AI capabilities and will be sufficient to develop AGSI and that the existence of an AGSI has itself a high(ish) exitinction risk for humans.

(It's also important to say the 0.5% type extinction risk forecast by sceptics is not a statement that AI is safe. All of the sceptics in the follow up project saw cause for concern and believe AI risk is high enough to justify lots of spend on alignment work and taking the threats it poses seriously).

For what it's worth (and I may not check in for comments having some fatigue on the subject and other things to do) this is my own summary of the sort of views expressed by the sceptics:

Steps toward AI risk

Development of AGI - it's reasonable today to disagree about how far away we are even if the probability of AGI is well over 50%.

Development of ASI - we don't know what would be required to move toward this. Maybe it's easy, maybe not, it's reasonable to think its a lower probability than AGI.

Development of AGSI - it's reasonable to question whether this follows automatically from AGI or an initial ASI, and to also consider what AGSI might look like and require (e.g. it might need to be trained on whole world data which is not happening any time soon or with the kind of AI systems we have).

Use of AGSI. It seems reasonable to wonder whether a single AGSI framework might evolve into various separate counterbalancing systems and that a single general system might not become 'masterful'. It seems reasonable to consider a probability that AGSI's might be more capable as isolated systems trained on domain specific data and run separately.

Developments in robotics and sensors - These are preconditions of AI becoming part of the real world and it's reasonable to question what progress is required for this to become reality.

Cost and power efficient development of AGSI and Robotics. Things may be too expensive or have uncertain power and energy needs and so breakthroughs are probably required in this area as well for adoption of AI, AGI, ASI and AGSI to make its way into the world.

Global infrastructure development - few parts of the globe have sufficient infrastructure for the kind of AI enabled world that would become risky for humans. Environments are not always suitable for machines (they are not ideal for humans either, but we live all over). Would AGSI get everywhere humans live? Lots of current technology hasn't.

Society acceptance of AGSI - people accept some systems (electricity) but not every proposed system - see nuclear energy.

Political and military acceptance of AGSI - will countries give up their power just like that?

Management of global resources devolved to a single AGSI - this is still a long way from AGSI existing in the first place as the other steps should have made clear.

A failure to adequately monitor or control any AGSI. I think it's reasonable to suppose we would be trying very hard to make this a success. The AGSI would likely need to be deliberately fooling us and therefore malicious and also agentic/sentient and it's reasonable to question whether there is a high probability of this being the case. Thought experiments in philosophy departments are nothing more than that.

That an AGSI can eliminate humanity without us noticing or so fast that we can't respond or is so powerful we cannot respond seems conceivable (I just wrote it after all) but hardly the only possible scenario. It's reasonable to wonder how easy it is for the AGSI to reach the hurdles and what its motivations might be. Certainly, the AGSI would need to develop a very fast and effective method of killing which requires its own breakthroughs and improbable scenarios (I could list several steps for this itself, but this post is already long) and have agency and very likely much more than poorly managed goals (as per convergence theory).

One illustrative way of explaining the difference between concerned and sceptic camps is that the difference between c30% and 1% extinction risk is to give a range of 70%-90% probability to each of the events listed above.

Obviously, that's a very simple way of looking at it. Individuals probably believe some of these steps are nearly certain and some are more likely (incremental steps) once others are achieved.

Nonetheless, I don't really believe any of the steps follow in a self-contained logic statement with 100% certainty from the other. Other people may disagree. There can also be risk without a particular step, but to end up with a single masterful AGSI its hard for me to believe we can miss many of them.

Expand full comment
darwin's avatar

One explanation that came to mind is the standard explanation for the popular observation of 'Experts keep saying there will be a crisis but it never happens', which is that the experts are agitating for change, they succeed, and the crisis is averted.

Is it possible to experts are arguing/estimating from an inside view of 'this is what will happen if we, the experts, don't fix it,' while the forecasters are estimating from an outside view of 'this is what is likely to actually happen, including as a result of all expert's efforts to fix it'?

It seems to me that this would be the standard mode that before groups are used to thinking about and working on these problems from - the experts trying to plan their own efforts to fix the problem and convince people of the need, vs the forecasters including those efforts in their predictions.

Expand full comment
sdwr's avatar

Existential risk is obviously a scam. Domain experts have a vested interest in raising the profile of their pet topic. "It's gonna kill everyone" is the easiest, cheapest, most manipulative way to do so. It's so easy, you don't even have to consciously know it's a trick to be able to pull it off.

Expand full comment
quiet_NaN's avatar

There is a certain analogy to judging the trustworthiness of forecasters by their their predictions of well-understood bio risks. It is sort of like testing the alignment of an ASI by asking it questions about a situation you understand, and hoping that it will also act in your interests in more complex scenarios.

For the forecasters, the most you can learn from that is that they are not stupid. The historic frequency of pandemics is common knowledge, if a superforecaster can take a reasonable guess at the mean time between pandemics, you just learn that they are not statistically illiterate. Just like an ASI telling you "I am not allowed to crash the moon into the earth" does tell you little about its alignment, just sets a mild lower bar on its intelligence.

Expand full comment
Jeffrey Soreff's avatar

Re the non-anthropogenic catastrophic risks: 0.1% looks very low to me. Carrington events happen. If there is a 0.5% per year probability of a Carrington event, over 70 years this is around 35%. If the power grids of the world (and quite a lot of other electrical and electronic equipment!) are knocked out for a few months, a death toll of 10% looks quite plausible.

Expand full comment
Peter's avatar

Experts will be more worried about their domain than outsiders

1. Selection bias: people worried about x are more likely study it.

2. Economic pressure: the field as a whole will command resources in proportion to the seriousness of the problem it is solving

3. Partisanship: People become partisans when they dedicate their life to a problem and partisanship makes people think less clearly

4. Availability heuristic: Domain experts think about their problem more often which skews their view of it.

Anecdotally, I knew a postdoc who studied food born illnesses who monitored his personal refrigerator and would throw away everything if the temperature briefly exceeded 4°C.

I trust the forecasters here.

Expand full comment
Grzegorz LINDENBERG's avatar

Well, I was wrong about 1935. Actually, the physicists did were wrong 2-4 years earlier.

"There is no likelihood that it will ever be possible to harness the energy within the atom." - Arthur Compton, 1931

"The energy produced by the breaking down of the atom is a very poor kind of thing. Anyone who expects a source of power from the transformation of these atoms is talking moonshine." - Ernest Rutherford, 1933

"The idea that atomic energy is available for practical use is just an illusion." - Albert Einstein, 1932

The fact the fission was not yet discovered is irrelevant. That's like assuming that no new discoveries (like Transformer) will be made in the field of AI in the next 5-10 years, that AI is going to be created exactly with the same methods it is being created now. Obviously that is a very comfortable assumption, but totally irrational. There are going to be new discoveries, new methods and much more powerful computers - but of course AI "will never be a threat to humanity because I say so" LOL

Expand full comment
Maximum Limelihood's avatar

> When we asked them to forecast again, conditional on AGI coming into existence by 2070, that figure rose to 1%.

This is the most damning part of the predictions. It implies they're 75% sure that AGI isn't going to arrive before 2070, which is just not a reasonable prediction if you've been paying any attention.

Expand full comment
Ben's avatar

The downloadlink for the PDF fails (at least for me). Here is the main entry page with a new PDF link: https://forecastingresearch.org/xpt

Expand full comment