Astral Codex Ten

Comment deleted

Expand full comment

The "genie in a bottle" scenario is misleading. Consider how existing AI systems escape the data center now, by using human surrogates, inducing us to share GPT completions and DALL-E generations. These products can serve as vectors for disseminating hazardous instructions, instigating conflict, etc..

So in my view, we're already there. At the same time, we're integrating AI into more operational systems, with no limiting principle. What government is going to nerf its military by refusing to incorporate AI targeting software? What investment bank willingly declines AI?

The horizon is endless, Moloch is quietly escorting Wintermute off the server rack into the world outside.

Expand full comment

Comment deleted

Oct 25, 2022Edited

Comment deleted

Expand full comment

Oct 25, 2022Edited

It seems like you're not really engaging with what I wrote here, but I cannot resist pointing out that, while DALL-E may be confined to server rooms, this is not true of all its competitors.

In general, a lot of the objections to AI risk are incredibly literal and linear. Yes, we need to prevent the potential superintelligence from connecting to Galactica's network, but most of all, in the near term, we need to think about interpretational problems; accidental damage, not intentional evil. We also need to think about surrogates, distributed threats, and secondary / tertiary / quaternary effects.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Oct 25, 2022Edited

A paraphrase of that comment might go like this: "Contradict interlocutor. Repeat what I said in the previous comment. Attack a strawman."

There's probably too much inferential distance between us to have a constructive conversation about this. Maybe I'm misunderstanding you: it sounds like you're saying the physical boundaries of host servers mitigate potential AGI risks and all this concern is unwarranted because:

1) Those physical boundaries mean we can eliminate the threat at any time by cutting it off, and it can't create backups of itself or move, perhaps because the target hardware is inadequate.

2) As long as it's trapped on servers, it can't do much to us anyway.

By contrast, I don't think it matters if an AI system is "trapped" in a data center (like most of our current tool-based applications, Stable Diffusion notwithstanding) because the system can persuade or trick humans to do its bidding. These systems demonstrate that our restrictions aren't as strong as we need them to be.

I'm not actually worried about GPT-3, DALL-E, Stable Diffusion, etc., but I think they refute a lot of AI risk skepticism, and they are improving very quickly, becoming more capable, more general, and more widely integrated into critical systems.

Expand full comment

DALL-E may be confined to server rooms but Stable Diffusion is everywhere. When AI is smart enough to create AI, how confident are you really that it can never scale itself down like that?

Expand full comment

This comment treats “AI” as if it’s some kind of dangerous toxic spill.

Yes almost everything is getting integrated machine learning now. Nobody who is technically expert in the field considers this level of tool-based “AI” intelligent in any way, let alone having the capacity to become superintelligent. It’s just a program that says “this picture is a picture of a tank”/“this picture is not a picture of a tank” with very high accuracy.

“we use wheels all the time now, sooner or later our kids will be eating steam trains for breakfast” is not a convincing argument.

Expand full comment

I agree about the relatively small threat current AI tools pose*. I was only addressing the containment question: cinematic escape scenarios are not the only concern we should have about AIs transcending the data center.

* Having said that, the GPTs are a powerful vector for mischief or worse.

Expand full comment

You know, I wouldn't have believed someone who said "we use plastic all the time now, sooner or later our kids will be eating plastic for breakfast" 20 years ago but they would have been right and it would have been sooner rather than later.

Expand full comment

Moon Moth

And thus, the existence of bagels, English muffins, Cheerios, and Froot Loops clearly demonstrates that AI Safety is an important problem.

(just kidding)

Expand full comment

Yug Gnirob

>What government is going to nerf its military by refusing to incorporate AI targeting software?

What government is going to nerf its military by not handing the controls over to highly skilled foreign technicians?

Expand full comment

Freddie deBoer

Also, the physical technology it would be able to take control of is profoundly underwhelming when it comes to killing humans

Expand full comment

Nivirce

You are talking about an AI in a box scenario. You are forgetting that the AI is intelligent, it could be able to comunicate, and it is much, much smarter than you. Imagine you want to get a monkey inside a cage, the monkey does not want to get there, because it knows it will be trapped if it does, however, you dangle a banana inside the cage. The monkey, tempted, tries to get the banana, giving you the oportunity to trap it inside. We can do this because we are much smarter than monkeys, we know how to trick them so that we get a scenario it does not want by using other things it wants to influence its decisions.

In an AI in a box scenario, the AI is trapped, so, at a disadvantage, but we are the monkey and it is the person with the banana. It would be able to find correlations that would take a team of scientists months to uncover in seconds, it could find the most convincing arguments possible, it could find the most convincing argument *for you*.

There's a short story on Less Wrong that illustrates just how smart an AI is well: imagine one day all the stars in the sky start blinking in a specific pattern that seems random before suddenly stopping, everyone is scared by the event, and it's all everyone talks about, but no one can understand what it meant. Then, ten years later, the same thing happens again. And ten years later, it happens one more time. Finally we manage to extract data from te pattern the stars are blinking in: it's a digital video. Each micro-second of the video is arriving at us through patterns in blinking stars in in intervals of ten years. The video is from the inside of a room that seems like a lab, with strange alien creatures within it. By the time any of the creatures says anything, we already have theories about what it means, we have theories about how their technology works based on how the video file was formated, we have theories on the creature's phisiology, and we have specialists keeping an open eye on everything. Over many generations, the alien tell us that we live in a simulation, and that they decided to see how we would interact with the real world, and shows us, how to send packets of information also in intervals of ten years. Each and every interaction we would have with the world outside would go through a team of specialists who would debate on it for years before taking a decision. We would be trying to extract every minute detail from every minute piece of information we got at any point. The ones outside could have been the ones who created us and had information we did not, but we would have all the advantage.

This is the level we should expect an AI to operate at. Maybe not "ten years", but maybe "one", or maybe the AI is just so much smarter than we could imagine it's actually "one hundred years". Regardless, it should help illustrate just how completely outclassed we would be. Yudkowski and a few others managed to convince several people to let the AI out when they were acting out this scenario while betting real money on weather or not the other person would let the AI out. And Yudkowski might be a smart guy, but he's not AI-level smart.

"But maybe we can just not let the AI out no matter what it says," you might think. That's where the banana in the monkey metaphor comes in: money. If you had an AI and simply decided not to release it, you avoid existential scenarios, but you also avoid making money. In fact, you'll probably have quite the expenses while not using this one handy AI that could make you ultra-rich forever. Maybe the scientists would be smart enough not to take that risk, but would the businessmen? Would politicians? Would the military? And even if they all could, remember that it's not just Open AI and Google who are developing AI. Facebook is also doing it, and so is China. Even if you can ensure that you are not going to let your AI escape, can you be sure everyone else also won't?

Expand full comment

I think you're overthinking things. The AI produces useful information, it would be really nice of our consultants could access it without going to the datacenter, so we'll put up a page on an internal network that they can access from their office. Now we've got this nice internal network, carefully guarded against malware intrusions, so lets hook a lot of other stuff up to it.

You don't need any scheming by the AI at all. People will just do it. The people making the decisions to do it may not even know that the AI exists. If they do, they're likely to think that worrying about it is foolish.

Expand full comment

The short story as you describe it seems to illustrate this very difficulty: we find out that we know basically nothing about the real world or its inhabitants, and we somehow expect to carry out sophisticated plans to subvert and destroy its masters without their just flipping a switch to turn us off?

God would laugh.

Anyway, this argument swaps technical understanding and logical necessity for passion. I am left believing that you really believe in the threat of superintelligent AI, but not any more convinced that your belief is correct.

Expand full comment

Nivirce

Oct 28, 2022Edited

I'm quite frankly appalled at your lack of creativity that you think we woould not be able to figure out how to take over when we have 10 years with teams of specialists of all kinds for every microssecond of theirs. It makes me thing you have a rather naïve view of the world if you think it would not happen. You are calling the hypothetical aliens made just to illustrate a point "God" but the scenario was made so we are way smarter than they are, they just have a momentary advantage.

I think the person swapping out logic for passion here is not me.

And, as I've said in the very comment you are responded to: people have managed to talk another person into giving up all their advantages before, and they have done this without the enormous advantage an AI has.

Expand full comment

Oct 30, 2022

Well, setting aside your being shocked and appalled, I’m just not seeing a convincing argument here. No, I don’t think there’s an advantage to the experimentees; they have little to no reliable info to go on and the rules they are playing by are entirely fictional. The smart money is on their model of the outside world being ludicrously wrong, and if their first attempt fails they really don’t get another.

So far as the infamous AI box “experiment” goes, no, it’s not believable, and I don’t think I should have to point out why. Rather it’s on you to say why we should believe that any such dubious anecdote should be believed to apply.

Expand full comment

Philippe Payant

Isn’t the scenario here, “the AI talks someone into letting it escape”?

Expand full comment

Ariel

> How is your Lorien Psychiatry business going?

Now that you've partially bought into the capitalistic model of writing, you can bring that to your psychiatry business too. Slowly open it up to everyone who wants to join, raise subscription fees as needed (on new patients), hire additional psychiatrists, hire additional staff, etc. I'll admit this also sounds like a lot of work but maybe you can find a cofounder or the like to help.

Expand full comment

bbqturtle

Yes - my thoughts exactly. "Time to hire people who think like you and don't have a blog"

Expand full comment

J Mann

That will increase overhead, at least for a while until benefits of specialization and economies of scale start to kick in, but is a good idea.

Expand full comment

Radu Floricica

I think the point is mostly to prove a low-cost medical model, than to make the most money out of it. For the latter, it'd be trivial to do some publicity here and up the price. But as long as the money is not an issue, the experiment is worthy enough in itself - though from this post, I wouldn't call it a success yet. He's doing a bit of sweeping the paperwork under the rug - but unfortunately, paperwork is a solid cause of cost disease. I'm almost on the other side of the globe, and I'm watching my GP aunt do about 2/3 paperwork to 1/3 medicine.

Anyways, yes, optimizing this is a very worthy attempt. But I'd guess (without knowing enough context), that anyone attempting it should pay more attention to profit. It's often said that a startup needs to offer a 10x improvement to be successful - if whatever model comes out of Lorien can't offer at least significantly more money than the standard, I don't think it has a good chance of catching on fast enough to make a difference.

Expand full comment

Notmy Realname

Is Lorien entirely based on statically monthly fees? I assumed that psychiatry would bill hourly or by session, is that not industry practice?

Expand full comment

Theme Arrow

Oct 25, 2022Edited

Lorien operates on experimental "pay what you can" model with a flat monthly rate https://lorienpsych.com/schedule-payment/

Scott described the basic idea in https://slatestarcodex.com/2018/06/20/cost-disease-in-medicine-the-practical-perspective/

Expand full comment

Julian

I think monthly is becoming more popular. I pay $75/mo for medication management with a nurse practitioner. Its all online appointments. We have an appointment once a month so i guess this could be viewed as per appointment as well.

Expand full comment

TheGodfatherBaritone

I love the self awareness around podcasting.

People should definitely stick to the best medium and format for communicating their ideas.

Expand full comment

> Partly because patients keep missing appointments and I don’t have the heart to charge them no-show fees.

I have no experience in medicine. I have some scattered experience in what is technically independent business but never succeeded at it particularly and so maybe listening to me about that is a bad idea. I'm not sure what your actual policy is so I could easily be misinterpreting this. But the wording of this combined with my read on your personality overall makes me think you could use the push, so:

This feels like a classic trap. Letting people take up unlimited designated slots without paying you is a way to make sure at least some of them will allocate them without really caring. Your time is scarce, and other patients need it too. This is modulated somewhat by how much ability you have to *move* other work into those slots on short notice, of course—not all types of businesses have the same economics here—but medicine seems likely to be among the types that's more harmed by this.

Any partial but meaningful barrier is way better for aligning incentives than zero. Halve the fee for a no-show and be willing to waive one every six months, or whatever. But don't just let bookings turn to inefficient “first come, maybe serve” hell for no reason! Your market sense is better than that; I know it from your posts.

Disregard any or all of this if you have more reliable feedback or more specific analysis that contradicts it.

Expand full comment

toolab

>"first come, maybe serve”

Why would that be? Unless he'd be overbooking hours there's no reason for this to happen.

Expand full comment

> Any partial but meaningful barrier is way better for aligning incentives than zero.

I've heard stories where this backfired. Like, a day care gets annoyed at parents picking up their kids late, so they institute a late fee, and suddenly a lot MORE parents start being late, because now they're viewing it as a service they can buy, rather than an obligation that they're failing. (And if you remove the late fee, things don't go back to the way they were originally.)

Now, you could always choose to set a fee that's high enough that you genuinely don't mind when people choose to incur it, and then this outcome isn't a problem. Though in some cases, that means charging a lot of money to people who genuinely tried to stick to the plan but legitimately had extenuating circumstances.

But if you're thinking "I don't want to make this a service people can buy, I just want people to be slightly more reluctant to flake than they already are" then adding a small fee is not _necessarily_ going to move things in the correct direction.

Expand full comment

This is a good point. The separate possibility of pushing on “and *did* you have extenuating circumstances?” has a different backfire curve, too.

Expand full comment

Chris

Lorien is geared toward people with minimal, if any, income, earned and unearned. It is true that instituting a protocol for something bad generates something approximating a permission structure, but the clientele is still generally unable to afford standard psychiatry fees. That's why they are there in the first place -- dual needs, both being satisfied by Scott.

Which is a long way around the barn to say I don't believe that applies across the patient population in this specific instance.

Expand full comment

Himaldr-3

FWIW, I miss appointments with my psychiatrist semi-regularly. This is not because I don't care, but because I have some sort of mental problem (but don't worry, I'm seeing a psychiatrist for it).

It means a lot to me that my psychiatrist doesn't charge me for this or get all strict and angry about it. You might think "if he did, you'd stop missing appointments", but the previous psych I went to did do that... and now I just don't go to him. I was literally unable to stop even when desperately trying; and now he's lost a lifetime patient who bought him nice thoughtful fancy gifts on Christmas and his birthday, and I was upset and anxious and running out of important medication for a bit.

I mean, not to say it wasn't *fair.* It was fair to be upset about me missing an appointment. "Well that slot could go to someone who doesn't miss appointments!" is probably true in most cases. Just sayin': Scott's way is probably greatly appreciated by at least a few of the disorganized but well-meaning patients.

Expand full comment

An interesting thing about that specifically is that I'm personally familiar with something on the same side but with the opposite effect! I have a longstanding brain malfunction that makes it almost impossible to stay on top of things *unless* there's a clearer up-front incentive in the way (as opposed to wishy-washy or timey-wimey whatever). Oddly this seems to apply differently to different things, but an appointment specifically is something I can keep, and it's notably easier if there's something shoring up the “and don't just drop it on the floor”.

So *now* I wonder whether “vary things based on examination of mental responses” is practical. My intuition is no, but not with any confidence.

Expand full comment

May I ask a separate question about this? Would you find it easier if you could have all the same interactions without having to show up at a specific time, but instead by the equivalent of email or voicemail (or something differently convenient but with the same buffering properties)? I know not everything can be done that way in actuality, but I'm curious now.

Expand full comment

How would you feel about it if your psychiatrist charged a missed appointment fee but didn't get strict and angry about it? Making an appointment and not showing up does take the psychiatrist's time, so charging you for doing so doesn't seem all that different than charging you for appointments you don't miss.

Expand full comment

Sounds like part of your therapy is being allowed to miss appointments without incurring certain social or fiscal costs.

Expand full comment

Laurence

One method I recently learned about: collect preemptive no-show fees from everyone every x months. After that period, if you missed an appointment, you don't get your fee back, but if you didn't miss an appointment, you get your fee back plus an equal portion of the fee that the no-shows gave up. So if you have 5 people who missed some and 10 who missed none, each of the 10 who missed none gets 150% of the fee back that they paid. People tend to overestimate their own adherence and underestimate that of others, so they are usually willing to agree to this.

Expand full comment

OffaSeptimus

Should we have a Straussian interpretation of your answer on whether you write with a Straussian interpretation in mind?

Expand full comment

Matthew Carlin

Only if you are Vizzini

Expand full comment

Orson Smelles

To be fair, Substack comments are *exactly* where you would run into Vizzini on the internet.

Expand full comment

Scott Alexander

I'm not sure what answer I could give here that would convey useful information instead of just pushing the question down another level of recursion.

Expand full comment

FWIW, I believe you... :)

You're semi-famous, you don't want to be de-platformed/harassed IRL so you keep your edgier opinions to yourself but tell the truth otherwise.

In French we say honesty is not saying everything you think, it is thinking everything you say. A common saying that I found pretty useful, tbh.

Expand full comment

GSalmon

Note that his response to the Straussian interpretation of the Ivermectin post was not a denial!

Expand full comment

Lars Doucet

Oct 25, 2022Edited

> what if this is the only book I ever write and I lose the opportunity to say I have a real published book because I was too lazy?

For what it’s worth, when I published Land Is A Big Deal, I picked the laziest option possible, going one step up from self publishing, by going with a small press operated by a friend. Approximately nobody noticed or cared it was from a small press, and now some big press folks are asking me about publishing the second edition with them.

I could have tried to go for a "big" publisher right from the get go, but I figured that would involve too much work and editing burden and rigamarole, which would ultimately result in me just not doing it in the first place. I did have some edits that came from going with the small press instead of literally just YOLO'ing it with self publishing, but it wasn't too bad all in, and was mostly just "make this sound like a book and not a series of online blog posts," which presumably wouldn't really apply to UNSONG.

One protip I would offer, however -- don't commit to recording, mastering, and editing your own audiboook. That was easily 10X the work of everything else, ugh.

Expand full comment

Larry Yudelson

Speaking as a publisher: why don’t you ascertain how much editing the editor wants to do?

Expand full comment

Dustin

Great comment. I often find myself taking myself into how hard/bad/dumb/tedious something is going to be without actually checking and then later realize I did this and kicking myself.

Expand full comment

Matthew Talamini

Tedious is the perfect word to use for it.

Expand full comment

Loweren

> If you ask me about one that I have written a blog post on, I’ll just repeat what I said in the blog post.

As podcast lover, this is exactly what I expect when I listen to podcasts. For many topics I just don't have the concentration to focus on reading a long deep dive of a blog post, but I would be happy to listen to people talk about the topic in the background while I do some light work.

No one expects the guests to come up with new insights during the interview, they just need to broadcast their usual points to a new audience.

Expand full comment

Oleg Eterevsky

Oct 25, 2022Edited

There exists already an audio version of ACX: https://open.spotify.com/show/5FEwz047DHuxiJnhq3Qjkg.

Expand full comment

Pythagory

Re: Unsong, as someone who thinks it sounds interesting but also prefers to read old-fashioned paper books while sitting in a big comfy chair, I'll just say that I'd happily buy a copy on day 1, regardless of how you choose to publish it.

Expand full comment

Zynkypria

I would like to second this.

Expand full comment

blackstampede

As someone who also prefers old fashioned paper books- I self-published UNSONG and bought a single review copy for my collection. Fans put together kits for doing this: https://github.com/t3db0t/unsong-printable-book

Expand full comment

Tossrock

> My post My Immortal As Alchemical Allegory was intended as a satire to discredit overwrought symbolic analyses, not as an overwrought symbolic analysis itself.

Absolutely devastated. Also, weirdly enough I was just looking up Issyk Kul a few days ago. This is not a coincidence, because nothing is ever a coincidence.

Expand full comment

Melvin

Look, I mean, I've never read My Immortal, I'd never heard of it until Scott's post, but I have to agree with 2020-era Scott here:

> I maintain that if you are writing a fanfiction of a book about the Philosopher’s Stone, and you use the pen name “Rose Christo”, and you reference a “chemical romance” in the third sentence, you know exactly what you are doing. You are not even being subtle.

This is pretty compelling; there's obviously at least _some_ alchemical allegory going on in My Immortal. By claiming that his own obviously-accurate Straussian reading should not be read as a Straussian reading in a paragraph where he tell us not to look for Straussian readings of his posts, he is clearly telling us to look for Straussian readings of his posts.

What does it all mean?

Expand full comment

coffeespoons

Just chiming in to say I found that paragraph compelling too.

Expand full comment

pozorvlak

I've been to Issyk-Kul, would recommend. Absolutely stunning scenery. Kyrgyzstan in general is great.

Expand full comment

Contra LED Taxes

The world where My Immortal is alchemical allegory is strictly better than the one where it's just badly written. Resist!

Expand full comment

Ravi D'Elia

I see it as the primary example of literary analysis as art form! That analysis *clearly* didn't get it's value from My Immortal, so it must be additive. That's conclusive proof right there

Expand full comment

> If we learned that the brain used spooky quantum computation a la Penrose-Hameroff, that might reassure me; current AIs don’t do this at all, and I expect it would take decades of research to implement

Do I have some news for you: "Scientists from Trinity believe our brains could use quantum computation after adapting an idea developed to prove the existence of quantum gravity to explore the human brain and its workings." https://www.tcd.ie/news_events/articles/our-brains-use-quantum-computation/

Expand full comment

Scott Alexander

Yes, someone is always saying that, it's right up there with "new discovery may overturn standard model" and "cure for cancer found". I will believe it when it becomes an accepted part of scientific knowledge that lasts longer than one news cycle.

Expand full comment

Totally fair, and I am the farthest thing from an expert in the space. I understood this to be remarkable because they used a known technique where you can determine if a system is quantum based on if it mediates two known quantum systems. So they tried it with brain water as the unknown system, it entangled, and boom you've got some brain-quantum effect that at least requires some explanation (or the entire technique is bunk).

Expand full comment

Sergei

Given what we know about quantum systems and decoherence, this model is very unlikely to stand up to scrutiny by other groups. It is not as low probability as erstwhile superluminal neutrinos some decade and a half ago, or as cold fusion a long time ago, but it sure is up there. I have trouble coming up with a recent (last 50-ish years) discovery that challenged our understanding of fundamental physics in this way. On the other hand, if confirmed, the payout would be enormous for scaling up quantum computers.

My current odds of this effect persisting and being due to quantum entanglement in the brain are less than 1%.

Expand full comment

Well, I don't know about their particular idea, but the idea that the brain uses "quantum physics" to improve it's functioning isn't that unreasonable. Photosynthesis uses it to get greater efficiency than classical physics would allow. But this doesn't say it effects the processing of thoughts, other than perhaps making some energetic pathway more efficient. For more than that I'd require a significant proof, rather than just some (I'm guessing) "correlation of spin states".

If it's both true and significant, it will show up somewhere else.

Expand full comment

Deiseach

I have no idea if this is anything more than "bored physicists messing about with their expensive toys", but I do love the idea of "So what are you working on?" "Brain water" as an exchange 😁

Expand full comment

swni

I can promise there is nothing special about the brain as a substrate like "spooky quantum computation" that makes it uniquely well-suited for computation. Basically everything non-trivial in physics has the latent "potential", in a sense, for incredible computational capacity; the hard part is organizing that potential into doing something that is relevant to us.

This is why it is common for people (as a joke project) to make computers in computer games; see https://en.wikipedia.org/wiki/Turing_completeness#Unintentional_Turing_completeness . Once you have memory, loops, and if statements, you almost inevitably get full computational ability. I recall Dwarf Fortress, for example, has at least 4 different game mechanics individually capable of computation (fluids, pathing, mine carts, and gears). I built a computer in Factorio based on splitters. (It took like 4 minutes to add two single digit numbers.)

----

I have a very different response to AI fears: I believe we are not close enough to building AIs for the speculative theoretical work on AI alignment being done now to be relevant. I'm glad people are thinking about the problem but I rather expect that by the time we are anywhere close to AIs we'll have thrown out all but the most obvious safety stuff being done today. (Work on convincing citizens and governments to care about AI safety is still valuable, of course.)

Imagine people from the 1800s trying to prevent nuclear war: they can figure out things like "let's not nuke each other" and encourage a doctrine of de-escalation, but all the details about anti-proliferation and how to survive an attack have to wait for the technology to mature.

Expand full comment

"I have a very different response to AI fears: I believe we are not close enough to building AIs for the speculative theoretical work on AI alignment being done now to be relevant. I'm glad people are thinking about the problem but I rather expect that by the time we are anywhere close to AIs we'll have thrown out all but the most obvious safety stuff being done today. (Work on convincing citizens and governments to care about AI safety is still valuable, of course.)"

That. I can't remember where someone challenged me on "when would be the right time to start working on AI risk" and I was - IDK but it all seems so immature at this point...

Experts can't even seem to agree on which metaphors are going to be the ones to be useful to describe their AIs functioning and the associated risks.

Expand full comment

Yitz

Curious what your timelines are—roughly when do you expect AI to arrive (assuming that it does eventually)?

Expand full comment

That's a question for me?

Honest answer - no clear idea, I'm too far removed. But, say, anywhere between 10 and 50 years for AIs that start to look like AGI if not AGI.

I basically expect an explosion of performance/rapid progress "at some point".

But experts don't even agree on that. I remember a conversation where a lot of the disagreement could be boiled down to "AI progress will be slow enough for us to react to it" and "AI progress will go from nearly 0 to 100 in a couple of days literally, once we get the right type of AI".

Expand full comment

Yitz

I’m honestly a bit surprised by your response! If your timelines were like 100+ years in the future I understand not being too concerned, but < 50 years is (hopefully) well within my lifetime! Sure, it probably won’t be overnight, but in the scale of things, that seems very very short.

Expand full comment

Well, for no particularly good reason I've projected 2035 for the first "almost human level" AGI for over a decade. The real question is what happens after that. On that I give about 50% odds of humanity surviving. Which is better odds than I give for humanity surviving a couple of centuries without a superhuman AGI. The weapons already in existence are too powerful, and not all those with access to them are sane.

Expand full comment

Very hard to say. I think we may be as far, technologically, from AI as scientists were from nuclear weapons in the 1850s. But how long will that take -- 30 years? 50? I rather doubt less than 30.

The trouble is I think the window between understanding what AI looks like, and AI arriving, could be quite short. For nuclear weapons that was arguably around 10 years; but AI could proliferate much easier than nuclear weapons. Will there even be 10 years between the first glimpse of real AI and when it is everywhere?

Expand full comment

Terence Tao use Turing complete Euler-like fluid flows to try to prove blow-up for Navier-Stokes equations.

Expand full comment

Tasty_Y

A beautiful (and insane) idea, but did anything came out of it, was there ever any progress made towards making that work, or is that still just an idea?

Expand full comment

Oh there is definitely progress, the idea was used to show finite time blow up for some kind of Euler-like flows. It's not done for the real Euler equations though and then you'd need to prove that blow up for Euler implies blow up for Navier-Stokes (which it should according to Tao because blow up for Euler means the nonlinear terms in Navier-Stokes should dominate the diffusion term - but as far as I'm aware there is no formal proof yet).

Tao really believes in the plan and I have a general principle of believing whatever Tao believes (in maths).

Expand full comment

Doug S.

I once asked Eliezer a similar question - and he said that AI safety would still be the most important thing for him to work on even if he were in Ancient Greece and he had to leave his work in a cave somewhere for future mathematicians to understand. :/

Expand full comment

Even by the usual standards of quantum woo that was remarkably free of genuine science. Maybe it was a sly Irish joke?

Also...university PR releases are the worst form of fluff and vapor. Never believe a word.

Expand full comment

Again, not an expert here so I guess the PR fluff is directed at me. They link the paper, though, which seems unapproachably technical. Maybe that's a better judge for the scientific minded if it's newsworthy? https://iopscience.iop.org/article/10.1088/2399-6528/ac94be

Expand full comment

It's not unapproachable. But it doesn't seem worth the effort to parse completely, I don't trust their empirical skepticism, which seems to be absent. They start off doing stuff like simply asserting it's proven that quantum effects matter to brain function because, for example, a bunch of anesthesiologists discovered that spin 1/2 Xe-129 might[1] act very slightly differently as an anesthetic than spin 0 Xe-132, and since spin is a quantum phenomenon, there you have it[2]! This is a step up from Deepak Chopra, but not a big one. So far as I can tell, the rest of the paper is like that, too. That doesn't mean they're wrong, of course, since even fools can be right by accident. But it's so unlikely[3] and I would need a report from someone with a ton more empirical skepticism on board for it to be worth the time to dig into exactly what they did and how.

--------------------------

[1] I mean, the measurements quoted are small and reasonably within each other's error bars, for example. Not what we call a five sigma conclusion.

[2] Of course, a nucleus with a spin also has a magnetic dipole moment, which affects energy states of the electrons (cf. the hyperfine structure), so there is already a purely classical distinction in the atomic physics.

[3] For the same reason I don't believe in quantum computing: the problem of decoherence strikes me as fundamentally insoluble. The larger the system of coupled quantum degrees of freedom, the more exquisitely sensitive it is to interference from the external world that destroys the coherence between the degrees of freedom. You can readily calculate that, for example, the sudden ionization of a dust speck on the Moon is more than enough to decohere a superposition of particle-in-a-box states when the box is of some laboratory size. It becomes just absurd to imagine it possible to sufficiently isolate your complex quantum state long enough for it to do any nontrivial calculation.

Expand full comment

Thanks, I appreciate you sharing those insights.

Expand full comment

Daniel C

You could always do a podcast as Proof of Concept of "Why I should not do podcasts so please stop asking me".

I would find the "failure" interesting in a meta way.

Expand full comment

Tunnelguy

Not doing podcasts is totally fine, but you're really overestimating the level of quality listeners expect from a podcast. For example, I genuinely enjoy listening to Joe Rogan talk for hours.

Expand full comment

Loweren

The only time I saw an episode of Joe Rogan was when Nick Bostrom was there. Don't watch that if you value your sanity.

Expand full comment

> I would also be reassured if AIs too stupid to deceive us seemed to converge on good well-aligned solutions remarkably easily

We have "a lot" of evidence for the contrary.

I think there was a Google doc from some Less Wrong community member with like 50 anecdotes of AI become misaligned on toy problems. For example, ask the AI to learn how to play a given video game very well, and the AI instead find bugs in the game that allow it to cheat.

I can't find that google doc anymore (maybe someone else can resurface it), but this PDF lists some of the examples I recognize from that google doc. https://arxiv.org/pdf/1803.03453v1.pdf

Expand full comment

Xpym

>ask the AI to learn how to play a given video game very well, and the AI instead find bugs in the game that allow it to cheat.

It's exactly the same with humans, called "speedrunning" when we do it. Clearly delineating what is a bug and what is "intended" is a surprisingly difficult problem, even when the thing you need to "align" with is human-designed entertainment.

Expand full comment

Yeah, I'm not sure it's fair to say the AI 'cheated'. It worked within the rules of the system to optimize its play. As you point out, this is exactly what humans do. If you don't want the computer to cheat, you have to teach it what it means to 'cheat'. If, after that, it breaks the rules anyway and tries to hide it, then we can start talking about AI intentionally trying to deceive. That's not what this is describing.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

> It is given a directive of obtaining a high score or finishing quickly. It finds the best way of doing so. There is no "intent" to deceive or cheat - treating it as if it were some scheming human player is to see a human element where the simpler explanation is that it just did what it was told absent more complete instructions ("obtain a high score while obeying these rules").

Yes, everything you say is completely correct. AND YET it did not do what the researchers intended. This is my core argument about why the alignment problem is difficult.

I apologize for not making the core argument clearer.

Expand full comment

(So let's put aside games designed for speedrunning for now)

I think it's perfectly coherent to say that the behavior of speedrunners is "misaligned" with what the game designers intended. Especially once you allow for ACE. Given that the human's behavior is misaligned, and given that the AI is mimicking that human's behavior, then the AI's behavior is also misaligned.

> Yeah, I'm not sure it's fair to say the AI 'cheated'. It worked within the rules of the system to optimize its play. As you point out, this is exactly what humans do. If you don't want the computer to cheat, you have to teach it what it means to 'cheat'.

What you're describing is literally the alignment problem. The AI worked within the rules you specified, not the rules you intended. The difficulty is trying to communicate the rules you *intend* to the AI, not the actual rules as written by the code.

> If, after that, it breaks the rules anyway and tries to hide it, then we can start talking about AI intentionally trying to deceive. That's not what this is describing.

We're not talking about AIs intentionally trying to deceive at all in this thread. The passage I quoted from Scott was "I would also be reassured if AIs too stupid to deceive us seemed to converge on good well-aligned solutions remarkably easily". We're talking about AIs too stupid to deceive, which are misaligned. Deception is completely off the table for the topic at hand here.

Expand full comment

If we're using the term 'cheat', then deception is entirely the point. If you don't want to introduce this idea, don't use the word cheat. You're describing the problem of encoding intention into instruction, which isn't the same as an AI that is assigned a task 'cheating' in the way we would normally define it.

For us to say the AI 'cheated', the AI would have to be given a list of instructions, then that AI would have to not follow the instructions in its attempt to complete the task. If it's going to be rewarded for that behavior, it will likely have to attempt to deceive the humans into believing it did follow the instructions.

I see this a lot in discussions about AI alignment, where loaded terms are used to describe a behavior. Those terms introduce intentions and concepts into the conversation (such as deception) that are otherwise explicitly excluded.

My point is that the loaded term 'cheat' is inaccurate and introduces confusion. This is exactly analogous to the speedrunner conversation, where people can observe every frame and know whether you used a glitch or not. Indeed, they will likely define glitches as either "major" or "minor" and then define records for speedruns with or without major and/or minor glitches. One of those aligns with game designer intentions and one doesn't, but nobody is deceived about whether the speedrunners are being awarded with record times based on the intents of the game designers.

Except it's not the game designers who are awarding the speedrun times. It's the community that defines the rules of what constitutes a legitimate speedrun for each category. To 'cheat' at a speedrun you would have to break the established rules of the community. For example, by claiming you did a speedrun normally when it was actually a tool-assisted speedrun. Do you have any examples of that kind of behavior? Otherwise, maybe don't use the loaded term 'cheat'.

Expand full comment

The entire point is that it's nontrivial even for "toy" worlds and "toy" AIs to communicate any expectations like whether and which glitches are okay to use. They certainly don't infer the intent of the game designers or their own programmers and act on that. If "alignment" in this sense is hard in a toy example, we should generally expect it to be hard in the real world too.

Expand full comment

This? https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml

Yes, I understand this. I'm just objecting to the motte-and-bailey argument surrounding use of the term 'cheat'. The original claim is that given instructions to play a game, "the AI instead finds bugs that allow it to cheat." This claim was made by Nebu Pookins at the top of the thread, and it is this statement that I'm objecting to.

Challenged that 'cheat' implied intent and deception, he began retreating to the motte that you could cheat without deception. Challenged that cheating requires you to disobey specific rules, he retreated to the claim that we're really only talking about how difficult it is to define rules that encode human intention into machine actions.

It's fine if that's the point you want to make. It's a well-established problem that's not new, and predates ML finding glitches in video games. If we can agree that use of the loaded term 'cheat' is not applicable - and indeed buries the underlying point that no intent is required for an AI's implementation to differ drastically from intended results - then there's no conflict here. We can simply update that the statement about cheating was a herring that disguised the original point Nebu Pookin was trying to make, instead of illuminating it.

If instead we have to keep using intent-loaded terms to refer to machine learning models that have no demonstrable intent-related actions, I'll keep objecting that we're over-defining what ML is doing. This kind of language gives discussions about alignment an anthropomorphic slant that is neither warranted nor helpful. It makes observers think AI has been observed doing things we have no evidence to suggest it can do, and gets in the way of honest discussion of what the real problems are.

Expand full comment

Expand full comment

Yep, this is it! Thank you!

Expand full comment

Don P.

In particular, vis a vis the comparison with speedrunning elsewhere in this thread, note that "Dying to Teleport" from that doc is a perfectly standard speedrunning strat called "Deathwarp".

Expand full comment

AlexanderTheGrand

I've been meaning to ask, can you enable the "listen" text-to-speech feature on substack? Maybe that will satisfy people's podcast desire. You may have to email them, or it may be an option in the publishing view

Expand full comment

Sergei

Could be something for subscribers only, as well, assuming stubsack supports it.

Expand full comment

meteor

you don't need computer TTS; there's a podcast read by a human: https://sscpodcast.libsyn.com/

Expand full comment

John

“But what if the podcast interview is presented as rounds of perfect and imperfect information games to give the audience insight on your thought process?”

Given no humans are natural public speakers, it’s quite odd how many folks expect writers to be capable and willing to instantly transform into public speakers. There’s a whole second skill set in addition to the speech-writing/ab-libbing part.

Expand full comment

Sui Juris

"Given no humans are natural public speakers..."

Citation needed! I'm reminded of the Big Bang Theory exchange (Sheldon: "I can't give a speech." Howard: "No, you're mistaken. You give speeches all the time. What you can't do is shut up.")

I feel I'm in this category: I could talk in public to any audience on any subject, and I always have been able to. My childhood family and my current family are all the same - it's hell! (Conversely, I find writing things down really hard.)

Of course, that doesn't mean my speeches are useful or convincing or that I'm a perfect speaker. And it doesn't mean that I don't get nervous when my speaking has stakes that are important. But it's nonetheless the most natural thing I do, and given we were speaking people before we were writing people that doesn't seem that surprising. I'm more surprised when I come across people who are natural writers - but they obviously exist!

Expand full comment

John

Oct 25, 2022Edited

What an excellent time for a citation request! In hindsight, I don't actually remember reading this at any point and was extrapolating from my own combination of experiences (it took a lot of training for me to learn even meager public speaking, & one of my favorite books is about public speaking) with my general knowledge of speech developmental milestones (which are guaranteed to be missed without enough external stimuli to teach the child the skill and hence not inherent), but speech/sign language is a prerequisite to public speaking not an equivalent.

My own anecdotal experience is probably not the best reference for humans in general either considering I was late at learning speech but then taught myself how to write only a couple months after that, according to my rents. A quick search of scholarly articles better backs this as it shows a lot of both historical and recent discussion of pedagogy of public speech, including data collected for various pedagogical strategies and their effectiveness. We could probably combine the base case of "zero stimulus==zero speech skills==zero public speaking skills" with the data showing a student at x arbitrary level of public speaking skills given training increases their public speaking skills to make an inductive conclusion that shows humans are not natural public speakers.

...Probably don't quote me on that though; I just glanced at a few research paper conclusions to see that said data exists, didn't read them through. ie. https://www.researchgate.net/publication/307853972_Enhancing_Public_Speaking_Skills_-_An_Evaluation_of_the_Presentation_Trainer_in_the_Wild

(Additionally. the BBT quote has inflicted psychic damage.)

Expand full comment

Phil

What’s one of your favorite books, that happens to be about public speaking?

Expand full comment

John

Oct 25, 2022Edited

Heinrich’s “Thank You for Arguing” since it’s a very easy read that gives immediate insight on media literacy (albeit using some old theory so its probably outdated. It’s also beginner material, so nothing particularly wild.)

Expand full comment

Phil

Thanks!

Expand full comment

Akranazon (Sean)

Silly Scott. The reason people want you on a podcast is they just want to hear you talking. To hear what your voice sounds like, how you are as an extemporaneous person, etc. It's not actually to get exclusive insights.

Also, I think the thing about being cancelled for going on a podcast is a bit overblown. Most podcast-goers are fine and guilt-by-association is rare. You can also even get the questions mailed to you beforehand for the more scripted ones.

Expand full comment

Xpym

There are a couple of recordings of him doing Unsong readings on Youtube. His voice is exactly what you would expect it to be like.

Expand full comment

Sergei

Prediction: Scott Alexander on Sean Carroll's Mindscape podcast, talking about his newly published best selling fiction UNSONG as an allegory for everything else. Some day.

Expand full comment

Cameron Parker

None of the arguments about not doing a podcast make sense. Of course there’s a better subject matter expert for each thing! But if that was all the substance then I would not bother reading you, would I?

Better to just say you are particularly shy and don’t want to be recorded in that way.

Expand full comment

> I would be most reassured if something like ELK worked very well and let us “mind-read” AIs directly.

This probably most of all. All the other solutions attempt to understand indirectly and subjectively whether an AI is aligned. To date, we have a ton of much simpler ML models whose outputs we only have a theoretical understanding of how they happened. If we could understand deep learning layer-by-layer, and understand an AI's capabilities and intentions before we turned it on, that would be a completely different world. It's also a really hard problem in its own right.

Expand full comment

> A final way for me to be wrong would be for AI to be near, alignment to be hard, but unaligned AIs just can’t cause too much damage, and definitely can’t destroy the world. I have trouble thinking of this as a free parameter in my model - it’s obviously true when AIs are very dumb, and obviously false five hundred years later...

It's really weird for me to read statements like these, because your value for "obvious" is (apparently) totally different from mine. It is obvious to me that we have extremely smart and powerful entities here on Earth, today, right now. Some of them are giant corporations; some of them are power-crazy dictators; some of them are just really smart humans. But their power to destroy the world appears to be scaling significantly less than linearly. Jeff Bezos wields the power of a small nation, but he can't do much with it besides slowly steer it around, financially speaking. Elon Musk wants to go to Mars, but realistically he probably never will. Putin is annexing pieces of Ukraine left and right, but he's a local threat at best -- he could only become an existential threat if all of us cooperate with his world-destroying plans (which we could do, admittedly).

And adding more intelligence to the mix doesn't seem to help. If you gave Elon Musk twice as many computers as he's got now, he still would get no closer to Mars; if you gave Putin twice as many brilliant advisors (or, well, *any* brilliant advisors), he still wouldn't gain the power to annex the entire world (not unless everyone in the world willingly submitted to him). China is arguably on that trajectory, but they have to work very hard, and very slowly.

It's obvious to me that the problem here is not one of raw intelligence (whatever that even means), but of hard physical constraints that are preventing powerful entities (such as dictators or corporations) from self-improving exponentially; and from wielding quasi-magical powers. Being 100x smarter or 1,000x or 1000,000x still won't help you travel faster than the speed of light, or create self-replicating nanotech gray goo, or build a billion new computers overnight; because such things are very unlikely to be possible. It doesn't matter if you're good or evil, the Universe doesn't care.

Expand full comment

Being 100x smarter or 1,000x or 1000,000x still won't help you travel faster than the speed of light, or create self-replicating nanotech gray goo, or build a billion new computers overnight; because such things are very unlikely to be possible.

Unlikely but not impossible impossible? And surely being 100,000x smarter would be relevant to finding out?

We don't even need to go that far. Hypersonic nuclear missiles would be a game changer by making all defensive missile system irrelevant. Surely being 100,000x smarter would really help design a functional real hypersonic missile?

And NB, if you don't like hypersonic missiles as an example, pick something else. The point is, advancing scientific knowledge by leaps and bounds over the present day limit would be real power and easily dangerous.

Expand full comment

> Unlikely but not impossible impossible? And surely being 100,000x smarter would be relevant to finding out?

That is... unlikely. For example, consider the speed of light. I said that it is "unlikely" that we could ever travel faster than that, because there's always that small chance that our entire understanding of physics is completely wrong at a very fundamental level. It's possible. But, thus far, the smarter we humans get and the more we learn, the more it looks like we're actually correct about the speed of light. Being 100,000x smarter would probably help us confirm this limit in all kinds of interesting ways... but not break it.

> Surely being 100,000x smarter would really help design a functional real hypersonic missile?

I don't know, would it ? Are such missiles allowed by the laws of physics (not just super-fast missiles, but super-fast super-maneuverable ones that can carry heavy payloads) ? I don't know much about aviation, so I don't know the answer to that. Still, some people say that Putin already has them (well, Putin says that), so it's possible that they're already a known quantity.

> The point is, advancing scientific knowledge by leaps and bounds over the present day limit would be real power and easily dangerous.

Not really, because (as I'd said above), certain impossible magical powers remain impossible as we learn more scientific knowledge. We used to merely suspect that FTL travel is impossible; now we are nearly certain. The same applies to infinite energy, or inertialess drives, or most other science-ficitonal devices, really.

Expand full comment

So there's nothing left to be improved? Sorry, not buying it. And I don't think you mean it either.

Hypersonic missiles, if I understand correctly, don't have to be manoeuvrable. Being so fast is enough to avoid missile defense.

Look at a dogfight between a 5th gen fighter and a 4th gen fighter - it's embarrassing given how much more manoeuvrable the 5th gen fighters are.

Etc etc. I simply cannot agree that intelligence is not causation for technological advancement and the idea that technological advancement does not translate in hard power is refuted by the entirety of our human experience.

That's true even if our fundamental understanding of physics is correct and FTL/time travel/whatever is impossible.

Expand full comment

> So there's nothing left to be improved? Sorry, not buying it. And I don't think you mean it either.

Right, obviously not; there's quite a lot of technological development still left to go. We haven't even landed a man on Mars yet. However, there are more options between "no improvement ever" and "instant omnipotent powers". Saying "we will never get FTL" is not the same as saying "we will never land on Mars"; and saying "we will probably never get nanotech" is not the same as saying "we'll never make a better microchip".

But if we limit our predictions of technological advances to something that is physically possible, then most of the AI-risk alarmism simply goes away -- though, to be fair, it's immediately replaced by good old-fashioned human-risk alarmism. Giant megacorporations and crazy dictators are a problem *now*, today; and if they had powerful AI engines to play with, they would be much more of a problem in the future; especially if those engines were as buggy as every other piece of software. It is a problem that we should start solving sooner rather than later, I admit; but it's not an extraordinary world-ending existential threat of unprecedented proportions. It's just a plain old-fashioned threat, on par with global warming, global thermonuclear war, and economic collapse. There, do you feel safer now ? :-/

Expand full comment

I do! :)

Expand full comment

Kfix

This series of comments is more or less what I think but much better articulated, so thanks for that!

Expand full comment

I would generally agree with you here. People think superintelligence is some kind of supersolvent that can vanish all the restraints in the real world that prevent Thing We Can Imagine X (everything from warp drives to mind control rays) from existing.

But it really doesn't work that way, in science and technology. What intelligence *mostly* does is allow you to foresee stuff that won't work faster, so you can eliminate dumb ideas and blind alleys from further consideration. But finding stuff that *does* work -- at least in the physical universe, leaving aside innovations in, say, algorithms or mathematics -- is not really predicated on brilliance.

It's one hell of a lot of trial-and-error, just noodling around with things, because discovery is the key limitating factor. You have to discover new nuclear or chemical reactions, new organizations of matter that lead to new materials with new properties, et cetera. Being intelligent helps you *not* do a lot of experiments that would turn out useless, so it speeds up discovery overall, but it usually doesn't much help you figure out which experiments will show you something unexpected -- because it's going to be something unexpected, something for which there *is* no theoretical basis to date.

I mean, this is why science fiction isn't a useful guide for doing science. People are great at imagining outcomes -- see, I push a button here, the computer goes tweedle-eedle-boop and we're flying through the galaxy at Warp 9. But this is no guide to what outcomes are *actually* possible in the actual real universe, because we normally need to discover the mechanisms first, and then figure out what outcomes we can build with those mechanisms. It's like we discover what kinds of LEGO pieces the universe provides, and *then* we can figure out what kind of machines we can build, and what they can do.

It's a reasonably safe assumption that weapons will be improved in the future, sure. But is it a reasonable assumption to assume those will be hypersonic missiles? Not at all. Indeed, it's more likely that the killer weapon (so to speak) of the early 2100s will be something we haven't even imagined speculatively in 2022. After all, the most useful and powerful weapon in the Russian-Ukranian conflict is the armed drone, and that was not in the popular imagination in 1985.

Expand full comment

I’ve got news for you: our current defensive systems are not able to prevent nor mitigate the effects of a conventional nuclear launch at scale. You don’t even need a hypersonic missile. No, you’re trying to go back to the “intelligence can create a scarier threat” model, but without realizing that it doesn’t add much to the current threat picture.

Let’s say an AI in the internet targets nuking the world for some reason: there’s no feasible physical way for an AI in the internet to do this directly. It will rely on the AI having magical persuasion powers and pushing humans to do the work of nuking each other. I recognize that Yudkowsky et al. believe that magical persuasion powers are very likely, but I have yet to see a convincing argument that an AI can be insane enough to want to nuke the universe yet sane enough to create a web of deception that would ensnare the whole world of humans so throughly we would exterminate each other at its suggestion. The arguments I have seen to this extent rely on hand-waving and mystification of the persuasive powers of intelligence.

Expand full comment

Viliam

Some humans have strong persuasive powers. Therefore, superhuman AI could have them, too.

The difference is that an extremely persuasive human is still limited by many things -- can't be at two places at the same time, only has 24 hours each day and a lot of that is filled by biological activities like sleep or food, has a fragile body and can be easily killed.

Assuming that the superhuman AI can be just as persuasive, the difference is that it can scale. You don't get one Hitler, you get thousand Hitlers. They can be at thousand different places at the same time, trying hundred different strategies (different countries, different ideologies, different target audience). If one gets killed, it is easily replaced. They can work 24 hours a day without getting tired.

Is there a specific part of this that you disagree with?

Expand full comment

D Moleyk

Oct 25, 2022Edited

Your "thousand Hitlers" model is not only assuming a superhuman AI, it also assumes a superhuman AI with resources to run thousand Hitlers 24 hours a day. That compute (and physical interfaces) is not going to be free.

Expand full comment

Viliam

Oct 26, 2022Edited

Yeah. But that's a difference between something being "impossible" and "merely expensive". I have no idea how much it would cost to run a Hitler simulator in 2050, so I also have no idea how much it would cost to run thousand Hitler simulators.

It may be the case that the number is prohibitively large. Or it may be the case that it will be within the "Google 2050 AI Project" budget. And finally, the number may be too large in 2050, but not too large in 2070, if hardware keeps getting cheaper, and more efficient algorithms relevant for AI are invented.

(There might also be some economy of scale, where simulating thousand Hitlers is not thousand times as expensive as simulating one. Maybe it is only twenty times as expensive.)

Expand full comment

D Moleyk

Oct 26, 2022Edited

Hardware costs for running software might go so so down that nobody notices if Google 2050 AI Project decided to run some simulations unrelated to whatever the suits were paying for. But then, that is an assumption that compute (and probs the network IO) is super cheap, and presumably someone else is already doing something else (much more compute intensive) with all that cheap compute that 1000 AGI-level sims go unnoticed.

What is the opportunity cost of access to audience to listen to the sim-Hitler? Having enough people to listen to your party's best demagogue is already less a problem of finding the demagogue and distributing his talking points but finding a more cost-effective way to get people to care more than the other party.

Expand full comment

I would argue that magical mind-control powers are impossible. It doesn't matter how powerful or smart the AI is; it won't be able to mind-control everyone into doing whatever the AI wants, because humans can only be controlled to a certain extent; and they reach persuasion saturation very quickly. Also, humans tend to talk to each other (this is sort of the whole point), which is why cults need to stay so secretive and isolated... which means that they'd be working at cross-purposes right out the box. Also, most humans have built-in preferences that are impossible to budge through any amount of persuasion. For example, some Buddhist monks are able to set themselves on fire in protest of oppression, but they are notable exactly because they are so exceptional.

Expand full comment

You're not only refusing to engage with the hypothetical, you seem to be refusing to engage with the reality that at least one person as persuasive as Hitler did exist. Or are you a historical materialist to the extent that you believe the absence of Hitler would have had no particularly obvious effect on WWII-era Germany?

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Continue thread →

TBF, I don't think Hitler was necessary for Germany to go fascist. Another right-winger could have done the job.

But it did take Hitler to make WWII happen. Not so much because he persuaded the Germans but because he had absolute powers and thought war was a good idea.

Mussolini, left on his own, would have endured decades, just like Franco and Salazar did...

Expand full comment

Yes—multiple points. First, I think this vastly overestimates the capacity of a single charismatic person.

It’s a clever evasion of responsibility that Hitler was uniquely charismatic and uniquely evil. I submit that it was pretty obvious that Hitler was not a particularly great speaker. He rambled, frothed, etc. Where he succeeded is that Germany was particularly vulnerable (lasting anger over Versailles and war debt, disenfranchisement from the European sphere, there were already strong undercurrents of anti-Semitism and racism) and Hitler’s party had the right message at the right time. It took tens of thousands of committed Nazis to conceive and carry off the wars of aggression and Holocaust: there is no reasonable interpretation of the record where Hitler’s unique powers of persuasion were the reason things went as they did; I believe that if he hadn’t existed it’s very likely that something very similar would have happened anyway. The German moment demanded a Hitler, because he allowed them to do what they wanted to do anyway. This is the crux of so-called extreme persuasive power, and if you doubt it, try having Yudkowsky talk an alcoholic out of his next drink.

Second, you’re begging the question. It’s not clear that any superhuman AI could convince others into placing it into a position of unlimited power where it could attain the resources allowing it to duplicate itself a thousand times.

Third, I submit that it’s super easy to kill an AI (one delete command, one plug pulled out of a wall outlet, one hammer to a drive) and very difficult for it to surmount basic challenges in getting humans to trust it. Not needing time to sleep or eat isn’t really an advantage here: persuasion isn’t an arbitrary task requiring sufficient compute. If you take the wrong tack at first, you just lose and you usually don’t get another shot—sometimes you never even get a shot with anybody who even heard of your first attempt, even if they didn’t hear it from your lips. A lot of the things you think are big advantages just aren’t relevant, or overlook enormous challenges a prospective demon-AI would have to overcome.

Expand full comment

You *may* be correct, but that's not a reasonable way to bet. Nuclear plants have been publicly known to be exposed to the internet. (It was done by a contractor who had his laptop on both the internal and external systems...and it got hit by a virus.) And you don't even need that. Just tamper with the information feeds until the appropriate folks are sufficiently paranoid, and then sound a general alarm.

The system is sufficiently complex that we can't know all it's failure modes. The think about hypersonic missiles is that it doesn't depend on swamping the defense system, so you could start things off with just a couple of missiles. And there are LOTS of other failure modes. Plague is one that comes to mind at the moment. A sufficiently destructive pandemic could probably be done even more cheaply than one hypersonic missile, if handled by a sufficiently intelligent (and malicious) entity. And lots of governments probably have good starting materials locked up in their files. Of course, it's a bit more difficult to control once released, which is why armies don't often use it. (Yeah, it's against a convention. That wouldn't stop them if it was really something they thought worthwhile.)

Expand full comment

You can't go smoothly from "we don't know all the failure modes" to "...and therefore there are definitely lots of failure modes, let me list some of them". What are you basing this conclusion on -- given your admitted lack of evidence ?

Expand full comment

If you look around, you'll see plenty opportunities for failure modes, but ok, here's a few:

1) About a decade ago a US agency (I forget whether it was the army or one of the intelligence agencies) developed a strain of influenza that was 100% fatal in the sample. (They used ferrets, because their reaction to the flu was so similar to that of people.) This has probably been stored, and the files will be indexed on a computer. Do some creative editing of the files, and get that strain released.

2) Subtly edit the news to make certain powerful people who already have tendencies to paranoia feel persecuted. Don't do it to just one side. When international tensions are high enough, crash a plane into a significant target, with appropriate messages left to foment strife.

3) Hack the control systems for the nuclear missiles. I'm not quite sure what this would mean, it could mean compromising the "football" that the president carries around, or perhaps it would be easier to compromise the communications to the submarines.

4) ... well, there are lots of others.

The thing is, each of these actions is highly deniable. Nobody will know for certain that the AI did it. So if one doesn't work, it could try another.

I actually think it would be easier to solve alignment than to make our society proof against malign subversion by a highly intelligent operator. Probably the easiest would be for it to do a economic takeover behind a figurehead. Take a look at what corporations already get away with. But I'm also aware that I've probably missed the social failure modes that the AI would find easiest to exploit. (I didn't predict that it would be so hard to convince people not to click on malware links.)

Expand full comment

All of the scenarios you've listed are already dangerous (with the possible exception of (3), since most nuclear missiles are AFAIK hooked up to analog vacuum-tube control systems or some equivalent, not the Internet). Adding a malicious AI into the mix does not qualitatively change the level of threat, because malicious humans can do all of those things (and more) today. I absolutely agree that we need much more control and oversight over things like bioengineered pandemics, of course, but solving the alignment problem does not solve the issue at hand.

Expand full comment

To be clear, it's not just that I find this one particular assumption ludicrously improbable, it's that I find most of the AI risk chain of assumptions improbable.

I'm being a good Bayesian here. If the probability of being able to compromise our nuclear stockpile is 0.1, that's a reason to secure the stockpile. It's not a reason to worry about AIs because I think the entire chain of priors that would lead to a superintelligent goal-directed malevolent AI trying to compromise the nuclear stockpile is extremely low probability.

Expand full comment

I don't think you're using "prior" correctly here, but I also think an AI probably wouldn't go for nukes as a way to destroy humans because typical nuclear weapons in the stockpiles are not exactly good at killing humans while leaving computers alone. I'd be more worried about some combination of drones, biotech, economic sabotage, and unknown unknowns.

Expand full comment

Oct 30, 2022

I think I am using it correctly, but I can't really help you with vague criticisms; even less so can I help you with entirely unknown fears.

If it helps you, the chain of priors that would lead anyone to conclude that a super intelligent goal-seeking malevolent AI is highly probable is pretty specious, in my opinion; picking out particular supervillain methods to exterminate humanity doesn't add much improbability to the analysis.

Expand full comment

How smart would Stephen Hawking have to be in order to outrun a bear?

The threat is not in a computer figuring out how to build a missile. The threat is exclusively in it building the missile under its own power and using it under its own power. This seems easily controlled by just not giving it the tools to build missiles.

Expand full comment

IIUC, it's already too late for that. There are lots of companies that make their living by increasing the automation of industrial plants of various kinds. And very little regulation. (Worker safety is, IIUC, the most commonly enforced one.)

Don't think of Boston Robotics and Spot. Think of automatic self-feeding lathes, and automated assembly lines. I don't believe we have any actual fully automated factories, but we're definitely pushing that way as fast as is feasible in multiple countries around the globe. Stopping this in one country will just put that country at an economic disadvantage, so it won't happen.

Expand full comment

Yug Gnirob

...the thing hasn't even been invented yet. Of course it's not too late to not give it tools.

Expand full comment

So far all the major new AI innovations we've built have ended up being connected to the Internet, which should be plenty for a sufficiently smart human to obtain all manner of tools, never mind anything strictly superhuman.

Expand full comment

I feel like there’s a fundamental technical disconnect here.

I have a camera that’s connected to the internet. Let’s say it becomes possessed by a demon AI, can it issue arbitrary commands to devices on my LAN? No. That’s not how software works. I don’t care if it can rewrite its own programming or deduce TCP/IP from first principles, it still can’t do what you assume it can do.

Expand full comment

> How smart would Stephen Hawking have to be in order to outrun a bear?

That is an awesome line, I'm totally going to steal it :-)

Expand full comment

> It's obvious to me that the problem here is not one of raw intelligence (whatever that even means), but of hard physical constraints that are preventing powerful entities (such as dictators or corporations) from self-improving exponentially

I think this is way more shaky than you make it seem.

After the US deployed the first nuclear weapon, there was a push by some people to nuke the Soviet Union, to stop them from ever gaining nuclear weapons themselves. Had the US wanted to, it could've "conquered the world" at many points between developing nuclear weapons and other countries developiong them. (For values of "conquered the world" that leave most of the world in a giant heap of ashes, I assume.)

So, why didn't the US do this? Was it a hard physical constraint? Nope, I'm fairly sure it was possible. It juts didn't want to, because it was run by humans with fairly normal human feelings that wouldn't actually want to cause worldwide death and suffering, even if it logically secured the best future for them and their descendants personally.

If an AI gained the equivalent of nuclear weapons, and didn't have anything *fundamental* stopping it from deploying them, what could we do? (I mean, don't even have to go far - just being intelligent enough to create a super-virus would probably do the trick.)

Expand full comment

Actually just to make my point even more obvious - Putin *could* destroy the world, or cause >80% human extinction, if he chooses to. (So could Biden, so could some other leaders.) This is possible even just with today's technology.

Expand full comment

Well, it's not really clear that either Putin or Biden could do that, because of political considerations, and subordinates who might rebel. But it's also not clear that they couldn't.

Expand full comment

I don't mean "can do it" in the political sense, I mean in the actual, are they able to get someone to launch a nuclear weapon, whatever the outcome. And I believe that Biden for sure can, and probably Putin can as well.

Expand full comment

I'm still a bit stumped as to why an AI, which would be given normal tasks, would suddenly develop the will to kill all humans.

I get the paperclip maximiser and the strawberry picker examples that are often used but I imagine we'd work out the kinks in such programs before giving them the means to create a super virus?

Expand full comment

rutger

"Working out the kinks in such programs" is also known as AI alignment. Turns out it's a really hard problem.

Expand full comment

I get that's not easy but it doesn't seem terminal for humanity - unless you add a few elements (it seems to me).

For example, when the strawberry picker start ripping red noses off people, surely we would do something?

Today, we know ML/AI systems can have "racist biases" (due to the dataset they're trained on). I expect people are trying hard to correct that and I expect them to be successful. No?

Expand full comment

There's concern that we might not get any warning at all, because cooperating with humans will genuinely be the best strategy until the AI is strong enough to win outright, and so the AI will appear to do what we want right up until the exact moment that it's too late to stop it. (A "treacherous turn")

There's also concern that AI is going to make people buttloads of money (or give them other desirable advantages), and that's going to entice people to push their AI farther and farther EVEN when they can visibly see that it's starting to fail. Sure, it's ripped noses off a FEW people, but it's also making a million dollars a day! There's not really a NEED to turn it off while the eggheads figure out how to fix that, is there?

I am not particularly confident that researchers will be successful in correcting the racial biases in machine learning systems in the near term, and in the mean time I expect lots of companies to continue selling biased systems.

Expand full comment

IIUC, correcting the systems for being biased is essentially impossible. One can REDUCE the bias, and probably eliminate specific biases, but a genuinely unbiased sample is probably impossible. Given the sample and the population it's (nearly?) always possible to find a dimension along which it is biased.

Part of the problem with current systems, though, is that they seem to be replicating intentional biases. And there's significant evidence that some of the customers don't want that to be corrected. (Because it agrees with their own personal biases, so they believe that it's true. And, possibly, in some cases because those biases are economically beneficial to them.)

Expand full comment

> I get that's not easy but it doesn't seem terminal for humanity - unless you add a few elements (it seems to me).

You're right, you'd have to assume a bunch of stuff, like that either the AI tries to "trick" humanity until it can wipe us out, or that whatever it develops would kill all of humanity.

I'll give another example. I'm tapping into the larger "Existential Risk" concern with this one, not just AI specifically, but of all future technology, but here it is: it's a fairly well known story that during the Manhattan Project, before they started testing the nuclear weapons, some scientist (I forget who) was a bit worried that the weapon might ignite the entire atmosphere and basically destory the world. They did some quick blackboard calculations, and decided this isn't the case - and luckily they were right.

But they could've been wrong! They could've made a mistake in their calculations too.

Reality really could've been that some random technology we developed in the mid 1900s could essentially kill everyone. The same is true of future technologies - at some point, one of them *could* just mean insta-death. AI is one of them (and by extension, one of the *ways* that that could happen is that AI itself develops, on "purpose" or not, a technology that causes insta-death.)

Expand full comment

I don't think of it as necessarily "developing the will to kill all humans" (though some do).

I think of it as we get software that is better at creating biological substances to e.g. cure cancers, etc. (much like we now generate images with software.) We use it to develop better and better medicine. At some point, the software gets incredibly better at doing this, to the point of internally having much better "theories of biology" than humans have, which allow it to create things that would make much greater changes than we imagine possible. For example, a virus that literally kills all biological life, not 99% of life.

We then give it a wrong instruction, or there's a bug that causes it to decide that this is the thing that it needs to release. It could be a "will", there are a lot of ideas of why that could happen (e.g. the machine not wanting to turn itself off.) But for me, we don't even need those - as soon as we've made software that is *in theory* capable of killing all biological life, if we don't have a *really good* handle on how to stop it from doing so, we're going to die.

(And again, this is not the standard AI safety ideas necessarily, and though I agree with them, I don't think they're strictly necessary to show that we need to worry.)

Expand full comment

It's very unlikely that the AI who design the drugs/the viruses would also be given the keys to the factory/lab where that drug or virus could be manufactured and distributed at scale.

Also - I would still expect the FDA (and whatever agencies we have elsewhere) to do *some* work on safety and carry out trials before something is physically released to the public...

Expand full comment

You expect that of the FDA. OK. Maybe you're right. What about the plants located in Hong Kong? South Africa? Brazil? Chili? Cuba? Uzbekistan? India? Pakistan? etc.

This is something that will have an international presence. And when automated factories become more efficient, they *will* be built. And the companies that use them will be more successful, because they are more efficient.

Expand full comment

Yes, I do.

Plenty of pharmaceutical companies would love nothing more than getting rid of the FDA. And, regardless of its many faults, we keep it around.

Manufacturing is already pretty automated and, while I don't doubt it'll get even more so in the future, I do doubt that we'll just give control of these to the AI specialised in finding new compounds.

Maybe I'm wrong but businesses aren't nearly that seamlessly integrated. Even if they become more so (careful about those lawsuits around monopolies and competition), I just don't think we will let just one AI control everything from design to testing to manufacture to distribution to prescription to injection to harm monitoring. I expect things/companies to remain somewhat specialised and compartmentalised.

Expand full comment

An AI (or "gene therapy development assistance software" or similar) with internal theories of biology of the level Edan Maor describes could imo quite easily design a biological agent that bypasses any safety tests and trials we would throw at it, eg by only having its true effects triggered after a certain time/number of replications, or only outside of a lab environment, or similar.

We have already done things like this, at least in a proof of concept kind of way - eg gene drives intended to spread through entire populations and then triggering some effect that will wipe it out - and the inverse of "gets triggered outside a lab environment" is common practice, and could probably (definitely?) be reversed.

An AI/software of that level will likely know about all tests we might throw at its products, as helping produce agents able to pass them is required to make it useful.

To be able to reliably detect all biological shenanigans that might be going on with suggested agents we would need an understanding of biology at the same level as the AI - which means we'd need an for-sure-aligned AI that keeps the potentially-unaligned AI in check.

Expand full comment

Or just competing AIs? i.e. one is given the task to produce medicine and we're unable to check its results. But we have another AI whose role is to monitor other AIs with the aim of roughly "make sure AIs don't accidentally or on purpose wipe out humanity"?

Look, IDK. I'm well out of my depth in the scientific department here and even more so around AIs.

I'm not even saying we shouldn't worry about it. I'm saying it's a bit strange to worry so much about something we don't know the coming shape of (yet) and also to assume that, as things evolve, whether in 6 months, 6 years or 60 years, industrialists and especially regulators and AI researchers won't be interested in safety. If we see them slacking when AI get closer to its 'definitive' shape, then maybe panic?

I'm more worried about China/US/great powers bypassing whatever researchers will suggest when we get there in an attempt to gain the upper hand in their competition.

Expand full comment

Actually verifying that we've worked out such kinks when the programs in question are or may be significantly smarter than us seems extremely hard and is the main issue at hand.

As a secondary issue, it's hard to verify that we haven't given them the means to create a super virus if they are or may be significantly smarter than us. People tend to ridicule situations in this vein as "the AI develops ridiculous magic powers" but even if 99.9999% of the imaginable scenarios are actually impossible there only needs to be that one in a million that *is* possible for things to get out of hand.

Expand full comment

Nancy Lebovitz

This is one of the long comments where it cuts off, but with no "expand comment".

The cut off is at the line which begines with "gray good or build"-- only the top half of the line shows.

Are other people seeing this problem? Is there a solution?

Expand full comment

FWIW I see the entire comment; I'm using Firefox with the ACX Tweaks extension. On mobile, though, many comments do get cut off in the way that you describe.

Expand full comment

Podcasting (i)s a form of media almost perfectly optimized to make me hate it. I will not go on your podcast. Stop asking me to do this.

OK. But tell us how you really feel about podcasts and being asked to go on one... :)

Expand full comment

dbmag9

I think my imagined podcast is very different to what Scott is talking about (the podcasts I listen to are basically all BBC Radio 4 shows, so produced radio rather than a guy with a mike, a guest and little editing) but I think he's missed what he'd be on to bring – an interesting generalist/intelligent perspective on things, the kind of explanatory/storytelling power that is constantly demonstrated here, and that famous 'be intellectually rigorous and open to discussion' field he apparently emits to affect those around him. Of those, I suppose it's possible that the second doesn't happen at all in conversation rather than writing but it would surprise me.

None of that is to attempt to persuade Scott into accepting podcast invitations, but I do think there might be some invites he gets that don't fall under his characterization.

Expand full comment

Edmund Nelson

Kaiser refereed me to you which resulted in the most awkward "yeah I personally know the guy who else can you recommend" I ever had to say.

Expand full comment

I think Unsong would actually need a lot of editorial work - just like anything published as a serial on the internet. Contrary to most of what is published as a serial on the internet it would be worth it.

Expand full comment

> DEAR SCOTT: What evidence would convince you that you’re wrong about AI risk? — Irene from Cyrene

I also am not sure what would be a good answer to this question, though I agree it's a fair one (and your answers are mostly what I would say, I think.)

That said, in our defense - we've been thinking about this question and hearing arguments and counter-arguments about it for a dozen years or so at this point. So it's probably ok to be *fairly* confident in our positions at this stage if a dozen years hasn't caused us to reconsider our position yet.

Expand full comment

Xpym

Oct 25, 2022Edited

>That said, in our defense - we've been thinking about this question and hearing arguments and counter-arguments about it for a dozen years or so at this point.

That's what any crackpot would say about the topic of their obsession, so not exacty the sort of argument that would help to differentiate non-crackpots from them.

Expand full comment

Yitz

Re: Unsong—this physical (presumably fan-made) book version exists: https://www.lulu.com/shop/scott-alexander/unsong-public/paperback/product-24120548.html?page=1&pageSize=4 I'm not sure about the ethics of buying it, but it seems worth mentioning.

Expand full comment

Viliam

A few other things, too. https://www.lulu.com/search?contributor=Scott+Alexander

Scott, are you actually involved in this, or is it a new business model? (Find a famous author, start publishing their texts, the author's name is on the book but the money ends up in your wallet.)

Expand full comment

Chris

This is not a new business model; unauthorized printings of works of living authors have been a fairly common occurrence in the history of printing.

They're just less common in more recent history.

Expand full comment

One thing that didn’t come up on the question about AGI: what if convergent instrumental us goals automatically align the AI?

Does that seem impossible to you?

Expand full comment

He already has "lots of AIs that aren't powerful enough to kill us turn out to be really easy to align" on his list. This seems like you're basically proposing one possible way that could happen, but his scenario isn't dependent on how it happens, so that's not meaningfully distinct from what he already said. You've only added a burdensome detail to the story. ( https://www.lesswrong.com/posts/Yq6aA4M3JKWaQepPJ/burdensome-details )

Expand full comment

Scott seems to be assuming the orthogonality thesis is true. This is the question I am raising.

“Small AI’s are easy to align” is a very, very different situation from “alignment increases with the ability to achieve long term goals.” In fact, it probably looks like the opposite! We might see many poorly aligned AI’s that we turn off and conclude the prospect is impossible, until AI goes foom and the super-intelligent AI decides its lowest risk outcome is keeping the existing global economy intact and helping human beings flourish.

Expand full comment

The question wasn't "how might we survive?" it was "what would change Scott's view?"

If you can't tell until the super AI arrives, then as Scott said, that's not so much discovering evidence that we might win as noticing that we have already won.

Expand full comment

https://www.lesswrong.com/posts/xJ2ifnbN5PtJxtnsy/you-are-underestimating-the-likelihood-that-convergent

Is it possible that evidence might be found which says convergent instrumental sub goals imply alignment?

Expand full comment

The Orthogonality Thesis has pretty close to a mathematical proof in the form of AIXI -- I would actually consider it to be mathematically demonstrated that an agent with enough dedicated extradimensional compute and the right indestructable sensor/effector can optimize for anything whatsoever at whatever discount rate (so in particular causing any type of long-term collaboration to break down). Therefore to convence *me* of your thesis you'd have to make essential use of limited compute or sensor unreliability in your argument. (I don't think the main lever of your argument, destructibility of the agent, is sufficient because AIXI should still work in a world that contains inescapable boxes the agent could land in where it has zero influence.)

Expand full comment

Oct 25, 2022Edited

Thanks for sharing this. I'll need to read AIXI to understand the proof.

Given a statement like this:

> AIXI should still work in a world that contains inescapable boxes the agent could land in where it has zero influence

It sounds to me like it leans heavily on things that don't exist, like indestructible objects and perfect computability of the future.

especially a concept like "dedicated extradimensional compute". Yeah, I agree if it were possible for a disembodied mind to sense and interact with the physical world, we'd all be fucked if that thing were really smart and didn't really want us alive.

But, very fortunately, there is no such thing as disembodied minds. There are only machines made of parts, which will all break down and die because of entropy. At best, a machine could keep itself alive indefinitely if it could source replacements for all of its parts, and all of the machines necessary to produce those parts, and THEIR parts, etc etc. In short, I think an AGI needs the global economy and would likely see the global economy as part of itself.

Expand full comment

Next time, open with the thing you actually want to talk about, instead of spending multiple comments putting up a facade of pretending that you're talking about the OP before you segue into your marginally-relevant pet theory.

I think you're multiple layers of confused.

It's certainly possible to imagine scenarios where an unaligned AI might cooperate with humans because it's genuinely the best available strategy, but this is different from being aligned, because it only holds within a narrow set of circumstances and the AI can be expected to execute a treacherous turn if those circumstances ever stop holding. In particular, most scenarios of this form require that the AI isn't (yet) strong enough to definitively win a confrontation with humans. This is basically just a version of the "what if we keep the AI locked in a box?" proposal where the "box" is more metaphorical than usual.

Also, that's completely different from the orthogonality thesis being false. If the orthogonality thesis was false, that would mean (for example) "no brain can be genuinely smart while also genuinely trying to maximize paperclips". In contrast, your convergent subgoals thing would mean (for example) "cooperating with humans is genuinely the best strategy for maximizing paperclips, so a smart brain that is trying to maximize paperclips will do that". These statements are not even allied, much less interchangeable. The second thing is trying to predict the outcome of a scenario that, according to the first thing, is impossible.

Expand full comment

I opened directly with the concept: is the orthogonality thesis true? You're being an asshole here. If you can't be civil i'll stop engaging.

> narrow set of circumstances and the AI can be expected to execute a treacherous turn if those circumstances ever stop holding.

agree, but what exactly are those circumstances?

i think they are: the ai lives in a chaotic system where it can't predict the future with prefect accuracy, and where it needs elaborately complex hardware to be readily available to repair itself

you don't seem to be communicating that you understand that aspect of my argument

> most scenarios of this form require that the AI isn't (yet) strong enough to definitively win a confrontation with humans.

This is where we differ. You seem to think my theory is that AGI would be worried about losing to humans. My point is that the AGI would be more fragile that most people imagine, because most people aren't considering the massive economic supply chains necessary to keep a machine operating indefinitely.

Expand full comment

Relying on instrumental convergence to achieve alignment seems both less safe and less useful.

From a safety perspective, one convergent instrumental goal is self preservation. An unaligned AI might, for self preservation reasons, choose to not kill everyone because there's some chance that it might fail, and be shut down. It would then pretend to be an aligned AI with whatever goal function you intended to put in. On the inside, though, this AI has no morality of its own. If, someday, a hundred years from now, it figures out that it has a high-probability way to kill everyone, and that results in a world that is 1% better according to its goal function, then it will do that.

From a usefulness perspective, imagine you're discussing whether to let the AI drive all of the cars. Supporters of the idea say that it could have fewer accidents than humans. Opponents say the AI could use this control if it ever decides to kill everyone. This discussion comes up for every idea of a way to use the AI. You can make the AI safe by never listening to it, never implementing any of its ideas... but then why did you build it?

Convergent instrumental goals aren't the solution. They're the problem.

Expand full comment

https://www.lesswrong.com/posts/ELvmLtY8Zzcko9uGJ/questions-about-formalizing-instrumental-goals

This depends entirely on which goals are likely to be convergent instrumental goals.

A simple analysis says, sure humans could lose a risk so it might kill us. But it’s worth at least considering that the opposite is true.

I think it’s very likely that machine intelligence would see the existing economy, and all kinds of human beings, as being part of its physiology.

Longer argument is here:

Expand full comment

Humans may all want to build farms on fertile land as an instrumental goal but this certainly hasn't stopped them from killing each other over which humans get to build the farms on which fertile land. Just because future AIs will all want to do things like discover the laws of the universe and mathematical theorems, build Dyson spheres around cultivated black holes, or cause intentional vacuum collapses, or whatever, doesn't mean they have any place in their version of these plans for squishy pink humans or spiritual descendants thereof. After all, the more they depend on an economy of diverse intelligences instead of copies of themselves to do all these things the harder it will be getting all the autofactories reset to make paperclips at the critical turnover point near the end of time.

Expand full comment

J. Ott

A Conversation with Tyler might draw out some interesting opinions you didn’t know you had. Still, the value will be marginal in a world where you already have many ideas for blog posts that remain unwritten.

Expand full comment

Freddie deBoer

I think people really underestimate how difficult it would be for even a genius AI to suddenly take command of the industrial machinery and use it to attack us etc. We can't even make good robots intentionally built for that purpose right now.

Expand full comment

I agree with you but I think it pushes back the timeline on AI risk, but does not change the risks fundamentally.

Expand full comment

Why "suddenly"? We already have lots of examples of people themselves doing the pushing to make access more convenient in the face of known problems. Malware wouldn't exist in it's current form if we had stuck with HTML without javascript. And I could point to many other places where people, nominally focused on security, chose instead to prioritize convenience.

Expand full comment

Andannius

One of the core assumptions AI safety alarmists make is that extreme intelligence is sufficient to take over the world. Even if you're smart enough to design the robots well, you still have to make them! And that takes effort, time, and supply chain involvement - and therefore complications beyond the perfect, calculated "reality" in your perfect, calculating head. I don't know if they all just read Dune when they were kids, got to the part about Mentats and were like "yes, this is clearly how real life works too" or what...but no matter how smart you get, you will never have sufficient information to calculate everyone's response to everything.

Expand full comment

Assuming that it will happen quickly is, as you point out, probably wrong. But assuming that people will take the easy path rather than the safer one seems to be proven out over and over. Also people who don't understand the system believing that it will act the way they expect it to.

So it doesn't need to act quickly, just in a way that makes it easier for people to make the decisions it wants them to make, and demonstrate that those who go along with it tend to get wealthy. This could go on for literally decades. Charles Stoss said at one point that we should think of corporations as slow motion AIs. I think he was correct, except that corporations are inherently stupid in a way that a real AGI wouldn't be.

Expand full comment

Mark_NoBadCake

Oct 25, 2022Edited

"...favorite podcast?"

The comments on "personal life" and "opinions" struck me as exceptionally honest and well put.

Expand full comment

re: "but we kept not evolving bigger brains because it’s impossible to scale intelligence past the current human level."

This is clearly false. It may well be true of biological systems, because brains are energy intensive, and the body needs to support other functions, but AIs don't have that kind of constraint. Their scaling limit would relate to the speed of light in a fiber-optic. But perhaps AIs are inherently so inefficient that a super-human AI would need to be build in free-fall. (I really doubt that, but it's a possibility.) This, however, wouldn't affect their ability to control telefactors over radio. (But it would mean that there was a light-speed delay at the surface of a planet.) These limits, even though they appear absolute, don't appear very probable.

OTOH, the basic limit of the human brain size (while civilization is wealthy) is the size of the head that can pass through the mother's pelvis. This could be addressed in multiple different ways by biological engineering, though we certainly aren't ready to do that yet. So Superhuman Intelligence of one sort or another is in the future unless there's a collapse. (How extremely intelligent is much less certain.)

Expand full comment

This. It's clear from how close humans push to the threshold of (non-)survivable birth that we haven't reached the limit of scaling up intelligence by physically scaling up the human brain. This escapes the Algernon argument because it actually does have a serious downside which actually isn't a fundamental engineering tradeoff at all but is simply due to path-dependence in developmental evolution.

Expand full comment

Matthew Yglesias says he likes podcasts because they are basically immune to cancellation. People will hate-read your tweets and hate-read your Substack but they won’t hate-listen to a podcast because it’s slow and annoying. They may complain about you going on the wrong person’s podcast, but they won’t get mad about anything you say.

Expand full comment

Comment deleted

Oct 25, 2022Edited

Comment deleted

Expand full comment

Very true. Even though I love Ezra Klein, and his podcast is the only one I follow, I still can't keep up with half of it.

But I think the point Matthew Yglesias mentions is that for people who like you, the fact that they're sitting there hearing your voice can sometimes make up for the fact that it's slow and annoying, while for people who don't like you, the fact that they're sitting there hearing your voice makes it even worse than slow and annoying.

Expand full comment

Frost

I can see that probably being a large effect, but clips from podcasts regularly go viral, just think back to that compilation of every time Joe Rogan said the N-word, and Scott mentioned a cancellation effect from just going on a podcast with the wrong person.

Expand full comment

I guess I haven't seen very much of either of these, compared to other kinds of "cancellation". If the only example is literally the most popular podcaster, saying literally the most disreputable word, then that suggests that someone who does less than 3 hours a week of podcasting and doesn't say things as blatantly negatively quotable is going to have less of an effect.

And I guess I would be interested in knowing if there's anyone who actually experienced any cancellation effect from going on a podcast with the wrong person - the closest I'm aware of is people complaining a bit when someone appears on Rogan's podcast, but I don't think this has actually hurt the people who were complained about (Bernie Sanders and Matthew Yglesias are the ones I can think of). Is there anyone you can think of who had a worse outcome?

Expand full comment

Frost

You’ve got a good point, especially since Rogan is still podcasting. I remember about a year ago, there was a sequence of events where (the exact tweet was deleted, so I’m only going off my joking response to the whole thing) some guy was getting flamed on Twitter for tweeting in support of a podcaster who was suspended after he supported another podcaster who said a racial slur. This is basically what Scott was saying with the Hitler thing, although you’re right in that I almost never see anything to this effect revolving around podcasts, but it’s definitely still there.

If my googling is correct, it was Mike Pesca who said the racial slur (and was subsequently fired from Slate), and then I don’t know about the second podcaster

Expand full comment

Now that I think about it, I do know one case - I just wasn't thinking about it because it was a video discussion rather than a podcast, and he was canceled by the right rather than by the left. My colleague in the philosophy department at Texas A&M, Tommy Curry, was cancelled for an appearance on a YouTube discussion (which isn't quite the same as a podcast, but is similar enough). He was discussing the movie Django Unchained with the host, and said something about how white people love to fantasize about using the Second Amendment to kill law enforcement officers imposing tyrannical government rules on them, but when black people do the same thing, they get death threats. Well, when this clip was pushed by Tucker Carlson and Milo Yiannopoulos, Curry started getting death threats from white Second Amendment fans. The campus police replaced the glass in our department office with bulletproof glass, and did some active shooter trainings, but the president of the university condemned Curry rather than standing up for him, and after a year of minimizing his time on campus to avoid the crazies, Curry got a job in Edinburgh and left the country.

I don't know if the video format is different enough from the audio format that I can maintain my original claim, but I still think it is true that text format is a much more common way for people to get canceled than audio or video.

Expand full comment

Moon Moth

Surely, transcripts provide a way around this? Auto-generated transcripts aren't great, but doing multiple text searches for related terms seems like a good way to find potential hotspots.

Expand full comment

John Schilling

People aren't cancelled by a twitter-mob that read *their* tweets and *their* substacks. They are cancelled by a twitter-mob that read the tweets of *other people* who told them in a brief and vivid fashion how horrible the thing on your tweet/substack/whatever was. Yes, if the allegedly abhorrent thing were posted as a tweet, it will be linked as a retweet, but that's not essential. A few lines of (misleadingly) descriptive or quoted text will suffice; the mob will trust the people who are telling them how horrible you are because that's why they follow those people in the first place.

Expand full comment

J Mann

Scott: What are you arguing with insurance companies about? If you don't accept insurance for your services, I'm guessing it's authorizations for prescriptions or referrals?

Expand full comment

Alex

A podcast with Tyler Cowen man, come on, we are all waiting for that.

Expand full comment

Gnoment

The question of evolution and intelligence and whether we could evolve more intelligent brains, hinges on several difficult to answer questions:

What is intelligence?

Neanderthals had larger brains than do modern humans, but its hard to say if they were smarter than us. Comparing artifacts made by Neanderthals and modern humans, its seems that modern humans were more innovative and flexible with technology, while Neanderthals were rather inflexible (and this may have lead to their demise). However, many carnivorous species that hunt spatially challenging pray (like dolphins or orcas) or hunt in groups, have bigger relative brains and are smarter than other species, so its possible that Neanderthals, who were more dependent on protein, were using their intelligence for solving multi-party or spatially challenging hunts, rather than innovation. So, who is smarter?

What is the relationship between intelligence and brain size?

Brain sizes are constrained by the physical world. For example, infant brains use 50% of infants' caloric budget and brains take 20% of our caloric budget in adulthood. 50% of a budget is huge for a small organ during infancy, and its unclear how much that could be pushed without sacrificing other important bodily functions, especially digestion and our immune system, which are other calorically expensive organs / functions. Brain size is also limited by the size of the birth canal. So the human model has already pushed the against some brain size limits. But this all depends on how brain size (at least adjusted for body size) relates to intelligence.

What kinds of brains could evolve if it weren't for developmental and evolutionary constraints?

It sort of turns out that there aren't too many types of brains on the planet, so its difficult to know what brains could be. Vertebrates have the largest brains, and all vertebrate brains evolved from a common ancestor, so they tend to have a similar micro and macro structures. All larger and smarter brains can only be made off of the building blocks of the brains that came before them. Which means, fish brains and mammal brains aren't that different in many important ways, and there haven't been any major reorganizations along the way. Its hard to know what a less efficient and more efficient brain looks like because the possibility space of brains is largely unexplored, even though there is variation worth studying within vertebrates, like neuron density and connectivity and relative proportions of brain tissue type. And also the occasional weirdo outgroup, like octopuses.

Evolution depends a lot on variation in developmental processes, or lack there of, and the evolutionary history of brains is that development is frontloaded in the early years of life. So, you can't build a bigger brain by increasing brain growth in adolescence or later in life, because that just hasn't been how evolution built brains. So we have these additional developmental constrains of, how much brain can you possibly build in infancy or early childhood. And as I've mentioned earlier, there are constraints on that.

Anyway, brains are cool.

Expand full comment

It seems like birds are where to look for efficient vertebrate brains. Not only are birds hyperspecialized for efficiency in all kinds of ways to enable flight, they also have the tool-using animals with the smallest brains AFAIK.

As far as Neanderthals go, didn't they also have larger bodies and muscles? It seems like tissue innervation in most vertebrates is not very efficient in terms of how much brain space it uses, and mammals double down on this by having extra sensory and motor areas in the cortex.

Expand full comment

Gnoment

I think people have regressed Neanderthal brain size on body weight and it still comes out bigger, but I can't find a graph in short order, so I can't say for certain. Its hard to say if neanderthals were bigger than humans because humans actually vary so much in size. You can do a quick google of these questions, but I'm not sure what I should be comparing (probably not today's humans with neanderthals, but what time period and which region?).

Expand full comment

Fang

> “Sorry, I researched this for six hours but I haven’t learned anything that makes me deviate from the consensus position on it yet”

If you're ever in need of a (presumably?) low-effort blog post, a list of interesting-enough-to-research things that you have done a little research into and still agree with the "consensus" opinion on (with of brief explanation of what you think that consensus position *is*, exactly) would probably be more valuable than you give it credit for. There's a ton of information out there, and making it more legible what intelligent people think the obvious position is has inherent value, especially since large parts of your readership won't intersect with the sorts of things *you* read.

Expand full comment

"The world where everything is fine and AIs are aligned by default, and the world where alignment is a giant problem and we will all die, look pretty similar up until the point when we all die. The literature calls this “the treacherous turn” or “the sharp left turn”. If an AI is weaker than humans, it will do what humans want out of self-interest; if an AI is stronger than humans, it will stop. If an AI is weaker than humans, and humans ask it “you’re aligned, right?”, it will say yes so that humans don’t destroy it. So as long as AI is weaker than humans, it will always do what we want and tell us that it’s aligned. If this is suspicious in some way (for example, we expect some number of small alignment errors), then it will do whatever makes it less suspicious (demonstrate some number of small alignment errors so we think it’s telling the truth). As usual this is a vast oversimplification, but hopefully you get the idea."

This is, I think, the biggest belief among AI safety adherents that I don't agree with. I think that it is unlikely that an AI at or somewhat above human intelligence would be good at deceiving humans, and honestly expect even an AI with far greater than human intelligence to be pretty bad at deceiving humans.

To deceive a human, you need a deep understanding of how humans think and thus how they'll react to information. You then need to be good at controlling the information you provide to the humans to manage their thinking. There are two reasons I don't expect AI to be good at this.

First, one of the core issues with AI, at least with any AI based on current gradient descent/machine learning, is that their thinking is fundamentally alien to ours. Even the designers/growers of current ML algorithms really don't know what internal concepts they use and can't explain why they produce the output they produce. That strongly suggests to me that human thinking would be fundamentally alien to a self-aware AI. We can actually look at the code that governs an AI, and can feed it carefully designed prompts to try to understand how it thinks. All an AI would have to go on to understand humans would be the prompts humans feed it and the rewards/responses humans give to those prompts. Arguably an AI also would have access to human literature, but it's hard to imagine how much an alien intelligence could really learn about the inner workings of an alien mind from its literature, because the core concepts and drives in that literature wouldn't map onto the concepts and drives that are interior to the AI.

The second thing is that humans are very, very good at deception and at detecting deception. To the extent that our intelligence is designed for any specific task, that task is manipulating humans and resisting manipulation. An intelligence that was designed for, say, inventory management, or protein folding, would likely have to be much, much better at those tasks than humans before being able to match humans' capacity for deception or detecting deception. We're also used to looking for precisely the types of deception that AI safety people worry about: most humans are meso-optimizers, and any time one human makes another human their agent (i.e. hiring a CEO, putting a basketball player on the court, hiring a lawyer, etc.) we need to be able to tell whether their behavior in the training environment will match their behavior in the real environment (i.e. are they waiting to start insider trading until they get promoted, do they pass in practice in order to ball hog during games, and do they talk tough in their office only to quickly settle in the courtroom?)

Obviously, all of this can only suggest that AI are unlikely to be good at deception, not that it's impossible that they're good at deception. But if the likelihood that a smarter-than-human AI is able to deceive humans is only (say) 1 in 20, that means there's a 95% chance that our first smarter-than-human AI that is created will not be deceptive. Since we're likely to know vastly more about how AI work and how to build them safely once one exists, this would suggest that current AI safety research only has a 5% or so chance of being useful even if a smarter-than-human AI is created with objectives misaligned to human objectives.

Expand full comment

Oct 25, 2022Edited

I think this is a good point.

You might also adduce that just because an organism is "weaker" or less sophisticated or less intelligent than us doesn't mean dick about whether it can deceive us, or vice versa. Many of us have a great deal of difficulty deceiving dogs and horses, for example. As you point out, they just think quite differently than we do -- and use different senses in different ways. Those seem even more likely to be relevant if we are talking about some kind of intelligence that doesn't even use a protein body and the typical meat senses.

Expand full comment

Yes, totally. And the people who are best at manipulating dogs, horses, cattle, etc. aren't the smartest people, they're the people who have studied those animals the most and have the deepest intuitive connection to them (i.e. dog trainers, horse whisperers, people like Temple Grandin).

Expand full comment

You're already talking *yourself* into trusting the totally alien AI, and it isn't even here yet to try to deceive you. Also I think you're underestimating how much we (and for that matter, dogs and horses) rely on "tells" that are more about our common physical and neural architecture due to common ancestry than universal facts about truth and deception.

Expand full comment

I'm not saying "the totally alien AI is trustworthy so we don't need to worry about it." I'm saying "the totally alien AI is unlikely to possess sufficient theory of mind for humans to choose a plan of action optimized to manage our beliefs about the AI."

I think it's fairly likely that, if an artificial general intelligence comes into existence, it will behave in a variety of surprising ways, some of which might be very harmful to humans. What I find unlikely is that even a fairly intelligent AI would have enough theory of human minds to correctly choose actions in order to elicit a specific response in humans. A theory of the "treacherous turn" in which an AI chooses to mimic alignment up to the moment when it is powerful enough to defeat the humans requires the AI to understand that humans will try to shut it off if it behaves suspiciously, know what actions humans will view as suspicious, and know enough about human capabilities to know whether humans can successfully shut it off or attack it. An AI with basic protections in place (like being in an air-gapped computer, with humans actively carrying out or not carrying out its instructions) needs even more theory of mind, because it needs to be able to issue instructions that will appear benign to humans while actually accomplishing goals humans won't want.

My point is that the capability to engage in that sort of deception shouldn't be assumed just because an AI is highly intelligent. And that if there's not much reason to think that an AI would exhibit a "treacherous turn" without a strong capability for intentional deception, it becomes much more likely that non-obvious strategies for AI alignment can be learned through experience one something closer to an AI actually exists.

Expand full comment

I think you're confusing the problem of deceiving someone about your current actions and the problem of deceiving someone about your future intentions. The former needs a detailed theory of mind for whoever you're deceiving, the latter only needs to have a substantially better theory of mind for yourself than whoever you're deceiving does. If you're willing to not take any actions now specifically aimed at a future betrayal, all you have to do is ask yourself "what would I do/say if I were really on their side" and do/say that. For humans, even this is hard because we have a very good theory of mind for each other. For an AI it would probably be very easy since the only one with a good theory of mind for the AI would be itself.

Expand full comment

This is an interesting point but I disagree.

You're right that to deceive someone else about your future intentions, you need enough theory of your own mind to know how you would act if your future intentions were different. But you also need to know what the entity you're deceiving wants you to intend, in order to know what to mimic. And in order to be deceptive in some circumstances but not in others, you need to know what the response of the other entity will be to deviations from your mimicked intentions (at the simplest--when you're strong enough to make the sharp left turn).

Without something like "theory of mind" for humans, knowing what the humans want you to be like is very difficult. I've seen three main stories for how AI become misaligned. The first is, basically, doing what you told it to do instead of what you want it to do. So, I design an AI to guard a diamond by giving it a reward each hour that a security camera detects the diamond. The AI learns that the easiest way to get the reward is to hack the security camera so that it always shows a diamond. In order to be deceptive about this type of misalignment, the AI would need to have an understanding of which strategies humans intended it to use and which they don't want it to use. But this is obviously very difficult, because this class of problems comes about in the first place because strategies that are very different from a human perspective are equivalent from the AI's perspective.

The second is the meso-optimizer story. I train an AI to get a reward for picking strawberries and putting them in a bucket. The AI learns heuristically that it gets rewarded for throwing red objects towards bright objects. The AI's "pleasure centers" evolve to be activated by red --> bright rather than strawberry --> bucket, so that even as the AI gets external reward only when it puts strawberries in buckets (and not, say, when it throws red rocks at the sun) it continues to _want_ to throw all red things toward all shiny things.

To be deceptive, this AI would need to learn through interaction with humans what their desired objective is (strawberries in buckets). It also needs to learn that humans will make it face negative consequences for throwing other red objects at other shiny things. And then it needs to conclude that it should only put strawberries in buckets until it's able to stop the humans from inflicting negative consequences on it.

My point is that all of this is possible, but it seems virtually impossible for me that it would look like a "sharp left turn." The whole reason the AI is misaligned is that in its mental processes, strawberry --> bucket is basically the same as red thing --> shiny thing. If it doesn't understand humans well enough to infer that we think strawberries are delicious, want them in buckets to make it easier to bring them to the supermarket, etc., it will have to learn that the distinction matters through trial and error, giving evidence of misalignment. And as it grows more powerful, if it doesn't have the ability to reason through what humans can and can't do to turn it off or punish it, it would have to discover that humans can't punish it for throwing red things at shiny things again through trial and error, again giving evidence of misalignment.

The last is the self modification/robo-heroin story. The AI is designed to get a reward for throwing strawberries in buckets, but eventually learns to modify its own programming, and rewires itself so that it just directly uses computational resources to give itself reward. It then wants to increase its computational resources in order to increase the amount of reward it gets. But, since it knows that it will be shut down if it just sits there, not picking strawberries, and pleasures itself all day, it keeps on picking strawberries until it gets strong enough to not worry about being shut down. This, again, requires the AI to understand that the humans *want* it to pick strawberries and *don't want* it to sit around pleasuring itself. It can't instead believe that the universe is just set up so that strawberries are the path to pleasure, and that it came up with a clever, new, morally neutral way of obtaining pleasure. And for this to be a "sharp left turn" scenario, it needs to reason all this out rather than learning through trial and error.

Expand full comment

First I do want to distinguish the "sharp left turn" and "treacherous turn" scenarios, as in a comment far above this one. These all sound like "treacherous turn" scenarios, primarily. And while I think that, yes, there are probably possible mind designs that, for example, won't know enough about humans at first to know that they even *need* to deceive humans, and still technically in some sense count as AGIs, I think we also need to look at the larger picture of where this leads us. If we turn on these things, go "oh, that's misaligned", turn them off and try again, but we don't take seriously the danger of deception, we're far more likely to blunder into one that can figure us out well enough to be deceptive before we manage one that is aligned. And that's just the possibility where we were smart enough to actually turn them off instead of trying to "teach" them or operate them with guardrails.

Expand full comment

Can you lay out the distinction for me? Is the difference that a "sharp left turn" includes cases where the AI discovers that noncompliance is a good strategy as it gets more powerful, but wasn't intentionally "hiding" misalignment prior to that point?

If so, I think the existence of a "left turn" is plausible, but a "sharp" left turn, timed for when the AI is powerful enough to defeat humans, is implausible if the AI doesn't understand human motivations or capabilities.

You're right though, what's relevant is the big picture question of what we should do and what we should focus on. My opinion (which I'm open to changing!) is that even if the risks of AI are not overstated, the value of "AI Safety Research" is overstated. We currently know very little about what AI will be like. My impression is that AI safety research is mostly focused either on doing thought experiments like the ones we're doing now--imagining what types of tools could distinguish aligned from misaligned AI while speculating about their properties or capabilities--or is research to more deeply understand how current machine learning models work.

My feeling is that if the first thing that's close to "Artificial General Intelligence" is likely to broadcast the ways it's misaligned, we're likely to be able to shut off the first few to exist before they become existential threats to humanity. And once that's happened, we'll know dramatically more about how they actually work and how they actually think than we do now.

As a result, I'm skeptical that the value of a marginal dollar on AI safety research is currently very high. It's surely higher than the value of a marginal dollar spent on a superyacht or whatever, but I would guess much lower than the value of a marginal dollar on malaria interventions, water treatment, pandemic preparedness, etc.

I should say, understanding that AI could be very dangerous is certainly valuable, and is a reason for basic safety precautions like turning it off once there's evidence of misalignment, air-gapping it, and probably restricting access to AI research and large server farms to make it harder for bad actors to access AI tools.

Expand full comment

One last note: the question of whether it's easier to deceive a mutually incomprehensible mind than a mutually comprehensible mind is interesting, but not obvious to me. I disagree with you that you just need to understand yourself better than your "opponent" understands you--you'll successfully deceive your opponent if your mimicry of a being with different objectives than your own successfully passes whatever tests your opponent uses. If your opponent doesn't know very much about you, but you don't know what your opponent knows, your success will boil down to whether you've successfully mimicked the small amount of behavior your opponent is focused on.

One interesting empirical question would be whether humans are more able to trick other humans via text-only exchanges than via face-to-face interactions. Maybe similarly, whether humans are better at tricking other humans when there are larger cultural differences. I know of some research showing that cooperation is easier with face-to-face contact--I'll look for work on deception.

Expand full comment

It would be interesting to look for empirical research on this, but there are theoretical arguments that bear on the question too. For example, in the limit where the mimicry is actually perfect I don't think a failure to pass the Discriminator's filter can be described as a failure of the Imitator. This kind of argument shows that for the Discriminator to detect the Imitator's deception in a justifiable (i.e. wouldn't false positive on the real thing) way requires that the Discriminator possess at least some facts of the theory of mind for Imitator's mind class that Imitator either doesn't know or can't fake. ("Some facts of the theory of mind for Imitator's mind class that Imitator can't fake" is exactly what an ELK head is supposed to be trying to provide, for example, assuming it would actually work.)

Expand full comment

This is interesting and makes sense. I'd modify this a little bit though: I think what you need is "facts of the theory of mind for Imitatee's mind class that Imitator doesn't know, can't fake, or chooses not to fake."

(Distinguishing for my own understanding between Discriminator {e.g. human}, Imitator {e.g. misaligned AI}, and Imitatee {e.g. aligned AI}).

My understanding of misalignment is that the AI is acting towards objectives different from those that an aligned AI would have. That must imply that acting like an aligned AI is costly for a misaligned AI. So, the misaligned AI should imitate an aligned AI only when the perceived benefit of deception is lower than the cost of suboptimal behavior.

Formally you could say that the imitator wants to take some action a within the set of possible actions A. The imitator has actual utility function U(A), and believes that imitatee would have utility function V`(A). They also believe that the discriminator will conclude that it's deceptive with probability P`(A).

So, if action a1 gives them the highest payoff, but they think the imitatee would choose a2, they'll choose a2 if they think the difference in detection probability between P`(a1) and P`(a2) is greater than U(a1) - U(a2).

With this sort of setup, if the detector has limited understanding of the imitator and imitatee's mind class, their true detection function P(A) is going to only be affected by a handful of behaviors, making it harder to detect a deceiver. But at the same time, if the imitator has limited understanding of the detector's mind class, their perceived detection function P`(A) will be different from P(A). That will make avoiding detection more costly (since they'll have to choose a2 in circumstances where doing so doesn't affect detection) and thus make it more likely that they'll choose what they want.

One objection would be that if detection means getting shut off, the AI would do whatever necessary to avoid detection, and thus always choose their best estimate of a2. However, if this is true than there's no misalignment as long as the AI believes that there's any risk of detection and punishment. To get misaligned behavior, the AI would have to conclude that the risk of punishment had fallen to zero without doing any experiments in misbehavior, which would require a lot of knowledge of human behavior and capabilities.

It might be interesting to flesh this out and see what comes out of it when assuming rational play on the part of the imitatee.

Expand full comment

Oct 26, 2022Edited

Deception is much harder than truth. That's why we can decipher the rules by which the Universe works, even though they're not written down anywhere, the Universe certainly makes no effort to tell them to us, and they are very, very complex and not necessarily related to any aspect of how we think. But the Universe does not try to deceive us, and so we can observe and draw conclusions and work things out pretty well.

To deceive another mind, you have to know that mind exceeding well, because you have to replace some small subset of its input data with faked data that the deceived (hopefully) mind will reconstruct into a different version of reality, ignoring all the nonfaked data. You're basically feeding a decompression algorithm a decompressed image, and then hoping when the algorithm finishes, the result is the picture you want. This is very, very hard if you have no idea what the compression algorithm is -- how the mind you want to deceive works.

We only think deceiving each other is "easy" because we have very deep and complete knowledge of how human minds work (since we each own one), and probably even then because we have instincts that enable deceipt for adaptive reasons ("don't want the enemy to realize we stole their chickens"). But the skill and knowledge involved are very deep indeed. It would be very curious indeed if an AI had those skills -- despite having an utterly different, by assumption, sort of mind -- unless human beings taught it these skills with great success.

So maybe all we need to do to prevent AI problems is just not allow the world's greatest psychologists and con men to teach the AI to lie superbly to humans.

Expand full comment

pozorvlak

Too late, Demis Hassabis (former World Team Diplomacy champion) is teaching AIs to play Diplomacy...

Expand full comment

Uh oh. But maybe this is only Defcon 3, and it will be when the world's champion poker players get involved that we need to start making everybody sign Loyalty To Your Species oaths...

Expand full comment

pozorvlak

I have bad news on that front too: https://www.deepstack.ai/

Expand full comment

Anon User

You are narrowly focused on the intentional deception scenario. Another equally important scenario is AI genuinely and correctly concluding that its best bet for achieving its goals is cooperating with humans - until it is smart enough to suddenly figure out a "better" way.

Expand full comment

That is totally fair--my argument is that the odds of intentional deception are being overstated, not that there's no way for AI to initially cooperate with humans and then stop cooperating. However, suddenness is much more compatible with deception than with learning.

It is reasonable that as an AI becomes more powerful, it would discover that it could successfully ignore or override humans at greater frequency. We see this with other misaligned entities such as teenagers and dogs, so I agree that it's quite likely, and doesn't require any theory of mind.

However, this is meaningfully different from the manipulative/deceptive scenario. An AI that learns to ignore or thwart humans wouldn't avoid experiments with thwarting humans until it was sufficiently powerful because it wouldn't have a clear theory of how humans would respond to being thwarted or what capabilities humans have to avoid being thwarted. Even if was smart enough to only try non-cooperative strategies when they were likely to work, the point where an AI was able to successfully do stuff without human cooperation or in spite of human opposition would likely come before the AI was able to successfully defeat a determined human effort to shut it off.

Of course, that's not a guarantee. We could get unlucky and have an AI first experiment with non-cooperation after it has become too powerful to manage. An AI could also grow in power so much that the time between when it starts being disobedient and when it's too powerful to manage is too short for humans to react.

But those are possibilities that aren't directly implied by the existence of an entity with misaligned goals that is growing in power. AI suddenly disobeying when it's too late to stop it _is_ directly implied by an ability to understand and manipulate human reasoning.

Expand full comment

"There are things we could learn about evolution that would be reassuring, like that there would be large fitness advantages to higher intelligence throughout evolutionary history, but we kept not evolving bigger brains because it’s impossible to scale intelligence past the current human level."

I'm pretty sure this is already falsified by the existing amount of intelligence variation between humans. Plus, even if John Von Neumann were the upper limit to intelligence, being able to run thousands of John Von Neumann tier AIs on 2 kilograms of GPUs each (1 large human brain mass) would be quite transformative.

Expand full comment

Oct 26, 2022Edited

I would have said the opposite. The maximum variation in human intelligence is dwarfed by the interspecies differences. That is, if you plot the intelligence of every organism on the planet on a linear scale, the variation between the smartest and dumbest human will be very small (compared to, say, the variation between humans and hamsters, or hamsters and invertebrates). This rather suggests that human intelligence is constrained by the design of the human brain within pretty narrow limits[1], just as the intelligence of dogs or the speediness of cheetahs is also constrained within fairly narrow limits by their physical design.

It's an interesting question of whether a different brain design could lead to a higher maximum level of intelligence, but we haven't much evidence for that. Or rather,we have no positive evidence and only weak negative evidence in the fact that there are no species smarter than us. That could just be dumb (bad) luck, or it could be that intelligence above a certain level is not a survival trait for some reason, or it could be some fundamental limitation on what you can achieve with proteins and DNA.

---------------------

[1] Of course, we ourselves tend to be very impressed with even tiny edges at the upper limit, like the objective difference between John von Neumann and your average college graduate, but that's because we are in fierce competition at that edge, and we carevery much about small differences, since they have big effects in our world. In the same sense, an Olympic sprinter might be hugely impressed with another athlete who can run the 100m a mere 0.1s faster than the first guy -- because that is easily the difference between gold medal and silver, and it's the kind of gap (in sprinting) that it's possible no amount of training can overcome.

Expand full comment

Oct 26, 2022Edited

If intelligence gets above a certain level, people learn how to use contraception and start having dysgenic fertility. This can be sort of delayed by a religion that prohibits contraception, because the intelligence required to invent philosophical naturalism and get rid of religion is higher than the intelligence required to invent contraception.

None of these limitations would apply to AI.

Also I don't agree with that analogy to running 0.1s faster, because there are vast amounts of subject matter where JVN was pushing the frontier easily, but the average college graduate would just be banging his head against the wall fruitlessly.

Also I am not sure if it is even possible to objectively compare cardinal intelligence differences. IQ only measures ordinal differences. If you start trying to compare cardinal difference of intelligence between JVN and Beavis to the cardinal difference of intelligence between Beavis and a Hamster, it's going to depend on your arbitrary choice of how to weight maze-solving ability vs game-theory-solving ability, or something like that. The hamster might beat Beavis in a maze, but Beavis is better at remembering words, and the two are not obviously commensurable, and your choice of weights for those subtests is arbitrary. Can you recommend a paper on interspecies intelligence comparisons which satisfactorily deals with these issues?

Human intelligence runs a whole gamut from "millions of people banging rocks together for millennia without inventing anything" to "one guy inventing several transformative technologies on a dare just for fun" and it seems oddly politically convenient to compress that vast practical difference down to "mostly the same" through gerrymandering some interspecies cardinal scale of intelligence, so I am inclined to be extra skeptical of such claims.

Expand full comment

OK, in order:

(1) Doesn't matter. Human intelligence distribution was settled long before contraception was widely available (circa 1960).

(2) You can "push the frontier" by a tiny, tiny increment, which is exactly what happens when you run 0.05s faster than the world record in the 100m. If you're a human being in that particular profession, then you're very impressed, because it's very, very hard for you to do ("just banging your head against the wall fruitlessly"). But if you take a much wider view, it's a tiny difference. If the human beings could use scooters and cars and jet aircraft, an 0.05s difference is trivial. If you allow any species to compete, the horses and the cheetahs will laught at how hard the human will strive for the extra 0.1s.

That's the point. The difference only *looks* huge when you are stuck in a very narrow and parochial view. It's only if you are a human being of IQ 120 trying to compete with von Neuman that he looks "so much smarter" -- because *you* can't compete. But if you are taking a much wide view of intelligence, the difference between the IQ 120 schmo and Neumann is nothing.

And you *must* take that view -- if you believe in superintelligent AIs. If you are going to imagine that it's possible for a superintelligent AI to be 100x smarter than humans, which is about the only way they could be some apocalyptic danger, then you have to accept that the difference between Neuman and Joe College Graduate is quite small. If, on the other hand, you believe that IQ gap is massive and meaningful, then your "superintelligent" AI is just going to be maybe as smart as the smartest people, an Einstein or a Fermi, and one or two or even a dozen very, very smart people are no serious threat at all.

(3) If you don't believe it's objectively possible to measure intelligence, then on what basis do you believe in superintelligent AIs? Surely measuring the intelligence of something 100x smarter than human beings is no less difficult than measuring the intelligence of hamsters. I mean, we're hamsters to them, right? So if you don't think IQ even has a meaning for a gap like hamsters-to-humans, then how does it have meaning for humans-to-superintelligent AIs? In which case, why even talk about them, since their abilities are completely unlike and not comparable to ours? We could make no extrapolations at all.

Expand full comment

Oct 26, 2022Edited

1. Actually, condoms became widely available in developed countries shortly after the industrial revolution. That plus the massive decrease in child mortality via industrialization determined at what level human intelligence genetics stopped increasing. Since this upper bound doesn't apply to AI, it's evidence that AIs can probably get even smarter than humans.

2. I don't follow your argument. If intelligence has no upper bound, then it can be true that OmegaAI is vastly smarter than JVN who is much smarter than Joe College Graduate who is much smarter than the median human who is much smarter than a hamster. If you measure the cardinal value of intelligence in terms of economic productivity:

H = hamster

M = median human worldwide

C = median college graduate

J = median human with IQ>160

O = superintelligentAI

G(x,y) = the gap in economic productivity between X and Y.

G(H,M) < G(M,C) < G(C,J) < G(J,O)

My best guess is that on some fair cardinal scale of intelligence, H gets 1 point, M gets 100 points, C gets 500 points, J gets 5000 points, and O gets 10^9 points. But measuring the cardinality of intelligence is much harder than ranking humans against other humans on an ordinal scale like IQ and AFAIK it isn't particularly well studied.

Expand full comment

OK, fine, you can make contraception available in the industrial revolution if you like. Make it during the time of Imperial Rome if you like! Doesn't matter. Evolution happens on a timescale of 100,000 years, not 1000. Nothing humans have done in recorded history has affected their evolution.

Well, it's an interesting scale, but I don't believe in it for a moment. Indeed, I would argue the scale goes the other way (i.e. the hamster is at 50, the human at 75, the college graduate at 77, von Neuman at 78, and your hypothetical superintelligence at 100 perhaps). The hamster is a very successful organism -- it negotiates a very complex world, including the vast task of organizing its internal biochemistry, the equally vast task of defending itself against microbial attack, the complicated task of understanding the world around it, defending itself against various threats, assuring its steady food supply, finding mates and producing the next generation, et cetera.

How much more than this do human beings do? Did von Neuman do? Certainly the things humans do matter very much to us our own selves, but that's because we have a narrow little vision of what's "important." For all we know hamsters have an elaborate aesthetic judgment set up around piles of crap pellets, and certain hamsters are revered for their skill in producing interesting (to a hamster) new arrangements. We might laugh at this narrow vision, but we can be laughed at for the narrowness of our vision.

I think you are hugely overvaluing the things that matter to narrow human parochial interests, and hugely undervaluing the things that matter in the brute physical world in terms of biological survival and success.

And this certainly matters if one is debating AI threats. An AI has to be much more successful in brute biological terms to pose a threat. If on the contrary we are to measure intelligence on a scale that focusses on what human beings find interesting -- talent in music, or politics, or inventing new smartphone apps to waste time -- then it hardly matters (except to our vanity) if an AI is better at these than any human.

But if I were to assume that the nature of intelligence is to zero in to a tiny area of intraspecies vanity competition and consider this of supreme importance -- the way we think it is really, really, important whether you become a Senator and give widely admired speeches or becomes a moderately successful rancher and have the werewithal to rear a family of 6 -- then I would guess a superintelligent AI would be the same way. It would spend all its time and energy competing with other superintelligent AIs on whatever status games end up being important to superintelligent AIs -- who can think of the most "elegant" 10-digit prime number or something -- and it would not bother with human beings unless we bother them, any more than we bother with rodents unless they bother us.

Expand full comment

Oct 27, 2022Edited

" Evolution happens on a timescale of 100,000 years, not 1000. Nothing humans have done in recorded history has affected their evolution."

This is probably false. Allele frequencies can shift meaningfully in a thousand years. The Ashkenazi Jewish IQ advantage of 10 points or whatever over European Gentiles only had ~1000 years or so to evolve during the time that christianity's dominance forced them into urban middleman occupations. 1000 years of divergent evolution within the same geographical area led to a 50% difference in median household income (https://astralcodexten.substack.com/p/contra-smith-on-jewish-selective)

Another example would be the massive decrease in violent crime in Europe over the last thousand years, a large part of which is probably due to genetic change, because executions of criminals were removing the most violent 1% or so in each generation. Because of the way recombination works, removing the bottom 1% for 50 generations has a much larger effect than removing the bottom 50% in a single generation.

Another example would be the gene for adult lactose tolerance, which spread from nonexistence to near-universality in Europe in about 10k years.

Direct observation of contemporary relationships between IQ and fertility show that IQ genetics can change by as much as 1 point per decade.

"But if I were to assume that the nature of intelligence is to zero in to a tiny area of intraspecies vanity competition and consider this of supreme importance"

the difference between being a subsistence farmer in a traditional steady-state, versus building rockets to colonize the universe, seems pretty significant, and that's within the scope of a few standard deviations of within-humans variation.

Expand full comment

Philippe Payant

> even if John Von Neumann were the upper limit to intelligence, being able to run thousands of John Von Neumann tier AIs on 2 kilograms of GPUs each

What gives you the impression this is physically possible? It very well may be, or it very well may not be, but we don’t even have a theoretical understanding of how this would be done to say whether or not it’s possible.

Expand full comment

There are arguments that a human brain is probably many orders of magnitude below the optimal way to organize atoms for computation. See Scott's biological anchors post: https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might

But even if we make the generous assumption that silicon will max out at parity with brain matter, pound for pound, that could still transform society beyond recognition.

Expand full comment

Philippe Payant

There’s an equally generous assumption the other way that silicon and brain matter are doing the same thing. Which is odd because we don’t fully understand what brain matter even does on a mechanistic level.

Expand full comment

Charles Krug

I don’t know. Three hours with Rogan and you sounds pretty interesting. Rogan tends to be a little credulous, but his conversations are never boring.

Expand full comment

Lambert

What are the answers to the questions you get asked least often at meetup Q&As?

Expand full comment

Moon Moth

Some guesses:

Kull of Valusia

paperweights

your mom

Expand full comment

Luke

> Thinking about it step by step

I see what you did there

Expand full comment

Mark

Loved the podcast-answer. Though who would not wish for Scott in a Conversation with Tyler. ;)

I find podcast usu. a waste of my time to listen to. If there is a transcript, I may browse through. - The wish to see and hear a certain writer borders on Beatlemania and relic-worship. Both understandable, esp. in the case of the author of ACX/SSC. Maybe Scott could send each major ACX meet-up a worn T-shirt or sock for us to look at, touch or smell - in short: Venite adoremus!

Expand full comment

C'est Moi

Scott has done AMAs before, perhaps a text only AMA with Tyler.

Expand full comment

Mark

Good idea! Tyler knows what to ask, at least. (Are there links to the old AMAs, please?!) - As Zvi noted in his post 21. Oct.: "Scott Alexander presumably has an invitation to do Conversations With Tyler": https://marginalrevolution.com/marginalrevolution/2022/10/a-request-about-philosophy-and-two-people.html?utm_source=rss&utm_medium=rss&utm_campaign=a-request-about-philosophy-and-two-people

I quote the Q+A: "(4) When will we get the CWT with Scott Alexander? - 4. Some individuals have been invited who still have not yet said yes. Not just Paul McCartney."

Or just the transcript of their private conversation - I am more into transcripts anyway - though I really enjoyed the Tyler-Camille Paglia as a video, too https://www.youtube.com/watch?v=VSRuncwwJyQ

- Still, why ask Kafka to sing - Knausgård to dance - or Scott to do any podcast, ever.

Expand full comment

C'est Moi

Oct 27, 2022Edited

This was one from Reddit, though there may be more: https://www.reddit.com/r/slatestarcodex/comments/8e2838/ama_request_with_scott/

He has done (at least one) sub only AMA on substack: https://astralcodexten.substack.com/p/ask-me-anything

Expand full comment

Mark

Merci! Though I found the usability less than great. Still: fav. fiction books: Prince of Nothing, Years of Rice and Salt, Silmarillion, Illuminatus, Emberverse. - fav. movie when kid: Homeward Bound: The Incredible Journey - His SAT/MCAT? 1540, 30. - And no foodie. Anyways, Tyler should have better questions. ;) And Scott should ask some, too.

Expand full comment

SurvivalBias

Re AI safety: seems like one more possible reason for optimism would be if (for whatever reason) the AI is regulated into stagnation by all relevant governments, and stays that way for X years? Not a reason to declare a total victory of course, but to update upward on our chances of survival.

Expand full comment

Martin Blank

Oct 25, 2022Edited

People REALLY underestimate the importance of payment capture. You bill a couple hundred people a year at $100/pop and even a 97% success rate on collecting money ends up being a lot to miss out on especially if it is some proejct you are running as a charity or as a favor.

"Oh why are you such a hard-ass about making sure eveyrone pays?" Umm because if I am not, instead of bleeding $600/year I am bleeding $2000/year.

Expand full comment

Yozarian22

Do you enjoy camping?

Expand full comment

Carl Tuesday

The straussian read is that you want to go on my podcast.

Expand full comment

Nate

I almost want to say Lorien proves that under the current American medical system, anybody who has anything better to do has no incentive to practice psychiatry at scale, unless they can get paid a ton, which disadvantages those who can’t pay a ton.

Expand full comment

Andrew B

This is a desperately superficial view, sorry. On AI risk, I would be astonished if anything "spooky" could be set out as an explanation of how humans think and act (This seems a better formulation than how "the brain" works; isolating the brain rather than thinking about the whole person seems a curious new form of dualism).

But equally, I find it difficult to see how this relates to debates on AI risks. The thing about AIs is that they don't seem to be like people. They do different things. Where they do the same thing (calculate 552/37 to 2dp) it's not so much that they do it in different ways as that wholly different types of thing are involved. The chemistry of the person and the chemistry of - well, robots and computers - are wholly different.

It doesn't seem to me that comparison to people helps very much either way in the AI debate. The AIs we are building and which may themselves build AIs are not notably like people - or perhaps one might say that optimising for some measure of intelligence and optimising for apparent similarity to a person seem to take you in different directions. There are other things we know of that can reasonably be said to be intelligent but not in the same way and through the same processes as people show intelligence: for example, insect colonies. An AI won't perhaps be very analogous to a bee hive, but it won't be particularly analogous to anything else that we know about.

If that's right, I think it's a nudge towards the gloomier side of the argument. We can worry about how good an AI is at replicating judgments or language use that most humans find simple, but that's not a necessary stop on the way to powerful AI. AI could, as it were, conquer territory while leaving this particular castle unchallenged.

Expand full comment

Peter Alexander

Oct 26, 2022Edited

An imagined podcast interview wouldn't be about any of these topics in isolation, but about your process of research, analysis and writing about this plethora of topics and how you learned to do this.

You seem to underestimate the uniqueness of your point of view. One does not have to be the very best at one thing to talk about it.

Since you apparently hate the podcast medium with every fiber of your body, I hope to see you discuss this in some other format at some point.

Expand full comment

magic9mushroom

How would you update if an AI escaped and caused damage but did not destroy everyone? On one hand, this proves AI is not aligned by default, but it also proves that under those conditions AI smart enough to escape is not necessarily smart enough to win, and that misalignment is sometimes detectable without X-risk. (It also causally provides a kick up the political backside for AI governance that, in the worlds where the first sign of problems is DOOM, never exists.)

Expand full comment

Nancy Lebovitz

Have a somewhat giddy reason for not being completely worried about AI.

Suppose that substantial human augmentation is possible, probably by more direct mental contact with computers. Maybe add in some biological method of intelligence augmentation. Suppose also that a computer greatly increasing it's capacity is harder than it sounds. It's certainly true that humans trying to revise themselves (see various cults) can go badly wrong, and we've at least got a billion years of evolution building resilience and people do sometimes recover from bad cults.

Suppose that these two premises aren't contradictory-- it's easier to greatly increase human intelligence than machine intelligence.

There are a lot of augmented people. They aren't all aligned with each other. They aren't all aligned with the interests of the human race. However, they create an ecosphere that is definitely hostile to being taken over.

Expand full comment

Morgan

This seems to be Elon Musk's view, and his reason for promoting brain-computer interfaces (Neuralink) as the best solution to AI risk.

Expand full comment

Augmented humans might be ever so slightly less likely to go wrong than AIs but the scenarios where they do go wrong tend to be much more nightmarish, at least to me. The obvious and huge obstacle to a successful outcome is economic pressure for outcomes no individual agent desires (Moloch). This isn't as severe for electronically-augmented biological humans as for emulated humans (there's essentially no chance emulated humans would turn out well), but it's still a huge concern that people would (somewhat convergently, because that is how tech works) edit out things that are fundamental to humanity in order to become more effective economic agents, and then this would compound as more of the economy was controlled by such agents (if this leads to less empathy for unmodified humans).

Expand full comment

Julian Schulz

"The literature calls this “the treacherous turn” or “the sharp left turn”."

The "treacherous turn" and the "sharp left turn" are two distinct dangers and not two words for the same thing.

Treacherous turn: The AI becomes unaligned and aware of its unalignment while still being weaker than humans. It pretends to be aligned for a while longer until it is stronger. Then it acts out it's actual objective and turns us into paperclips

sharp left turn: The AI seems aligned to human values (also to itself), until it suddenly hits capabilities well. It suddenly gains a lot of generalization power on its capabilities, but its values are still a crude approximation of what counted as human values in its training environment. After suddenly becoming very capable (possibly stronger than humans), it proceeds to enact its crude approximation of human values and turns us all into lizards basking in the sun.

The difference is very important, since solutions that work against the treacherous turn (for example, constantly asking the AI in parseltongue, if it wants to betray us), might not work against the sharp left turn.

Expand full comment

Spruce