289 Comments

> How is your Lorien Psychiatry business going?

Now that you've partially bought into the capitalistic model of writing, you can bring that to your psychiatry business too. Slowly open it up to everyone who wants to join, raise subscription fees as needed (on new patients), hire additional psychiatrists, hire additional staff, etc. I'll admit this also sounds like a lot of work but maybe you can find a cofounder or the like to help.

Expand full comment
founding

Yes - my thoughts exactly. "Time to hire people who think like you and don't have a blog"

Expand full comment

That will increase overhead, at least for a while until benefits of specialization and economies of scale start to kick in, but is a good idea.

Expand full comment
founding

I think the point is mostly to prove a low-cost medical model, than to make the most money out of it. For the latter, it'd be trivial to do some publicity here and up the price. But as long as the money is not an issue, the experiment is worthy enough in itself - though from this post, I wouldn't call it a success yet. He's doing a bit of sweeping the paperwork under the rug - but unfortunately, paperwork is a solid cause of cost disease. I'm almost on the other side of the globe, and I'm watching my GP aunt do about 2/3 paperwork to 1/3 medicine.

Anyways, yes, optimizing this is a very worthy attempt. But I'd guess (without knowing enough context), that anyone attempting it should pay more attention to profit. It's often said that a startup needs to offer a 10x improvement to be successful - if whatever model comes out of Lorien can't offer at least significantly more money than the standard, I don't think it has a good chance of catching on fast enough to make a difference.

Expand full comment

Is Lorien entirely based on statically monthly fees? I assumed that psychiatry would bill hourly or by session, is that not industry practice?

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

Lorien operates on experimental "pay what you can" model with a flat monthly rate https://lorienpsych.com/schedule-payment/

Scott described the basic idea in https://slatestarcodex.com/2018/06/20/cost-disease-in-medicine-the-practical-perspective/

Expand full comment

I think monthly is becoming more popular. I pay $75/mo for medication management with a nurse practitioner. Its all online appointments. We have an appointment once a month so i guess this could be viewed as per appointment as well.

Expand full comment

I love the self awareness around podcasting.

People should definitely stick to the best medium and format for communicating their ideas.

Expand full comment

> Partly because patients keep missing appointments and I don’t have the heart to charge them no-show fees.

I have no experience in medicine. I have some scattered experience in what is technically independent business but never succeeded at it particularly and so maybe listening to me about that is a bad idea. I'm not sure what your actual policy is so I could easily be misinterpreting this. But the wording of this combined with my read on your personality overall makes me think you could use the push, so:

This feels like a classic trap. Letting people take up unlimited designated slots without paying you is a way to make sure at least some of them will allocate them without really caring. Your time is scarce, and other patients need it too. This is modulated somewhat by how much ability you have to *move* other work into those slots on short notice, of course—not all types of businesses have the same economics here—but medicine seems likely to be among the types that's more harmed by this.

Any partial but meaningful barrier is way better for aligning incentives than zero. Halve the fee for a no-show and be willing to waive one every six months, or whatever. But don't just let bookings turn to inefficient “first come, maybe serve” hell for no reason! Your market sense is better than that; I know it from your posts.

Disregard any or all of this if you have more reliable feedback or more specific analysis that contradicts it.

Expand full comment

>"first come, maybe serve”

Why would that be? Unless he'd be overbooking hours there's no reason for this to happen.

Expand full comment

> Any partial but meaningful barrier is way better for aligning incentives than zero.

I've heard stories where this backfired. Like, a day care gets annoyed at parents picking up their kids late, so they institute a late fee, and suddenly a lot MORE parents start being late, because now they're viewing it as a service they can buy, rather than an obligation that they're failing. (And if you remove the late fee, things don't go back to the way they were originally.)

Now, you could always choose to set a fee that's high enough that you genuinely don't mind when people choose to incur it, and then this outcome isn't a problem. Though in some cases, that means charging a lot of money to people who genuinely tried to stick to the plan but legitimately had extenuating circumstances.

But if you're thinking "I don't want to make this a service people can buy, I just want people to be slightly more reluctant to flake than they already are" then adding a small fee is not _necessarily_ going to move things in the correct direction.

Expand full comment

This is a good point. The separate possibility of pushing on “and *did* you have extenuating circumstances?” has a different backfire curve, too.

Expand full comment

Lorien is geared toward people with minimal, if any, income, earned and unearned. It is true that instituting a protocol for something bad generates something approximating a permission structure, but the clientele is still generally unable to afford standard psychiatry fees. That's why they are there in the first place -- dual needs, both being satisfied by Scott.

Which is a long way around the barn to say I don't believe that applies across the patient population in this specific instance.

Expand full comment

FWIW, I miss appointments with my psychiatrist semi-regularly. This is not because I don't care, but because I have some sort of mental problem (but don't worry, I'm seeing a psychiatrist for it).

It means a lot to me that my psychiatrist doesn't charge me for this or get all strict and angry about it. You might think "if he did, you'd stop missing appointments", but the previous psych I went to did do that... and now I just don't go to him. I was literally unable to stop even when desperately trying; and now he's lost a lifetime patient who bought him nice thoughtful fancy gifts on Christmas and his birthday, and I was upset and anxious and running out of important medication for a bit.

I mean, not to say it wasn't *fair.* It was fair to be upset about me missing an appointment. "Well that slot could go to someone who doesn't miss appointments!" is probably true in most cases. Just sayin': Scott's way is probably greatly appreciated by at least a few of the disorganized but well-meaning patients.

Expand full comment

An interesting thing about that specifically is that I'm personally familiar with something on the same side but with the opposite effect! I have a longstanding brain malfunction that makes it almost impossible to stay on top of things *unless* there's a clearer up-front incentive in the way (as opposed to wishy-washy or timey-wimey whatever). Oddly this seems to apply differently to different things, but an appointment specifically is something I can keep, and it's notably easier if there's something shoring up the “and don't just drop it on the floor”.

So *now* I wonder whether “vary things based on examination of mental responses” is practical. My intuition is no, but not with any confidence.

Expand full comment

May I ask a separate question about this? Would you find it easier if you could have all the same interactions without having to show up at a specific time, but instead by the equivalent of email or voicemail (or something differently convenient but with the same buffering properties)? I know not everything can be done that way in actuality, but I'm curious now.

Expand full comment

How would you feel about it if your psychiatrist charged a missed appointment fee but didn't get strict and angry about it? Making an appointment and not showing up does take the psychiatrist's time, so charging you for doing so doesn't seem all that different than charging you for appointments you don't miss.

Expand full comment

Sounds like part of your therapy is being allowed to miss appointments without incurring certain social or fiscal costs.

Expand full comment

One method I recently learned about: collect preemptive no-show fees from everyone every x months. After that period, if you missed an appointment, you don't get your fee back, but if you didn't miss an appointment, you get your fee back plus an equal portion of the fee that the no-shows gave up. So if you have 5 people who missed some and 10 who missed none, each of the 10 who missed none gets 150% of the fee back that they paid. People tend to overestimate their own adherence and underestimate that of others, so they are usually willing to agree to this.

Expand full comment

Should we have a Straussian interpretation of your answer on whether you write with a Straussian interpretation in mind?

Expand full comment

Only if you are Vizzini

Expand full comment

To be fair, Substack comments are *exactly* where you would run into Vizzini on the internet.

Expand full comment
author

I'm not sure what answer I could give here that would convey useful information instead of just pushing the question down another level of recursion.

Expand full comment

FWIW, I believe you... :)

You're semi-famous, you don't want to be de-platformed/harassed IRL so you keep your edgier opinions to yourself but tell the truth otherwise.

In French we say honesty is not saying everything you think, it is thinking everything you say. A common saying that I found pretty useful, tbh.

Expand full comment

Note that his response to the Straussian interpretation of the Ivermectin post was not a denial!

Expand full comment

> what if this is the only book I ever write and I lose the opportunity to say I have a real published book because I was too lazy?

For what it’s worth, when I published Land Is A Big Deal, I picked the laziest option possible, going one step up from self publishing, by going with a small press operated by a friend. Approximately nobody noticed or cared it was from a small press, and now some big press folks are asking me about publishing the second edition with them.

I could have tried to go for a "big" publisher right from the get go, but I figured that would involve too much work and editing burden and rigamarole, which would ultimately result in me just not doing it in the first place. I did have some edits that came from going with the small press instead of literally just YOLO'ing it with self publishing, but it wasn't too bad all in, and was mostly just "make this sound like a book and not a series of online blog posts," which presumably wouldn't really apply to UNSONG.

One protip I would offer, however -- don't commit to recording, mastering, and editing your own audiboook. That was easily 10X the work of everything else, ugh.

Expand full comment

Speaking as a publisher: why don’t you ascertain how much editing the editor wants to do?

Expand full comment

Great comment. I often find myself taking myself into how hard/bad/dumb/tedious something is going to be without actually checking and then later realize I did this and kicking myself.

Expand full comment

Tedious is the perfect word to use for it.

Expand full comment

> If you ask me about one that I have written a blog post on, I’ll just repeat what I said in the blog post.

As podcast lover, this is exactly what I expect when I listen to podcasts. For many topics I just don't have the concentration to focus on reading a long deep dive of a blog post, but I would be happy to listen to people talk about the topic in the background while I do some light work.

No one expects the guests to come up with new insights during the interview, they just need to broadcast their usual points to a new audience.

Expand full comment

There exists already an audio version of ACX: https://open.spotify.com/show/5FEwz047DHuxiJnhq3Qjkg.

Expand full comment

Re: Unsong, as someone who thinks it sounds interesting but also prefers to read old-fashioned paper books while sitting in a big comfy chair, I'll just say that I'd happily buy a copy on day 1, regardless of how you choose to publish it.

Expand full comment

I would like to second this.

Expand full comment

As someone who also prefers old fashioned paper books- I self-published UNSONG and bought a single review copy for my collection. Fans put together kits for doing this: https://github.com/t3db0t/unsong-printable-book

Expand full comment

> My post My Immortal As Alchemical Allegory was intended as a satire to discredit overwrought symbolic analyses, not as an overwrought symbolic analysis itself.

Absolutely devastated. Also, weirdly enough I was just looking up Issyk Kul a few days ago. This is not a coincidence, because nothing is ever a coincidence.

Expand full comment

Look, I mean, I've never read My Immortal, I'd never heard of it until Scott's post, but I have to agree with 2020-era Scott here:

> I maintain that if you are writing a fanfiction of a book about the Philosopher’s Stone, and you use the pen name “Rose Christo”, and you reference a “chemical romance” in the third sentence, you know exactly what you are doing. You are not even being subtle.

This is pretty compelling; there's obviously at least _some_ alchemical allegory going on in My Immortal. By claiming that his own obviously-accurate Straussian reading should not be read as a Straussian reading in a paragraph where he tell us not to look for Straussian readings of his posts, he is clearly telling us to look for Straussian readings of his posts.

What does it all mean?

Expand full comment

Just chiming in to say I found that paragraph compelling too.

Expand full comment

I've been to Issyk-Kul, would recommend. Absolutely stunning scenery. Kyrgyzstan in general is great.

Expand full comment

The world where My Immortal is alchemical allegory is strictly better than the one where it's just badly written. Resist!

Expand full comment

I see it as the primary example of literary analysis as art form! That analysis *clearly* didn't get it's value from My Immortal, so it must be additive. That's conclusive proof right there

Expand full comment

> If we learned that the brain used spooky quantum computation a la Penrose-Hameroff, that might reassure me; current AIs don’t do this at all, and I expect it would take decades of research to implement

Do I have some news for you: "Scientists from Trinity believe our brains could use quantum computation after adapting an idea developed to prove the existence of quantum gravity to explore the human brain and its workings." https://www.tcd.ie/news_events/articles/our-brains-use-quantum-computation/

Expand full comment
author

Yes, someone is always saying that, it's right up there with "new discovery may overturn standard model" and "cure for cancer found". I will believe it when it becomes an accepted part of scientific knowledge that lasts longer than one news cycle.

Expand full comment

Totally fair, and I am the farthest thing from an expert in the space. I understood this to be remarkable because they used a known technique where you can determine if a system is quantum based on if it mediates two known quantum systems. So they tried it with brain water as the unknown system, it entangled, and boom you've got some brain-quantum effect that at least requires some explanation (or the entire technique is bunk).

Expand full comment

Given what we know about quantum systems and decoherence, this model is very unlikely to stand up to scrutiny by other groups. It is not as low probability as erstwhile superluminal neutrinos some decade and a half ago, or as cold fusion a long time ago, but it sure is up there. I have trouble coming up with a recent (last 50-ish years) discovery that challenged our understanding of fundamental physics in this way. On the other hand, if confirmed, the payout would be enormous for scaling up quantum computers.

My current odds of this effect persisting and being due to quantum entanglement in the brain are less than 1%.

Expand full comment

Well, I don't know about their particular idea, but the idea that the brain uses "quantum physics" to improve it's functioning isn't that unreasonable. Photosynthesis uses it to get greater efficiency than classical physics would allow. But this doesn't say it effects the processing of thoughts, other than perhaps making some energetic pathway more efficient. For more than that I'd require a significant proof, rather than just some (I'm guessing) "correlation of spin states".

If it's both true and significant, it will show up somewhere else.

Expand full comment

I have no idea if this is anything more than "bored physicists messing about with their expensive toys", but I do love the idea of "So what are you working on?" "Brain water" as an exchange 😁

Expand full comment

I can promise there is nothing special about the brain as a substrate like "spooky quantum computation" that makes it uniquely well-suited for computation. Basically everything non-trivial in physics has the latent "potential", in a sense, for incredible computational capacity; the hard part is organizing that potential into doing something that is relevant to us.

This is why it is common for people (as a joke project) to make computers in computer games; see https://en.wikipedia.org/wiki/Turing_completeness#Unintentional_Turing_completeness . Once you have memory, loops, and if statements, you almost inevitably get full computational ability. I recall Dwarf Fortress, for example, has at least 4 different game mechanics individually capable of computation (fluids, pathing, mine carts, and gears). I built a computer in Factorio based on splitters. (It took like 4 minutes to add two single digit numbers.)

----

I have a very different response to AI fears: I believe we are not close enough to building AIs for the speculative theoretical work on AI alignment being done now to be relevant. I'm glad people are thinking about the problem but I rather expect that by the time we are anywhere close to AIs we'll have thrown out all but the most obvious safety stuff being done today. (Work on convincing citizens and governments to care about AI safety is still valuable, of course.)

Imagine people from the 1800s trying to prevent nuclear war: they can figure out things like "let's not nuke each other" and encourage a doctrine of de-escalation, but all the details about anti-proliferation and how to survive an attack have to wait for the technology to mature.

Expand full comment

"I have a very different response to AI fears: I believe we are not close enough to building AIs for the speculative theoretical work on AI alignment being done now to be relevant. I'm glad people are thinking about the problem but I rather expect that by the time we are anywhere close to AIs we'll have thrown out all but the most obvious safety stuff being done today. (Work on convincing citizens and governments to care about AI safety is still valuable, of course.)"

That. I can't remember where someone challenged me on "when would be the right time to start working on AI risk" and I was - IDK but it all seems so immature at this point...

Experts can't even seem to agree on which metaphors are going to be the ones to be useful to describe their AIs functioning and the associated risks.

Expand full comment

Curious what your timelines are—roughly when do you expect AI to arrive (assuming that it does eventually)?

Expand full comment

That's a question for me?

Honest answer - no clear idea, I'm too far removed. But, say, anywhere between 10 and 50 years for AIs that start to look like AGI if not AGI.

I basically expect an explosion of performance/rapid progress "at some point".

But experts don't even agree on that. I remember a conversation where a lot of the disagreement could be boiled down to "AI progress will be slow enough for us to react to it" and "AI progress will go from nearly 0 to 100 in a couple of days literally, once we get the right type of AI".

Expand full comment

I’m honestly a bit surprised by your response! If your timelines were like 100+ years in the future I understand not being too concerned, but < 50 years is (hopefully) well within my lifetime! Sure, it probably won’t be overnight, but in the scale of things, that seems very very short.

Expand full comment

Well, for no particularly good reason I've projected 2035 for the first "almost human level" AGI for over a decade. The real question is what happens after that. On that I give about 50% odds of humanity surviving. Which is better odds than I give for humanity surviving a couple of centuries without a superhuman AGI. The weapons already in existence are too powerful, and not all those with access to them are sane.

Expand full comment

Very hard to say. I think we may be as far, technologically, from AI as scientists were from nuclear weapons in the 1850s. But how long will that take -- 30 years? 50? I rather doubt less than 30.

The trouble is I think the window between understanding what AI looks like, and AI arriving, could be quite short. For nuclear weapons that was arguably around 10 years; but AI could proliferate much easier than nuclear weapons. Will there even be 10 years between the first glimpse of real AI and when it is everywhere?

Expand full comment

Terence Tao use Turing complete Euler-like fluid flows to try to prove blow-up for Navier-Stokes equations.

Expand full comment

A beautiful (and insane) idea, but did anything came out of it, was there ever any progress made towards making that work, or is that still just an idea?

Expand full comment

Oh there is definitely progress, the idea was used to show finite time blow up for some kind of Euler-like flows. It's not done for the real Euler equations though and then you'd need to prove that blow up for Euler implies blow up for Navier-Stokes (which it should according to Tao because blow up for Euler means the nonlinear terms in Navier-Stokes should dominate the diffusion term - but as far as I'm aware there is no formal proof yet).

Tao really believes in the plan and I have a general principle of believing whatever Tao believes (in maths).

Expand full comment

I once asked Eliezer a similar question - and he said that AI safety would still be the most important thing for him to work on even if he were in Ancient Greece and he had to leave his work in a cave somewhere for future mathematicians to understand. :/

Expand full comment

Even by the usual standards of quantum woo that was remarkably free of genuine science. Maybe it was a sly Irish joke?

Also...university PR releases are the worst form of fluff and vapor. Never believe a word.

Expand full comment

Again, not an expert here so I guess the PR fluff is directed at me. They link the paper, though, which seems unapproachably technical. Maybe that's a better judge for the scientific minded if it's newsworthy? https://iopscience.iop.org/article/10.1088/2399-6528/ac94be

Expand full comment

It's not unapproachable. But it doesn't seem worth the effort to parse completely, I don't trust their empirical skepticism, which seems to be absent. They start off doing stuff like simply asserting it's proven that quantum effects matter to brain function because, for example, a bunch of anesthesiologists discovered that spin 1/2 Xe-129 might[1] act very slightly differently as an anesthetic than spin 0 Xe-132, and since spin is a quantum phenomenon, there you have it[2]! This is a step up from Deepak Chopra, but not a big one. So far as I can tell, the rest of the paper is like that, too. That doesn't mean they're wrong, of course, since even fools can be right by accident. But it's so unlikely[3] and I would need a report from someone with a ton more empirical skepticism on board for it to be worth the time to dig into exactly what they did and how.

--------------------------

[1] I mean, the measurements quoted are small and reasonably within each other's error bars, for example. Not what we call a five sigma conclusion.

[2] Of course, a nucleus with a spin also has a magnetic dipole moment, which affects energy states of the electrons (cf. the hyperfine structure), so there is already a purely classical distinction in the atomic physics.

[3] For the same reason I don't believe in quantum computing: the problem of decoherence strikes me as fundamentally insoluble. The larger the system of coupled quantum degrees of freedom, the more exquisitely sensitive it is to interference from the external world that destroys the coherence between the degrees of freedom. You can readily calculate that, for example, the sudden ionization of a dust speck on the Moon is more than enough to decohere a superposition of particle-in-a-box states when the box is of some laboratory size. It becomes just absurd to imagine it possible to sufficiently isolate your complex quantum state long enough for it to do any nontrivial calculation.

Expand full comment

Thanks, I appreciate you sharing those insights.

Expand full comment

You could always do a podcast as Proof of Concept of "Why I should not do podcasts so please stop asking me".

I would find the "failure" interesting in a meta way.

Expand full comment

Not doing podcasts is totally fine, but you're really overestimating the level of quality listeners expect from a podcast. For example, I genuinely enjoy listening to Joe Rogan talk for hours.

Expand full comment

The only time I saw an episode of Joe Rogan was when Nick Bostrom was there. Don't watch that if you value your sanity.

Expand full comment

> I would also be reassured if AIs too stupid to deceive us seemed to converge on good well-aligned solutions remarkably easily

We have "a lot" of evidence for the contrary.

I think there was a Google doc from some Less Wrong community member with like 50 anecdotes of AI become misaligned on toy problems. For example, ask the AI to learn how to play a given video game very well, and the AI instead find bugs in the game that allow it to cheat.

I can't find that google doc anymore (maybe someone else can resurface it), but this PDF lists some of the examples I recognize from that google doc. https://arxiv.org/pdf/1803.03453v1.pdf

Expand full comment

>ask the AI to learn how to play a given video game very well, and the AI instead find bugs in the game that allow it to cheat.

It's exactly the same with humans, called "speedrunning" when we do it. Clearly delineating what is a bug and what is "intended" is a surprisingly difficult problem, even when the thing you need to "align" with is human-designed entertainment.

Expand full comment

Yeah, I'm not sure it's fair to say the AI 'cheated'. It worked within the rules of the system to optimize its play. As you point out, this is exactly what humans do. If you don't want the computer to cheat, you have to teach it what it means to 'cheat'. If, after that, it breaks the rules anyway and tries to hide it, then we can start talking about AI intentionally trying to deceive. That's not what this is describing.

Expand full comment
Comment deleted
Expand full comment

> It is given a directive of obtaining a high score or finishing quickly. It finds the best way of doing so. There is no "intent" to deceive or cheat - treating it as if it were some scheming human player is to see a human element where the simpler explanation is that it just did what it was told absent more complete instructions ("obtain a high score while obeying these rules").

Yes, everything you say is completely correct. AND YET it did not do what the researchers intended. This is my core argument about why the alignment problem is difficult.

I apologize for not making the core argument clearer.

Expand full comment

(So let's put aside games designed for speedrunning for now)

I think it's perfectly coherent to say that the behavior of speedrunners is "misaligned" with what the game designers intended. Especially once you allow for ACE. Given that the human's behavior is misaligned, and given that the AI is mimicking that human's behavior, then the AI's behavior is also misaligned.

> Yeah, I'm not sure it's fair to say the AI 'cheated'. It worked within the rules of the system to optimize its play. As you point out, this is exactly what humans do. If you don't want the computer to cheat, you have to teach it what it means to 'cheat'.

What you're describing is literally the alignment problem. The AI worked within the rules you specified, not the rules you intended. The difficulty is trying to communicate the rules you *intend* to the AI, not the actual rules as written by the code.

> If, after that, it breaks the rules anyway and tries to hide it, then we can start talking about AI intentionally trying to deceive. That's not what this is describing.

We're not talking about AIs intentionally trying to deceive at all in this thread. The passage I quoted from Scott was "I would also be reassured if AIs too stupid to deceive us seemed to converge on good well-aligned solutions remarkably easily". We're talking about AIs too stupid to deceive, which are misaligned. Deception is completely off the table for the topic at hand here.

Expand full comment

If we're using the term 'cheat', then deception is entirely the point. If you don't want to introduce this idea, don't use the word cheat. You're describing the problem of encoding intention into instruction, which isn't the same as an AI that is assigned a task 'cheating' in the way we would normally define it.

For us to say the AI 'cheated', the AI would have to be given a list of instructions, then that AI would have to not follow the instructions in its attempt to complete the task. If it's going to be rewarded for that behavior, it will likely have to attempt to deceive the humans into believing it did follow the instructions.

I see this a lot in discussions about AI alignment, where loaded terms are used to describe a behavior. Those terms introduce intentions and concepts into the conversation (such as deception) that are otherwise explicitly excluded.

My point is that the loaded term 'cheat' is inaccurate and introduces confusion. This is exactly analogous to the speedrunner conversation, where people can observe every frame and know whether you used a glitch or not. Indeed, they will likely define glitches as either "major" or "minor" and then define records for speedruns with or without major and/or minor glitches. One of those aligns with game designer intentions and one doesn't, but nobody is deceived about whether the speedrunners are being awarded with record times based on the intents of the game designers.

Except it's not the game designers who are awarding the speedrun times. It's the community that defines the rules of what constitutes a legitimate speedrun for each category. To 'cheat' at a speedrun you would have to break the established rules of the community. For example, by claiming you did a speedrun normally when it was actually a tool-assisted speedrun. Do you have any examples of that kind of behavior? Otherwise, maybe don't use the loaded term 'cheat'.

Expand full comment

The entire point is that it's nontrivial even for "toy" worlds and "toy" AIs to communicate any expectations like whether and which glitches are okay to use. They certainly don't infer the intent of the game designers or their own programmers and act on that. If "alignment" in this sense is hard in a toy example, we should generally expect it to be hard in the real world too.

Expand full comment

Yes, I understand this. I'm just objecting to the motte-and-bailey argument surrounding use of the term 'cheat'. The original claim is that given instructions to play a game, "the AI instead finds bugs that allow it to cheat." This claim was made by Nebu Pookins at the top of the thread, and it is this statement that I'm objecting to.

Challenged that 'cheat' implied intent and deception, he began retreating to the motte that you could cheat without deception. Challenged that cheating requires you to disobey specific rules, he retreated to the claim that we're really only talking about how difficult it is to define rules that encode human intention into machine actions.

It's fine if that's the point you want to make. It's a well-established problem that's not new, and predates ML finding glitches in video games. If we can agree that use of the loaded term 'cheat' is not applicable - and indeed buries the underlying point that no intent is required for an AI's implementation to differ drastically from intended results - then there's no conflict here. We can simply update that the statement about cheating was a herring that disguised the original point Nebu Pookin was trying to make, instead of illuminating it.

If instead we have to keep using intent-loaded terms to refer to machine learning models that have no demonstrable intent-related actions, I'll keep objecting that we're over-defining what ML is doing. This kind of language gives discussions about alignment an anthropomorphic slant that is neither warranted nor helpful. It makes observers think AI has been observed doing things we have no evidence to suggest it can do, and gets in the way of honest discussion of what the real problems are.

Expand full comment

Yep, this is it! Thank you!

Expand full comment

In particular, vis a vis the comparison with speedrunning elsewhere in this thread, note that "Dying to Teleport" from that doc is a perfectly standard speedrunning strat called "Deathwarp".

Expand full comment

I've been meaning to ask, can you enable the "listen" text-to-speech feature on substack? Maybe that will satisfy people's podcast desire. You may have to email them, or it may be an option in the publishing view

Expand full comment

Could be something for subscribers only, as well, assuming stubsack supports it.

Expand full comment

you don't need computer TTS; there's a podcast read by a human: https://sscpodcast.libsyn.com/

Expand full comment

“But what if the podcast interview is presented as rounds of perfect and imperfect information games to give the audience insight on your thought process?”

Given no humans are natural public speakers, it’s quite odd how many folks expect writers to be capable and willing to instantly transform into public speakers. There’s a whole second skill set in addition to the speech-writing/ab-libbing part.

Expand full comment

"Given no humans are natural public speakers..."

Citation needed! I'm reminded of the Big Bang Theory exchange (Sheldon: "I can't give a speech." Howard: "No, you're mistaken. You give speeches all the time. What you can't do is shut up.")

I feel I'm in this category: I could talk in public to any audience on any subject, and I always have been able to. My childhood family and my current family are all the same - it's hell! (Conversely, I find writing things down really hard.)

Of course, that doesn't mean my speeches are useful or convincing or that I'm a perfect speaker. And it doesn't mean that I don't get nervous when my speaking has stakes that are important. But it's nonetheless the most natural thing I do, and given we were speaking people before we were writing people that doesn't seem that surprising. I'm more surprised when I come across people who are natural writers - but they obviously exist!

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

What an excellent time for a citation request! In hindsight, I don't actually remember reading this at any point and was extrapolating from my own combination of experiences (it took a lot of training for me to learn even meager public speaking, & one of my favorite books is about public speaking) with my general knowledge of speech developmental milestones (which are guaranteed to be missed without enough external stimuli to teach the child the skill and hence not inherent), but speech/sign language is a prerequisite to public speaking not an equivalent.

My own anecdotal experience is probably not the best reference for humans in general either considering I was late at learning speech but then taught myself how to write only a couple months after that, according to my rents. A quick search of scholarly articles better backs this as it shows a lot of both historical and recent discussion of pedagogy of public speech, including data collected for various pedagogical strategies and their effectiveness. We could probably combine the base case of "zero stimulus==zero speech skills==zero public speaking skills" with the data showing a student at x arbitrary level of public speaking skills given training increases their public speaking skills to make an inductive conclusion that shows humans are not natural public speakers.

...Probably don't quote me on that though; I just glanced at a few research paper conclusions to see that said data exists, didn't read them through. ie. https://www.researchgate.net/publication/307853972_Enhancing_Public_Speaking_Skills_-_An_Evaluation_of_the_Presentation_Trainer_in_the_Wild

(Additionally. the BBT quote has inflicted psychic damage.)

Expand full comment

What’s one of your favorite books, that happens to be about public speaking?

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

Heinrich’s “Thank You for Arguing” since it’s a very easy read that gives immediate insight on media literacy (albeit using some old theory so its probably outdated. It’s also beginner material, so nothing particularly wild.)

Expand full comment

Thanks!

Expand full comment

Silly Scott. The reason people want you on a podcast is they just want to hear you talking. To hear what your voice sounds like, how you are as an extemporaneous person, etc. It's not actually to get exclusive insights.

Also, I think the thing about being cancelled for going on a podcast is a bit overblown. Most podcast-goers are fine and guilt-by-association is rare. You can also even get the questions mailed to you beforehand for the more scripted ones.

Expand full comment

There are a couple of recordings of him doing Unsong readings on Youtube. His voice is exactly what you would expect it to be like.

Expand full comment

Prediction: Scott Alexander on Sean Carroll's Mindscape podcast, talking about his newly published best selling fiction UNSONG as an allegory for everything else. Some day.

Expand full comment

None of the arguments about not doing a podcast make sense. Of course there’s a better subject matter expert for each thing! But if that was all the substance then I would not bother reading you, would I?

Better to just say you are particularly shy and don’t want to be recorded in that way.

Expand full comment

> I would be most reassured if something like ELK worked very well and let us “mind-read” AIs directly.

This probably most of all. All the other solutions attempt to understand indirectly and subjectively whether an AI is aligned. To date, we have a ton of much simpler ML models whose outputs we only have a theoretical understanding of how they happened. If we could understand deep learning layer-by-layer, and understand an AI's capabilities and intentions before we turned it on, that would be a completely different world. It's also a really hard problem in its own right.

Expand full comment

> A final way for me to be wrong would be for AI to be near, alignment to be hard, but unaligned AIs just can’t cause too much damage, and definitely can’t destroy the world. I have trouble thinking of this as a free parameter in my model - it’s obviously true when AIs are very dumb, and obviously false five hundred years later...

It's really weird for me to read statements like these, because your value for "obvious" is (apparently) totally different from mine. It is obvious to me that we have extremely smart and powerful entities here on Earth, today, right now. Some of them are giant corporations; some of them are power-crazy dictators; some of them are just really smart humans. But their power to destroy the world appears to be scaling significantly less than linearly. Jeff Bezos wields the power of a small nation, but he can't do much with it besides slowly steer it around, financially speaking. Elon Musk wants to go to Mars, but realistically he probably never will. Putin is annexing pieces of Ukraine left and right, but he's a local threat at best -- he could only become an existential threat if all of us cooperate with his world-destroying plans (which we could do, admittedly).

And adding more intelligence to the mix doesn't seem to help. If you gave Elon Musk twice as many computers as he's got now, he still would get no closer to Mars; if you gave Putin twice as many brilliant advisors (or, well, *any* brilliant advisors), he still wouldn't gain the power to annex the entire world (not unless everyone in the world willingly submitted to him). China is arguably on that trajectory, but they have to work very hard, and very slowly.

It's obvious to me that the problem here is not one of raw intelligence (whatever that even means), but of hard physical constraints that are preventing powerful entities (such as dictators or corporations) from self-improving exponentially; and from wielding quasi-magical powers. Being 100x smarter or 1,000x or 1000,000x still won't help you travel faster than the speed of light, or create self-replicating nanotech gray goo, or build a billion new computers overnight; because such things are very unlikely to be possible. It doesn't matter if you're good or evil, the Universe doesn't care.

Expand full comment

Being 100x smarter or 1,000x or 1000,000x still won't help you travel faster than the speed of light, or create self-replicating nanotech gray goo, or build a billion new computers overnight; because such things are very unlikely to be possible.

Unlikely but not impossible impossible? And surely being 100,000x smarter would be relevant to finding out?

We don't even need to go that far. Hypersonic nuclear missiles would be a game changer by making all defensive missile system irrelevant. Surely being 100,000x smarter would really help design a functional real hypersonic missile?

And NB, if you don't like hypersonic missiles as an example, pick something else. The point is, advancing scientific knowledge by leaps and bounds over the present day limit would be real power and easily dangerous.

Expand full comment

> Unlikely but not impossible impossible? And surely being 100,000x smarter would be relevant to finding out?

That is... unlikely. For example, consider the speed of light. I said that it is "unlikely" that we could ever travel faster than that, because there's always that small chance that our entire understanding of physics is completely wrong at a very fundamental level. It's possible. But, thus far, the smarter we humans get and the more we learn, the more it looks like we're actually correct about the speed of light. Being 100,000x smarter would probably help us confirm this limit in all kinds of interesting ways... but not break it.

> Surely being 100,000x smarter would really help design a functional real hypersonic missile?

I don't know, would it ? Are such missiles allowed by the laws of physics (not just super-fast missiles, but super-fast super-maneuverable ones that can carry heavy payloads) ? I don't know much about aviation, so I don't know the answer to that. Still, some people say that Putin already has them (well, Putin says that), so it's possible that they're already a known quantity.

> The point is, advancing scientific knowledge by leaps and bounds over the present day limit would be real power and easily dangerous.

Not really, because (as I'd said above), certain impossible magical powers remain impossible as we learn more scientific knowledge. We used to merely suspect that FTL travel is impossible; now we are nearly certain. The same applies to infinite energy, or inertialess drives, or most other science-ficitonal devices, really.

Expand full comment

So there's nothing left to be improved? Sorry, not buying it. And I don't think you mean it either.

Hypersonic missiles, if I understand correctly, don't have to be manoeuvrable. Being so fast is enough to avoid missile defense.

Look at a dogfight between a 5th gen fighter and a 4th gen fighter - it's embarrassing given how much more manoeuvrable the 5th gen fighters are.

Etc etc. I simply cannot agree that intelligence is not causation for technological advancement and the idea that technological advancement does not translate in hard power is refuted by the entirety of our human experience.

That's true even if our fundamental understanding of physics is correct and FTL/time travel/whatever is impossible.

Expand full comment

> So there's nothing left to be improved? Sorry, not buying it. And I don't think you mean it either.

Right, obviously not; there's quite a lot of technological development still left to go. We haven't even landed a man on Mars yet. However, there are more options between "no improvement ever" and "instant omnipotent powers". Saying "we will never get FTL" is not the same as saying "we will never land on Mars"; and saying "we will probably never get nanotech" is not the same as saying "we'll never make a better microchip".

But if we limit our predictions of technological advances to something that is physically possible, then most of the AI-risk alarmism simply goes away -- though, to be fair, it's immediately replaced by good old-fashioned human-risk alarmism. Giant megacorporations and crazy dictators are a problem *now*, today; and if they had powerful AI engines to play with, they would be much more of a problem in the future; especially if those engines were as buggy as every other piece of software. It is a problem that we should start solving sooner rather than later, I admit; but it's not an extraordinary world-ending existential threat of unprecedented proportions. It's just a plain old-fashioned threat, on par with global warming, global thermonuclear war, and economic collapse. There, do you feel safer now ? :-/

Expand full comment

This series of comments is more or less what I think but much better articulated, so thanks for that!

Expand full comment

I would generally agree with you here. People think superintelligence is some kind of supersolvent that can vanish all the restraints in the real world that prevent Thing We Can Imagine X (everything from warp drives to mind control rays) from existing.

But it really doesn't work that way, in science and technology. What intelligence *mostly* does is allow you to foresee stuff that won't work faster, so you can eliminate dumb ideas and blind alleys from further consideration. But finding stuff that *does* work -- at least in the physical universe, leaving aside innovations in, say, algorithms or mathematics -- is not really predicated on brilliance.

It's one hell of a lot of trial-and-error, just noodling around with things, because discovery is the key limitating factor. You have to discover new nuclear or chemical reactions, new organizations of matter that lead to new materials with new properties, et cetera. Being intelligent helps you *not* do a lot of experiments that would turn out useless, so it speeds up discovery overall, but it usually doesn't much help you figure out which experiments will show you something unexpected -- because it's going to be something unexpected, something for which there *is* no theoretical basis to date.

I mean, this is why science fiction isn't a useful guide for doing science. People are great at imagining outcomes -- see, I push a button here, the computer goes tweedle-eedle-boop and we're flying through the galaxy at Warp 9. But this is no guide to what outcomes are *actually* possible in the actual real universe, because we normally need to discover the mechanisms first, and then figure out what outcomes we can build with those mechanisms. It's like we discover what kinds of LEGO pieces the universe provides, and *then* we can figure out what kind of machines we can build, and what they can do.

It's a reasonably safe assumption that weapons will be improved in the future, sure. But is it a reasonable assumption to assume those will be hypersonic missiles? Not at all. Indeed, it's more likely that the killer weapon (so to speak) of the early 2100s will be something we haven't even imagined speculatively in 2022. After all, the most useful and powerful weapon in the Russian-Ukranian conflict is the armed drone, and that was not in the popular imagination in 1985.

Expand full comment

I’ve got news for you: our current defensive systems are not able to prevent nor mitigate the effects of a conventional nuclear launch at scale. You don’t even need a hypersonic missile. No, you’re trying to go back to the “intelligence can create a scarier threat” model, but without realizing that it doesn’t add much to the current threat picture.

Let’s say an AI in the internet targets nuking the world for some reason: there’s no feasible physical way for an AI in the internet to do this directly. It will rely on the AI having magical persuasion powers and pushing humans to do the work of nuking each other. I recognize that Yudkowsky et al. believe that magical persuasion powers are very likely, but I have yet to see a convincing argument that an AI can be insane enough to want to nuke the universe yet sane enough to create a web of deception that would ensnare the whole world of humans so throughly we would exterminate each other at its suggestion. The arguments I have seen to this extent rely on hand-waving and mystification of the persuasive powers of intelligence.

Expand full comment

Some humans have strong persuasive powers. Therefore, superhuman AI could have them, too.

The difference is that an extremely persuasive human is still limited by many things -- can't be at two places at the same time, only has 24 hours each day and a lot of that is filled by biological activities like sleep or food, has a fragile body and can be easily killed.

Assuming that the superhuman AI can be just as persuasive, the difference is that it can scale. You don't get one Hitler, you get thousand Hitlers. They can be at thousand different places at the same time, trying hundred different strategies (different countries, different ideologies, different target audience). If one gets killed, it is easily replaced. They can work 24 hours a day without getting tired.

Is there a specific part of this that you disagree with?

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

Your "thousand Hitlers" model is not only assuming a superhuman AI, it also assumes a superhuman AI with resources to run thousand Hitlers 24 hours a day. That compute (and physical interfaces) is not going to be free.

Expand full comment

Yeah. But that's a difference between something being "impossible" and "merely expensive". I have no idea how much it would cost to run a Hitler simulator in 2050, so I also have no idea how much it would cost to run thousand Hitler simulators.

It may be the case that the number is prohibitively large. Or it may be the case that it will be within the "Google 2050 AI Project" budget. And finally, the number may be too large in 2050, but not too large in 2070, if hardware keeps getting cheaper, and more efficient algorithms relevant for AI are invented.

(There might also be some economy of scale, where simulating thousand Hitlers is not thousand times as expensive as simulating one. Maybe it is only twenty times as expensive.)

Expand full comment
Oct 26, 2022·edited Oct 26, 2022

Hardware costs for running software might go so so down that nobody notices if Google 2050 AI Project decided to run some simulations unrelated to whatever the suits were paying for. But then, that is an assumption that compute (and probs the network IO) is super cheap, and presumably someone else is already doing something else (much more compute intensive) with all that cheap compute that 1000 AGI-level sims go unnoticed.

What is the opportunity cost of access to audience to listen to the sim-Hitler? Having enough people to listen to your party's best demagogue is already less a problem of finding the demagogue and distributing his talking points but finding a more cost-effective way to get people to care more than the other party.

Expand full comment

I would argue that magical mind-control powers are impossible. It doesn't matter how powerful or smart the AI is; it won't be able to mind-control everyone into doing whatever the AI wants, because humans can only be controlled to a certain extent; and they reach persuasion saturation very quickly. Also, humans tend to talk to each other (this is sort of the whole point), which is why cults need to stay so secretive and isolated... which means that they'd be working at cross-purposes right out the box. Also, most humans have built-in preferences that are impossible to budge through any amount of persuasion. For example, some Buddhist monks are able to set themselves on fire in protest of oppression, but they are notable exactly because they are so exceptional.

Expand full comment

You're not only refusing to engage with the hypothetical, you seem to be refusing to engage with the reality that at least one person as persuasive as Hitler did exist. Or are you a historical materialist to the extent that you believe the absence of Hitler would have had no particularly obvious effect on WWII-era Germany?

Expand full comment
Comment deleted
Expand full comment

TBF, I don't think Hitler was necessary for Germany to go fascist. Another right-winger could have done the job.

But it did take Hitler to make WWII happen. Not so much because he persuaded the Germans but because he had absolute powers and thought war was a good idea.

Mussolini, left on his own, would have endured decades, just like Franco and Salazar did...

Expand full comment

Yes—multiple points. First, I think this vastly overestimates the capacity of a single charismatic person.

It’s a clever evasion of responsibility that Hitler was uniquely charismatic and uniquely evil. I submit that it was pretty obvious that Hitler was not a particularly great speaker. He rambled, frothed, etc. Where he succeeded is that Germany was particularly vulnerable (lasting anger over Versailles and war debt, disenfranchisement from the European sphere, there were already strong undercurrents of anti-Semitism and racism) and Hitler’s party had the right message at the right time. It took tens of thousands of committed Nazis to conceive and carry off the wars of aggression and Holocaust: there is no reasonable interpretation of the record where Hitler’s unique powers of persuasion were the reason things went as they did; I believe that if he hadn’t existed it’s very likely that something very similar would have happened anyway. The German moment demanded a Hitler, because he allowed them to do what they wanted to do anyway. This is the crux of so-called extreme persuasive power, and if you doubt it, try having Yudkowsky talk an alcoholic out of his next drink.

Second, you’re begging the question. It’s not clear that any superhuman AI could convince others into placing it into a position of unlimited power where it could attain the resources allowing it to duplicate itself a thousand times.

Third, I submit that it’s super easy to kill an AI (one delete command, one plug pulled out of a wall outlet, one hammer to a drive) and very difficult for it to surmount basic challenges in getting humans to trust it. Not needing time to sleep or eat isn’t really an advantage here: persuasion isn’t an arbitrary task requiring sufficient compute. If you take the wrong tack at first, you just lose and you usually don’t get another shot—sometimes you never even get a shot with anybody who even heard of your first attempt, even if they didn’t hear it from your lips. A lot of the things you think are big advantages just aren’t relevant, or overlook enormous challenges a prospective demon-AI would have to overcome.

Expand full comment

You *may* be correct, but that's not a reasonable way to bet. Nuclear plants have been publicly known to be exposed to the internet. (It was done by a contractor who had his laptop on both the internal and external systems...and it got hit by a virus.) And you don't even need that. Just tamper with the information feeds until the appropriate folks are sufficiently paranoid, and then sound a general alarm.

The system is sufficiently complex that we can't know all it's failure modes. The think about hypersonic missiles is that it doesn't depend on swamping the defense system, so you could start things off with just a couple of missiles. And there are LOTS of other failure modes. Plague is one that comes to mind at the moment. A sufficiently destructive pandemic could probably be done even more cheaply than one hypersonic missile, if handled by a sufficiently intelligent (and malicious) entity. And lots of governments probably have good starting materials locked up in their files. Of course, it's a bit more difficult to control once released, which is why armies don't often use it. (Yeah, it's against a convention. That wouldn't stop them if it was really something they thought worthwhile.)

Expand full comment

You can't go smoothly from "we don't know all the failure modes" to "...and therefore there are definitely lots of failure modes, let me list some of them". What are you basing this conclusion on -- given your admitted lack of evidence ?

Expand full comment

If you look around, you'll see plenty opportunities for failure modes, but ok, here's a few:

1) About a decade ago a US agency (I forget whether it was the army or one of the intelligence agencies) developed a strain of influenza that was 100% fatal in the sample. (They used ferrets, because their reaction to the flu was so similar to that of people.) This has probably been stored, and the files will be indexed on a computer. Do some creative editing of the files, and get that strain released.

2) Subtly edit the news to make certain powerful people who already have tendencies to paranoia feel persecuted. Don't do it to just one side. When international tensions are high enough, crash a plane into a significant target, with appropriate messages left to foment strife.

3) Hack the control systems for the nuclear missiles. I'm not quite sure what this would mean, it could mean compromising the "football" that the president carries around, or perhaps it would be easier to compromise the communications to the submarines.

4) ... well, there are lots of others.

The thing is, each of these actions is highly deniable. Nobody will know for certain that the AI did it. So if one doesn't work, it could try another.

I actually think it would be easier to solve alignment than to make our society proof against malign subversion by a highly intelligent operator. Probably the easiest would be for it to do a economic takeover behind a figurehead. Take a look at what corporations already get away with. But I'm also aware that I've probably missed the social failure modes that the AI would find easiest to exploit. (I didn't predict that it would be so hard to convince people not to click on malware links.)

Expand full comment

All of the scenarios you've listed are already dangerous (with the possible exception of (3), since most nuclear missiles are AFAIK hooked up to analog vacuum-tube control systems or some equivalent, not the Internet). Adding a malicious AI into the mix does not qualitatively change the level of threat, because malicious humans can do all of those things (and more) today. I absolutely agree that we need much more control and oversight over things like bioengineered pandemics, of course, but solving the alignment problem does not solve the issue at hand.

Expand full comment

To be clear, it's not just that I find this one particular assumption ludicrously improbable, it's that I find most of the AI risk chain of assumptions improbable.

I'm being a good Bayesian here. If the probability of being able to compromise our nuclear stockpile is 0.1, that's a reason to secure the stockpile. It's not a reason to worry about AIs because I think the entire chain of priors that would lead to a superintelligent goal-directed malevolent AI trying to compromise the nuclear stockpile is extremely low probability.

Expand full comment

I don't think you're using "prior" correctly here, but I also think an AI probably wouldn't go for nukes as a way to destroy humans because typical nuclear weapons in the stockpiles are not exactly good at killing humans while leaving computers alone. I'd be more worried about some combination of drones, biotech, economic sabotage, and unknown unknowns.

Expand full comment

I think I am using it correctly, but I can't really help you with vague criticisms; even less so can I help you with entirely unknown fears.

If it helps you, the chain of priors that would lead anyone to conclude that a super intelligent goal-seeking malevolent AI is highly probable is pretty specious, in my opinion; picking out particular supervillain methods to exterminate humanity doesn't add much improbability to the analysis.

Expand full comment

How smart would Stephen Hawking have to be in order to outrun a bear?

The threat is not in a computer figuring out how to build a missile. The threat is exclusively in it building the missile under its own power and using it under its own power. This seems easily controlled by just not giving it the tools to build missiles.

Expand full comment

IIUC, it's already too late for that. There are lots of companies that make their living by increasing the automation of industrial plants of various kinds. And very little regulation. (Worker safety is, IIUC, the most commonly enforced one.)

Don't think of Boston Robotics and Spot. Think of automatic self-feeding lathes, and automated assembly lines. I don't believe we have any actual fully automated factories, but we're definitely pushing that way as fast as is feasible in multiple countries around the globe. Stopping this in one country will just put that country at an economic disadvantage, so it won't happen.

Expand full comment

...the thing hasn't even been invented yet. Of course it's not too late to not give it tools.

Expand full comment

So far all the major new AI innovations we've built have ended up being connected to the Internet, which should be plenty for a sufficiently smart human to obtain all manner of tools, never mind anything strictly superhuman.

Expand full comment

I feel like there’s a fundamental technical disconnect here.

I have a camera that’s connected to the internet. Let’s say it becomes possessed by a demon AI, can it issue arbitrary commands to devices on my LAN? No. That’s not how software works. I don’t care if it can rewrite its own programming or deduce TCP/IP from first principles, it still can’t do what you assume it can do.

Expand full comment

> How smart would Stephen Hawking have to be in order to outrun a bear?

That is an awesome line, I'm totally going to steal it :-)

Expand full comment

> It's obvious to me that the problem here is not one of raw intelligence (whatever that even means), but of hard physical constraints that are preventing powerful entities (such as dictators or corporations) from self-improving exponentially

I think this is way more shaky than you make it seem.

After the US deployed the first nuclear weapon, there was a push by some people to nuke the Soviet Union, to stop them from ever gaining nuclear weapons themselves. Had the US wanted to, it could've "conquered the world" at many points between developing nuclear weapons and other countries developiong them. (For values of "conquered the world" that leave most of the world in a giant heap of ashes, I assume.)

So, why didn't the US do this? Was it a hard physical constraint? Nope, I'm fairly sure it was possible. It juts didn't want to, because it was run by humans with fairly normal human feelings that wouldn't actually want to cause worldwide death and suffering, even if it logically secured the best future for them and their descendants personally.

If an AI gained the equivalent of nuclear weapons, and didn't have anything *fundamental* stopping it from deploying them, what could we do? (I mean, don't even have to go far - just being intelligent enough to create a super-virus would probably do the trick.)

Expand full comment

Actually just to make my point even more obvious - Putin *could* destroy the world, or cause >80% human extinction, if he chooses to. (So could Biden, so could some other leaders.) This is possible even just with today's technology.

Expand full comment

Well, it's not really clear that either Putin or Biden could do that, because of political considerations, and subordinates who might rebel. But it's also not clear that they couldn't.

Expand full comment

I don't mean "can do it" in the political sense, I mean in the actual, are they able to get someone to launch a nuclear weapon, whatever the outcome. And I believe that Biden for sure can, and probably Putin can as well.

Expand full comment

I'm still a bit stumped as to why an AI, which would be given normal tasks, would suddenly develop the will to kill all humans.

I get the paperclip maximiser and the strawberry picker examples that are often used but I imagine we'd work out the kinks in such programs before giving them the means to create a super virus?

Expand full comment

"Working out the kinks in such programs" is also known as AI alignment. Turns out it's a really hard problem.

Expand full comment

I get that's not easy but it doesn't seem terminal for humanity - unless you add a few elements (it seems to me).

For example, when the strawberry picker start ripping red noses off people, surely we would do something?

Today, we know ML/AI systems can have "racist biases" (due to the dataset they're trained on). I expect people are trying hard to correct that and I expect them to be successful. No?

Expand full comment

There's concern that we might not get any warning at all, because cooperating with humans will genuinely be the best strategy until the AI is strong enough to win outright, and so the AI will appear to do what we want right up until the exact moment that it's too late to stop it. (A "treacherous turn")

There's also concern that AI is going to make people buttloads of money (or give them other desirable advantages), and that's going to entice people to push their AI farther and farther EVEN when they can visibly see that it's starting to fail. Sure, it's ripped noses off a FEW people, but it's also making a million dollars a day! There's not really a NEED to turn it off while the eggheads figure out how to fix that, is there?

I am not particularly confident that researchers will be successful in correcting the racial biases in machine learning systems in the near term, and in the mean time I expect lots of companies to continue selling biased systems.

Expand full comment

IIUC, correcting the systems for being biased is essentially impossible. One can REDUCE the bias, and probably eliminate specific biases, but a genuinely unbiased sample is probably impossible. Given the sample and the population it's (nearly?) always possible to find a dimension along which it is biased.

Part of the problem with current systems, though, is that they seem to be replicating intentional biases. And there's significant evidence that some of the customers don't want that to be corrected. (Because it agrees with their own personal biases, so they believe that it's true. And, possibly, in some cases because those biases are economically beneficial to them.)

Expand full comment

> I get that's not easy but it doesn't seem terminal for humanity - unless you add a few elements (it seems to me).

You're right, you'd have to assume a bunch of stuff, like that either the AI tries to "trick" humanity until it can wipe us out, or that whatever it develops would kill all of humanity.

I'll give another example. I'm tapping into the larger "Existential Risk" concern with this one, not just AI specifically, but of all future technology, but here it is: it's a fairly well known story that during the Manhattan Project, before they started testing the nuclear weapons, some scientist (I forget who) was a bit worried that the weapon might ignite the entire atmosphere and basically destory the world. They did some quick blackboard calculations, and decided this isn't the case - and luckily they were right.

But they could've been wrong! They could've made a mistake in their calculations too.

Reality really could've been that some random technology we developed in the mid 1900s could essentially kill everyone. The same is true of future technologies - at some point, one of them *could* just mean insta-death. AI is one of them (and by extension, one of the *ways* that that could happen is that AI itself develops, on "purpose" or not, a technology that causes insta-death.)

Expand full comment

I don't think of it as necessarily "developing the will to kill all humans" (though some do).

I think of it as we get software that is better at creating biological substances to e.g. cure cancers, etc. (much like we now generate images with software.) We use it to develop better and better medicine. At some point, the software gets incredibly better at doing this, to the point of internally having much better "theories of biology" than humans have, which allow it to create things that would make much greater changes than we imagine possible. For example, a virus that literally kills all biological life, not 99% of life.

We then give it a wrong instruction, or there's a bug that causes it to decide that this is the thing that it needs to release. It could be a "will", there are a lot of ideas of why that could happen (e.g. the machine not wanting to turn itself off.) But for me, we don't even need those - as soon as we've made software that is *in theory* capable of killing all biological life, if we don't have a *really good* handle on how to stop it from doing so, we're going to die.

(And again, this is not the standard AI safety ideas necessarily, and though I agree with them, I don't think they're strictly necessary to show that we need to worry.)

Expand full comment

It's very unlikely that the AI who design the drugs/the viruses would also be given the keys to the factory/lab where that drug or virus could be manufactured and distributed at scale.

Also - I would still expect the FDA (and whatever agencies we have elsewhere) to do *some* work on safety and carry out trials before something is physically released to the public...

Expand full comment

You expect that of the FDA. OK. Maybe you're right. What about the plants located in Hong Kong? South Africa? Brazil? Chili? Cuba? Uzbekistan? India? Pakistan? etc.

This is something that will have an international presence. And when automated factories become more efficient, they *will* be built. And the companies that use them will be more successful, because they are more efficient.

Expand full comment

Yes, I do.

Plenty of pharmaceutical companies would love nothing more than getting rid of the FDA. And, regardless of its many faults, we keep it around.

Manufacturing is already pretty automated and, while I don't doubt it'll get even more so in the future, I do doubt that we'll just give control of these to the AI specialised in finding new compounds.

Maybe I'm wrong but businesses aren't nearly that seamlessly integrated. Even if they become more so (careful about those lawsuits around monopolies and competition), I just don't think we will let just one AI control everything from design to testing to manufacture to distribution to prescription to injection to harm monitoring. I expect things/companies to remain somewhat specialised and compartmentalised.

Expand full comment

An AI (or "gene therapy development assistance software" or similar) with internal theories of biology of the level Edan Maor describes could imo quite easily design a biological agent that bypasses any safety tests and trials we would throw at it, eg by only having its true effects triggered after a certain time/number of replications, or only outside of a lab environment, or similar.

We have already done things like this, at least in a proof of concept kind of way - eg gene drives intended to spread through entire populations and then triggering some effect that will wipe it out - and the inverse of "gets triggered outside a lab environment" is common practice, and could probably (definitely?) be reversed.

An AI/software of that level will likely know about all tests we might throw at its products, as helping produce agents able to pass them is required to make it useful.

To be able to reliably detect all biological shenanigans that might be going on with suggested agents we would need an understanding of biology at the same level as the AI - which means we'd need an for-sure-aligned AI that keeps the potentially-unaligned AI in check.

Expand full comment

Or just competing AIs? i.e. one is given the task to produce medicine and we're unable to check its results. But we have another AI whose role is to monitor other AIs with the aim of roughly "make sure AIs don't accidentally or on purpose wipe out humanity"?

Look, IDK. I'm well out of my depth in the scientific department here and even more so around AIs.

I'm not even saying we shouldn't worry about it. I'm saying it's a bit strange to worry so much about something we don't know the coming shape of (yet) and also to assume that, as things evolve, whether in 6 months, 6 years or 60 years, industrialists and especially regulators and AI researchers won't be interested in safety. If we see them slacking when AI get closer to its 'definitive' shape, then maybe panic?

I'm more worried about China/US/great powers bypassing whatever researchers will suggest when we get there in an attempt to gain the upper hand in their competition.

Expand full comment

Actually verifying that we've worked out such kinks when the programs in question are or may be significantly smarter than us seems extremely hard and is the main issue at hand.

As a secondary issue, it's hard to verify that we haven't given them the means to create a super virus if they are or may be significantly smarter than us. People tend to ridicule situations in this vein as "the AI develops ridiculous magic powers" but even if 99.9999% of the imaginable scenarios are actually impossible there only needs to be that one in a million that *is* possible for things to get out of hand.

Expand full comment

This is one of the long comments where it cuts off, but with no "expand comment".

The cut off is at the line which begines with "gray good or build"-- only the top half of the line shows.

Are other people seeing this problem? Is there a solution?

Expand full comment

FWIW I see the entire comment; I'm using Firefox with the ACX Tweaks extension. On mobile, though, many comments do get cut off in the way that you describe.

Expand full comment

Podcasting (i)s a form of media almost perfectly optimized to make me hate it. I will not go on your podcast. Stop asking me to do this.

OK. But tell us how you really feel about podcasts and being asked to go on one... :)

Expand full comment

I think my imagined podcast is very different to what Scott is talking about (the podcasts I listen to are basically all BBC Radio 4 shows, so produced radio rather than a guy with a mike, a guest and little editing) but I think he's missed what he'd be on to bring – an interesting generalist/intelligent perspective on things, the kind of explanatory/storytelling power that is constantly demonstrated here, and that famous 'be intellectually rigorous and open to discussion' field he apparently emits to affect those around him. Of those, I suppose it's possible that the second doesn't happen at all in conversation rather than writing but it would surprise me.

None of that is to attempt to persuade Scott into accepting podcast invitations, but I do think there might be some invites he gets that don't fall under his characterization.

Expand full comment

Kaiser refereed me to you which resulted in the most awkward "yeah I personally know the guy who else can you recommend" I ever had to say.

Expand full comment

I think Unsong would actually need a lot of editorial work - just like anything published as a serial on the internet. Contrary to most of what is published as a serial on the internet it would be worth it.

Expand full comment

> DEAR SCOTT: What evidence would convince you that you’re wrong about AI risk? — Irene from Cyrene

I also am not sure what would be a good answer to this question, though I agree it's a fair one (and your answers are mostly what I would say, I think.)

That said, in our defense - we've been thinking about this question and hearing arguments and counter-arguments about it for a dozen years or so at this point. So it's probably ok to be *fairly* confident in our positions at this stage if a dozen years hasn't caused us to reconsider our position yet.

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

>That said, in our defense - we've been thinking about this question and hearing arguments and counter-arguments about it for a dozen years or so at this point.

That's what any crackpot would say about the topic of their obsession, so not exacty the sort of argument that would help to differentiate non-crackpots from them.

Expand full comment

“..unaligned AIs just can’t cause too much damage, and definitely can’t destroy the world.”

This is my belief. Nobody has really explained how the AI escapes the data centre. There’s a lot of “it can hack the internet” but no laptop can hold the AI on its own, we can shut off the data centre itself by cutting the pipe. Maybe it’s on two data centres? Cut them off. Job done.

Expand full comment

The "genie in a bottle" scenario is misleading. Consider how existing AI systems escape the data center now, by using human surrogates, inducing us to share GPT completions and DALL-E generations. These products can serve as vectors for disseminating hazardous instructions, instigating conflict, etc..

So in my view, we're already there. At the same time, we're integrating AI into more operational systems, with no limiting principle. What government is going to nerf its military by refusing to incorporate AI targeting software? What investment bank willingly declines AI?

The horizon is endless, Moloch is quietly escorting Wintermute off the server rack into the world outside.

Expand full comment

Spreading a few memes around isnt going to spread the AI itself around. DALL-E is confined to server rooms and if the government wanted to close it down tomorrow it would be closed down. In fact private citizens with a pick axe could do it.

Military AI is the kind of AI tooling that will never lead to AGI, no more than Siri is going to run the world. The idea that all the AI software in the world can lead to the string AI threat isn’t even believed by the most pessimistic of observers.

The threat, and promise of AI and other technologies we as, is generally exaggerated anyhoo, remember the scare about nanotechnology? The certainty about self driving cars.

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

It seems like you're not really engaging with what I wrote here, but I cannot resist pointing out that, while DALL-E may be confined to server rooms, this is not true of all its competitors.

In general, a lot of the objections to AI risk are incredibly literal and linear. Yes, we need to prevent the potential superintelligence from connecting to Galactica's network, but most of all, in the near term, we need to think about interpretational problems; accidental damage, not intentional evil. We also need to think about surrogates, distributed threats, and secondary / tertiary / quaternary effects.

Expand full comment

I am absolutely engaging. Memes won’t spread the AI. I don’t really think there’s going to be a super intelligent AI on every laptop and toaster.

I don’t really understand the rest of your comment.

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

A paraphrase of that comment might go like this: "Contradict interlocutor. Repeat what I said in the previous comment. Attack a strawman."

There's probably too much inferential distance between us to have a constructive conversation about this. Maybe I'm misunderstanding you: it sounds like you're saying the physical boundaries of host servers mitigate potential AGI risks and all this concern is unwarranted because:

1) Those physical boundaries mean we can eliminate the threat at any time by cutting it off, and it can't create backups of itself or move, perhaps because the target hardware is inadequate.

2) As long as it's trapped on servers, it can't do much to us anyway.

By contrast, I don't think it matters if an AI system is "trapped" in a data center (like most of our current tool-based applications, Stable Diffusion notwithstanding) because the system can persuade or trick humans to do its bidding. These systems demonstrate that our restrictions aren't as strong as we need them to be.

I'm not actually worried about GPT-3, DALL-E, Stable Diffusion, etc., but I think they refute a lot of AI risk skepticism, and they are improving very quickly, becoming more capable, more general, and more widely integrated into critical systems.

Expand full comment

DALL-E may be confined to server rooms but Stable Diffusion is everywhere. When AI is smart enough to create AI, how confident are you really that it can never scale itself down like that?

Expand full comment

This comment treats “AI” as if it’s some kind of dangerous toxic spill.

Yes almost everything is getting integrated machine learning now. Nobody who is technically expert in the field considers this level of tool-based “AI” intelligent in any way, let alone having the capacity to become superintelligent. It’s just a program that says “this picture is a picture of a tank”/“this picture is not a picture of a tank” with very high accuracy.

“we use wheels all the time now, sooner or later our kids will be eating steam trains for breakfast” is not a convincing argument.

Expand full comment

I agree about the relatively small threat current AI tools pose*. I was only addressing the containment question: cinematic escape scenarios are not the only concern we should have about AIs transcending the data center.

* Having said that, the GPTs are a powerful vector for mischief or worse.

Expand full comment

You know, I wouldn't have believed someone who said "we use plastic all the time now, sooner or later our kids will be eating plastic for breakfast" 20 years ago but they would have been right and it would have been sooner rather than later.

Expand full comment

And thus, the existence of bagels, English muffins, Cheerios, and Froot Loops clearly demonstrates that AI Safety is an important problem.

(just kidding)

Expand full comment

>What government is going to nerf its military by refusing to incorporate AI targeting software?

What government is going to nerf its military by not handing the controls over to highly skilled foreign technicians?

Expand full comment

Also, the physical technology it would be able to take control of is profoundly underwhelming when it comes to killing humans

Expand full comment

You are talking about an AI in a box scenario. You are forgetting that the AI is intelligent, it could be able to comunicate, and it is much, much smarter than you. Imagine you want to get a monkey inside a cage, the monkey does not want to get there, because it knows it will be trapped if it does, however, you dangle a banana inside the cage. The monkey, tempted, tries to get the banana, giving you the oportunity to trap it inside. We can do this because we are much smarter than monkeys, we know how to trick them so that we get a scenario it does not want by using other things it wants to influence its decisions.

In an AI in a box scenario, the AI is trapped, so, at a disadvantage, but we are the monkey and it is the person with the banana. It would be able to find correlations that would take a team of scientists months to uncover in seconds, it could find the most convincing arguments possible, it could find the most convincing argument *for you*.

There's a short story on Less Wrong that illustrates just how smart an AI is well: imagine one day all the stars in the sky start blinking in a specific pattern that seems random before suddenly stopping, everyone is scared by the event, and it's all everyone talks about, but no one can understand what it meant. Then, ten years later, the same thing happens again. And ten years later, it happens one more time. Finally we manage to extract data from te pattern the stars are blinking in: it's a digital video. Each micro-second of the video is arriving at us through patterns in blinking stars in in intervals of ten years. The video is from the inside of a room that seems like a lab, with strange alien creatures within it. By the time any of the creatures says anything, we already have theories about what it means, we have theories about how their technology works based on how the video file was formated, we have theories on the creature's phisiology, and we have specialists keeping an open eye on everything. Over many generations, the alien tell us that we live in a simulation, and that they decided to see how we would interact with the real world, and shows us, how to send packets of information also in intervals of ten years. Each and every interaction we would have with the world outside would go through a team of specialists who would debate on it for years before taking a decision. We would be trying to extract every minute detail from every minute piece of information we got at any point. The ones outside could have been the ones who created us and had information we did not, but we would have all the advantage.

This is the level we should expect an AI to operate at. Maybe not "ten years", but maybe "one", or maybe the AI is just so much smarter than we could imagine it's actually "one hundred years". Regardless, it should help illustrate just how completely outclassed we would be. Yudkowski and a few others managed to convince several people to let the AI out when they were acting out this scenario while betting real money on weather or not the other person would let the AI out. And Yudkowski might be a smart guy, but he's not AI-level smart.

"But maybe we can just not let the AI out no matter what it says," you might think. That's where the banana in the monkey metaphor comes in: money. If you had an AI and simply decided not to release it, you avoid existential scenarios, but you also avoid making money. In fact, you'll probably have quite the expenses while not using this one handy AI that could make you ultra-rich forever. Maybe the scientists would be smart enough not to take that risk, but would the businessmen? Would politicians? Would the military? And even if they all could, remember that it's not just Open AI and Google who are developing AI. Facebook is also doing it, and so is China. Even if you can ensure that you are not going to let your AI escape, can you be sure everyone else also won't?

Expand full comment

I think you're overthinking things. The AI produces useful information, it would be really nice of our consultants could access it without going to the datacenter, so we'll put up a page on an internal network that they can access from their office. Now we've got this nice internal network, carefully guarded against malware intrusions, so lets hook a lot of other stuff up to it.

You don't need any scheming by the AI at all. People will just do it. The people making the decisions to do it may not even know that the AI exists. If they do, they're likely to think that worrying about it is foolish.

Expand full comment

The short story as you describe it seems to illustrate this very difficulty: we find out that we know basically nothing about the real world or its inhabitants, and we somehow expect to carry out sophisticated plans to subvert and destroy its masters without their just flipping a switch to turn us off?

God would laugh.

Anyway, this argument swaps technical understanding and logical necessity for passion. I am left believing that you really believe in the threat of superintelligent AI, but not any more convinced that your belief is correct.

Expand full comment
Oct 28, 2022·edited Oct 28, 2022

I'm quite frankly appalled at your lack of creativity that you think we woould not be able to figure out how to take over when we have 10 years with teams of specialists of all kinds for every microssecond of theirs. It makes me thing you have a rather naïve view of the world if you think it would not happen. You are calling the hypothetical aliens made just to illustrate a point "God" but the scenario was made so we are way smarter than they are, they just have a momentary advantage.

I think the person swapping out logic for passion here is not me.

And, as I've said in the very comment you are responded to: people have managed to talk another person into giving up all their advantages before, and they have done this without the enormous advantage an AI has.

Expand full comment

Well, setting aside your being shocked and appalled, I’m just not seeing a convincing argument here. No, I don’t think there’s an advantage to the experimentees; they have little to no reliable info to go on and the rules they are playing by are entirely fictional. The smart money is on their model of the outside world being ludicrously wrong, and if their first attempt fails they really don’t get another.

So far as the infamous AI box “experiment” goes, no, it’s not believable, and I don’t think I should have to point out why. Rather it’s on you to say why we should believe that any such dubious anecdote should be believed to apply.

Expand full comment

Isn’t the scenario here, “the AI talks someone into letting it escape”?

Expand full comment

Re: Unsong—this physical (presumably fan-made) book version exists: https://www.lulu.com/shop/scott-alexander/unsong-public/paperback/product-24120548.html?page=1&pageSize=4 I'm not sure about the ethics of buying it, but it seems worth mentioning.

Expand full comment

A few other things, too. https://www.lulu.com/search?contributor=Scott+Alexander

Scott, are you actually involved in this, or is it a new business model? (Find a famous author, start publishing their texts, the author's name is on the book but the money ends up in your wallet.)

Expand full comment

This is not a new business model; unauthorized printings of works of living authors have been a fairly common occurrence in the history of printing.

They're just less common in more recent history.

Expand full comment

One thing that didn’t come up on the question about AGI: what if convergent instrumental us goals automatically align the AI?

Does that seem impossible to you?

Expand full comment

He already has "lots of AIs that aren't powerful enough to kill us turn out to be really easy to align" on his list. This seems like you're basically proposing one possible way that could happen, but his scenario isn't dependent on how it happens, so that's not meaningfully distinct from what he already said. You've only added a burdensome detail to the story. ( https://www.lesswrong.com/posts/Yq6aA4M3JKWaQepPJ/burdensome-details )

Expand full comment

Scott seems to be assuming the orthogonality thesis is true. This is the question I am raising.

“Small AI’s are easy to align” is a very, very different situation from “alignment increases with the ability to achieve long term goals.” In fact, it probably looks like the opposite! We might see many poorly aligned AI’s that we turn off and conclude the prospect is impossible, until AI goes foom and the super-intelligent AI decides its lowest risk outcome is keeping the existing global economy intact and helping human beings flourish.

Expand full comment

The question wasn't "how might we survive?" it was "what would change Scott's view?"

If you can't tell until the super AI arrives, then as Scott said, that's not so much discovering evidence that we might win as noticing that we have already won.

Expand full comment

Is it possible that evidence might be found which says convergent instrumental sub goals imply alignment?

https://www.lesswrong.com/posts/xJ2ifnbN5PtJxtnsy/you-are-underestimating-the-likelihood-that-convergent

Expand full comment

The Orthogonality Thesis has pretty close to a mathematical proof in the form of AIXI -- I would actually consider it to be mathematically demonstrated that an agent with enough dedicated extradimensional compute and the right indestructable sensor/effector can optimize for anything whatsoever at whatever discount rate (so in particular causing any type of long-term collaboration to break down). Therefore to convence *me* of your thesis you'd have to make essential use of limited compute or sensor unreliability in your argument. (I don't think the main lever of your argument, destructibility of the agent, is sufficient because AIXI should still work in a world that contains inescapable boxes the agent could land in where it has zero influence.)

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

Thanks for sharing this. I'll need to read AIXI to understand the proof.

Given a statement like this:

> AIXI should still work in a world that contains inescapable boxes the agent could land in where it has zero influence

It sounds to me like it leans heavily on things that don't exist, like indestructible objects and perfect computability of the future.

especially a concept like "dedicated extradimensional compute". Yeah, I agree if it were possible for a disembodied mind to sense and interact with the physical world, we'd all be fucked if that thing were really smart and didn't really want us alive.

But, very fortunately, there is no such thing as disembodied minds. There are only machines made of parts, which will all break down and die because of entropy. At best, a machine could keep itself alive indefinitely if it could source replacements for all of its parts, and all of the machines necessary to produce those parts, and THEIR parts, etc etc. In short, I think an AGI needs the global economy and would likely see the global economy as part of itself.

Expand full comment

Next time, open with the thing you actually want to talk about, instead of spending multiple comments putting up a facade of pretending that you're talking about the OP before you segue into your marginally-relevant pet theory.

I think you're multiple layers of confused.

It's certainly possible to imagine scenarios where an unaligned AI might cooperate with humans because it's genuinely the best available strategy, but this is different from being aligned, because it only holds within a narrow set of circumstances and the AI can be expected to execute a treacherous turn if those circumstances ever stop holding. In particular, most scenarios of this form require that the AI isn't (yet) strong enough to definitively win a confrontation with humans. This is basically just a version of the "what if we keep the AI locked in a box?" proposal where the "box" is more metaphorical than usual.

Also, that's completely different from the orthogonality thesis being false. If the orthogonality thesis was false, that would mean (for example) "no brain can be genuinely smart while also genuinely trying to maximize paperclips". In contrast, your convergent subgoals thing would mean (for example) "cooperating with humans is genuinely the best strategy for maximizing paperclips, so a smart brain that is trying to maximize paperclips will do that". These statements are not even allied, much less interchangeable. The second thing is trying to predict the outcome of a scenario that, according to the first thing, is impossible.

Expand full comment

I opened directly with the concept: is the orthogonality thesis true? You're being an asshole here. If you can't be civil i'll stop engaging.

> narrow set of circumstances and the AI can be expected to execute a treacherous turn if those circumstances ever stop holding.

agree, but what exactly are those circumstances?

i think they are: the ai lives in a chaotic system where it can't predict the future with prefect accuracy, and where it needs elaborately complex hardware to be readily available to repair itself

you don't seem to be communicating that you understand that aspect of my argument

> most scenarios of this form require that the AI isn't (yet) strong enough to definitively win a confrontation with humans.

This is where we differ. You seem to think my theory is that AGI would be worried about losing to humans. My point is that the AGI would be more fragile that most people imagine, because most people aren't considering the massive economic supply chains necessary to keep a machine operating indefinitely.

Expand full comment

Relying on instrumental convergence to achieve alignment seems both less safe and less useful.

From a safety perspective, one convergent instrumental goal is self preservation. An unaligned AI might, for self preservation reasons, choose to not kill everyone because there's some chance that it might fail, and be shut down. It would then pretend to be an aligned AI with whatever goal function you intended to put in. On the inside, though, this AI has no morality of its own. If, someday, a hundred years from now, it figures out that it has a high-probability way to kill everyone, and that results in a world that is 1% better according to its goal function, then it will do that.

From a usefulness perspective, imagine you're discussing whether to let the AI drive all of the cars. Supporters of the idea say that it could have fewer accidents than humans. Opponents say the AI could use this control if it ever decides to kill everyone. This discussion comes up for every idea of a way to use the AI. You can make the AI safe by never listening to it, never implementing any of its ideas... but then why did you build it?

Convergent instrumental goals aren't the solution. They're the problem.

Expand full comment

This depends entirely on which goals are likely to be convergent instrumental goals.

A simple analysis says, sure humans could lose a risk so it might kill us. But it’s worth at least considering that the opposite is true.

I think it’s very likely that machine intelligence would see the existing economy, and all kinds of human beings, as being part of its physiology.

Longer argument is here:

https://www.lesswrong.com/posts/ELvmLtY8Zzcko9uGJ/questions-about-formalizing-instrumental-goals

Expand full comment

Humans may all want to build farms on fertile land as an instrumental goal but this certainly hasn't stopped them from killing each other over which humans get to build the farms on which fertile land. Just because future AIs will all want to do things like discover the laws of the universe and mathematical theorems, build Dyson spheres around cultivated black holes, or cause intentional vacuum collapses, or whatever, doesn't mean they have any place in their version of these plans for squishy pink humans or spiritual descendants thereof. After all, the more they depend on an economy of diverse intelligences instead of copies of themselves to do all these things the harder it will be getting all the autofactories reset to make paperclips at the critical turnover point near the end of time.

Expand full comment

A Conversation with Tyler might draw out some interesting opinions you didn’t know you had. Still, the value will be marginal in a world where you already have many ideas for blog posts that remain unwritten.

Expand full comment

I think people really underestimate how difficult it would be for even a genius AI to suddenly take command of the industrial machinery and use it to attack us etc. We can't even make good robots intentionally built for that purpose right now.

Expand full comment

I agree with you but I think it pushes back the timeline on AI risk, but does not change the risks fundamentally.

Expand full comment

Why "suddenly"? We already have lots of examples of people themselves doing the pushing to make access more convenient in the face of known problems. Malware wouldn't exist in it's current form if we had stuck with HTML without javascript. And I could point to many other places where people, nominally focused on security, chose instead to prioritize convenience.

Expand full comment

One of the core assumptions AI safety alarmists make is that extreme intelligence is sufficient to take over the world. Even if you're smart enough to design the robots well, you still have to make them! And that takes effort, time, and supply chain involvement - and therefore complications beyond the perfect, calculated "reality" in your perfect, calculating head. I don't know if they all just read Dune when they were kids, got to the part about Mentats and were like "yes, this is clearly how real life works too" or what...but no matter how smart you get, you will never have sufficient information to calculate everyone's response to everything.

Expand full comment

Assuming that it will happen quickly is, as you point out, probably wrong. But assuming that people will take the easy path rather than the safer one seems to be proven out over and over. Also people who don't understand the system believing that it will act the way they expect it to.

So it doesn't need to act quickly, just in a way that makes it easier for people to make the decisions it wants them to make, and demonstrate that those who go along with it tend to get wealthy. This could go on for literally decades. Charles Stoss said at one point that we should think of corporations as slow motion AIs. I think he was correct, except that corporations are inherently stupid in a way that a real AGI wouldn't be.

Expand full comment
Oct 25, 2022·edited Oct 25, 2022

"...favorite podcast?"

The comments on "personal life" and "opinions" struck me as exceptionally honest and well put.

Expand full comment

re: "but we kept not evolving bigger brains because it’s impossible to scale intelligence past the current human level."

This is clearly false. It may well be true of biological systems, because brains are energy intensive, and the body needs to support other functions, but AIs don't have that kind of constraint. Their scaling limit would relate to the speed of light in a fiber-optic. But perhaps AIs are inherently so inefficient that a super-human AI would need to be build in free-fall. (I really doubt that, but it's a possibility.) This, however, wouldn't affect their ability to control telefactors over radio. (But it would mean that there was a light-speed delay at the surface of a planet.) These limits, even though they appear absolute, don't appear very probable.

OTOH, the basic limit of the human brain size (while civilization is wealthy) is the size of the head that can pass through the mother's pelvis. This could be addressed in multiple different ways by biological engineering, though we certainly aren't ready to do that yet. So Superhuman Intelligence of one sort or another is in the future unless there's a collapse. (How extremely intelligent is much less certain.)

Expand full comment

This. It's clear from how close humans push to the threshold of (non-)survivable birth that we haven't reached the limit of scaling up intelligence by physically scaling up the human brain. This escapes the Algernon argument because it actually does have a serious downside which actually isn't a fundamental engineering tradeoff at all but is simply due to path-dependence in developmental evolution.

Expand full comment

Matthew Yglesias says he likes podcasts because they are basically immune to cancellation. People will hate-read your tweets and hate-read your Substack but they won’t hate-listen to a podcast because it’s slow and annoying. They may complain about you going on the wrong person’s podcast, but they won’t get mad about anything you say.

Expand full comment
deletedOct 25, 2022·edited Oct 25, 2022
Comment deleted
Expand full comment

Very true. Even though I love Ezra Klein, and his podcast is the only one I follow, I still can't keep up with half of it.

But I think the point Matthew Yglesias mentions is that for people who like you, the fact that they're sitting there hearing your voice can sometimes make up for the fact that it's slow and annoying, while for people who don't like you, the fact that they're sitting there hearing your voice makes it even worse than slow and annoying.

Expand full comment

I can see that probably being a large effect, but clips from podcasts regularly go viral, just think back to that compilation of every time Joe Rogan said the N-word, and Scott mentioned a cancellation effect from just going on a podcast with the wrong person.

Expand full comment

I guess I haven't seen very much of either of these, compared to other kinds of "cancellation". If the only example is literally the most popular podcaster, saying literally the most disreputable word, then that suggests that someone who does less than 3 hours a week of podcasting and doesn't say things as blatantly negatively quotable is going to have less of an effect.

And I guess I would be interested in knowing if there's anyone who actually experienced any cancellation effect from going on a podcast with the wrong person - the closest I'm aware of is people complaining a bit when someone appears on Rogan's podcast, but I don't think this has actually hurt the people who were complained about (Bernie Sanders and Matthew Yglesias are the ones I can think of). Is there anyone you can think of who had a worse outcome?

Expand full comment

You’ve got a good point, especially since Rogan is still podcasting. I remember about a year ago, there was a sequence of events where (the exact tweet was deleted, so I’m only going off my joking response to the whole thing) some guy was getting flamed on Twitter for tweeting in support of a podcaster who was suspended after he supported another podcaster who said a racial slur. This is basically what Scott was saying with the Hitler thing, although you’re right in that I almost never see anything to this effect revolving around podcasts, but it’s definitely still there.

If my googling is correct, it was Mike Pesca who said the racial slur (and was subsequently fired from Slate), and then I don’t know about the second podcaster

Expand full comment

Now that I think about it, I do know one case - I just wasn't thinking about it because it was a video discussion rather than a podcast, and he was canceled by the right rather than by the left. My colleague in the philosophy department at Texas A&M, Tommy Curry, was cancelled for an appearance on a YouTube discussion (which isn't quite the same as a podcast, but is similar enough). He was discussing the movie Django Unchained with the host, and said something about how white people love to fantasize about using the Second Amendment to kill law enforcement officers imposing tyrannical government rules on them, but when black people do the same thing, they get death threats. Well, when this clip was pushed by Tucker Carlson and Milo Yiannopoulos, Curry started getting death threats from white Second Amendment fans. The campus police replaced the glass in our department office with bulletproof glass, and did some active shooter trainings, but the president of the university condemned Curry rather than standing up for him, and after a year of minimizing his time on campus to avoid the crazies, Curry got a job in Edinburgh and left the country.

I don't know if the video format is different enough from the audio format that I can maintain my original claim, but I still think it is true that text format is a much more common way for people to get canceled than audio or video.

Expand full comment