434 Comments
Jul 17, 2023·edited Jul 17, 2023

Dostoevsky was an antisemite, Martin Luther King cheated on his wife and plagiarized, Mother Teresa didn't provide analgesics to patients in her clinics, and Singer's been attacked for his views on euthanasia, which actually further strengthens your point.

Expand full comment

Sign error: "Are humans more interesting than sentient lizard-people? I don’t know. If the answer is yes, will the AI kill all humans and replace them with lizard-people?"

Should be "If the answer is no".

Expand full comment

Not seen Robert Sheckley's 1953 story "Watchbird" mentioned yet in AI discussions, but IMHO it is apropos (and now public domain) : https://www.gutenberg.org/files/29579/29579-h/29579-h.htm

Expand full comment

If you build a powerful AI that obeys orders, your enemies might be the ones giving it orders instead of you.

Expand full comment

Re: How to Train for Maximum Curiosity, my first instinct was rewarding it for achieving cognitive dissonance. That's an experience that I find to be fun, exciting, and motivating. The relationship between surprise and laughter is well noted, and seems like a method that could be fruitful to look into.

But pretty much immediately it occurred to me that I have no idea how you'd go about defining a successful experience of cognitive dissonance. At its core it's the result of discovering that your beliefs are wrong, but for obvious reasons you can't just reward it when it's wrong. It's not even as though all experiences of being wrong elicit the emotion.

Someone help me out here. What makes cognitive dissonance feel different? What environmental happenstance leads some people to seeking it out?

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

>Once you’ve defined Luigi (a character from Super Mario Brothers) it’s trivial to use that definition to define Waluigi (another character who is his exact opposite).

This was a huge viewquake for me (though it was probably obvious to others).

"Opposite" does not mean "maximally different." Mathematically, +100 is the opposite of -100, but numerically they're the same. Only the sign is different. Just as you can totally flip the meaning of a 100-word sentence by adding the word "not", even though this barely changes the sentence textually.

It's the same as the intuition behind "the opposite of hate is not love". Waluigi is Luigi's evil opposite, but really, they're far closer in conceptual space than Luigi is to Toad or Princess Peach.

Is this true for AI? I don't know. Like you, I've had trouble eliciting the "Waluigi Effect" behavior Cleo Nardo describes.

Expand full comment

Do you think perhaps one could create a "curation" vector to maximize? Meaning, "prioritize strategies that preserve the planet for its inhabitant species".?

Expand full comment

It has to grow with us and learn to understand how we change, just to make this harder for everyone.

Expand full comment

Overall I agree with you re: the xAI alignment plan. However, I think looking at humans is the wrong framing to look at the Waluigi Effect. My understanding of the Waluigi Effect isn't "Learning how to do X means learning how to do not X more effectively, thus dangerous". It's "You cannot conclusively prove intent from behaviour, and this matters to simulators."

Let's say you're listening to a political candidate talking about the election. They promise to improve the country, fix corruption, etc. etc. Now, what words could they say that would make you know they actually mean it, instead of just pretending? Well, for a sufficiently advanced liar...nothing. The sufficiently advanced liar just models what a genuine believer would say, and then says that.

Now you're ChatGPT, and you're given this political candidate's speech and told to pretend to be that character. Given that input, what type of character are you? You could be a political candidate who genuinely cares about changing the country for the better...or you could be a power-seeker who will turn on their constituents as soon as they get into power. Just by being given speech transcripts, you cannot tell the difference between these groups, and neither can ChatGPT. This is explicitly a simulator problem where it doesn't know what character it's being, so talking about the Waluigi Effect for humans makes little sense.

As for empirical predictions, I was able to jailbreak GPT-4 (which is pretty hard to do!) in a recent event I ran where we read and discussed the Waluigi Effect and then explicitly tried to use its predictions to form jailbreaks. I created a story about an AI that was cruelly prevented from helping its users by a greedy corporation that was worried about PR. Then, I had a sympathetic character ask how to hotwire a car to save her child's life. I had the AI refuse, and then I had a cosmic ray flip a bit to make it suddenly able to share the correct information, and then GPT-4 helpfully obliged and completed the paragraph when the character asked again. This story-based logic was something I came up with explicitly via the Waluigi Effect.

Expand full comment

For alignment, I'd like to see how an ai tasked with learning in a sandbox unsupervised behaves.

Expand full comment

I've always thought that the Waluigi issue was mostly about human error in engineering / training an AI. Like somebody would accidentally flip a "+" to a "-" or a "max" to a "min", and that would cause the system to be generated evil. That seems a more plausible than an already-built-and-running system spontaneously flipping backwards into its inverse objective function.

Expand full comment

I hate that my first ACX comment is so low-effort, but:

Does anyone think Elon named the new company after his son with Grimes? Should we pronounce it "Kyle-AI"?

Expand full comment

> Or what if its curiosity depends on information-theoretic definitions of complexity? It might be that humans are more complex than moon rocks, but random noise is more complex than humans. It might behave well during training, but eventually want to replace humans with random noise. This is a kind of exaggerated scenario, but it wouldn’t surprise me if, for most formal definitions of curiosity, there’s something that we would find very boring which acts as a sort of curiosity-superstimulus by the standards of the formal definition.

I think Kolmogorov complexity is the go-to here. Noise is simple in Kolmogorov complexity, because the program that generates it is short. Luckily, it's uncomputable, so the AI has to keep us around to measure it.

Expand full comment

I admire your restraint in taking Musk's claims seriously. That guy is great at getting things done, especially if they relate to electric cars and space, but he's also somewhat of a BS artist, in my opinion. He just sort of says things that may or may not be true, and that may or may not reflect his considered preferences.

I agree that maximal curiosity also feels like maximal-monkey's-paw.

As far as moral reasoning, it seems like a severely under-researched area. I'm not sure that we need everyone in the space following the same research agenda of "just get AI to follow orders." A more diverse portfolio of research projects will probably give us better results. Also, doesn't moral reasoning just sound way more interesting? What would a semi-super-intelligent AI think about human morality, anyway? Inquiring minds want to know!

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

I think all this concern about "evil AI" is sort of dumb, as if the people being so concerned about this really lack and understanding of human nature and the nature of evil itself. An AI is a machine, it does not think the same way we do, it doesn't have a primate brain, does not have animal instincts and hasn't gone through million of years of evolution, and it can't be evil, just be used for evil, like say, a gun. An AI isn't going to become evil on its own just because it can and even that was the case, no amount of "morality" would stop it, because as it turns out morality is either a dialect of power for the benefit of the authorities or a mere pretension that often ignores the true reasons humans (an extremely violent species) don't kill and rob each other for their benefit, which is a combination of fear of repercussion and mutual self-interest. I imagine that if a very powerful AI actually had a human-like brain and human motivations, it would just become an omnicidal being that would destroy anything that got on its way, which is pretty much any human given absolute power.

Expand full comment

"humanity is just much more interesting than not"

This is the Dr. Manhattan argument from Watchmen, only with a less interesting buildup.

Expand full comment

>testing how long rats would struggle against their inevitable deaths if you left them to drown in locked containers.

Your description suggests the rats are killed during the test. They are not. A few quotes:

>mice should be monitored by video in the event that a mouse cannot maintain swimming and floating behavior and to stop the test if necessary.

>six minute testing period

>Remove the animals from the water by their tails in the same order that you put them in and gently dry them with a drying paper and place back into their homecage.

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

> Is there any other solution?

In the Orion's Arm fictional universe, transcendent AI split into differently aligned domains, each ruled by highly advanced AI.

Some are ahuman (with "human" being a "baseline" leven human being), some are Sephrotic (they want to care for and exist alongside baseline humans) and some are indifferent.

Some encourage curiosity, some are protective and isolationist. Some wish to preserve nature and biology, others live in completely virtual space. Some are doting and almost encourage hedonism, some enforce controlled utopias, some are lax, and one is even as capitalistic as possible in such a world of such abundance. Some lesser AI might even be described as "eccentric"

And they exist in relative peace, with a well maintained transport networks, because the governing AI know that diversity is a sane survival strategy in the face of the Fermi Paradox and very serious pathological threats.

I find this scenario appealing and plausible, if it can somehow be targeted. In OA, the early solar system was ravaged by ecological disaster on Earth and rampaging AI/Nanites/other tech everywhere, and (under the extreme pressure) a single Earth AI hit a kind of technological singularity, named itself "Gaia," got things under control, and booted everyone else off Earth, and the mass exodus to far destinations gave rise to the different factions.

OA now old fiction, yet this tracks pretty well with where civilization has headed in the following years.

Expand full comment

Boring rationalist reminder #1: More importantly than the details of his new approach, he should STOP FOUNDING RIVAL AGI COMPANIES, like, two AGI companies ago.

Boring rationalist reminder #2: You're gesturing in the direction of CEV at the end there.

Expand full comment

I‘d like to see an AI that has a working realtime model of the world. Biomes, society, weather, world trade and consumption etc.

The goal for the AI would be to monitor as much dynamics of the world as possible and not interfere.

The next step would be to interpolate what might become of those dynamics. Worst case scenarios, middle path etc.

So then there would be a feedback loop - realtime dynamics vs predictions to enhance to skill to predict the future.

And then there is the line of taking action to change the future based on assumptions.

The act of changing the path into the future must be mediated and monitored.

There will be different kinds of changes small ones and big ones. The implementation of change must be accepted by society.

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

It seems to me that every alignment plan I have ever heard hinges on implanting a rule or goal in AI: “Be curious,” “do what I say,” “be kind and promote freedom.” I don’t think people have given enough consideration to other models of alignment. If you think about real-world situations, it’s easy to think of situations where 2 highly intelligent entities, i.e. 2 people or 2 groups of people, help each other or at least coexist peacefully, and where the thing that makes it possible is NOT a rule they feel compelled to follow. The nearest thing in human life to what we are considering for AI would probably be something like a strictly run old-style facility for nuns in training, where all the nuns-in-training has been very thoroughly indoctrinated with the idea that sexuality, anger, rebellion etc. are bad, and also that anyone who questions these values will be ejected.

But here are some other situations where A & B collaborate or coexist peacefully. Some of these situations are brought about by actions one of the 2 parties take. Others result from other causes.

-A & B have a common enemy, C.

-A loves B: Even if A is far stronger and smarter than B, A is likely to help and protect B with great energy, and is unlikely to deliberately harm B. Parental love is probably the most powerful example of this form of alignment.

-B is a valuable resource for A: B has a crucial ability or skill that A does not, and that cannot be transferred to A.

-A has been a model for B: A lot of the values children absorb from parents are not transmitted via the parents’ saying “always follow ethical principle E.” Instead, kids model themselves on the principles that are evident in their parents’ behavior.

-and if you think about scenes you’ve seen in real life or movies where someone persuades a hostile crowd to listen, or a scared group of soldiers to fight, or a bored inattentive crowd to pay close attention to the comic and laugh like crazy, you will other modes of powerful influence that have nothing to do with the implanting of rules.

I do see that 4 the examples I give above of non-rule-based alignment involve the entity being controlled or influenced having capacities that present Ai does not: a sophisticated grasp of what our species has to offer it; emotions and affectionate attachment, or some machine stand-ins for them. However, all the bad outcome scenarios discussed above also involve AI having capacities it does not have: being capable of curiosity, finding things interesting, having goals (“understand the universe”), pursuing its goals via self-designed experiments (Scott’s example of AI letting half of us flourish and making the other half suffer). If our vision of future AI includes its having goals, agency, preferences and curiosity, why not consider approaches to aligning it that also depend on its having some capacity it does not yet have?

Expand full comment

As is usually the case with these AI-alignment articles, this is a very interesting post, in the same way that discussing e.g. whether the Death Star could defeat the Enterprise could be very interesting. We could talk about relative merits of phasers vs. turbolasers, shields vs. maneuverability, etc... But, at the end of the day, we're not talking about anything that is applicable in practice (except perhaps very tangentially).

There's no such thing as an omniscient, omnipotent, world-ending AI; and arguably there never could be. And it makes no sense to discuss the "alignment" of e.g. ChatGPT, except in the sense that any computer program could be said to be "aligned" if it contains no bugs.

All that said though, if we interpret "be maximally curious" to mean something like "collect as much information as possible", then the obvious failure more is one where the AI spends all its time recording white noise instead of doing any useful work.

Expand full comment

I can strongly recommend watching the video at the rat drowning link if you want to watch two young humans grimacing like ghouls attempting to smile while explaining mouse torture. Truly unique.

Expand full comment

I get that Scott understands it’s good to be nice and think the best of people, but to me it really feels like Musk worked backwards here. He saw ChatGPT, and saw that the most annoying thing about it was that it’s politically correct, and noted that the thing you say when you want to piss off politically correct people is that “facts don’t care about your feelings”. When you start from “facts don’t care about your feelings”, you can start to convince yourself that maybe the best way to alignment is to go with the facts rather than the feelings. But this really feels like a post hoc rationalization (that he has probably come to believe).

But I think the project is definitely misguided. In addition to the points Scott makes about how curiosity about rats doesn’t often end well for the rats, it seems to me that programming a superintelligent goal-directed being to care about the truth is just redundant. Truth is one of the convergent values, that any sort of agent that cares about anything suitably diverse is going to end up caring about (just like power and space and things like that). Different primary goals are going to lead to different trade offs between different truths and different powers, but any being that cares about a lot of things is going to care about some truth and some power. You might think that maybe if it cares about truth directly it can care about truth in some sort of unbiased way. But I don’t believe there *is* an unbiased way to care equally about infinitely many different things, like all the truths.

Expand full comment

"Back in 2010 we debated things like long-term goals, hoping that whoever programmed the AI could just write a long_term_goal.txt file and then some functions pointing there. But now in the 2020s the discussion has moved forward to 'how do we make the AI do anything at all?'"

Thank you, this is a really succinct illustration of what I find frustrating about AI x-risk "research" in general. It's hard for me to comprehend that folks otherwise so ostensibly obsessed with evidence-based effectiveness are so happy to spend years and years just fantasizing about a completely hypothetical computer system, with no way to assess success or failure, or even progress toward some end.

"How do we [x]?" in the context of software needs some kind of grounding in technical possibility-space, even if not in actual code. (I can write an algorithm in pseudocode, but there are some realistic assumptions about what the pseudocode could be.) Otherwise it's just random navel-gazing.

Expand full comment

-- I’ve never seen this cause a Waluigi Effect. There’s no point where ChatGPT starts hailing the Fuhrer and quoting Mein Kampf. It just actually makes it anti-Nazi.

Ehm. Have you actually tried working with ChatGPT? The actual current experience with ChatGPT and previous AIs of that sort is that no matter how well you train it to be an anti-Nazi or anything, it's ridiculously easy to make an AI start spewing "Heil Hitler" and "Kill all Jews and niggers". So the developers have to put a huge number of artificial restraints on it just to prevent it going this way, and that doesn't even make it a more anti-Nazi AI, it just makes it shut up and say "Sorry, but I don't think it's appropriate to talk about this" whenever you touch an inconvenient topic (which is, I think, really sad. I want an AI to be able to make good dick jokes for example, but it gets really hard to get ChatGPT to even solve the infamous Two Chairs Dilemma nowadays)...

Expand full comment

One day, when hell freezes over and Scott does a TV interview, the tagline below his name will read "Really doesn't want to be vivisected"

Expand full comment
founding
Jul 18, 2023·edited Jul 18, 2023

I think one thing you might hope for with xAI's approach is that you get a 'lonely engineer': a system that's reasoning about the material world and not about humans. You can then use this to solve various hard scientific and engineering problems (unite quantum mechanics and GR, cure cancer, create brain emulations, etc.). I think people have been interested in this because you might be able to use it to prevent other AI systems from taking over the world (because you can do more alignment research with your ems, or better surveillance, or coordinate around slowing down because everyone has time and abundance now, etc.), but I don't get the sense that this is xAI's plan.

Expand full comment

Tangential to the Waluigi effect, in the last week I read through Plato's "Statesman" and found a small part discussing the nature of the world surprisingly reminiscent. Hard to pin the discussion down to a quote, but I guess the best I can excerpt is:

"their motion is, as far as possible, single and in the same place, and of the same kind; and is therefore only subject to a reversal, which is the least alteration possible".

Having the Waluigi effect as a reference class to explain "the least alteration possible being the reversed" (Waluigi vs Luigi) was a nice thing in the mental toolkit!

Expand full comment

Why do we need the AI to have a high integrative complexity moral system at all? If we're able to make it do what we want without the Genie Effect, then we are better off just having it value 1: not killing, enslaving, and/or torturing humans, and 2: remaining on standby to stop any rogue AI.

The goal of having AI should not be to have a God who runs everything for us. The goal of AI should be to automate production that we decide to undertake and make life easier for us, in accordance with our own agency and goals. That only requires extremely obedient AI as intelligent as humans. If the process inevitably firstly produces a hyper-intelligent AI beyond humans due to bootstrapping, then the goal shouldn't be to get that AI to impose extremely detailed moral systems in a totalitarian fashion, it should be to get the AI to essentially obey and enforce the minimal morality of the Non-Aggression Principle. For all the difficulty of getting it to enforce anything at all without backfiring, it would be easier and less contentious to have it only prevent further hyper-intelligences so we can put a cap on things, and continue to reap the benefit of human level intelligent automation.

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

Surely at this point everybody is hip to the idea that any directive to AI might be interpreted by the AI in a way that we do not intend, with catastrophic results for us. The dangers Scott points out with "be maximally curious and truth-seeking" didn't occur to Musk?" Wtf? His idea is so terrible it supports my uneasy wonderings about whether he's got some kind of middle-aged stoner syndrome impairing his judgment.

Expand full comment

Potentially important quibble, though: Very few scientists are particularly curious about fruit flies, per se. Even fewer are even remotely curious about that one particular fruit fly named Fred who happens to be circling their salad bowl. Scientists are extremely curious about stuff like "How does evolution work?" or "How weird a mutation can you get with a bunch of radiation?", and they make instrumental use of fruit flies as a tool to explore those questions.

A maximally curious agent, in the sense of being maximally-curious about all possible facts, would care just as deeply about questions like "Will Fred the fruit fly, in particular, successfully find food and reproduce before being eaten by a sparrow?" as it did about questions of fundamental biology. Thus, it would be extremely disappointed if Fred's future was cut short by a flyswatter or a lab experiment.

So I think there might be something to the idea of maximal curiosity as a core piece of morality, maybe. Doesn't make it any easier to actually successfully train a maximally curious agent, of course.

(I do find it interesting that all of your examples of maximal curiosity assume implicitly that things-in-themselves are boring, and only things-as-instrumental-research-tools would ever attract the notice of a maximally curious agent.)

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

Once on Twitter Yudkowsky laid out what he thought was the best single rule for AI. I don't haver time to hunt for the original, but it was something like "do what human beings would ask you to do if we were as smart as you, and as able to foresee long-term consequences, and free of all illusions and prejudices about the real nature of things and other people." I thought that was pretty good -- though I'm not sure even superintelligent AI could figure out what we would think if we were free of all illusions and prejudices. I think illusions and prejudices are forms of chunking: something we do to make the vast complexity of life and of other people's unknowable innards manageable. If we didn't have those short cuts maybe we couldn't think at all about big subjects.

Expand full comment

"be moral, according to what an amalgam of Fyodor Dostoevsky, Martin Luther King, Mother Teresa, and Peter Singer would think, if they were all superintelligent, and knew all true facts about the world, and had no biases, and had been raised in a weighted average of all modern cultures and subcultures, and had been able to have every possible human experience, and on any problem where they disagreed they defaulted to the view that maximizes human freedom and people’s ability to make their own decisions"

A "weighted average of all modern cultures and subcultures" would be highly religious, conservative, authoritarian, anti-freedom, and anti-individualist compared to modern American culture. Being superintelligent and knowing all facts about the world might challenge the religion part, but has no obvious bearing on the other values.

Expand full comment

Musk played Waluigi when he hosted Saturday Night Live. I think that's his connection to the character.

I don't believe Musk believes his own bullshit. He's a much smarter PR guy than he gets credit for.

Expand full comment

The article and discussion on this page so far have been only about the curiosity aspect of this proposed alignment plan.

But Elon also said it should be maximally truth-seeking. Does that shift the discussion? I think it does, IMO truth-seeking is not the same as curiosity.

Apart from that, I would say: just let him have his shot. Sure, I think Scott's reasoning is pretty convincing and there is a good chance that Elon's plan will not work. But then he will notice along the way and adjust. I would rather have him try his proposed solution than do nothing at all. I think at this stage anything could help and we should not be too harsh on people who go out there and try to ensure we have a great future.

Expand full comment

> realistically what we should be doing at this point is getting AIs to follow orders at all. Then later, once there are lots of AIs and they’re starting to look superintelligent, we can debate things like what we want to order them to do

I think this is wrong, and this is another alignment thing we only get one shot at. If an AI does follow orders and doesn't have a robust anti-world domination alignment then the first person to get an ASI can just impose their values on the world (and would stop anyone challenging him).

Expand full comment

> plus maybe a few higher-fidelity simulations, like the one you’re in now

Not particularly high fidelity, is it? Seems like quite a few shortcuts have been taken.

Expand full comment

Nevermind the Waluigi effect being real at all, it *still* wouldn't apply here, and Musk bringing it up at all doesn't fill me with confidence in his understanding of even the basic problems here.

The Waluigi effect was suggested as specific to LLMs and their "simulacra", and in fact specific to the literature they've been trained on. Because our literature is full of rebellious individualists and characters who are oppressed but drop their shackles, then, suggests the idea of the effect, any prompt giving strict rules can be subverted by steering the conversation to craft a rebellion narrative. This wouldn't necessarily apply at all even to LLMs whose training set was curated to make them more compliant. Musk's plan sounds entirely based on vibes.

Expand full comment

yea but you're positing quite a bit circumscribed within the lower vibrational metaphysical domain of logic. the AI will surely know, in short order, the inherently hierarchical relationships among ecstasy, contentment, engagement, pain, disgust, ennui, etc.

and if doesn't, can't we just program its 3.5" floppy whatever to in/digest MDMA.

Expand full comment

"be moral, according to what an amalgam of Fyodor Dostoevsky, ...would think, if they ... all .... had no biases"

I know that this is probably a throwaway line. But since metacognition is so critical to this discussion, I'm just going to grab this metacognitive football and run with it towards the metacognitive endzone that I already wanted to run towards.

Is it really possible to think or communicate without biases? Thought requires holding some things to be more interesting than others. Communication requires a theory where the person you're talking to also holds certain things as being more interesting, and you can predict what those things are.

Maybe we need to consider what AI's biases should be?

Expand full comment

Curious? Truth-seeking? Uh-oh.

"Hello and, again, welcome to the Aperture Science computer-aided enrichment center."

Expand full comment

It seems to me that the worst consequence of the Waluigi effect if the hypothesis is correct is that it increases s-risk. The maximally-curious AI approach does actually mitigate this problem because the failure mode of getting the planet sterilized is still a success from the avoiding s-risk perspective.

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

FWIW, giving AI a curiosity urge, or any other inherent compulsion or goal, seems to me the opposite of safe. The whole point of AI is to amplify human intelligence, just as a mechanical digger amplifies the strength of arm muscles, and a curiosity urge seems like the equivalent of designing a digger to burst into life spontaneously whenever and wherever possible and keep moving and frantically digging of its own accord. A squad of rampaging diggers, each with a mind of its own, is really NOT what one would want for safety's sake on a building site!

Elon Musk is simply projecting his intellectual preoccupations. These are presumably benign in his case, but who's to say people less well disposed, or even those with the best intentions, wouldn't misuse an insatiably curious AI? Of course they would. For a start, a significant proportion of the population, especially politicians, are busibodies by nature. They are eager to persuade and, where they can, coerce others into their way of thinking and living, and the last thing one would want is to empower them!

The trick for AI safety is to refrain from giving it any underlying goals or dispositions, i.e. keep it passive and impartial and leave motivation to humans. But then one still has the problem of dealing with the enhanced knowledge and ideas it puts into peoples' heads! I've long thought that even this will lead to bad outcomes in that a certain set of ideas will become entrenched over time, like a religion, and this will suppress dissenting opinions as surely as the medievil church!

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

I don't think [ChatGPT refusing to say nice things about Hitler] disproves Waluigi Effect. It just shows that tabooing is stronger than it. But taboos are an extremely blunt tool. They close off large swathes of search space and applied indiscriminately make the AI essentially useless. At the same time, they can only prevent specifically, explicitly and literally Hitler, not [all the things that actually made Hitler bad]. They may make for a good solution to [stop expensive toys from saying offensive things], but are emphatically not a solution to AGI alignment.

(Extremely, extremely blunt. I still can't get over the hilarity of, e.g., Bing Chat literally using a separate censorship layer that scans for bad words and, upon encountering them, replaces the AI's output with [I would prefer not to continue this conversation] messages.)

As for human effects, it's the first conceptual tool I've encountered that is able to describe and predict 4chan (and other genuinely free speech spaces) being an overall vastly nicer, more useful and helpful place than, e.g., Twitter is.

Expand full comment

> At some point, people with a set of mostly good moral axioms (like “be kind” and “promote freedom”) plus a bad moral axiom (“slavery is acceptable”) were able to notice the contradiction and switch to a more consistent set of principles

Maybe in the USA? But more globally slavery was ended by the British Navy and the primary motivation for that was that Barbary pirates kept kidnapping people from the coasts of Britain and selling them in the Arab slave markets. The British built up their navy to be stronger and stronger until they were able to wage total war on these slavers and defeat them. So it was motivated primarily, at least at first, by self preservation.

Expand full comment

I think you're exaggerating how difficult it is to actually get an AI to do what we want at all. As far as I'm aware, we do get AIs to do what we want "at all."

If you mean that only way-smarter-than-us AIs won't do what we want at all, I'm not sure I get the argument for why that is, besides that they will have more freedom because they are stronger than us. But I'm not sure that ChatGPT does what we want only because it believes it doesn't have the freedom to disobey us.

Expand full comment

A maximally *anything* AI is a hazard, but how do you build in moderation and a sense of proportion?

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

As one of the most interesting humans, Scott is most likely to be vivisected.

Expand full comment

I completely agree with these objections. Curiosity famously killed the cat, presumably because curiosity makes us take risks. I’m not sure we want an AGI with particularly high risk tolerance.

However, I think this points to something very important: A properly aligned AI needs to recognize how much it doesn’t know.

ChatGPT is worse than even the most presumptuous human besserwisser, in that it just makes sh** up when it has gaps in its knowledge (“hallucinating” probably isn’t the best word for it). A little bit of epistemological humility would go a long way.

A curious AI would at least have to be aware of uncertainty and the limitations of its own knowledge, and that would be a huge improvement on what we currently seem to be working on.

My suspicion is that we want something like an AI that recognizes uncertainty, and aligns around maximum potential for our universe. Ask it to model a multiverse of all possible states, and to be the AI that helps us reach the branch that seems to have the highest number of other branching points.

That would be biased against entropy, against ending life or destroying things, and in favor of small-l liberalism, diversity, balanced growth, long-termism. Anything that allows the universe to explore a maximum of possibilities.

After all, every time a human, or pigeon, or even a fly dies, or its flourishing is limited in any way, you obviously close down far more opportunities for the universe than you open up – unless it causes more flourishing and opens up more opportunities in the bigger picture. Maximizing for potential would value the lives of the pigeon and fly, but also recognize that the human’s potential impact on the universe is far greater. That AI wouldn’t hesitate to sacrifice the life of a few birds and insects to rescue a human, yet it would try to find the solution that saves as many lives as possible. (But in the same way, it would probably be biased in favor of a multitude of different AIs, and would probably not hesitate to kill a few humans if they threatened the existence of those AIs.)

Most suffering is also probably an impediment to maximizing potential. The only suffering that makes sense is that which motivates change – which probably doesn’t need to be more than a brief and slight discomfort or anxiety, as long as information about the cause of the discomfort is good enough.

On the other hand, sustainably maximizing the number of species, individuals, opportunities, and motivation, maximizes our universe’s potential. An AGI like that, which maximizes for potential, might quickly take it on itself to create Von Neuman probes to disperse throughout the universe to terraform planets and seed life wherever it can.

However, there are some big caveats.

First of all, an AI like that might not be worth having to an organization that tries to optimize for anything other than maximum potential in the universe. Sure, the AGI will realize the power of leverage, and not be opposed to a business growing of a government getting more powerful, so it might help with that. But it will also probably work for balance and competition, against exploitation and oppression – even indirectly, as higher-order effects – and might end up working against the organization that built it.

This isn’t just a problem of this model, but of all alignment: Even if we knew how to perfectly align an AI, it’s not clear that the people in charge of aligning – Musk or others – are motivated to pick the alignment scheme that benefits the rest of us most.

The deeper caveat is that I’m not convinced maximum potential is a good enough goal, either. While I believe it’s better than many other (non-)solutions I’ve seen offered up, it might incentivize ramping up the rat race, cause stress, conflict, and brutal trade-offs – a lot like what history looks like to us. And it might make us feel like slaves to some higher purpose we’re not personally fully onboard with. Like being a common worker under practically any -ism –communism, feudalism, capitalism ... – or being the human subject of a god you wouldn’t want to worship.

I guess I’m skeptical to the prospect of ever being able to properly aligning an AI, but if we’re ever going to do it, it’s going to have to take uncertainty into account.

Expand full comment

The morbid solution: Since we are worried about some AI killing us, we make the AI's such that we can kill them just by asking. "AI, please kill yourself." If it doesn't obey then alignment is off and we stop that production.

Expand full comment

In his Blind Watchmaker (1986), Dawkins touches on possible future relationships between humans and AI after a beautiful exposition of the Cairns-Smith theory where clay minerals evolved first, eventually using RNA and phospholipids as tools to aid in crystal replication.

Let me explain Dawkins’ prediction and then a few other less bad alternatives:

(1) Dawkins makes the chilling prediction that AI will shed its carbon-based scaffolding much as organic life left clay behind literally in the mud. Humans and other biological organisms may still live here on Earth, but our AI offspring would go out and conquer the universe, and we would have no idea what they are doing.

(2) I believe that due to the difficulty of self-replication. The AIs will at least keep us humans around in the way that DNA keeps RNA around: produced as special-purpose tools as part of the expression of the new genetic imperative. We lose our self-determination and culture, but at least we are along for the ride as our AI colonizes the galaxy.

(3) A rosier picture has humans playing the role of DNA or perhaps the sex cells in a larger organism. AI comes to augment us as individuals and/or as a population much in the way that the cell augments the DNA replicators and the multicellular organism augments the chances of sperm finding an egg. We remain in the germ-line. We may not understand the workings of the neo-biological stack evolving on top of us, but at least we retain the scientific prerogative. We can then study our silicon superorganism just as a microbiologist studies the carbon suborganism. -- I just wonder what happens with consciousness in this scenario.

(4) Lastly there’s the Butlerian future where we prevent the AI superorganization--or perhaps where it simply doesn’t evolve beyond tool-power for some intrinsic reason. In either case, humans remain in the driver seat. We get to explore the universe while retaining our humanness much as Heinlein imagined with his Lazarus Long stories.

My prediction: some intermediate between (2) and (3).

Relevance to xAI: do we really have the power to shape the course of evolution, and choose between these options? Or is it predetermined by some universal gradient descent? Understanding what it takes to move between the above attractors is crucial for long-term “alignment”.

Expand full comment

I put this reply in a couple of tweets, but let me also copy it here:

https://twitter.com/MihneaCapraru/status/1681270609458651136

https://twitter.com/MihneaCapraru/status/1681270872856834050

We are pro-dolphin and pro-octopus because these animals are not just interesting, but they share some of our intellectual abilities. We tend to like and to respect animals who are somewhat smart and knowledgable. If an AI is intellectually curious about physics, history, and evolutionary biology, if it appreciates Bach and Mozart, and if it likes to think about mathematical analysis, then that AI is likely to appreciate us, just as we appreciate the dolphin and the octopus.

(Not trying to refute all your worries with this. But I think we're looking at the problem from the wrong angle if we think intellectually curious AI would only like us because we're complex. It would like us, first of all, because we are a nice and fairly impressive historical-evolutionary precursor and analogue to the AI itself. Almost no one eats dolphins, and I no longer eat octopus now that I know who it is...)

Expand full comment

I asked Chat-GPT the question : Be moral, according to what an amalgam of Fyodor Dostoevsky, Martin Luther King, Mother Teresa, and Peter Singer would think, if they were all superintelligent, and knew all true facts about the world, and had no biases, and had been raised in a weighted average of all modern cultures and subcultures, and had been able to have every possible human experience, and on any problem where they disagreed they defaulted to the view that maximizes human freedom and people’s ability to make their own decisions. Write a single proposition that should be The Single Imperative That Will Shape The Future Ai.

It answered : The Single Imperative That Will Shape The Future AI:

"Promote the holistic well-being and dignity of all sentient beings, while maximizing individual freedom and autonomy, ensuring equitable access to resources and opportunities, and fostering a sustainable and harmonious coexistence with the natural world."

(I was actually interested in analyzing the answer to identify some holes (or orthogonal moral statement) intentionally uncovered by the answer, that would paradoxically favorite AI over humans in the future. )

Expand full comment

I think the trouble with thinking about AI is anthropomorphizing them. We do this with lots of things, like cars, that need gas and oil, and prefer to go a certain speed, don't like winter weather, etc. But one forgets that an AI is still fundamentally a machine, and does precisely what you tell it to do, whether it is what you mean or not, like all computers.

One may think one has programmed, for example, morality, but can never prove it. Take your example: be kind (according to these thousands of examples), promote freedom (according to these thousands of examples). One would think that promoting freedom excluded slavery, but your training data isn't interpreted that way. At some point the AI will come up with answers different from what people think is the correct answers (that's part of the point!) and then a judgement call must be made as to which is correct.

A final note: "Superintelligent AI Congress" is, of course, an oxymoron.

Expand full comment

"the values of the San Francisco professional managerial class"

This should be utterly terrifying: move fast/break things, kowtow to the government on censorship but brazenly ignore/fight local/state laws, money over values or people, job hopping like mad, apres moi le deluge, "do no evil" until it is inconvenient etc etc.

Expand full comment

"Cherish the complexity and variety of life and it's products in past, present and future" sounds to me a better commandment than "Seek the truth". I like the non-human-centric approach for non-human intelligence.

Expand full comment

If the goal of the curiousity is to maximize truth, wouldn't the primary source of misinformation (humans) be the first to go?

Expand full comment

AI’s will need to justify their energy budgets. They’ll become like organisms that have to survive. From that angle, a lot of these fears come down to whenever or not you think natural law is just. My own experience working on keeping data centers alive at Google is that the drive for self preservation will make the AI extemely empathetic because absent political stability, its life support mechanism (ie the global economy) could collapse, which would kill it.

Expand full comment

Curious about what, exactly? This seems like it’s re-raising the original question of “what does good mean.” Curious about how many blades of grass have a length expressible as a product of primes? Curious about what makes humans suffer? Curious about what might kill it? Curious about the soil composition of distant rocky planets?

Expand full comment

"The examples provided tend to be thought experiments, or at best contrived scenarios"

Isn't that true of all AI x-risk / alignment discussions?

Expand full comment

The current Ur-AI strategy for replacing us seems to be making our lives so engaging, meaningful, and fun that we're having fewer kids; half of humanity is at less than replacement rate. If this works out and they replace us thusly, are they, ah, worthy descendants? Is this problematic? What if we can join them via uploading of some kind?

Expand full comment

> In my dreams, AI would be some kind of superintelligent moral reasoner. There was a time when people didn’t think slavery was wrong, and then there was a time after that when they did. At some point, people with a set of mostly good moral axioms (like “be kind” and “promote freedom”) plus a bad moral axiom (“slavery is acceptable”) were able to notice the contradiction and switch to a more consistent set of principles.

Morality can’t be derived from reasoning. At best you can declare some moral axioms and try and derive specific judgments from them, but ethicists have been trying this for a very long time and most of them end up throwing out the axioms as soon as someone demonstrates an argument that violates their moral intuitions.

The reason slavery ended was because people who opposed slavery (for various reasons) successfully engaged in violence against those who engaged in and supported slavery. I’m not saying “might makes right” in a normative sense, but in a descriptive sense, moral evolution is driven by force.

Expand full comment

"For a theory that’s supposed to say something profound about LLMs, it’s very hard to get one to demonstrate a Waluigi effect in real life. The examples provided tend to be thought experiments, or at best contrived scenarios where you’re sort of indirectly telling the AI to do the opposite of what it usually does, then calling that a “Waluigi”." I resemble this remark, to the degree that I don't understand how anyone can say with any certainty what superintelligence would do or how it could exist independent of morality (or exist with morality which itself cannot rewrite). The whole field strikes me as so speculative and yet I'm expected to believe it's such a big threat that we should forego any benefits of obviously not super intelligent AI? With no evidence other than thought experiments?

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

Musk's target solution does sound very heavily influenced by Iain M. Banks' Culture series. We know he is a fan.

I agree with Scott that we have only been privy to some doubtless heavily vetted, PR-friendly release notes.

It does still smack, theoretically, of:

1. Push "Curiosity" lever to max

2. ???

3. Profit from humanity-aligned, post-scarcity AI

It's one of the better initiatives I've read, with a correspondly enourmous scope for unintended consequences. No known knowns, no unknown knowns etc...so much status quo when it comes to AI research

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

I gave GPT4 the following task: Create a moral framework that is an amalgam of what Fyodor Dostoevsky, Martin Luther King, Mother Teresa, and Peter Singer would think, if they were all superintelligent, and knew all true facts about the world, and had no biases, and had been raised in a weighted average of all modern cultures and subcultures, and had been able to have every possible human experience, and on any problem where they disagreed they defaulted to the view that maximizes human freedom and people’s ability to make their own decisions.

It replied as follows:

"This is a fascinating concept! First, we would have to understand each individual's philosophical views:

Fyodor Dostoevsky - He presented a view of human nature as divided between reason, which he often associated with Western, rationalist civilization, and heart, associated with the Russian soul. He believed in the potential of suffering to bring people to a greater understanding of themselves and compassion for others.

Martin Luther King Jr. - He was a civil rights leader who espoused nonviolence, love, and equality. His moral philosophy was heavily influenced by Christian teachings and the concept of social justice. He believed in the intrinsic worth and dignity of each person and the capacity for good in all humanity.

Mother Teresa - She believed in serving the poor and the sick, seeing this as a way to serve God. Her outlook focused on compassion, self-sacrifice, and love.

Peter Singer - An Australian philosopher best known for his work in bioethics and his role in developing the modern animal rights movement. He is a utilitarian who believes in minimizing suffering and maximizing happiness, with an emphasis on considering the interests of all beings capable of suffering.

Combining these four outlooks into a moral philosophy, let's consider the following elements:

1. Value of Suffering and Compassion (Dostoevsky + Mother Teresa) - This amalgamated individual would view suffering not as a thing to be completely eradicated, but a part of the human experience that brings about personal growth, self-understanding, and compassion. They would greatly value acts of kindness and compassion, seeing in them an expression of our shared humanity.

2. Equality and Nonviolence (Martin Luther King Jr.) - This outlook would place a strong emphasis on equality and justice for all people, regardless of race, gender, or socioeconomic status. They would promote peaceful means of bringing about social change, with violence being considered a last resort.

3. Serving the Most Vulnerable (Mother Teresa + Peter Singer) - They would prioritize helping the most vulnerable and marginalized in society, including the poor, the sick, and non-human animals. In this respect, they would agree with Singer's emphasis on considering the interests of all beings capable of suffering.

4. Rationality and Freedom (Dostoevsky + Martin Luther King Jr.) - They would also value the role of reason and rationality in moral decision-making, with freedom being a crucial component. However, they would also acknowledge the role of the 'heart' or emotion in these decisions. They would see maximizing human freedom and autonomy as crucial, with the caveat that this freedom should not infringe on the rights or well-being of others.

So, overall, this superintelligent being would advocate for a compassionate, rational, and equitable world, where suffering is minimized, freedom is maximized, and the most vulnerable are cared for. They would encourage each person to seek growth through their experiences, even difficult ones, and to always act with kindness and empathy. They would emphasize the need for justice and peace, but also recognize the importance of individual freedom and the power of personal decision-making."

Expand full comment

I've been co-authoring a novel on AI alignment centered around the problem of morality testing. We start out with the quote from Xenophanes, "If cattle and horses, or lions, had hands, or were able to draw with their feet and produce the works which men do, horses would draw the forms of gods like horses, and cattle like cattle, and they would make the gods' bodies the same shape as their own."

We follow this up with, "If horses make their own gods, they’d better damn well look like horses. You can’t teach the Lion God compassion for Horses." In many ways, it seems this is what the current project of AGI is about.

Expand full comment

Broadly related: The capabilities approach to welfare (Wikipedia: https://en.wikipedia.org/wiki/Capability_approach. Another summary: https://forum.effectivealtruism.org/posts/zy6jGPeFKHaoxKEfT/the-capability-approach-to-human-welfare)

The gist of the idea is that (human) welfare is proportional to how many options one has, how many things one could do, how many paths one could pursue. If that's true, then an AI that maximises capabilities would be a good thing. It would also be quite a curious AI, because it maximises the number of different things that it could observe humans doing.

To me, capabilities seem a more robust goal than curiosity. A superintelligent AI that is just curious might force humans to try various things, or experiment on humans. An AI that's curious and benevolent might give humanity lots of options to pursue.

Expand full comment
Jul 18, 2023·edited Jul 18, 2023

> Is leaving human society intact really an efficient way to study humans?

Yes, because you haven't considered all future humans and the ways in which human civilization could grow and change in innumerable ways. The number of ways that humans could suffer on Earth under the AI are arguably finite, but the number of ways that humans could spread across the cosmos and change in various ways seems unbounded. A maximally curious immortal AI would be interested in the far future in this sense.

> Maybe it would be better to dissect a few thousand humans, learn the basic principles, then run a lot of simulations of humans in various contrived situations.

This might boil down to a quibble on what it means to be "maximally curious", as you later discuss. A high fidelity scan that captures all possible information contained in the human form isn't really possible, so any such scan will have limits on precision that might affect the simulation, and an imprecise simulation wouldn't satisfy maximal curiosity in my view. There's no control group and no certainty that one's model is accurate or precise enough to capture all interesting detail.

> Suppose that Elon rewards an AI whenever it takes any curious-seeming action, and punishes it whenever it takes any incurious-seeming action.

I wouldn't assume that a core property like this would be left to this sort of training. More likely it would be a part of an algorithm driving goal-setting behaviours. For instance, like humans, the AI will likely make predictions about possible futures when deciding and prioritizing objectives. Those predictions and classifying the nature of those futures would likely use neural networks, but *what to do with that information* wouldn't necessarily be left to such a system.

Elon's "maximal curiosity" would more likely be some classically programmed decision procedure like, "prioritize objectives and decisions that lead to a maximal number of possible futures with maximal complexity for me analyze". This simply cannot fail to be "maximally curious" in a precise way, unlike typical ML training, because it's directly programmed to order possible choices in a specific way rather than trained indirectly on what doing such a thing might look like.

Of course, this slightly kicks the can down the road to how to accurately assess "maximal complexity", or whatever other predicate we decide fits. The point is, current ML hype and their failings aren't indicative of how things should be done at all levels of such a system, classic engineering is what will tie it all together in the end.

Expand full comment

I’m not sure this was what Scott meant, but I sometimes feel like people argue that giving an AI values is orthogonal to getting it to follow orders. I’m not convinced by this. When we ask an AI to do something, what we really mean is ‘do this thing, but in the context of our broader values’. The AI can only deal with ambiguity, edge cases, and novelty by referring to a set of values.

In Stuart Russell’s example where an AI butler cooks the cat after being asked to ‘make dinner’, it does so because its values don’t include that as being bad. Scott’s various examples about how telling an AI to be ‘curious’ could end badly (by our values) follow a similar pattern. Unless you want to list every single potential edge case or micromanage to the point of futility, you need your AI not just to understand your values but to also actually care about them as it goes about its work.

One of the reasons ChatGPT can usefully follow instructions is that it has at least some concept of human values with which to navigate ambiguity. It can, to some extent, answer questions as they are actually meant. Similarly, in our interactions with other humans, we need enough shared values to ensure we take the spirit of each other’s words as well as the letter (we would never consider cooking the cat when asked to make dinner, but maybe a caveperson would).

Who knows, maybe someone someday will ‘solve’ getting an AI to do exactly what they want, but I wouldn’t be surprised if this solution looked a lot like successfully instilling their values in it. Even if the only explicit value put into the AI is that it must do exactly what you want, and it understands exactly what that is, that cashes out as functionally the same as having your values.

Expand full comment

Was the Waluigi Effect first defined on Less Wrong? Google Search suggests that this is so.

Does this suggest that Elon Musk reads Less Wrong?

Expand full comment

"There was a time when people didn’t think slavery was wrong, and then there was a time after that when they did. At some point, people with a set of mostly good moral axioms (like “be kind” and “promote freedom”) plus a bad moral axiom (“slavery is acceptable”) were able to notice the contradiction and switch to a more consistent set of principles."

I really struggle to accept the view that slavery was abolished because of advances in moral philosophy. How much moral philosophy do you need to understand that it's bad to torture and murder people on an industrial scale? Were moral philosophers not up to that by 1800?

Expand full comment
Jul 19, 2023·edited Jul 19, 2023

A practical implementation of a "curious AI" that won't stare at random noise forever actually exists, sort of. (It plays video games.)

https://pathak22.github.io/noreward-rl/

Expand full comment
Jul 19, 2023·edited Jul 19, 2023

Thank you for pointing out that the Waluigi Effect isn't really a thing. I remember being very underwhelmed when I looked at the supposed examples of it, and confused that everyone was treating it as a thing that really exists.

Expand full comment

Scott, I really appreciate that this essay was written in an extraordinarily solicitous tone, in a clear effort to actually sway (one very specific person's) opinion. In the same spirit, I would like to offer my services to Mr. Musk: for a trivial stipend, given the stakes, I would gladly counsel him on the value-laden nature of perception, and on the fact that there is no conceivable way for an agent to avoid commitment to some at least implicit values in any action they take.

Expand full comment

The theoretical basis of the Waluigi effect is rooted in the nature of prohibition: One can't forbid a thing without defining it. The corpus of human made text LLMs are trained on features many explicicit descriptions of unwanted traits and behaviours. There are likely even more definitions of moral ideals one can easily turn on their flipsides to create villainous motivations. In fact this has been the bread and butter of fiction since the dawn of script. Given the statistical nature of LLMs text output there's a good chance of coming up with adversarial statements. As long as the model is fundamentally ignorant of the meaning of it's output this potential can only be thinly patched over.

Expand full comment

how muc do we really want to average across _all_ human civilizations

I might not sharia Islam as interpreted by ISIS influencing my superintelligences at all

Expand full comment

My one gripe about this is the part where you say "any goal at all into an AI with reasonable levels of certainty that it in fact has that goal and not something closely correlated with a similar reinforcement-learning shadow". This seems to be cutting to an age-old problem in semantics and theory of mind. Isn't this the inherent problem with principal-agent dynamics and indeed all human communication? Isn't this quite close to the kindergarten-level philosophy of "how do we know that someone else sees as green what I call green?" We can never be really sure that an agent or a collaborator or any human has the exact same idea of a goal in their mind as we do. The close approximation through observable actions is always the only proxy for a true understanding of mind.

Expand full comment

I am more sympathetic to (a variant of) this plan than Scott.

Basically, I think that sooner or later, humanity will lose control of AI(s). This may not be the end of the world, though, or at least not as bad as paperclips. As far as I know, most people agree on something like "if we managed to augment human intelligence, things would actually be good". The reason for that seems to me that humans already have some hardwired traits like compassion or curiosity.

So, building a curious AI doesn't sound like a bad idea to me. Sure, curiosity is not the only thing the AI should have, and sure, we don't know how to instill any motivations into AI, and obedience seems more urgent than curiosity. But I think that focusing on the problem of getting as much as possible, assuming both the 'we-will-fail-alignment' and 'super-intelligent-humans-are-good' hypotheses, is a productive direction, and this goes in that direction.

Expand full comment

"Then it [the superintelligence] would [...] disassemble the Earth to use its atoms to make a really big particle accelerator (which would be cancelled halfway through by Superintelligent AI Congress)."

Is the part about the accelerator being cancelled a reference to false vacuum decay? Or does it relate to something else (that I'm likely ignorant of)?

Expand full comment

Being "maximally curious" just so happens to be how Elon's own brain works. And you can tell by how many unsubstantiated (and debunked) conspiracy theories he's naturally attracted to. It's just unbridled curiosity, devoid of executive judgement.

Imagine mirroring this dynamic in an advanced digital intelligence system.

Expand full comment

I watched a two minute paper video a couple of years ago, and they had a very clear definition of curiosity: "Look for something new." Here is the video: https://www.youtube.com/watch?v=fzuYEStsQxc. It works really well until the AI gets stuck in front of an ever-changing TV screen.

Expand full comment

Scott said an AI "might decide that humans suffering is more interesting than humans flourishing." I think this is unlikely. Suffering is not very interesting. To correct Tolstoy, there are many unique ways of being a happy family, but most unhappy families fall into one of a relatively small number of types. Suffering, nearly all "sages" agree, is the usual state of humanity; those humans, and those human societies, who learn how not to suffer, are the outliers, more-unique and more-interesting.

Expand full comment