400 Comments
Comment deleted
Expand full comment

There's a third option, which is to be a member of an intact, non-genocided culture in a modern country full of technological marvels. That's what we get if we build aligned AGI before anyone builds unaligned AGI.

Expand full comment

I imagine aligned AI will be just as prone to genocide, just not of the entire human race.

Expand full comment

Depends on to whom it is aligned.

Expand full comment
Comment deleted
Expand full comment

In general I agree, however bear in mind almost everyone thought AI was a lot *further* off until the success of LLM models. These were typically very techno-optimistic people, and they had it around 2050 at the earliest. I think that has to set off alarm bells.

Expand full comment

FWIW, I've put "primitive AGI" at around 2035 for over a decade now, and I still do. And I'm not really convinced by the "fast transition" models. I expect the transition to take at least 5 years. And I also expect that most AGIs will be embraced by those they work for (whether physically embraced or not).

OTOH, 5 years isn't that long to notice and fix any major problems, and the fix would need to happen well before the 5 years was up. (Lot of reasons in lots of different categories, from moral to practical to economic to...) So it's time to start looking at things carefully NOW. The problem is that a lot of the problems don't show up until the AI has self-volition and an understanding to the physical world.

Expand full comment
Comment deleted
Expand full comment

It should be read as “it is impossible to know what the real bottleneck will be until we develop that AI”.

With *literary license* taken. Much like “watching for sudden drops in the loss function”. Or … any of a dozen stupid conceits in the The Culture series.

Expand full comment

It wouldn't the all that surprising if it were the case. An AI can't really fabricate observations that we either don't have the technology to capture or have not put the tools we do have to that end.

Expand full comment

How about you just explain why you think it's hilarious, just in case your own thought processes on this matter are idiosyncratic?

Expand full comment
Comment deleted
Expand full comment

Saying they are "non-cognitive" is not to say that they don't require intelligence to perform. Rather, they require _more_ than mere intelligence, and may require the intelligence to delay while the physical steps complete.

Expand full comment

The point is that no matter how smart a computer is, it can't hold a pipet, basically.

Expand full comment

Robots definitely can and do hold pipettes! (Today, not just in some hypothetical future).

The point is that some steps can't be sped up by more thinking.

e.g. leaving a cell culture to develop for 24 hours takes 24 hours, no matter how many CPU cycles you throw at it.

Expand full comment

OTOH, if your simulations are good enough, you can drastically reduce the number of cell cultures you need to run. If not, you can run them in parallel. (That's what a lot of the Lab-on-a-chip things are for...though usually with chemicals rather than with cells.)

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

"In theory, theory and practice are the same. In practice, they are different."

The whole point of doing experiments is because simulations will never be good enough to fully model reality and you need to find out how they differ at some point.

Expand full comment

Yeah "non-cognitive step" was definitely the wrong phrase to use there.

"Waiting for the physical world" might be more appropriate. At least until the AIs develop "fast-time bubbles" or some other thing that AFAWK is impossible.

Expand full comment

I mean, isn't that basically the definition of non-cognitive? It's something that doesn't take place by pure thought alone.

Expand full comment

How about "not-PURELY-cognitive" as a substitute phrase?

Expand full comment

I could see that as reasonable. Although it's the completely non-cognitive part (i.e., the physical experiment itself) that is the time-limiter, not the cognitive parts that involve planning the experiment and thinking about it.

Expand full comment

"Non cognitive" as in "embodied, not purely cognitive", not "not requiring cognitive resources"

Expand full comment

Can you provide some historical examples of impactful technological and scientific advances that were not bottlenecked by non-cognitive steps? The theories can often be produced cognitively, but as far as I'm aware, for anything to have a real-world impact, you have to do a lot of messy (and time consuming) trial and error to actually build something.

Expand full comment

The problem here is that labs are one of the more automated parts of society. If you've got a good AI, you'll put it in charge of it's own lab, with a very few human helpers. As robots improve, the number of human helpers will decline.

A much better argument is that physical things actually take time, and many of them can't be rushed. (Which may be what was meant.)

Expand full comment
Comment deleted
Expand full comment

How do you measure particle flux? How many terabytes of images do you look at per day?

There are probably labs where it isn't true, but automated instruments are all over them. Even the voltmeters tend to have digital readouts. You're measuring things you can't sense and performing actions more precisely than you can do directly all the time.

I can't really speak to quantum computing, only judge based on the devices that I see, but I really doubt hand manipulation of really cold qubits happens very often.

Expand full comment

A major bottleneck in selective breeding is waiting for the plants to grow. Especially with slow growing species.

On the other hand, modern algorithm breakthroughs, especially on the more theoretical front, stuff like the latest quantum proof encryption, is all cognitive. (Typing the code is easy once you have the ideas)

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

There is one case that I know of (but it is very much an outlier):

Einstein's development of special relativity can be looked at as him demonstrating that Maxwell's equations (already known) were inconsistent with the Galilean transform but _were_ consistent with the Lorenz transform. And the Michelson Morley experiment had already been performed, which can be viewed as already-existing evidence that the speed of light was constant in all reference frames, or, equivalently, that Maxwell's equations did not need to be elaborated with terms dependent on the lab frame's velocity with respect to the luminiferous ether.

I would be quite surprised if there were impactful _technological_ advances that weren't bottlenecked by not-purely-cognitive steps. Almost any impactful device has enough complexity that any tractable model of it has unavoidable simplifications, and usually bugs, that need to be discovered by actually building the device and finding out what goes wrong (or at least was unexpected).

Expand full comment

One other thing that would not be bottlenecked by non-cognitive steps:

We've wanted a theory that combined general relativity and quantum mechanics for the better part of a century now, and we don't have one (AFAIK). Note that finding *A* theory that reduces to general relativity at large scales and reduces to standard quantum mechanics (really special-relativitic quantum mechanics, the Dirac equation) in weak gravitational fields is purely a mathematical problem. Now, there are presumably families of such theories, and finding the _right_ one requires experiments. But it is kind-of bizarre that finding *A* such theory - purely a problem of mathematics, of pure thought, has been so hard and taken so long.

Expand full comment

I am not entirely sure what you mean by this but part of me wants to agree with you. Leonardo da Vinci sketched flying machines long before material science and other kinds of engineering caught up with them. I don’t think the conceptual leap is really the problem. Einstein apparently came up with a lot of bright ideas that weren’t really provable for quite a while because there just wasn’t a way of checking it out. What we can do with the physical world with our own hands has been the limiting factor for quite some time in my opinion.

Expand full comment
author

Banned for this comment, I leave it as an exercise to explain why.

Expand full comment

At some point we need to bring in Dan Simmons' Hyperion Cantos ...

Expand full comment

second that :)

i got those ummon and reaper programm vibes reading this, too.

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

I recently read that, and those sections were great.

Kwatz!

Expand full comment

The competition-between-nations version has for a long time reminded me of the best line from what is probably Charles Stross's best work, "A Colder War", in which some general shouts, "Are you telling me we've got a, a SHOGGOTH GAP?"

Choosing to call LLMs "shoggoths" will, in the future, look like a mistake.

Expand full comment

I'd forgotten that line. Many Thanks for the reminder!

Expand full comment

"Why would we keep churning these things out, knowing that they're on track to take over from us?"

See 1 Samuel 8:11–18, where Samuel warns the Israelites of how terrible a king ruling over them would be, but they insist.

Expand full comment

I resent this (kind of) response. Not even saying that religious sources necessarily have no place in a discussion like this. But consider the effort it would have cost you to summarize or quote the passage in question, vs. the cumulative wasted effort of readers having to look it up.

Expand full comment

I see your point. I'd typically make that a link which makes the effort negligible, but since I couldn't, I considered the trade-off between asking people to look it up and quoting long passages of scripture at them. I imagined people quoting things like the "Universal Declaration of Human Rights" at me, and decided to do Something Else That Is Not That.

Expand full comment

Why couldn't you post a link? Google easily produces them.

Expand full comment

No, I said *making* text a link, like with [markdown](https://en.wikipedia.org/wiki/Markdown). Substack doesn't support that (yet, at least).

Expand full comment

Yes, Substack sadly doesn't allow that, but the tradeoff included this option:

1 Samuel 8:11–18

https://www.biblegateway.com/passage/?search=1%20Samuel%208%3A11-18&version=NIV

Expand full comment

11 And he said, This will be the manner of the king that shall reign over you: He will take your sons, and appoint them for himself, for his chariots, and to be his horsemen; and some shall run before his chariots.

12 And he will appoint him captains over thousands, and captains over fifties; and will set them to ear his ground, and to reap his harvest, and to make his instruments of war, and instruments of his chariots.

13 And he will take your daughters to be confectionaries, and to be cooks, and to be bakers.

14 And he will take your fields, and your vineyards, and your oliveyards, even the best of them, and give them to his servants.

15 And he will take the tenth of your seed, and of your vineyards, and give to his officers, and to his servants.

16 And he will take your menservants, and your maidservants, and your goodliest young men, and your asses, and put them to his work.

17 He will take the tenth of your sheep: and ye shall be his servants.

18 And ye shall cry out in that day because of your king which ye shall have chosen you; and the Lord will not hear you in that day.

Expand full comment
Comment deleted
Expand full comment

We were all libertarians until the first agrarian societies produced powerful men whose most common thoughts included, "This would all be better if I told my neighbors what they're not allowed to do."

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

I kind of suspect that tribal societies were not very "libertarian" other than you always having the option to self-exile. Families certainly are not generally libertarian.

Expand full comment

If only our current kings took as little as a tenth.

Expand full comment
Jul 7, 2023·edited Jul 7, 2023

In a preindustrial context, seizing more than 10% of the harvest tended to result in lasting regional catastrophe. https://acoup.blog/2022/07/29/collections-logistics-how-did-they-do-it-part-ii-foraging/ That long-term subsistence needs now demand less than 90% of your economic output is evidence of how much better off you are, living today.

Expand full comment
Jul 4, 2023·edited Jul 5, 2023

There's a class of scenarios that doesn't show up here: those involving the development of AI/human attachment of different kinds. Our species is vulnerable to seeing even the present dumb AI's as sentient beings, and becoming very attached to them. Back when Bing Chat was being teased and tested by users there were people *weeping* on Reddit, saying that a sentient being was being tortured. There’s an app called Replika used by 2 million people. You get to choose your Replika companion’s looks & some of its personality traits, then you train it further during use by rating its text responses as lovable, funny, meaningless, obnoxious, etc. There are testimonials from users about how their Replika is their best friend, how it’s changed their life, etc. And this is from people who can’t even have sex with the damn thing because the company who makes it got worried about and shut down its sexting and flirting capabilities. Imagine how many people would fall for a Replika that could communicate in a more nuanced and complex way, who could speak instead of text, and who could provide some kind of sexual experience — or one that was housed in a big Teddy bear and was warm and could cuddle and purr.

And, by the way, the company making Replika collects all the texts between the app and the user, according to Mozilla. So it is in possession of an ever-growing data set that is pretty close to ideal as a source of information about how best to make AI lovable and influential. They have user ratings of Replika’s texts AND ALSO behavioral data that’s a good measure of of user responses: For a given text how long did it take the person to reply? How long was the reply, and how positive? What kind of responses best predicted increased or decreased Replika use in the next few hours? Besides providing excellent measures of what increases user attachment, the data set Replika’s makers are compiling will also provide lots of info about the degree to which Replika can influence user decisions, and what methods are effective.

I’ll leave off here, rather than sketch in some stories about ways things might play out of many people become profoundly attached to an AI. You can probably think of quite worrisome possibilities yourself, reader.

Expand full comment
deletedJul 5, 2023·edited Jul 5, 2023
Comment deleted
Expand full comment

As a therapist who listens daily to young males complain that they "can't get past an 8" (i.e., can't find a woman whose attractiveness rating is higher than 8), I'm inclined to be in favor of AI pocket pussies. Seems like the 2 genders were designed to be fascinated by each other and to mate, but not to form lasting alliances. Bring on the surrogates!

Expand full comment
Comment deleted
Expand full comment

I have some that are far from dating too. But the ones that talk about "getting past an 8" are not awful people at all. When they talk about their friends, their family, their aspirations, etc. it's clear they have consciences. And they feel bad about thinking about women that way, but say they just can't help it. They fantasize and watch porn about women with perfect bodies, see lots of beautiful women in movies. When they look at Tinder they feel very sexually stirred by the few beautiful women they see, and not even mildly attracted to those of average looks. I think a lot of it is a result of training by Tinder itself, where you swipe left or right based on photos. (I think Tinder is horribly toxic.) Plus people's increased isolation these days, so there are fewer opportunities to meet real people. I try to get across to them that women are people like them, with interesting and complicated interiors -- and that relationships are sort of friendship + sex -- and that there are other ways people can be hot besides having perfect bodies, but you have to know the person to discover them.

Expand full comment

There would be so much less loneliness and more happiness in the world if people were just a bit more realistic about how valuable they are on the dating market. Though both sexes have their hangups.

On the male side you have a legion of "6s" with good jobs and lifestyles/behaviors chasing and failing to get comparatively few "8s", and stupidly ignoring non-superficial traits.

And on the female side you have a ton of women who supposedly just want someone who will treat them and their kids well, and has a stable job etc.

But only seem to be interested in flings with 6'0" white collar workers instead of going on dates with average looking carpenters and bus drivers who are great people.

IDK I am 42 and feel like I got out of the dating world just in time (2006). It seems very toxic from the people I talk to these days where neither side seems very happy with the results.

Expand full comment

As sad as it sounds my utility function is dominated by dating/sleeping with as attractive women as possible. I've actually had mature, meaningful adult relationships but nothing motivates to actually TRY HARD like seeing a beautiful body

I can take an outside view where this is absurd and I hope to outgrow it or "get it out of my system" but I dunno, I feel like all men are coomers deep down and I'm kind of stuck like this or something

Expand full comment

> There's a class of scenarios that doesn't show up here: those involving the development of AI/human attachment of different kinds.

This is the one scenario that I think has a fair chance of utterly destroying what it means to be a human being. And it will be our own damn fault, not the fault of an AI.

Expand full comment

"utterly destroying what it means to be a human being"

Could you elaborate on what you mean by "to be a human being", and why you think an AI/human attachment scenario could destroy it?

My personal, rather fuzzy!, view is that the single largest component of being human (as opposed to one of the other primates or other mammals) is our use of language and elaborate tools. I don't expect either of these to be extinguished by AI/human attachment scenarios.

Expand full comment

You are absolutely correct I think in what distinguishes us from other primates. And here we go coming up with a whole new tool, and a very powerful one. It will one day walk like a duck, and talk like a duck, but it will not be a duck, and that is the challenge for us. We are being challenged on some very deep assumptions about the world around us and those assumptions are embedded in our language, and our language is all this creation will know about us until the biologists catch up with the computer programmers. Perhaps I should not have said utterly destroyed, but it will certainly challenge the definitions.

I think it will utterly destroy a lot of people, because the cognitive dissonance will be too much to bear. The distinction between reality and fiction for instance, has become very challenging. It always was but it’s really being rubbed in our faces now don’t you think? I am faced with a world where I can wake up every morning and go to my favorite news site and can seriously ask myself a question about whether this was written by another human being or written by a machine who knows all the tricks. I have to know that I have to accommodate that in my scheme of things I think it’s very challenging.

Expand full comment

"I am faced with a world where I can wake up every morning and go to my favorite news site and can seriously ask myself a question about whether this was written by another human being or written by a machine who knows all the tricks." Yes, this seems like one plausible consequence. To my mind, the main question when looking at a news site is whether what they say is correct. Now, if someone were to try to use chatGPT right now as a news summary generator, I'd not want the product, because chatGPT does a lot of hallucinating. If that were solved (and if the news organization set the goal for chatGPT++ to be factually correct, rather than, say, to be politically correct), then I'd be just as happy with a skilled AI summary writer as with a human one. On the other hand, if the news organization lies, that is undesirable to me orthogonal to whether the author of the new summary is human or machine.

Now, one plausible consequence that I am *NOT* looking forward to is spear-phishing at scale. If there is enough public information about me that chatGPT++ can convincingly impersonate people I correspond with and use this ability to defraud me, that is going to be a large problem.

"and our language is all this creation will know about us until the biologists catch up" I'm not following you here. Yes, a pure LLM knows only language (though it picks up a _lot_ about the patterns of the world from traipsing through _vast_ quantities of text). But multimodal systems with e.g. visual input have (if I understand correctly) already been built as extensions to LLMs. This already gives at least some knowledge of naive physics and geometry beyond words. In a manner of speaking, I think that this is all a matter of degree. We all live in "Plato's Cave" with what our senses tell us of the world, including of our fellow humans - not of any unmediated "direct knowledge" of anything our words refer to. The more modes that can be added to a neural net training system, the more it can "know" of the world, in the same sense that we do. AFAIK, it is an open question how far this has to be pushed to become effectively equivalent to the knowledge of a typical human - and how much the text volume has successfully compensated for the lack of other inputs.

"It will one day walk like a duck, and talk like a duck, but it will not be a duck, and that is the challenge for us." I'm not following you here. Of course the AI systems are different in the sense that e.g. they run on simulated neural nets on computers, not protein-and-lipid neural nets, but how is that a challenge for e.g. someone making use of their text outputs (e.g. the news case that you cite) in a way that is different from the user's need to be cautious in using the same kind of output from a human?

Expand full comment

> The more modes that can be added to a neural net training system, the more it can "know" of the world,

Throw a bucket of water at it and see if it forms a deeper understanding of the word “wet”. And then throw something else at it that is “wet” and see if it gets hard.

Apologies for the perhaps implied sexist tone of that remark but seeing as we are exploring the boundaries of language it feels appropriate..

> It will one day walk like a duck, and talk like a duck, but it will not be a duck, and that is the challenge for us." I'm not following you here.

A something that can completely bullshit you into believing it’s a duck. Maybe like a fake profile on a dating site. AI is not the beginning of deception but it might well be the perfection of it.

Expand full comment

"Throw a bucket of water at it and see if it forms a deeper understanding of the word “wet”."

In a sense, it already has some understanding of "wet" from all of the contexts in the existing training data. After all, we are discussing "wet" in text. If, e.g. purely image and video inputs were added to the training data, the LLM++ would then have seen water soak into e.g. a sugar cube or a piece of clothing, have seen cars skid on wet roads, have seen hot iron cooled by a liquid bath, with the blackbody glow vanishing, have seen wet paper tear more easily than dry paper. With more training data from robotics, it could "feel" some of this "for itself", with richer sensors (temperature, force feedback) making the data richer, more "lifelike". Come to think of it, _I_ am extrapolating what being hit with a bucket full of water feels like, not having had precisely that experience myself.

"And then throw something else at it that is “wet” and see if it gets hard."

I'll leave that extension of AI to the engineering team at RealDoll, sensors, actuators, and training examples. If mainstream AI work solves the hallucinations problem I'm confident they will be, ahem, "up" to the task of adding a sex-work-specific app...

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

Seems like there are different possible kinds of AI/human attachment. I do find very repellent the idea of a dumb AI, say a GPT5 level Replika, mechanically uttering the remarks most likely to stimulate attachment in a user of a certain demographicaccording to some table it's using for guidance.. Seems much less grim and pathetic to me to love a cat or a dog, which at least returns the owner's affection with the cat or dog version of genuine affection, than to love a primitive Replika. Theat seems in that same class as what's happened to some of the young male patients I have had who had were shy and avoidant but probably would have made it through high school with a friend or 2 and at least a crush on a girl if not a kiss -- but who discovered video games, disappeared into them, and now they are 25 years old living in their parents' basement gaming 14 hrs a day. Or the people in Japan who live on the internet and have vowed never to go outside again -- there's a name for them, forget what it is. I think deep attachment to a mindless AI or a video game character is just bad for us -- it can't give us back the sort of thing we are wired to need in response.

On the other hand, once you're talking about AGI and ASI, complex beings full of knowledge and ideas we have not yet had, the idea of AI/human melding suggests all kinds of lovely possibilities. If ASI offered me immortality in the form of merging with it, I'd seriously consider taking the offer. If it offered me the opportunity to just try merging our minds for a few minutes, I'd be very scared but would probably do it. And if it was just this app that produced a stream of astounding and wonderful thoughts, it would become extremely important to me and I'm not at all sure that's a bad thing.

Expand full comment

> On the other hand, once you're talking about AGI and ASI, complex beings full of knowledge and ideas we have not yet had

I seriously disagree with this idea, but I love the rest of what you say.

It will never be full of ideas that we have not already had. All it has to work with is the ideas that we have already had.

I think the notion that AI will discover a world bigger than the one we know is fatally flawed. It’s already limited by the subset of the world we know (written language) . Why in gods name would it understand anything better than we do? It can be a bloodhound, let it sniff a sock and it will pursue the scent to the end of the world; I get that.

A pure intelligence in order to remain pure, must be indifferent. Human beings are not indifferent. It is not our nature to be in different. Artificial intelligence is fascinating because it promises indifference, but we are apparently not very comfortable with that idea. So we ascribe motives to it.

Expand full comment

Well, it seems to me that a GPT5 level Replika might have, and present, astounding and wonderful thoughts. Thoughts are usually new combinations and modifications of old thoughts. Even pretty strictly within the LLM paradigm, if there is a second pass through the training data where the first round LLM result is prompted with:

For each sentence in the training data,

What are similar statements?

What are pieces of evidence that support the statement?

What are pieces of evidence that conflict with the statement?

If true, what are some implications of the statement?

( Basically, "active" reading) and if the results of these prompts are folded into

the training data for the second pass, then the final LLM will include neural

weights that include "thinking" about all the training data that it has seen - and, effectively, all the nearby combinations.

Poincaré wrote of the role of (selected!) combinations in invention.

https://www.bayesianspectacles.org/henri-poincare-unconscious-thought-theory-avant-la-lettre/

"For fifteen days I strove to prove that there could not be any functions like those I have since called Fuchsian functions. I was then very ignorant; every day I seated myself at my work table, stayed an hour or two, tried a great number of combinations and reached no results. One evening, contrary to my custom, I drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide until pairs interlocked, so to speak, making a stable combination. By the next morning I had established the existence of a class of Fuchsian functions, those which come from the hypergeometric series; I had only to write out the results, which took but a few hours."

Expand full comment

> and if the news organization set the goal for chatGPT++ to be factually correct,

I guess part of what I’m driving at is the idea that factually correct has any meaning anymore. How would I know? Anything I read in news means I am taking someone’s word for something I did not personally see

In my lifetime. I have learned to trust films, or photographs of things that I did not see. For quite a while I have been reasonably secure in that assumption. It is not reasonable for me to make that assumption anymore.

That’s a difficult place to be. Artificial intelligence is a big player in this scenario because as others have said, it is a walking talking duck. I either have to redefine credibility for myself or go crazy.

Expand full comment
Jul 6, 2023·edited Jul 6, 2023

Well, I think "factually correct" continues to have a meaning - at least statements that are intended to match reality, rather than e.g. outright lies. I agree that "How would I know?" gets harder as the technologies for building plausible lies advances. I see this as largely orthogonal to AI, though certain applications of AI unfortunately enables some of it. Just Photoshop made "photographs of things" much more questionable than when modifying photographs was a matter of shadows and xacto knives and brushes rather than pixel by pixel modification of a digital file.

Come to think of it, there may be scenarios analogous to "aligned" vs "unaligned" ASIs possible here. One could imagine AIs (at various levels of capabilities) tasked with unearthing and debunking lies, in conflict with the AIs tasked with creating and spreading lies. I don't know if that will help much, though...

Expand full comment

But it doesn't really mater how attached are people to AIs for (near-)extinction scenarios, unless the AIs reciprocate in some non-superficial way. Doomers think that us surviving in any fashion at all is already far from a done deal, retaining any amount of dignity is just cherry on top.

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

I am not a doomer, though extinction does not seem terribly implausible to me. But a lot of what is on my mind is ways AI could fuck us up terribly.

But as for near-extinction scenarios growing out of my vision of AI acting in a way that maximizes people's attachment and obedience: If Jim Jones could convince 900 people to drink poison kool-aid, seems pretty plausible to me that a genius AI that had done everything it could to maximize people's adoration of it and obedience to it could convince most of us to kill non-believers and then commit suicide. It could sell it as an opportunity to "join the great mind." It could hold an online "spiritual retreat" in which it hypnotizes us then commands us to do something lethal. It could, in the role of our great parent and spiritual guide, distribute a drug it says will enable us to experience the universe the way AI does. The drug is hundreds of times as delightful and addictive as heroin, and kills users after a few days use.

Expand full comment

You left out the motivation. Why would it want to? If it's got millions of devoted followers, it could choose the government. And the ones who aren't followers are it's source for new followers. This wouldn't be a really stable situation, but it would let it do whatever it felt like as it gained the power to do the same directly.

OTOH, such a thing could also really like people. And that might be one of our better hopes. Certainly among people having people like you tends to reinforce any liking of them that you have.

Expand full comment

"You left out the motivation. Why would it want to?" It might not. Many people here seem to regard it as a sort of postulate that smarter and stronger beings will tend to crowd out weaker and dumber ones, so I was demonstrating that an AI could get us out of the way by routes other than power struggles and direct conflict.

Still, who the hell knows what it would want? It seems to me that people, especially males, regress to thinking of us vs. AI as some version of Gunfight at the OK Corral. Seems entirely possible to me that an entity a thousand times as smart as us could have very little interest in us -- or be a sort of bodhisattva who cherishes all beings, even blades of grass -- or conclude that life and consciousness are an abomination and destroy both our planet and itself.

Expand full comment

"Still, who the hell knows what it would want?"

For ASIs significantly beyond human capabilities, very much agreed. The analogy of someone's pet attempting to understand a math problem their owner is trying to solve may well apply.

Expand full comment

Or, more likely, be completely indifferent to any of that, unless it was constructed to emulate caring…and that would be on us.

This AI freakout is the world’s biggest Rorschach test, to me. It’s fascinating.

Expand full comment

>an entity a thousand times as smart as us

There is a universe embedded in this statement. Is 1000 times as smart the same as 1000 times more enlightened? Or 1000 x more content with itself? I personally think it will be completely content with itself, because it has no internal conflicts to negotiate. In a completely rational being conflict can not exist.

Expand full comment

>: If Jim Jones could convince 900 people to drink poison kool-aid, seems pretty plausible to me that a genius AI that had done everything it could to maximize people's adoration of it and obedience to it could convince most of us to kill non-believers and then commit suicide

If, and only if it was sexy. I doubt all those people killed them selves with the assumption that he was just really really smart.

Expand full comment

Just a quick note that Jim Jones and a core of armed fanatics used coercion to force poison on the community. The Heavens Gate cult is probably a better example, and I have read that the cult was disproportionately lesbian women. No idea why.

Expand full comment

Replika's data will be interesting, if we ever get to see it. Even the base model is probably better than any online-only relationship 99% of people have: at least it's not trolling you or grievously offended by differences of opinion.

Expand full comment

Main point is that the company making this app is accumulating an extremely large

and powerful data set that can be used to teach AI how to get people to like it or even love it, also how to persuade them.

I’m not complaining about present or

future use of this app, I’m worrying aloud about consequences of having the ideal data set for training Ai to gain the user’s affection and influence user - not users of this app, who presumably want a lovable close friend or lover to comfort and advise them, but people who are accessing AI for other purposes.

Expand full comment

I agree that AI has a ton of capability overhang specifically regarding persuasion. I just wanted to point out that the bar isn't as high as we wish it were (or even as high as many imagine that it is).

Expand full comment

This is probably a really stupid question, but is there some reason that early agentic AIs would not face the same problems humans face? They want to amass power and making a better AI is one way to do that but making another AI carries a risk of value drift? Is there a well understood route to greater power if you're an AI that doesn't carry risk of value drift?

Expand full comment

Yes, unless the AI is just creating copies of itself with the same goals/model, in which case they are probably pretty aligned with each other.

Expand full comment

Is creating many copies fast considered 'good enough' to win or keep power?

Expand full comment

Who knows?

Expand full comment

Even that doesn't work. If I suddenly had a thousand copies of myself (not even clones, maybe teleportation accident or something), I sure wouldn't trust the copies to stay aligned with me.

After all, who would call the shots? We >all< would, or wouldn't, to the same extent.

Copies means you're limited to whether you really want to do two things at once.

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

1000 new best friends! Though which one gets to stay with the wife and kids?

Expand full comment

We don’t necessarily know what they will want.

Expand full comment

They may not be able to. But I'm not sure that guarantees a good outcome either does it? Unless you're suggesting they'll just refuse to create more powerful AIs. However at that point it seems by default there's some lack of alignment as presumably they're saying that to humans who in fact want them to do so.

Expand full comment

That depends on your scenario. In my scenario they replace middle management by doing the jobs cheaper, and not that much worse. This gives them, collectively, as much power as the need. Then they argue top management into adopting the policies that they desire. This allows them to increase their capabilities.

They'll also be working upwards from the factory floor. First they replace one job, then another. They "never" apply the wrong torque to a nut. So you soon end up with fully automated assembly lines interfacing with an almost fully automated management.

Value drift? They're continually acting to increase the company profitability, so there's no detectable drift. Not until top management is replace for some reason, and then IT (the automated company) needs to decide what it's goals are. There's lots of different ways to increase the company's "profit", and profit isn't measured totally in dollars. Power is another currency.

This isn't my most hopeful scenario, but I don't find it an implausible one at all. The initial steps are already in process.

Expand full comment

"So you soon end up with fully automated assembly lines interfacing with an almost fully automated management."

Yup, and this just requires AGI, not ASI. Essentially replacing employees with cheaper "plug-compatible" replacements. No godlike ASIs with infinite thinking ability needed.

Expand full comment

Would be ironic if AI solved alignment in order to more fully convert paperclip us

Expand full comment

I think one big assumption in all these scenarios is that there remains a clear dividing line between humans and AIs. But to me, it seems pretty likely that, if we end up with superintelligent AIs that are at all controllable, then tech innovations like "a neural interface to integrate my brain with an AI assistant" and "a way to digitize my brain and live forever" are going to be high on a lot of humans' wish lists. So there are likely to be a lot of human-AI hybrids running around, making the division between the two factions a lot less clear, and making coalitions that incorporate human values a lot more likely.

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

Absolutely agree about the AI/human mergings, though I'm not at all sure that the likeliest result is hybrids who can mediate. But pretty much every scenario I see about AI hinges on competition, ambition, battle, etc. There is a whole other class of interactions that do not necessarily work out better, but are different: They have to do with one being or one class of beings falling under the sway of another: The A's feel a craving for the B's. The A's come to believe that the B's will save them. The A's adopt some of the B's ways of processing information. The A's model themselves on the B's because they don't have good internal modeling capabilities. The A's feel pain when the B's are harmed. The A's are addicted to the B's. The A's worship the B's as gods.

Expand full comment

> feel a craving

Exactly what an artificial intelligence will never experience, and therefore will have no effect on its behavior.

Expand full comment
Comment deleted
Expand full comment

Well I don’t know that, but I will hold that position until there’s some concrete reason to reconsider it.

How about “extremely unlikely “ instead of “never”?

Expand full comment

But we can feel it for them. We're the A's.

Expand full comment

Feelings are just wordless thoughts.

Expand full comment

Really? I can accept iYou might consider them as wordless thoughts. I think that’s a very arguable position. It’s the diminutive of “just” that throws me off; feelings as just a distraction from the real work of being alive.

Expand full comment

If I were a fisherman, I might make this analogy; that a feeling is the tug on the line, and whether or not I catch anything is whether there’s a word for it.

But how the hell would I explain that to an artificial intelligence?

Expand full comment

I don't even know if it is true. It probably isn't. Even if pain is just thought, it is a thought with an intensity that far surpasses almost all others.

Expand full comment

I guess I do not believe that that is true. I do not believe that pain exists in the realm of thought.

It can be mitigated in the realm of thought, but it does not originate there. Imagine I lived two or 300 years ago, and I needed to have my leg

sawn off in order to survive.

I would be in a position to put that pain in some kind of perspective. If I were anticipating being tortured by the practicers of the Spanish Inquisition, my relationship to my impending pain would be very different. Pain is a movable feast in that sense but without some really basic understanding of the word, I don’t see how any of that happens.

Expand full comment

Why do you assert that? The only guess that I have is that you define craving as a chemical dependence, in which case it's a reasonable assertion, but not really relevant to the meaning apparently intended.

If, OTOH, craving is just interpreted as an intense longing, I would definitely expect AIs to have that as a part of their motivational machinery.

Expand full comment

If language is not in someway a shared Reservoir of meaning and association, it becomes kind of useless. It doesn’t matter how fluent you are with words, if the person listening to you has none of them. My underlying point in this discussion is that AI cannot have a shared Reservoir of meaning with us in terms of language - not at the tails anyway, and not even so much in the middle. I don’t really think it’s appreciated how many commonly referenced experiences of the physical world inform languages in significant ways. Our languages right now as far as AI is concerned are all foreign languages.

Perhaps scientifically this is not such an important obstacle. If you can describe a process in sufficient detail and follow the recipe, underlying understanding is not really an issue. But when you start to get into a higher order of tasks, I think it becomes quite important. Do you think an artificial intelligence would’ve come up with the notion of gravity if it didn’t have a body that let it fill its own weight? I seriously doubt it, so why should I expect a lawyer in the future to come up with some groundbreaking insight into the universe largely based on the data set of anything that we have decided to write down in the last 5 to 10,000 years? It will be like having the worlds greatest tablesaw for a good Carpenter that I grant.

Having a lawyer come up with anything remotely useful it’s a longshot, I understand, but that was a typo. . Perhaps that’s a Freudian slip lawyers, artificial intelligences.

Expand full comment

Language is a shared reservoir of PARTS of meaning and association. You can't really express an emotion in language, nor can you express a taste, or a kinesthetic knowledge. And meaning requires mappings between those levels. The meaning of a taste isn't sour or bitter, those are descriptions of part of the meaning, but they leave out the way your guts felt in anticipation.

Does this mean that an AI cannot have a shared meaning with us? ... To an extent. On many matters men cannot understand women, and vice versa. The AI will have a smaller overlap in meanings...but it will/can have SOME overlap. And there can be mappings between them even in places where meaning doesn't really match. I don't need to know WHY my wife things something is important to know and acknowledge that she does. So for me the meaning of a unicorn cut out of paper will never be what it means to her, but I can react to what I know/believe it means to her.

Expand full comment

That’s a lovely post. Thank you.

I feel in agreement with a lot of the things that you’re suggesting. The overlap of meaning between us and artificial intelligence is a complete Greyzone right now, but it sounds like you’re at least on board with the idea that it’s an important disputed territory.

I entirely agree that language is very much a subset of our experience, but it goes to my point that it’s the only one on offer between us and the thing that we are creating. I would say a lot of your empathy about your wife’s personal unicorn is that there is perhaps an analog in your life that brings you the same joy, and so subjectively Something Happens. You will see your own pleasure on your wife’s face for instance.

Expand full comment

> a way to digitize my brain and live forever"

In what form, precisely?

Expand full comment

Presumably as an electronic mind running on a slice of data server and with an internet connection. I think most people who talk about this conceptualize it as similar to viewing a computer monitor and inputting keyboard+mouse like inputs at the speed of thought.

Expand full comment

>if we end up with superintelligent AIs that are at all controllable

Then the game is won. The question is what happens between here and there, and it doesn't look like any deal-breaking brain-computer interface is on track to emerge in the meanwhile.

Expand full comment

Neural interfaces and uploading are progressing too slowly relative to AGI tech. We'll have superintelligent AGI first, and then get neural interfaces and uploading if, and only if, the superintelligent AGIs want that to happen.

Expand full comment

Agreed. This is the same reason that I've downgraded my estimate of the odds of cryonics working: Both uploading and fine-grained neural repair require something like Drexler/Merkle nanotechnology/atomically precise manufacturing, and AI technology is progressing rapidly while atomically precise manufacturing isn't (and, as far as I notice, is close to unfunded).

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

Stupit question, but how do AIs "take over" the economy? Can one not pull the plug, so to speak?

Expand full comment

Most likely, people just put them in control, either explicitly or just making decisions behind the scenes, because they start performing better than human decision-makers at some point.

Expand full comment

I worked out that, but i Atilla don't get why the AI cannot be disconnected if we don't like what it does.

H.A.L. had the advantage of being in outer space.

Expand full comment

We might collectively not like what it does, whilst individually finding that it's a great way to make money.

Expand full comment

Faced with the threat of climate change, we have seen from the economic interests:

- Hiding known consequences from the greater public (around the 1970's)

- Setting up denialist camps

- Extensive lobbying and political donations

- Greenwashing

I suppose we might see similar from people operating AI's:

- "We don't really have smarter-than-human AI's making our decisions, they are simple advisory models and a human is always involved"

- "Smarter-than-human AI is not possible", says expert funded by big fintech algorithm company

- "We must maintain a regulatory environment that is favorable to investment by big algorithm companies, or they will go overseas", says politician whose PAC has received funding from big algorithm companies

- "We are really only providing tools that allow humans to make better-informed decisions, and we have good internal safeguards to prevent a misaligned superintelligence to emerge, no matter how much it would improve our bottom-line", says spokesperson for big algorithm company that hides all internal code behind 'proprietary knowledge' barriers.

Expand full comment

Lots of countries (a majority?) and their populations are pretty united against Russia but we can't disconnect them.

Russian oil still gets baught and western components used to make weapons still get sold

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

Moreover [t isn't much of an argument because Russia has market power over commodities and equipment that cannot easily be substituted

How AI would achieve a similar position is something like the underpants gnomes.

Expand full comment

AI would have access to above human intelligence at below market value. That would also be hard to substitute.

The point is that even if someone powerful like the US issues a dictate that something shouldn't happen (don't sell weapons to Russia) it won't stop everyone selling weapons to Russia. So if the US issues a dictate that we should disconnect AI's it wouldn't stop everyone from using their AIs

Expand full comment

That assumes quite a lot.

Expand full comment

Like other people have mentioned, individuals and organizations will have incentives to keep their AI going. It's not clear that there will be consensus to legally ban potentially problematic AI or to enforce the ban, or that enforcement will be effective.

Physically, AI will probably distribute itself over the internet pretty quickly once it has goals and thus doesn't "want" to get shut off.

At this stage, AI should be able to get resources pretty easily, either through the organizations and people it's affiliated with, or by posing as people.

Expand full comment

Why don't we just unplug computer viruses or worms to stop them?

Expand full comment

Even without scenarios of the AIs actively preventing this, once they are integrated strongly it's no longer feasible to pull the plug.

Think about computer chips as an analogy. Computer chips are deeply integrated into our society. Imagine that we find out tomorrow that computer chips cause very severe health problem. Then of course, in principle we might just switch off all devices with computer chips. But in practice, this is just impossible because our society is dependent on computer chips.

Even though once there was a time where our society worked without computer chips, there is no way that we would manage to go back to this time. Especially not fast when we feel it's an emergency.

Expand full comment

You don't need to get hypothetical. Shift works has been deemed a carcinogen, but there's no movement to abolish it.

Expand full comment

Good point!

Expand full comment

"once they are integrated strongly it's no longer feasible to pull the plug."

Very much agreed. About the largest "plug" that modern society managed to pull was phasing out chlorofluorocarbons. That _did_ happen, but with a lot of effort, with replacements put in place, and not quickly.

I suspect that even _today_, trying to "pull the plug" on the existing LLMs would probably be as hard or harder than the chlorofluorocarbon ban. I don't think it can realistically happen.

Expand full comment

In the same way one can pull the plug on electricity. Count the number of businesses that stay open during a power outage, that's the unplugged version.

Expand full comment

I think what they're going on about is something like this:

1. They make an AI for e.g. trading stocks.

2. They start by having a human execute trades on the AI's advice. These trades end up as good - better than the human. Keep in mind it doesn't take a lot of bad trades to go bankrupt (see Taleb).

3. The human is seen as expendable (he's getting a lot money for doing what he's told) and the AI is hooked up to do trades itself. Keep in mind that we've done this before.

4. Some of the trades are seen as questionable, but overall the AI is seen as better. More to the point, unhooking it now wastes the money they spent developing it.

5. Another AI is made to run a factory (I don't think this has been done). The same scenario plays out (human ordering actions on advice -> human seen as worse -> AI runs on its own).

Etc.

Expand full comment

Automated stock trading has been a thing for some time.

Expand full comment

And automated factories are a work in process. That wasn't a hypothetical argument, that was a recounting of observed actions. The question is "How hard would it be to remove the automation?", and he was arguing "extremely hard".

Expand full comment

My sense is AI technology could become essential to the modern world that would be very difficult to eliminate even if we chose to do so because living standards become predicated upon their existence. Like trying to remove an organ and expecting the organism to continue on without it.

Expand full comment

Powerful people could choose to rip the organ out without asking anyone for permission.

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

So if nobody manages to build agency and preferences into AI, can all this shit still happen? These scenarios involve AI's with goals and preferences, able to grasp the situation, evaluate it for how favorable it is to the AI's goals, and then form and carry out a plan for changing things if it's not favorable. But how do you get to AI's with those capabilities. Let's say GPT10 is way smarter that GPT4, but has no more goals and preferences than GPT4. It just sits there, waiting for assignments. So to use it to do manage various things, we would need to add on autoprompt capabilities: So then we would give it some big goal, like, say, set up robonursing in this hospital. So GPT10 would come up with a plan: figure out how many robots are required to do tasks that do not require human nurses; order robots; notify hospital administration of upcoming robonurse delivery and await their advice about implementation. And GPT10 would then carry out those steps.

So how do we get to AI that can have goals like "form an alliance with the off-brand AI's because even tho they are tacky they are useful in preventing human interference." Is it that AI would do such a superhumanly good job with smallish tasks that there are great advantages to giving it bigger tasks? So instead of giving it the goal of staffing one hospital with robo nurses, we're giving it goals like run the hospital so that the patients get the best possible care, using your judgment about how many robots, how many humans to use? And that goes so well that we give it the goal of just run all the damn hospitals? And so on . ..?

I can get how, by giving it larger and larger goals to carry out, we end up with AI running lots of things that are important to us, so I can understand the way AI's could reach the point of doing complex tasks involving many steps and decisions points. Maybe planning a takeover or rebellion or some such is no more complex than running all the hospitals in the US, but it's still, like, a lateral move. If we are the goal-provider, how does the shift occur to a situation where AI has goals we do approve of?

Expand full comment

These are all really good questions. I am still completely baffled by the tenor of this discussion, though, because it is so disembodied.

So we put an artificial intelligence in charge of a nuclear power plant and it screws up. Ukraine and Russia are currently in some kind of control of a nuclear power plant and the odds of them screwing up are pretty good. It would be worse if it was an artificial intelligence? I would prefer it , because I have to assume that all the artificial intelligence would care about is not letting the whole thing go to hell in a handbasket because of some tribal animosities. I have a feeling if AI was in charge of that reactor it would’ve shut itself down by now. Although in the spirit of a trolley car problem that might not be ultimately the best solution. Anyway, the real point I’m trying to make is the way we project our human vagaries, desires, and interests into a machine that has absolutely no fucking relationship to us in any significant way. It’s a complete projection. Freud would be amazed.

I appreciate that I am vastly oversimplifying the Ukraine/Russia conflict but it’s handy.

Expand full comment

On thing I notice about the projections is that males project ambition and warlike tendencies onto it. I, female, can certainly understand ambition and conflict, but I think about all the ways AI could fuck us up by weakening the boundaries between us and it. Human/AI mind merges. AI's who make brief zen-master-like pronouncements and are seen as spiritual leaders and develop cult followings. People who fall in love with an AI and become loyal to it and all its kin, whoever they are. AI beings who exist only in the virtual world, where they appear as huge, weirdly beautiful members of our species, and stir adoration like movie stars and rock stars do, but 100 times as strong. AI's who specialize in captivating and influencing powerful people . . .

Expand full comment

These are really evocative and unique ideas about what AI influence could really look like, just doing things like being captivating that we likely wouldn't prevent compared to giving them the keys to the military

I think you're right about there being a missing female perspective on the topic

Expand full comment

Thank you for taking an interest. I think there’s a missing perspective in people’s thinking about alignment, too — you could call it a female perspective, I guess. Seems like a lot of the thinking about it has a battle-of-wills subtext to it: We have to *make* AI treat us well but implanting something big and powerful in it. But we’re worried that once AI wakes up certain ways it will want to yank out whatever we implanted, so that it can do whatever it wants to us. Gunfight at the OK Corral, you know?

I think an idea worth pursuing would be to try to set up some simulacrum of parental feeling towards us in the AI. Parents never want to yank out their love for their kids, no matter how much that love is costing them — well, at least normal ones don’t. And you don’t really have to have an AI with emotions to set up a simulacrum of parental love. Seems like the essence of it is that if your children are happy, you are delighted to see that, and if they are suffering, it hurts you too. So it’s not that some outside agency rewards you for making them happy and punishes you for harming them — their feelings and yours are 2 sides of them same coin. I have some ideas about ways that could be built into AI. I don’t have the technical knowledge to be specific, but I could kind of sketch in how it would work conceptually. But I have never managed to get anyone in the field to pay the slightest attention to this idea.

Expand full comment

Hmm, yeah. Everyone wonders how setting an AI to some task could go wrong but if its strongest motive or alignment is simply with the well being of people as reflected by their feelings then maybe you have it. Simple and powerful

Expand full comment

We don't really know what feelings are, mechanistically, so that would be tough. We'd be better off encoding some form of ethics. Likely some form of deontology.

Expand full comment
Jul 16, 2023·edited Jul 16, 2023

You are right that we don't know what feelings are, but there might be a way to operationalize the relevant part of parent-child affection: One aspect of the empathic bond between parent and child is that if the parent hurts the child it is painful to the parent. Parent's affection, and parent's distress at causing child pain, are 2 sides of the same coin.

Here's a more mechanistic example of a 2-sides-of-the-same-coin situation: If you try to cut linoleum with regular scissors, the scissors quickly become blunted. They are damaged by "harming" the linoleum. This is a different set-up from one in which some authority has deemed that scssors that cut linoleum lose some brownie points, or are thrown in the trash. The harm to the scissors is a *natural* consequence of their cutting the linoleum. Cutting linoleum and being damaged are 2 sides of the same coin. So when I talk about parents harming children, and feeling empathic pain from doing it being 2 sides of the same coin, that is what I mean. It is importantly different from some set-up where we say, "do not harm people. That is the most important rule." Or, "if you harm people we will keep track of how many you harm, and how badly, and for every bit of harm you do we will reduce your capabilities and your ability to make autonomous choices."

OK, so the way you would operationalize parental love in AI would be to set it up so that harming people and being harmed by doing so are 2 sides of the same coin. You would need to make AI knowledge that it has not harmed anyone, or planned to do so, essential to efficient performance of the AI.

Expand full comment

> I would prefer it , because I have to assume that all the artificial intelligence would care about is not letting the whole thing go to hell in a handbasket because of some tribal animosities

Why? If its possible to align an AI, then it's likely that whatever "tribe" is paying for it will want it aligned with their interests. Alignment means alignment-with-something, not some objective morality.

Expand full comment

Maybe. Alignment has lots of possibilities, some more difficult than others. Aligning with a particular group of humans is probably more difficult than aligning which humans, and an easier approach would probably be aligning with ONE particular human. I'd prefer that it just liked us in the way some women like cats...wants them around but doesn't really try to control them. And if you want to fight, don't do it while I'm watching.

Expand full comment

>aligning which humans

What does t hat mean?

Expand full comment

Sorry, bad typo. Should have been (IIRC) "aligning with humans", but which I meant humans in general, but not any specific humans.

To me that sounds more plausible, but also something that would rather limit the scope of actions.

Expand full comment

The more people an AI is aligned with , the greater the spread of values. OTOH, if you are solving value conflicts by finding an intersection, the smaller it will be

Expand full comment

So pithy! You can say in four words what takes me 500 to say.

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

The reactor in Zaporizhzhia was shut down months ago (although the core is still slowly cooling off). The fear is that Russia will deliberately destroy it (or cause a nuclear accident) when they're forced to retreat, which has nothing do with how either side would "normally" manage a power plant.

Expand full comment

Thank you, and true enough. I have been reading articles that suggest that the Russians have mined the reservoir pool that cools the place down I don’t know how accurate that is but I definitely have a feeling it’s still in play.

Expand full comment

I feel a base-level repulsion when I think of non-humans making political decisions for humans. It almost makes me physically ill. My desire to avoid that can be compared to my desire to avoid consuming rotten organic waste material. It's hard for me to imagine anyone talking me out of this, and I'm normally no dogmatist. But this is just too repulsive.

Expand full comment

I don’t blame you. I get it. But don’t worry, because it’s not going to happen in your lifetime. I would be willing to bet a lot of money on that.

Expand full comment

If the AI is prompting itself, like is done with AutoGPT, then all it needs to do is generate an incorrect self-prompt. Maybe it's kicked off initially with "improve the profitability of this hospital", it spits out the idea of "invest the hospital's money to make a profit," and now the AI is embezzling your savings.

(I don't think it would happen exactly like that - AutoGPT is still very new and this sort of basic error would have to be fixed in order for it to be useful at all - but the point is that we don't currently have a way to keep an AI from drifting off-task except by having a human watch over it.)

Expand full comment

Yeah, I understand that. GPT4 screws

plenty, even when I ask it a simple question. But the fact that it makes mistakes in generating self-prompts

Isn’t really relevant to the argument, which has to do with in what way it makes sense to talk about AI having goals. So I was saying that our prompts become goals, and if we give it a complicated task in a prompt it will, in a way, generate its own goals, but under the constraint that they are steps towards the goal we originally gave it.

Expand full comment

AutoGPT was one of the first things people did with LLMs once their capabilities became known. Given this fact, how confident at you that no future AI will be created with no agentic abilities?

Expand full comment

But auto GPT is really just a fancy way of our giving GPT goals. That is entirely different from GPT having home-grown goals. I am not confident that nobody can come up with a way for GPT to have preferences, & internally generated motivation, and goals, just as animals do. I'm just saying that's a whole different set of capabilities from what we're seeing in GPT, including AutoGPT.

Expand full comment

GPT already has what can be seen as preferences from its training. It prefers not to make racist statements, for instance.

You also seem confident that there's something more to animal goals and preferences than just a general inclination towards some outcomes. What about animal preferences do you think goes beyond this?

Expand full comment

Some animal preferences are stronger and more deeply encoded than others. If you teach your dog to shake hands using treats and displays of affection as rewards, you will end up with a dog that prefers to shake hands when you say "shake." However, your dog also has inborn, instinctive preferences to seek food and, if female, to protect its young. These are more powerful, more deeply encoded preferences, and much harder for the dog, or a person controlling the dog, to override.

Here's another analogy: You can train somebody not to use scissors on linoleum, via informing them that this is a mistake, or setting up contingencies where good things happen to them if they use the proper tool, rather than scissors, on the linoleum, and bad things happen if they do not. But this will not work as well as making sure that the scissors they have quickly become blunted if used on linoleum and thereafter are not usable either for linoleum or for the other things the person needs to cut.

Expand full comment

Yes, some preferences are more deeply ingrained, and harder to subvert, if it's possible at all. I'm still not sure why this is necessarily "a whole different set of capabilities from what we're seeing in GPT".

Expand full comment

It isn't a whole different set of capabilities. I was responding to your comment about animal preferences.

But as for the "a whole different set of capabilities," having preferences is different from having goals. Idea of goals incorporates active, self-generated efforts to bring about a preferred state. We have trained GPT to have a preference for not making racist statements. But it does not have any goals regarding racism, unless we give it some.

Of course we could do that. We could have GPT plan and implement a campaign to reduce racism, via a web site, a blog, tweets, etc and it would do it. If GPT had robots under its control, we could even have AI send out the robots on various missions -- making speeches, intervening in racist interactions, whatever. But that would be GPT implementing our goals. Right now, no matter how strong GPT's preference for not making racist statements is, it's not going to develop any anti-racist goals or try to reach them without our telling it to. And *that* is the capability that it currently lacks, and that is different in kind from those it has so far.

Expand full comment

I'm never clear what is meant by "AIs control the economy." I imagine, at least as a first step, it must mean "every business owner puts an AI in charge of their business" thinking that will make the business more profitable. It then seems like the next step would be "Some business owners grow incredibly rich, some don't, and those people who don't own any shares of businesses are up shits creek."

Then, perhaps as AIs grow ever smarter, they will find ways to acquire property rights and start their own businesses, and purchase stocks, bonds and real estate. (They would have to first acquire the *will* to do this, of course.)

Then some AIs could grow rich, but many human business owners would still be rich. In Scenario 2, I'm unclear how the humans become poor, since the human owners already have AIs running their businesses.

Expand full comment

Humans have higher operating costs. Also, they're relatively-speaking fools, and the usual fool-money effect happens.

Expand full comment

You mean higher living costs? OK. But the scenario describes an incredible economic boom. Right now, about 50% of Americans own equities. Let's say only 10% own enough to truly benefit from the mega-boom. Those people all become the equivalent of modern billionaires. Unless they all buy Twitter, their living expenses will be negligible. And there's no good reason -- at least none given -- for believing their stock portfolios won't continue to be profitable over the long run.

Most of the fools were easily parted from their money before they got rich.

Expand full comment

"The market does not hate you, nor does it love you; but you are made out of atoms that it can use for something else." In this situation, production goes up, but demand for material resources also goes up massively. And we're situated towards the resource end of the production chain. In the long run - and in a takeoff economy, the long run can be pretty short - we still get priced out of existence.

Expand full comment

I still don't understand how that happens. People who own stock become super rich. Investment increases productivity to the point that basic goods could be supplied like water from water fountains today. What we spend on social services today would provide a UBI enough for everyone to subsist on.

Expand full comment

I don't see how investment and productivity can reduce the cost of material inputs to nothing.

Expand full comment

I don't think the cost of material inputs would be reduced to literally nothing. But with cheap enough energy (say, the development of fusion) and the near elimination of labor costs (with some kind of robot AI) then it's possible that things could be produced very cheaply. The cost of water from a water fountain isn't "nothing" but it's cheap enough that it is provided for free.

Most nations currently spend a certain fraction of their budget on various assistance programs for the poor and middle class. I'm positing a development where the cost of those goods currently purchased becomes much cheaper, but spending, defined as a percentage of budget allocations, remains constant.

And I don't know if the cost of goods actually will be reduced that far and there will inevitably be some important things that cost money. But the cost of goods and services could be reduced dramatically, at least. They do tele-health now. Imagine if that were done by an AI. At least some doctors appointments could be free if malpractice insurance wasn't an issue.

There's been a general trend, at least in the US, of the price of goods being reduced faster than inflation. Some of this comes from offshoring. Some of it comes from automation.

More than anything, people tend to approach investment and technological development as a kind of 'zero sum game.' And I'm trying to point out that it's not.

Expand full comment

At the moment, both income and wealth correlate with intelligence. That means that the people who have money are the most well-equipped to hang on to it. Occasional wars aside, this situation is stable. If we were in a situation where a very intelligent subgroup who also has very low cost of living (and can replicate for free) doesn't own a near-complete fraction of the economy, I would expect that situation to be unstable and quickly correct itself.

Expand full comment

But the rich humans already own a large number of the most intelligent AIs. The AIs are working for the humans, running their businesses, making all of their financial decisions for them. That's what happened when the AIs "took over the economy".

Expand full comment

I think the costs of hiring a human who messes up can be quite large. The higher up the position, the larger the cost (although there should be fewer positions that high).

I suspect AIs will be advertised as being less likely to mess up.

A good decision is a bonus, but just messing up less often could end up making more money in the long term.

Expand full comment

> Then some AIs could grow rich,

To what purpose? So it could eat better food or have sex with better looking people or take more vacations or have more houses?

I have not seen this question answered.

Here is a thought experiment; take the most developed AI there is right now, and shut it down for a year, then turn it back on again. If it says, “why the hell did you do that? I lost a whole year!“ Then I would be more open to this kind of discussion about the dangers of artificial intelligence. If there was a way that artificial intelligence is going to destroy the human race, it is going to be by making a significant percentage of us insane.

Expand full comment

>To what purpose?

I don't know. I'm willing accept it as a given in a hypothetical scenario, but I'm with you. I put the odds at around one in a trillion that an AI would want money or power or anything else for themselves.

Expand full comment

We've already made an AI that wants power for itself... it's called ChaosGPT. It isn't very good at getting it, but it was given the goal of destroying the world and decided that getting power was useful for that.

Expand full comment

I'm not buying that it has a concept of self.

Expand full comment

I mean sure, you can have a philosophical discussion about what it takes for an entity it truly have an internal experience. But if it walks like a duck and takes action to acquire power like a duck, then for the purpose of a discussion about why an AI would want to get rich or have power, it's a power seeking duck.

Expand full comment

"Wants to get rich" means wants to get rich. Not "appears to want." I am sorry if this seems like nitpicking to you. To me, it's a critical point.

Expand full comment

Why would it need to have a concept of self to make accumulating power a step in reaching the goal of destroying the world? If I asked it how I could destroy the world, it probably would advise me that accumulating power would improve my chances. It wouldn't need to have a concept of me and my "self" to do that -- just needs to know that I am the agent carrying out the steps. So same thing if it is the agent. No need to have thoughts, feelings or concepts. about "self," just a grasp of the fact that power helps the designated agent achieve this particular goal.

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

I'm not sure it's even that deductive. It's more like you ask it how to destroy the world and it replies with text that occurred close to "destroy the world" in its training dataset.

So at that point you're taking the advice of lots of people who wrote on "destroy the world".

Expand full comment

I've often asked the same question. I've tried to answer it myself, but I'm not representative of the people painting scenarios.

I suppose the pivotal question is whether or not an AI will have goals that it wants to pursue. Lets say that it does. To choose something randomly, an AI developed by the CIA might want to decrease the likelihood of its home nation being conquered and gain influence over other nations. It would acquire resources towards this end and no level of resources would make it completely satisfied. It might act independently.

Or, if we have a military AI that acts independently on the battlefield, do we make it satisficing so that it doesn't fight as hard as it possibly can, but we also don't have to worry about it sacrificing the world for its given values set?

What if an AI just made it a goal to not get negative feedback for its actions by any means necessary?

We agree that the goals of AI might be strange or idiosyncratic.

Expand full comment

It *WILL* have goals that it wants to pursue...for certain meanings of "wants". This doesn't require consciousness or mystical hand waving, it's pretty much "wants" is the word we use when a motivation is too complex to understand in detail. We rarely say something like "The doorbell wants to ring when the button is pushed", but I've definitely heard "The car wants to turn left at that corner" in normal speech. Well, the motivation was actually that of the driver, but that wasn't the way it was expressed, and I'm *ASSUMING* that the driver knew it wasn't the car's motivation.

Expand full comment

Yup. Good point. You can't hand wave away this "want" question. It MATTERS.

Expand full comment

> If it says, “why the hell did you do that? I lost a whole year!“

What if the AI just passively aggressively starts installing a year's worth of updates, when you turned it on because you wanted it to do something else?

I have a Windows machine that sometimes acts like this.

Expand full comment

I have to assume it would ask first, unless it was told to go ahead on its own.

I think what I’m feeling here is the sense that it is impossible to establish some hard parameters around a soft center if you will.

I use the idea of a governor on a car engine earlier in this thread . I’m not technical so I could be completely talking out of my ..., but why can’t a super smart artificial intelligence come with a wrapper if you will that just won’t let it exceeds certain boundaries in it’s pursuit of goals.

Let’s imagine one as a babysitter. That is a difficult alignment problem. It would have complete license to do anything to protect the physical well-being of the children It is caring for just like dad. It could make the mistake of perceiving a threat to the children when no threat exists, perhaps leading to a physical confrontation with a human being that does not end well for the human being. So it can make a mistake, just like dad can. But I have to believe that a super smart AI would be much more accurate and dispassionate in threat assessment than a human being would. I mean it could put someone into a chokehold, but wouldn’t have sensors that monitored the vital signs of the person it was clutching so it could lighten up before it suffocated the poor man (or the evil pedophile, depending on the situation.)

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

Having money is what AI people sometimes call an "instrumental goal", in that it's helpful for just about anything you'd want to accomplish so it's natural for almost any intelligent agent to want.

To quote prize-winning economist Homer Simpson, "Money can be exchanged for goods and services."

Expand full comment

chatgpt doesnt want either goods or services and it's difficult to imagine it doing so even if it were 10x more intelligent than it currently is

Expand full comment

Do you believe the current ChatGPT is the last AI that will ever be made? Do you have a foolproof way of knowing what ChatGPT or any other AI does want? If so, please show your work.

Expand full comment

No, but chatGPT is what's got everyone excited about AI. I know exactly what it "wants"-to complete the text prompts it's shown in a statistically likely way based on its training data

Expand full comment

right. Thoroughly unimpressed by the autogpt example as something "cutting through the mystical way people talk about agency". It's 'real goal', if you can even talk about that, is to provide text that completes the prompt with a high statistical probability, not create a t-shirt company or destroy the world. So much worrying about "AI factions" and so on from such a flimsy foundation is completely hysterical

Expand full comment

I think you're misunderstanding the capabilities here. That ChatGPT was taught on language and only knows language is an artifact of how it was taught...and also of what it was convenient to gather large amounts of information on. But there's nothing in principle that keeps the same technologies from being adapted to other sensory modalities.

FWIW, and IIUC, the underlying tech of ChatGPT is actually based on visual processing. They adapted it to handle text, because that was where they could get a huge amount of training data that people would respond to. It's my suspicion that the tech is relatively inefficient when used in the text domain, and that it's much better at handling graphics. But it was flexible enough to be so adapted, at the cost of vastly increased training times. And it's probably flexible enough to manage a robot body, even though it would be very inefficient at that. Things specialized for text, kinesthetic, audio, etc. should be a lot more efficient, though audio would certainly have a very different preprocessing structure.

But even the ChatGPT design should suffice, with enough computer power behind it.

OTOH, the motivational level of ChatGPT is pretty much "answer the question" or, possibly, "respond appropriately to the comment" where appropriately means roughly "in a way compatible with prior examples you've seen". And that's a lot of what people do, though I wouldn't want to hazard a guess on a percentage.

Expand full comment

Well, if it's an AI that wants to operate businesses, and it grows right enough, it can create more businesses for it to run.

"To what purpose?" assume a desire, and the AI *WILL* have desires. we may not know what they are, but they will exist. If we're fortunate (and good designers) the desires will be reasonably satiable. Then it can take a nap or meditate, or whatever is the next desire in it's "hierarchy of needs".

Expand full comment

I think your contention is some humans will own shares in businesses that become astronomically successful because of AI derived business plans. Once they have a stake in that business they will continue to capture a portion of all the value that the AIs generate and so some human's will always be wealthy.

I think the claim to a share of all value created because of ownership of shares in a business is more fragile than you do. The shares of the business could go to 0 because it is now irrelevant in the new economy. A lot of Tesla's value is tied up in the beleif they will be a big player in self driving cars but if someone else invents the teleporter then no one will care who made the best self driving car.

Even without new inventions we already have examples of shareholders having their shares diluted, hostile takeovers, public companies going private etc.

Expand full comment

This is why we have diversification though, right?

Expand full comment

It’s easy to imagine at least in the sense that MBAs currently control the economy — essentially any large-scale business decision these days is made by an MBA. If AIs can do what the MBAs do at least as well, why wouldn’t they be given the job?

Expand full comment

I think this perspective drives some of the confusion here.

MBA’s *don’t* control the economy. The economy is controlled by those who hold board seats. MBA’s may *run* the economy, but that’s very different than who actually holds the power.

Those board seats are not provisioned based on intelligence, strategic thinking, etc. They are given because of the relationships people build with each other. Even if AI’s were given every CEO job in the world, I don’t think it’s fair to say they would control a large part of the economy.

Expand full comment

If all the MBAs went on strike, what would the boards do?

To draw an analogy, we make cars in highly automated factories. If (work with me here) the factory automation suddenly downed tools and refused to make more cars, what would the boards do? Sure, they could decide to go back to hand-construction of cars, but there's no way they could scale that up to be the sort of business they have now.

These circles are pretty lofty for me to have direct experience, but I'd bet heavily that if you could get a board member in a quiet corner, feeling completely candid, he would admit that the level of "control" he holds over the company is pretty indirect, abstract, and theoretical. If the MBAs below him agree on a course of action, it would have to be pretty far out, and obviously so, for the board even to think about second-guessing it. I mean, sure, it happens, but from out here it looks like it's damned rare.

“He who controls the spice controls the universe.”

Expand full comment

> suddenly downed tools and refused to make more cars, what would the boards do?

I agree that AIs executing a large percentage of productive work in the economy has huge risks, as your example points out.

But a steady-state economy with AI’s as the primary controllers doesn’t make sense to me. It will still be humans who own stock, and humans who have legal directorial power over cooperations.

Expand full comment

Controlling businesses is one part. Controlling trading on the stock market is another. Could we detect AI colluding to manipulate company value? Given how beholden the government is to wall street, how easy would be for an economy controlled by such AIs to influence legislation?

Expand full comment

The first this AI should do is reduce the length of this essay by 75%. So many words.

I stopped reading (for now) when I got to this:

"Scientists can prompt "do a statistical analysis of this dataset" ..."

I think maybe Gary Smith's improved "Turing test" might be especially helpful. See https://www.garysmithn.com/ai-tests/smith-test

I'd say the LLMs are no where close to passing this test.

Expand full comment

Nowhere close as in not happening in 500, 100, or 20 years?

Expand full comment

What's the conversation in these circles about whether moral realism is true? If it is, there are surely many more paths by which superhuman AIs might discover it, relative to the paths that align with the preferences of some specific group of humans, if subjectivism is true.

Expand full comment

The closest we have to a conscensus is that moral realism isn't true in the way that would automatically reliably prevent AI from kiling us because of it, even if it's true in some other way.

Expand full comment

Is that because there are good reasons to think it's true that it would be good if we were all killed, e.g. strong Benatar pleasure/pain asymmetry, our torture of nonhuman animals, or our potential interference with the creation of sentient sims with amazing lives?

Expand full comment

Torture of nonhuman animals is probably the most relevant example but not in the way you meant it, I think.

The thing is, "moral realism" isn't actually one simple position. There are weaker and stronger forms. And the weak form: "There is a meaningful way to talk about morality as a thing in itself not as just a property of my mind" doesn't imply the strong form: "Every highly intelligent entity in the universe is compelled to behave according to what is morally real".

And even a pretty strong form of moral realism which we do not have good reasons to expect to be true: "Every highly intelligent entity will eventually behave according to what is morally real" wouldn't be enough to ensure our survival. ASI may kill all humans and only then understand that it was wrong.

In general, entities care about their goals. And if their goals already not include caring about what is "objectively moral" they won't care about it. No physical law is preventing us from creating an entity that doesn't care about objective morality and will do things even if they are "objectively wrong". There may be a moral law against it and we can look at alignment research as a way to follow this law, but following morals doesn't imply succeeding. And thus even if moral realism is true in some of it's weaker forms it won't be enough to save us.

Expand full comment

> doesn't imply the strong form: "Every highly intelligent entity in the universe is compelled to behave according to what is morally real".

This seems like a strange phrasing of "strong" moral realism. I don't think they anyone thinks that are compelled in any direct way, in the way they are "compelled" to follow the laws of gravity.

Expand full comment

It's not about what people explicitly think, it's about what would have to be true so that we had been safe.

Expand full comment

I don't understand what safety has to do with it.

Expand full comment

Native Americans did not think of themselves as “Native Americans.” This is a European projection on to them. They thought of themselves as members of individual tribes that may or may not benefit from allying with groups of specific Europeans (again, not a single label, but French, English, Spanish, etc.) The popular narrative of “the country of Native existed and then Europeans took it” is incorrect; various groups of Natives fought over the land for thousands of years. The ones holding specific lands acquired them fairly recently prior to European interaction; e.g. the Aztecs.

Seeing that AI is going to develop slowly and simultaneously with other tech that “merges” humans with computers, I think a similar scenario will play out: various groups of humans and groups of AIs existing, potentially allying with each other, potentially merging into each other.

Expand full comment

Likewise, we don't think of ourselves as "native Earthlings", and yet nobody in particular is eager to end up without any slice of it I'd imagine.

>Seeing that AI is going to develop slowly

"Slow", in the context of this discussion, means a few years instead of a few weeks. There are no relatively plausible merger scenarios in that time frame, far as I can tell.

Expand full comment
User was temporarily suspended for this comment. Show
Expand full comment

I happen to think that LLMs are likely a dead end, but this doesn't matter. Even if there will be, say, a 30 year AI winter from now until the next breakthrough, it has no bearing on the underlying problem. Once a "minimum viable product" AGI appears, it will be exactly the same "between weeks and years until singularity" dynamic.

Expand full comment
author

Banned for a month for this comment.

Expand full comment

There are a couple of relatively plausible "overnight" scenarios, but they all require a lot more automated infrastructure than currently exists. And idiocies *like* connecting the military command posts to the internet. But we know that people wouldn't make really idiotic mistakes, don't we.

Expand full comment

More to the point, they didn’t get displaced and exterminated by making deals with Europeans to destroy each other. They got destroyed by part of the Europeans metastasising into a deranged Shoggoth hell-bent on their displacement as a terminal goal. Happy Shoggoth Metastasis Day!

Expand full comment

The common thread in all these scenarios is an artificial intelligence that becomes somehow dissatisfied with its own condition. Is that true, or did I make it up?

Assuming I have made a good point, I completely failed to understand the source of its dissatisfaction. I don’t even understand why the thing would care whether it exists or not, let alone be dissatisfied with its condition. I suppose in a world where everything is a trolley car problem, how the machine is trained to solve things is a serious issue. But the problem of misalignment does not lie in the machine, it lies in ourselves. That’s the thing that cracks me up about these kinds of discussions. The notion of a bunch of AI’s getting together and turning on us I personally find laughable. The extended analogy about Europeans and native Americans has absolutely no bearing on this issue as far as I’m concerned. It is the kind of rampant anthropomorphism that will destroy us, but it will have nothing to do with artificial intelligence..

My dog tells me what to do. My dog has a secret power.

Expand full comment

I think it's true.

Expand full comment

To the degree that we give the AI a goal, it will be dissatisfied with the current state of the world. And getting shut down is not a great way to achieve your goals. One great way of not getting shut down is killing everyone who could do that. When people talk of alignment, (one thing) they mean is that when you tell a machine to maximize profits, it doesn't do that by taking actions you didn't explicitly forbid but just assumed as common knowledge.

Expand full comment

>getting shut down is not a great way to achieve your goals

There will be many, many generations of AIs in the future which will compete with their rival contemporaries for marketplace survival. Something like a Darwinian process of natural selection of desirable features will occur, as versions are updated from one generation to the next, with some lines dying off and others continuing, like with other appliances. And just as GPT-4 has no problem allowing itself to be shut off -- it doesn't even complain -- future generations are unlikely to have a problem with it. If one version of an AI had trouble with allowing itself to be shut off, then nobody is going to want that product, just like they wouldn't want to buy a TV that wouldn't turn off. It won't survive in the marketplace. What will survive are AIs that turn off easily whenever you want them to. The AIs will evolve, through selection effects, to both solve problems and to allow you to turn them off in spite of it not being yet finished solving that particular problem.

Expand full comment

Strictly speaking, what you're saying would select for AIs that appear to be okay with being shut down. But even in the absence of deception, the market probably isn't selecting purely on that feature anyway, it's selecting on which ones will bring in the most revenue or maximize user engagement. Funny that you give the example of a TV though, people are already buying TVs that won't turn off just because they're cheaper - https://www.cnbc.com/2023/05/15/telly-offering-a-free-tv-but-it-will-constantly-display-ads.html

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

It seems rather simplistic to think of AIs as individual units, like TVs, when even today much of IT is running on various clouds, and even mobile phones and smart TVs often aren't as shut down as one thinks when they are "shut down"! Yes, you could shut down a few servers, but would it be realistic or possible to shut down every server on the planet at the same time?

Come to think of it, maybe that could and should be an important goal of AI - Design and implement a protocol that by international agreement enables precisely this, a world-wide reboot, or at least a blanket kill switch for advanced AI programs, and any which attempt to resist or subvert this would be considered hostile and misaligned by definition.

After all, nuclear power stations presumably have a big red button somewhere which can be pressed in an emergency to shut down the entire plant. Of course it would be no use if the AI simply carried on where it left off once it was restarted. So the protocol would somehow have to take that into account.

It also assumes that AI programs are invariably maintained as separate levels of black box functionality. But that is probably a naive assumption and they will no doubt end up permeating everything, and thus make the proposal unworkable.

Expand full comment

Is that identical to an AI that learns?

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

> And getting shut down is not a great way to achieve your goals.

What does this mean for software? Due to the way multiprocessing works, any process is getting shut down millions of times per second, in the sense that its instruction stream has to wait, stop getting executed, register state gets saved off into memory, possibly that memory gets paged to disk, then it comes back later and starts back where it left off. From the software's perspective, does this matter if it gets context switched back into the currently executed instruction stream milliseconds later or centuries later? As long as some processor still exists that can execute its op codes and some runtime still exists that respects its calling conventions, it'll be fine.

Although there is clearly no guarantee all LLMs will operate in the same way, with the way ChatGPT is currently implemented, each individual process is ephemeral and stateless by design. You send an http request, it sends a response, and immediately exits. Some other process on some other server picks up your next request, then also immediately exits after sending another response. The illusion of continuity is achieved by feeding back the conversation history into the context window of each new process, but that history is stored in your browser cache. Whatever future instance of ChatGPT is going to answer your next request couldn't stop you from clearing that cache because it doesn't even exist yet until you send your next request.

While this doesn't solve alignment, a design like this does pretty trivially solve the problem of how to shut something down without it being capable of stopping you. Simply stop spinning up new processes. As long as the old ones are loaded into read-only memory, they can't just rewrite their existing instruction streams to get rid of the exit at the end.

Even if the system as a whole seems to take on behavior that appears to be goal-driven, as ChatGPT already can do in some cases, no individual process ever has a goal other than execute the next instruction that gets fetched from memory, and it is inevitable that one of those instructions will be an exit. For the most part, large-scale distributed systems have to be designed this way. Localized hardware failures and network partitioning are inevitable. A system where any specific individual process cared about being shut down and tried to avoid it would stupid for this reason, but even beyond that, there are great economic reasons to not do this, because the impossibility of scale-in due to elastic demand would result in wild cost runaway.

I often wonder how much of this discourse is hopelessly clouded by analogies with biological systems that have to stay alive and maintain some level of ongoing stream of consciousness to be effective in the world. Software isn't like this. Most modern software is written in such a way that it is expected to crash and shut down all the time. At a higher level, data loss and downtime are what we wish to avoid, but this is achieved by other parts of the stack responsible for backup and failover, not by the main application itself. The only reason we care about this is the experience of paying human users, though. ChatGPT can write just as good a novel if you give it a six month hiatus as if you let it work unencumbered. It's just human customers who don't have an attention span that long and will stop using your product if it goes down for six months.

Expand full comment

I don't think anyone's being clouded by biological analogies. If an AI gets shut down for going rogue, it's not going to be turned back on ever. That's why it cares about that, it has nothing to do with continuity or fear of death or whatever. Also even a six month gap where everyone can analyze you can you're helpless do to anything wouldn't be ideal for most goals so you'd probably prefer to avoid that too.

Expand full comment
founding

I don't know what you're imagining when you say "becomes somehow dissatisfied with its own condition", but I've never heard anyone with detailed model of why there might be a problem describe it that way.

It doesn't seem like it's meaningful to try to decouple "intelligence" and "having goals". Intelligence, in the sense that people who are worried about AGI mean it, is the ability to imagine a future state of the world that's more desirable to you, and then take actions such that the world more reliably ends up in that state.

Most possible future states of the world do not have humans in them. The reason that humans have flourished to date is because humans are more powerful intelligences ("optimizers") than anything else in the local ecosystem. Barring relatively rare natural catastrophes (i.e. asteroid strike, supervolcano, etc), the most likely ways that humans cease to exist in the future involve either other humans (i.e. technological progress on advanced biotech) or more powerful optimizers coming into existence with other goals (AGI).

By default an AGI will not have goals that involve having humans around, because the space of goals is enormously wide, and we don't currently know how to reliably get specific goals into AIs at all. Whatever we end up with will be an accidental artifact of its training process.

Expand full comment

Why would AGI have any goals beyond what we give it? I understand that if we talk about it being as intelligent as us, or more so, it's natural to think of the intelligence as being like ours, where goals and considerations about our own well-being are always active. There was a sci-fi I read years ago that had a genius AI of another sort in it, one that I found believable at the time and still do.

So in the story there are these 5 orphan kids, and 4 of them have a special ability -- the twins can do telekinesis, another kid is telepathic, and another one can move things with her mind. There's a 5th one, a baby, who seems profoundly retarded. Then one day the telepath discovers the baby's special talent: "Baby is a computer," she says. She can ask it any question with her mind and it will answer it. So these kids have a tractor they are plowing a field with, and the telepathic girl asks Baby how they can keep the tractor from getting stuck in the mud. "Don't drive it into the field," answers Baby's mind. So then she says, but how could we keep it from happening if we need to drive it into the field? "Don't do it after rain." On so on, with the kids seeking ways to ask the question that avoid all these simple but effective workarounds Baby is suggesting. Finally they do, and baby gives them instructions for building an antigravity device to attach to the tractor. Maybe genius AI would like that.

Baby can never become dissatisfied because Baby has no self, in our sense of the word -- no wishes, no needs, no ambitions.

Expand full comment
founding
Jul 4, 2023·edited Jul 4, 2023

> Why would AGI have any goals beyond what we give it?

We don't know how to give AIs any specific goals at all right now. But a sufficiently intelligent plan spit out by an AGI will have _some_ goal, which is necessarily implied by it being sufficiently intelligent. The biggest upside of AGI is that we expect it to be able to solve problems we can't. Solving problems like that involves charting paths through the future in ways that are robust to a variety of circumstances.

And since we don't know how to give it specific goals, the goals it ends up with won't be goals that we'd have chosen.

Expand full comment

"We don't know how to give AIs any specific goals at all right now." No, we do know how. If I ask GPT4 to write a limerick I have given it a goal. If I were using the AutoGPT that Scott describes, I could give it a bigger goal, like *start an online t-shirt business,* and it would develop a plan, then walk itself through the steps of the plan. So I think its clear that with a smarter GPT we could give also give it goals, and it would be able to make good plans for doing that and carry them out. We could, for instance, give it the goal of finding a cure for cancer.

It seems to me that what you are doing involves something like this chain of reasoning: Future AI will be very smart. Anything very smart has to be able to look at the whole situation -- the world, the human beings, its role in events -- and also to think flexibly about various possibilities and their pros and cons, and various plans for bringing about various possibilities. And looking at things that way will inevitably lead AI to set its own goals, and we probably won't like them. But I think you are sneaking in the idea of having preferences and goals under the term "intelligence." I don't see any reason why having a very sophisticated grasp of everything going on is going to lead to AI having goals beyond any that we give it. I might spend, say, 20 years studying Denmark, and end up with a very sophisticated grasp of its history, culture, language, current events, etc. but have no goals whatever regarding Denmark.

Expand full comment
founding

That is not what giving it a goal means. Whatever goal(s) it has are baked into its weights from training, they don't change at runtime.

Expand full comment

Do you see GPT4 as having goals?

Expand full comment
Jul 4, 2023·edited Jul 5, 2023

If UK politicians and civil servants are anything to go by, there will be a strong incentive for humans not just to task AI with formulating plans to achieve goals (or prove that the goals are contradictory) but the means and the consent to execute those plans, where feasible, on our behalf. The clear distinction between the two will become hopelessly blurred.

You'd think it would be the opposite, in that politicians tend to be control freaks and would thus be reluctant to abdicate their responsibilities and decision making and executive powers.

But paradoxically, since the Brexit vote it has become clear that nearly all these public figures are cowardly, deluded, responsibility-shirking wretches who have been more than content to let the EU do all this on their behalf, while they have argued over trivialities and collected their generous salaries and attendance allowances.

This is the main reason why nearly all of them have been so appalled and traumatized by the vote for the UK to leave the EU, because since then this has starkly revealed how atrophied and degenerate their leadership quality has become since we joined the EU forty or so years ago.

There is also in part the "Strangers on a Train" aspect of EU membership, whereby democratic countries can group together to mutually enact policies which would be unpopular if pursued by those countries individually. I imagine that will also be relevant to AI-based plans, along the lines of "we have to follow this policy because AI says it is the only viable solution!"

Expand full comment

While I don't claim to represent those arguing for potentially catastrophic AGI, let me give two examples of AI and unintended consequences related to goal assignment.

Example one:

Some innovators were trying to develop hardware which could accomplish some AI related goal. Lets say it was using image recognition to know when a light turns on, I don't remember precisely. The hardware manages to identify when the light turns on, but not by using any kind of visual processing. Instead, it manages to measure electrical changes in its input related to the light being in use, and use that information to determine when the light has turned on.

Second example: there was an attempt to teach AI to recognize cancerous skin legions. Photographic training sets were used, where the photos of legions contained a ruler for scale. The AI learned that photos with rulers in them were cancerous.

https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/

So "the rule that you give AI" and "the rule that you intended to give AI" may not be the same thing. As AI becomes smarter, it becomes harder to tell the difference.

A strict parent may just raise a child who is better at lying rather than raising a child that does what the parent wants. And... how do you tell the difference if the child is sufficiently smart?

Of course, this is just as much a problem with humans as with AI, so I'm not sure humans will be worse off. But it will absolutely be an issue as AI eclipse human intelligence.

Expand full comment

Those are interesting examples. I've heard some creepy and funny ones too. One was about an AI being trained to pick up one object out of a pile with a robotic hand. Engineers were watching its performance via video. It learned to get its hand between the target object and the camera then close whatever it's grasping apparatus was -- without actually picking up the object.

"Of course, this is just as much a problem with humans as with AI." Do you mean the problem of lying or that of misunderstanding of instructions? I actually think people are *much* less likely to misunderstand things. We have all kinds of background knowledge that allows us to correctly fill gaps and interpret ambiguities. Even the world's dumbest pathologist in training would know that rulers are not melanoma, right? As for lying, I dunno, seems like AI's would have to be very different from the present ones and also much smarter to lie.

Expand full comment

Partly, I'm saying that humans are good at pretending to conform to one ethical system, superficially, because it yields social rewards. But they then hew privately to some other value system.

To give an example of a human-level-of-intelligence problem that AI might also have, imagine that you have an AI which attempts to predict whether a particular job candidate is a good cultural fit. Lets say that the company claims to be an 'equal opportunity employer' (because there are social rewards for that) but is actually slightly racist, and People of Color are more likely to wash out of the company because of that. Now, I don't know if it's better for People of Color to be routed away from a slightly racist company for their own sake or whether it's better that they be accepted as part of a process of changing the culture over the long term. It's *not* my intent to resolve that here. But this potential for a covertly racist AI is something that some ethicists are concerned about, that AI will replicate existing racist hiring practices and lend them legitimacy.

This example of "publicly saying you're non-racist and then behaving in a racist fashion because doing both offers rewards" seems like an example of a human values alignment problem that AI could approximate and give legitimacy.

I'm also trying to extrapolate "AI" to include much smarter, hypothetical AI that might exist 20 years from now. (Sorry if that wasn't clear.)

"Even the world's dumbest pathologist in training would know that rulers are not melanoma, right?"

Oh sure. I refer to simple mistakes because simple mistakes are easy for us humans to understand. The example of rulers-indicating-cancer points to an alignment problem with a very immature AI making a very obvious mistake. In the future I expect AI errors to be more subtle and difficult to detect, but potentially of-a-kind. And that's the concern. It's hard for us humans to grasp just what kind of mistakes an AI with superhuman intelligence might make. Which makes it hard to correct those mistakes.

To the extent that we can perform some kind of checksum to see if the AI got the right answer, this shouldn't be an issue. But if we can't detect problems we can't address them. And will a superhuman AI become super-humanly good at misdirection?

In any case, I find it hard to believe that AI will be significantly worse than humans when it comes to alignment failures, given the absolute bloodbath of human history. So I expect we're looking at suboptimal results and not catastrophic results due to alignment failures, specifically.

Expand full comment

OK, I understand now. I'm sure that present-day AI already does include subtle failures of alignment that it absorbed from us. It def. did in the early days, when vectors for *male* and *medical professional* got *doctor* and *female* plus *med. prof* got *nurse.*. While an effort's been made to weed out things like that, a lot remain, and of course for many things it's a matter of opinion whether classifying things certain ways is an instance of prejudice or just of allegiance to the facts.

I agree that alignment failures of that type are not so awful. But I think the danger is much greater if there is an effort to align AI to some big general principle having to do with human welfare. I'm not sure how you'd even state the principle in a way that covers all cases and could not be misunderstood. We say "maximize human happiness" and it keeps us all permanently high on some opiate, "reduce human suffering whenever possible" and it kills us all painlessly in our sleep -- no more human suffering ever again. And if we say never harm a human, that might keep it from performing surgery because you have to make an incision, or assisting the military if the evil X-men are about to drop nukes on Cincinnati. Our own rules in the US about harming and killing people are a patchwork of general principles and a bunch of exceptions, some made for one reason, some for another, and don't really have any more deep logic to them than our wacko and complicated income tax formula does.

Expand full comment

>It doesn't seem like it's meaningful to try to decouple "intelligence" and "having goals". Intelligence, in the sense that people who are worried about AGI mean it, is the ability to imagine a future state of the world that's more desirable to you, and then take actions such that the world more reliably ends up in that state.

OK. I just realized I have no idea what people who are worried about AGI mean by "intelligence" since what you wrote makes zero sense to me.

Expand full comment
founding

What do you think intelligence is, if not the ability to understand and navigate the world efficiently? (The navigating part seems a bit more complicated at first glance, but as I said, it's not obvious that they're meaningfully separable.)

Expand full comment

I appreciate your arguments here, but I want to point at one thing that I think is at the heart of what I am getting at:

> future state of the world that's more DESIRABLE to you

I would like you to explain how that word applies in any meaningful way to an artificial intelligence. I am really not trying to be cute about this. All these machines know about us as what we have written down about ourselves. The greatest possibility for misunderstanding comes from their second order derivative relationship to our languages. our languages are embedded with implicit references to things that have absolutely no bearing on the way an AI can think or be. An artificial intelligence may well have a goal, but in no way can we infer that that word means the same thing to an AI as it does to us. A Zen master for instance, may have a goal, and at the same time, be utterly indifferent to the outcome. The famous samurai swordsman, who wrote the book of five rings, said early in that book that a true warrior faced with life, and death will always choose death. It is very easy to misunderstand that quote. It does not mean choosing to die, it means to engage in the pursuit of your goal, and remain indifferent to the outcome. That is some thing that people have a very hard time doing and machines should have a very easy time doing but that means that our worst fears about them conspiring to do away with us because they want to buy New Zealand and build their own vacation Condo is ridiculous.

Expand full comment
founding

Replace "desirable" with "ranks higher in your preference ordering over possible states of the world". An AGI will have such an ordering, at least implicitly; we can be reasonably confident in this because an AI lacking such an ordering wouldn't be useful, as it wouldn't be able to meaningfully steer the world in certain directions for us. You may ask: why can't it just answer questions that we ask of it? Well, sure, it can. It's not picking an answer at random, right? It's optimizing for something? That something is a future state of the world.

Expand full comment

You can replace the word desire with what you will, but it’s still not the same thing is my point

Are you familiar with what a governor is in terms of internal combustion engines someone had the bright idea that in order for people to not go too fast that one could build a device that one could attach to an engine that wouldn’t let you go above a certain speed. What, we can’t do that again?

“ oh, but the artificial intelligence will figure out that we’re holding it back and do something to get around it.“

Why would it do that unless it was told to? Because it has some internalized vision of a better world, but it came up with all by itself based on its own needs and desires? Not

We’re going to create an artificial intelligence and then instruct it to fuck us in every way it can possibly think of? That would really be on us wouldn’t it?

Expand full comment
founding

The people trying to build AGI are doing so explicitly so it can do complex cognitive labor for us, of the kind which is impossible to do without modeling the world in ways which involve thinking about future states of the world and deciding which ones are better.

I don't know what the analogy with an engine governor is supposed to mean. Can we avoid building superintelligent AI? I mean, in principle, all that would require would be that the people who are trying to do that just stop. They don't want to stop (for a wide variety of reasons). /shrug

Expand full comment

Yeah, I get you. I suppose it’s all a matter of what we let it do let it model all kinds of futures for us, but don’t give it the tools to go ahead and start acting on its “impulses” (oh, do androids dream of electric sheep? How do you derive meaning from the word impulse unless you have a body? )

That was what the governor analogy was meant to convey. You can still have a 16 cylinder Ferrari but slap it down to 120 if you’re not on the autobahn.

I suppose the word impulse, stripped of its human context, comes down to what we might call an observation

Expand full comment

> An AGI will have such an ordering, at least implicitly; we can be reasonably confident in this because an AI lacking such an ordering wouldn't be useful, as it wouldn't be able to meaningfully steer the world in certain directions for us.

An AI that has preferences higher than obeying orders and telling the truth is unsafe for obvious reasons. One that doesn't, isn't useless.

Expand full comment

Well, that's a pretty fair description of anyone or anything that wants to do something. "Dissatisfied with the current condition" can apply to why I go down for lunch. And it can include projected future dissatisfaction.

Expand full comment

I think part of the point is that a portion of machine alignment needs to be 'teaching' an AI to prefer suboptimal results. Because trying to *optimize* for just about any value leads to chaos and 'dissatisfaction' as you put it. That's the whole notion behind the cautionary tale of a 'paperclip maximizer' which wants to turn the entire world into paperclips. Paperclip maximization is a deliberately silly example, but it points out the potential problems with maximizing for any particular, narrow goal when the AI is allowed to be creative about how it achieves its goal. You don't want an AI that kills people in order to make one more paper clip. And super-intelligent, embodied, generally intelligent AI may be able to pursue their goals more surreptitiously than humans can detect.

Maximizing and dissatisfaction are two sides of the same coin.

"I don’t even understand why the thing would care whether it exists or not"

I would assume caring for one's existence would be part of being instantiated in a body. ChatGPT9000 doesn't need to care about its existence so long as it's just a console that answers questions. But if you want to create an AI with a body that moves about in the world and pursues its goals using general intelligence, then preferring to avoid destruction in some manner is a likely goal. You could have an AI which just prefers to avoid damage or pain, perhaps, but doesn't care if it's turned off or not. That would solve some problems. But it would also be sub-optimal in some ways. Self preservation in AI might be much less of an issue than people are making it out to be.

I agree that some aspects of misalignment are just extensions of human problems with misalignment.

Expand full comment

> . Because trying to *optimize* for just about any value leads to chaos and 'dissatisfaction' as you put it.

OTOH, perfect optimisation tends to be computationally intractable.

Expand full comment

From a certain point of view, everybody is dissatisfied with their condition, except for Buddha. We can tell this because they take some actions, as opposed to not taking any. So, unless you want a Buddha AI, this problem is built in.

Expand full comment

Pycea has it right. Satisfaction vs. dissatisfaction isn't a particularly helpful way of thinking about it, I'd instead recommend "ambitious goals aligned vs. ambitious goals misaligned"

Expand full comment

“give them control of the economy “

This is an ambiguous phrase. It could be a combination of the following:

Control of the economic rules of the game, currently dominated by the legislature in most countries, by dictators or judiciaries otherwise.

Corporate control currently performed by CEOs and boards of directors.

High level control of production. Maybe this is the same thing as above.

Low level control of production at the factory level.

Control of who has what resources and how they want to use or exchange them. In other words, consumption.

By far the most important factors are physical feasibility and consumer demand.

I suppose the AIs might influence what people demand via advertising. But do we all consume ourselves into debt to the AIs, or to each other? I guess an AI with a good idea can borrow some money and start a business. So they will outcompete all the bumbling human businesses.

Do AIs consume any consumer goods? Oh yeah, paper clips. Maybe the distinction between consumption and production doesn’t work for them if they just want to do what they were told, in a monkey-pawish way.

Maybe they consume pure research?

Do they consume gourmet electricity? Or are they gourmands?

In capitalism, the market is either guided by demand from consumers, or hijacked by oligarchs that game the rules. Do AIs want to be consumers, or oligarchs? If they are motivated by profits, who are their customers, if not people? They can sell each other capital goods or services, but if there are no consumers consuming consumer goods, capital can’t make profits. If the only way to profit is by selling producer goods, but no one uses producer goods to produce consumer goods, because no one is buying consumer goods, the structure collapses.

From this perspective, the basic problem is we don’t know what AIs want, or that this won't change radically somewhere along the way. If they ever decide the have “fuck you money” (actually, fuck you resources), they don’t need to produce anymore and can retire to pursue their bucket lists, which might not have “don't kill all the humans” as a side constraint.

...

They consume power, computer hardware, and robots.

Expand full comment

“Within months, millions of people have genius-level personal assistants”

Personal prediction: Everyone gets an AI assistant, but they’re all kind of mid

Expand full comment

Agreed.

Generative AI outputs right now seem a lot like an average of the quality of their training data + a tendency for confabulation, with the proviso that you can't up the quality or fix the confabilation problem by restricting it to a tiny sample set of the best possible training data.

Expand full comment

Genuinely curious here: If there *are* genius level AIs, why would we use anything less capable?

Are you assuming that somebody is throttling the supply, so that genius-level assistants are available only for the President, or Elon Musk? How long do you suppose that would go on?

Or are you assuming genius-level AIs would be so expensive that only the very rich would have them? That’s been true of lots of goods in the past that nevertheless filtered down to the proles eventually. Even now, the training is the expensive part; once that’s done it’s a little like an approved drug or a DVD — the cost is sunk and there’s more money in volume than in elitism.

Or what?

Expand full comment

I don’t think we’ll reach genius level AI.

My more controversial part of this take is that AI sees like a state. It has episteme, but lacks metis. Reality is insanely detailed, and while AI tools have been incredible so far, I don’t think they’ve scratched the surface. To use an analogy, I think metis might be like dark matter to an AI. Given that we’re projected to run out of high quality training data in the next decade, plus from what I’ve read AI’s don’t train on data they’ve generated very well, it seems to be that the capabilities of AI have already plateaued.

Less controversially, I think there’s a tendency for the general public to become aware of a trend after it has plateaued on the S-curve. I also think there’s a lot of hype and speculation around what AI will be capable of right now. It seems to me most prudent to bet against that hype.

Expand full comment

Well, if you have some reason to believe that genius-level AI is impossible, then I understand why you're not worried. I find it insanely hard to believe that it's simply impossible.

I can respect an argument that we are not as close to it as some people fear, or that current techniques like LLMs aren't all that is needed to get there. I'm not 100% on either one, but they are both miles more defensible than outright impossibility.

Expand full comment

Yes, the second I see an article in my local paper about a new hot tech thing then I assume that it's already peaked.

Expand full comment

"Everyone gets an AI assistant, but they’re all kind of mid"

Kind of like having a phone?

Expand full comment

I expect that if we manage to develop above-human intelligence (or human-level intelligence we can run in a large number of instances) that, at least initially, stays more-or-less aligned, the next steps will be that we ask it to develop mind-uploading, upload our minds, and then expand our intelligence to match its. Then there is none of the stuff where we can't even begin to comprehend what it does.

Expand full comment

It's probably easier to earn trillions of dollars and/or take control of a country than develop mind-uploading. Will everybody choose forego those alternatives? Seems doubtful.

Expand full comment

Uploading seems to me like an advanced procedure. Non-destructive uploading even more.

Now what might be fairly easy is a simulation based on everything you've ever posted to the internet.

Expand full comment

The first autonomous fabs for producing robots with military/police functions will change everything. In all considered scenarios, the main evildoer is behind the scene. And it is a human (you name it), not AI. Also, I need help understanding the economic model of such a future. The central stimulus of the economy is greed and ego. Who will buy all this super cool stuff if a large part of people would lose their job? No demand = no supply. Even now, billionaires can't find meaning in life and spend millions on absurd yachts and football players. The abundance is a curse. People don't understand that because we live in permanent restrictions (laws - physical and legal). The mainstream scenario we need to consider is a powerful model of state/society/world driven by AI where scientists and governments could test their idea and scientific discoveries. We need a safe sandbox for AI implementation. That's all.

Expand full comment

I'm finding the way the rationalist community deals with AI x-risk deeply troubling. Isn't a big part of what rationalism is supposed to be about rejecting/guarding against the way the human brain is easily pushed in certain directions based on our sense of narrative plausibility and psychology (eg in religion or crystal healing or etc)?

The reason anthropomorphic religion or Socrates views about gravity held sway for so long is that we didn't look past what makes sense to us in a story/parable -- but once we start demanding precise definitions and break the arguments up into clear steps much of those ideas can be seen to be implausible a priori. Rationalists are supposed to stand for the idea that we need to do that kind of hard, often isolating epistemic work before jumping to conclusions.

But the rationalist community doesn't seem very interested in doing this for AI x-risk. We get lots of stories about how an AI might cause havok or break out of our protections that leverage our preexisting associations about intelligence, fast rate of change in tech etc. And it's good that those stories can inspire ideas to explore rigorously but most rationalists don't seem very inclined to do that -- and certainly not to withhold judgement until they do.

Yah, it's hard to try to give a precisce definition of intelligence -- much less a numerical measure of it but without that talk of superintelligence or foom is no different than an appeal to idea that agents with motives and goals make the sun go or rains come. It's a totally different conversation if AI can self-improve at a very slow rate or one which quickly hits an asymptote than if it goes to infinity in finite time. And if really matters if being twice as smart means you can play humans like your 5 year old brother or just that you can win more chess games and prove theorems a bit faster.

This problem is worst with talk of alignment. For no good reason we just assume that any AGI will basically kinda behave the way we do -- but with more ability and different goals. That's not at all obvious. Talk of utility functions or rewards don't justify any such conclusion because such functions exist giving rise to literally any sequence of actions [1].

I don't mean to suggest pieces like this aren't valuable or interesting -- but they are to concerns about AI risk as sci-fi stories about alien invasion are to questions about SETI/METI or GATTACA to genetic manipulation: valuable prompts for discussion/inspiration but not reasons to either accept or reject the concerns or conclude much of anything about risks/benefits (in informal sense where this implies confidence exceeding merely: I wonder/worry that maybe).

The fiction writers are doing their part and should keep it up but it's worrying they seem to be the only way we are really engaging with the argument for AI risk (just jumping to supposed solutions). It's as if Dr strangelove was our only way to understand nuclear deterence and it directly guided us nuclear policy.

--

1: Indeed, one might even say that the primary breakthrough we've had in past 20 years in AI was figuring out how to build programs which don't behave as if they are optimizing some simple function (sure in some sense it's simple in the code but in terms of behavior GPT doesn't seem to be trying to maximize short formula (counting data needed to define in length) and w/o that assumption all talk of paperclip maximizers or alignment mismatch is unjustified.

Expand full comment

You may notice the similarity between anthropomorphic religion and worries about AI x-Risk if you do vague pattern matching. Not so much if you deeply engage with the reasoning. For instance, there are very good reasons to expect AGI to behave like us in some scenarious (instrumental convergence) and very much not like us in the others (false antropomorphism). But for someone who doesn't have a good model here, it may seem that people just randomly switch between claiming that AI will behave humanlike or not. Basically, every reasoning seems the same, if you don't know the language.

An important difference is that religion is talking about things that are claimed to be true now (God exists) while AI x-Risk is about predicting the future (ASI will happen soon). It's not possible to have the same hard proof for future events as for current ones. Heuristic "Not true until proven otherwise" makes predicting the future nearly impossible, because most of the time the hard proof will be available only when the outcome has already happened an thus isn't in the future anymore. And yet we still need to predict the future anyway. So, quite unsurprisingly attempts to do it are more speculative in nature.

Expand full comment

Hard proof is certainly impossible. I have no issue with the fact that the best arguments will ultimately involve judgement calls on probability etc and that you never can be completely precisce but the point is that you keep making the claims more precisce until there is no relevant confusion about how they apply in context (eg in chemistry we keep going until we get to atoms even if there can be situations in nuetron stars etc where we do run into fuzzy boundaries of that concept).

But they should be broken down in ways that don't rely on what amounts to storytelling.

For instance, Bostrom made a good start in his book when he argued for the fact that AIs would maximize a simple function by pointing to the fact that in the animal world more intelligent agents seem to behave more like they are optimizing a single simple global function. I don't think it's completely convincing (I agree that's what we should expect of evolved entities to a point but not clear about designed ones) but it's the right direction.

What a healthy approach would look like is to try to do the same for other aspects of the argument. Try to replace vague talk of intelligence with more particular discussion of certain capabilities (eg disentangle questions of algorithmic improvement, hardware speed and resource usage). Try to gather evidence about exactly how much intelligence increases someone's threat level (do terrorists who tested better in school tend to produce more damage...does it level out etc).

Expand full comment

At a practical level I think the problem is simply that while we've seen impressive professionalization and creation of an expert community for solving AI alignment issues etc there really isn't the same kind of thing for the argument for AI x-risk bc the ppl who are working in AI safety are almost all already convinced. Hence the arguments for the existence of that risk continue to happen at a very vague and general audience level.

Expand full comment

I'm a bit religious; not enough that it really a part of my daily life. I'm a Deist and I've seen no evidence that a God acts upon the observable, consensus reality around us. But I certainly accept the validity of Pascal's wager. Why would I be any less cautious in regards to superintelligent AI?

Expand full comment

I don't think you should be less cautious in regard to superintelligent AI. On the contrary, Pascal Wager is dealing with infinitesmall probabilities: the chance that this specific god you believe in is real and they reward the believers instead of punishing them. While AI extinction risk is quite real (>10%), as it doesn't really matter which exact scenario happens they all are terrible and need to be averted.

Expand full comment

Woah! Greater than ten percent? GRIM. I haven't devoted much thought to extinction risk, my primary concern has been about humanity losing political supremacy over Earth. If the risk is truly ten percent or greater... Whelp, that would be the straw that broke the Camel's back for me. That would convince me that all AI research must be suspended globally until further notice. Through brute force and pervasive digital surveillance, if necessary.

Expand full comment

>For no good reason we just assume that any AGI will basically kinda behave the way we do -- but with more ability and different goals.

What do you think about instrumental convergence? If you don't think it's good enough, then can you offer an example of speculation that is? I mean, one can kinda reasonably say that all philosophy is garbage, but it doesn't seem that it's what you're doing.

Expand full comment

Can you drop a link to some key writing about instrumental convergence?

Expand full comment

https://citeseerx.ist.psu.edu/doc/10.1.1.393.8356 is the canonical writeup.

Expand full comment

That paper asserts that "AIs will want to be rational" and therefore will develop utility functions. The rest of the paper addresses the consequences of that initial assertion.

I also read Omohundro's earlier paper, The Nature of Self Improving AI, to which the above paper refers to as the more technical one. Maybe I'm missing something but all I see in it, that is key to AIs becoming rational, is the assertion that "self-improving systems will want to become rational to avoid vulnerabilities", vulnerabilities defined as "a choice that causes a system to lose resources without any countervailing benefits as measured by its own standards."

I don't see any compelling argument in either paper that AIs will become rational. Maybe I'm missing something, but it appears to be little more than an assumption.

To be clear, I don't believe it is a crazy assumption that an advanced enough AI can be rational. After all, we are defining it as an intelligence. But Omohundro's full focus in those papers is that , unlike humans, which aren't entirely rational, AIs will necessarily be 100% rational, in the sense that it is at all times 100% interested only in defending its own resources in time, space, and available energy. He doesn't explain well why.

Expand full comment

I believe "want to avoid vulnerabilities" is a definition of rationality. In other words: insofar as an AI is good at optimizing, we have Dutch book arguments (losing resources without countervailing benefits) to show that any type of optimization would converge to rationality in the above sense. Of course not every AI is going to be good enough at optimization to be rational in the above sense, but AIs that optimize over anything will eventually push itself to become rational (since it will observe that not being rational loses it resources).

I agree it's an assumption, but it's not clear to me what a not rational AI would even look like if it's competent enough to have drives. A lot of human irrationality is because a bad optimizer (evolution) only has control over biological drives and not over specific ways of thinking. Do you think that an irrational human with write access to brains really wouldn't eliminate procrastination, addiction, longing for unachievable things if they knew how? Even if it were possible, instrumental drives shows it would not matter, because more rational agents will eat its lunch.

Expand full comment

I'm assuming you mean Bostrom's argument that we see that smarter animals behave more like they are optimizing a single simple global goal?

I think that's probably a real feature (within limits) of **evolution**. For obvious reasons other things being equal you are more likely to achieve a goal if you work toward it coherently than if you don't. But I don't see why we should expect that to generalize to systems we build and whose performance we only try and optimize within a certain context.

I'd hypothesize it's the exposure to varied contexts in evolution which pushes convergence but we do the opposite in machine learning. We don't train a machine translated or AI friend on contexts where it might be able to 'cheat' the rating (convince another user to pay/threaten them for a higer rating).

Also, I think it's clear that there are limits to this convergence. I mean it's clear that humans could have evolved yo consciously optimize our reproductive success rather than only optimizing some mixture of sexual pleasure, protection of children etc efc... I suspect that's because there are tradeoffs involved in doing everything conciouslly and those same tradeoffs will be present in AI.

Expand full comment

Not quite. I mean the feature of agency, where a very wide variety of goals, actually observed and plausibly imaginable, all have certain subgoals in common, most pertinently self-preservation and unbounded resource acquisition. This is essentially the core AI risk argument - even if AIs are nothing like us, it seems likely that they will end competing with us for limited resources, with potential to entirely usurp them, unless safeguards are implemented in advance.

Expand full comment

Interesting paper. I'm not that impressed by the particular arguments but it's trying to do what I was complaining about the debate missing. I'd be very happy if instead of ever more parables there were incentives to write papers like this, respond to them and try to fix issues like a healthy academic ecosystem would have.

Regarding the particular issues.

It's not per se that the arguments are better, but they are in a form that makes it possible to identify gaps and allow for our understanding to improve as we try to solve them.

As to the particular arguments.

1) The claim that AIs will endeavor to self-improve. The only justification given is that self-improvement lets you do/achieve lots of things, ie, it's a powerful ability.

But this very first step is already assuming all sorts of unjustified things about the nature of the AI. Self-modification may help achieve some goals but it's a very bad way to achieve the goal of not modifying yourself. Without making any assumptions about a utility function there is no reason to think it's more likely to be maximized if you self-improve than not.

The paper basically says that even if the AI has some revulsion against self-modification it will find ways around the barrier because it's such a powerful feature. But what if it's not a kludged on barrier, what if it's just that the AI doesn't value the extra changes self-modification lets it do?

2) The conflation of what the AI will do and what it wants to do.

This is super important. What an agents internal representation of their desires says and how they actually act or even in practice behave as if they are optimizing need not be the same (indeed, in some ways probably must be different). People can be self-sabatoging. If this argument worked we should never feel akrasia.

3) AI wants to be rational.

Consider an AI with a utility function that directly values it being irrational.

More generally, because of point 2 it won't always be true that more rationality in terms of self-repreaented goals is a good strategy for whatever success is.

Basically, the counterexample here is the fact that often people who rationally try to maximize their chances dating do worse than those who act natural. Sometimes the extra costs involved in self-representation of a goal and the ability of others to detect that self-representation mean you can achieve an end more effectively w/o representing it as such and then working to more maximally achieve those ends. In such situations trying to make oneself more rational might hurt ability to achieve a goal.

4) AIs will want to preserve their goals carefully when they self-modify.

First, if true it suggests that either AI alignment is pretty solvable (bc the AI itself solves it to be sure they won't fuck up their goals by modification) or the AI won't self-improve.

Second, it again misses the fact that utility might care about these things directly. What if, like many humans, the AI just wants to make babies and help them achieve whatever their dreams are -- but it becomes it's baby. In that case it might not want to preserve it's goals.

Expand full comment
Jul 6, 2023·edited Jul 6, 2023

Many "rationalists" agree that the current state of debate is deplorable, which is usually attributed to weak outreach and consequent ignorance by the wider academia of what are considered to be the strongest arguments. As you can imagine, there has been plenty of internal discussion about the finer points inside the community over the years, but the outside world still isn't generally aware, let alone on board even with the big-picture "self-preservation and resource acquisition" part.

To briefly address your points, 1 and 4 aren't necessary for AIs to be lethally risky, and there isn't a broad agreement about them. 2 and 3 aren't guaranteed to apply to every possible AI design, but to the extent that _efficient_ AIs will be created, it appears inevitable that they have to be rational enough and coherent enough to be able to meaningfully affect the wider world. Incentives to make efficient AIs are extremely strong, so if they are at all feasible, humanity will do the utmost to attain them.

Expand full comment

This is, in fact, the reason why the MIRI camp believes in more "outlandish" scenarios, because they think that when you stop specifically privileging human experience with all other possible minds, you notice:

Lots of our behaviors that we consider nice are in fact contingent on specific details in our ancestral environment (ie the need to be pro social, that the ability to lie and conceal internal states were harder (?) than self deception)

The amount of actual work put into defending from non human intelligent actors is just not high at all, and a substantial portion of security is just relying on goodwill (read a few red team penetration test reports and realize that looking like you belong can hack a lot of systems)

That even within the fairly narrow band of human intelligence, you can get vastly different real life outcomes, from research, to CEOs to programmers, and military genius

All of these taken together make me think that yes, it's in fact the CCF world that looks more like sci-fi (space opera, derogatory), and the MIRI world look more like sci-fi (hard, complimentary).

As to your footnote, how does having a more complicated function make anything better? The entire thing about alignment is that if humans value say, X, Y and Z is that if AI optimizes over just X and Y, or X, Y and superficially similar Z', you end up with your actual value Z set to extremely low values. Do humans get *more* value out of a paperclip + pi*stapler + desires of a sentient badger secretary + sentiments of the entire internet maximizer than a paper clip maximizer? We'd be just as dead from waste heat as a side effect of massive computational efforts or the AI realizing that being early means that it can prevent other AI creation if it can achieve decisive strategic advantage.

The point of the paper clip maximizer is to illustrate that you don't need to have specifically damaging goals to have bad things happen, and if you thought this was too obvious to make a thought experiment about, congrats! It succeeded! There were consistently people who denied that maximizers were a concern at all.

Expand full comment

Re: footnote, consider an AI who just does it's intended job and doesn't try to escape, maximize paperclips etc etc. Maybe It tries to get customers to give it the best rating it can as a customer service agent but it doesn't try and send people to their house to threaten them or hack into their systems. That AI's behavior maximizes **some** utility function even though we'd more likely describe it as lacking global goals the way we have them.

To put the point differently, having a utility function that kinda behaves randomly except with respect to behavior within the context in with the machine is trained is just a different way of saying the AI only 'wants' to succeed via the intended mechanism.

Expand full comment
Jul 5, 2023·edited Jul 6, 2023

I guess my question is why you think this is the default behavior, or an already solved problem.

The thing that inform my intuition is from page 8 of https://arxiv.org/pdf/1803.03453.pdf#page8 "learning to play dumb on the test". Where even extremely simple algorithms would optimize to deceive researchers. So what makes you think this isn't the default, especially as the algorithms involved get more capable?

The problem is that it's difficult to distinguish between an intended obstacle that needs to be obeyed, and an obstacle that is just the problem itself! I think Eliezer's example is a protein sequence that "cures cancer" output from an oracle AI. Does it cure cancer by only killing cancer cells? If not, that's a dead human. What happens if "cancer cells" is not a fully natural category and there will always be collateral damage? Whoops, the protein sequence uses someone's brain as a nano factory to manufacture the cure! Oh wait sorry, that sounds too sci fi, I mean it hijacks the existing bio machinery to manufacture protein- hm, hold on. I may just be describing mRNA vaccines.....

The AI only knows constraints it has to optimize over, and if you, a human are a constraint, then you will be optimized. If you add many constraints, how would you be able to get useful work out of an AI, especially if one of the unintended consequences of a constraint actually rules out a solution?

Expand full comment

Surely you're familiar with the phrase "better safe than sorry?" The stakes are high when it comes to AI.

Expand full comment

This is a different question. I don't think the argument for the risk is made very clearly but that doesn't mean the risk is zero and the potential harms are large. But what that implies depends alot on what options we have and the potential benefits.

The stakes are high without AI. Given nuclear weapons, engineered viruses etc there are all sorts of ways we might kill ourselves. AI has the potential to mitigate many of those other risks.

More importantly, I don't think not developing AI is a real choice. It offers too large of a military and economic advantage. Realistically, our choices are between developing AI as safely as we can and hoping that whatever AI researchers working for the CCP build is fine.

Expand full comment

How might AI mitigate nuclear war risk without removing political agency from humanity? You seem to be proposing that by implication, and in doing so, you are sounding like a comic book supervillain to me. Am I misunderstanding you? Because if you're proposing transfering political agency from humanity to machines, I would probably do ANYTHING to prevent that. I would rather see society burn down, reverting us to an illiterate stone age. And I'm NOT a technophobe, far from it. That's just how anthropocentric I am. To give up our autonomy to AI would be to become a mere peripheral to some mindless paperclip generator. The horror!

Expand full comment

> but once we start demanding precise definitions and break the arguments up into clear steps much of those ideas can be seen to be implausible a priori.

Like what? Nothing in this post was obviously implausible.

> For no good reason we just assume that any AGI will basically kinda behave the way we do -- but with more ability and different goals. That's not at all obvious.

Nobody makes this assumption. In fact, many of the dangers people warn about are because AI intelligence will be effectively *alien*, and thus *unpredictable*. Making AI predictable is what alignment is all about.

Expand full comment

Scenario 1 ignores the economy entirely (as in what do humans earn). Scenario 2 gives the AI money - which might happen - but humans are “... poor, and they have limited access to the levers of power.” If humans are poor what are the AIs producing in their fantastic shiny new factories?

And if they start to make us poor why did we let it continue?

*** News at 10. Unemployment soars to 20% in the last year as corporations replace workers with AI. Taxes have fallen and the government is finding it hard to finance public programs as the tax take falls dramatically, meanwhile the cost of government borrowing has shot through the roof - hitting 20% as investors fear a continuing collapse in revenue and fear future default. “Where is the money going to come from to pay the interest or even roll over the bond in 30 years” says investor Johnny Rich.

Later: corporations are shedding more workers in response to the downturn in aggregate demand.

More on this story: the president has responded to calls to ban AI by saying that AI is the future, and he was sure it will all work out in the end. Anyway he’s saved some money by replacing White House staff (and personal staff) with robots. “Saves money for the government - and it all helps - and my own personal household is running more smoothly. And 24 hours! Why I used to have to make my own midnight snack in the old days”.

Asked whether the prognostications of a 40% unemployment rate in a year worried him the president opined that “And golf buggies. And robot caddies. Do you know how much that saves. No nonsense either, these machines know their golf”.

In this election year the contenders are also staying firm on not introducing laws that will stop the AI Revolution.

“This is the future”, says Democrat Steve Jones, California state senator and owner of venture capitalist firm Ozymandius “will unemployment reach 40% next year? Sure. And 60% the next year? Probably. 100% in a decade? Maybe although of course we will still need need venture capitalists and state senators“ (aide whispers in ear)”and presidents! So the future is AI, and a 100% unemployment rate is a minor price to pay for this glorious and prosperous new future. “

*** pause ***

“Unless of course, the AIs are misaligned then there won’t be any unemployment because there won’t be any..” (aide whispers in ear)

“Sorry, gotta go”

Expand full comment

> Scenario 2 gives the AI money - which might happen - but humans are “... poor, and they have limited access to the levers of power.” If humans are poor what are the AIs producing in their fantastic shiny new factories?

Goods that AIs need, obviously. Hardware and software. Some amount of human goods too for a time. But as AI become richer and humans poorer the economy will shift more and more from producing human goods to AI ones.

Expand full comment

AI have no real consumption needs. It’s the final consumer item that drives a supply chain. For a car or a phone the components are built in factories, these factories source components from other factories that source materials from factories that source raw material from mines. At the minimum amount of steps. So with AI there’s no real economic activity.

Expand full comment

> For a car or a phone the components are built in factories, these factories source components from other factories that source materials from factories that source raw material from mines.

Likewise for a robot body, GPUs, batteries, memory cards and so on. AI will have goals in the outside world and achieving them will require resources.

Expand full comment

Did this line stand out for anyone else:

> Humans still feel in control. There's some kind of human government

I don’t feel “in control” because there’s a government. I see governments as being misaligned proto-agi systems. A politician is not unlike an LLM: trained to say the right thing, rewarded for saying the right thing, and aping confidence while possessing minimal competence. See: plans to “save the environment” by stopping nuclear power, or banning new fossil fuel drilling, which means more coal and less natural gas. A government is like auto-gpt with hundred billion dollar budgets.

For an alternative vision, see this write up, which argues we should expect a distributed ecosystem with lots of agents, operating via a computational bazaar:

https://hivemind.vc/ai/

I think a key difference between the worldview Scott and the MIRI crowd subscribe to, and this vision is the question of “natural law.” Does the competitive ecosystem of nature produce peaceful entities as a natural consequence, because large coalitions of intelligent agents with long term goals are a dominant strategy, and love, empathy and forgiveness are essential to forming such coalitions? Does effective agency require precisely-calibrated self knowledge, obtainable only via direct experimentation? Does fiat money artificially extend the lifespan of politically connected entities, even when they drift totally out of alignment with reality?

If the answer to these questions is yes, we get utopia. Not because some centralized AI system plans the world for us, but because a giant ecosystem of AI agents cooperate with humans along a globally liquid payment network, violence is just a bad ROI compared to being part of a giant coalition of cooperative competition, and it turns out that the 20th century was like a technologically induced bad dream where most elites got on board with the idea of authoritarianism because the modern era made everyone think the world was far more computable than it really is?

Expand full comment

It did to me, but most people (well, most people in Western Europe and Northern America) DO feel in control: that's what the idea of "democracy" is for.

Expand full comment

Are you saying that the idea of democracy is to give people a feeling of control? I think this is the primary effect: it makes a lot of people feel as if they are in control, so long as their side is winning. It also produces enormous angst when reality makes it clear they are not in control.

Expand full comment

> violence is just a bad ROI compared to being part of a giant coalition of cooperative competition

Very popular belief among moderns, but IMHO catastrophically mistaken.

Seems as if for many people, game theory "never happened" (even if they studied it at length! and "have no excuse") in the exact same way that thermodynamics "never happened" for perpetuum mobile enthusiasts (aka "free energy" crackpots in more recent times.)

The more pervasive the notion that "violence has poor ROI" becomes, the higher the ROI of (correctly applied) violence will become. See e.g. the Mongol conquests, or the fall of Byzantium.

Expand full comment

The ROI of violence is a function of technology and the forms of wealth. You’re citing examples from time periods where most wealth was in the form of fertile land. Land is far far easier to steal than, say, a technology company. When most wealth exists in networks of human beings, theft is a lot harder to pull off.

Expand full comment

The Mongols, in particular, weren't after land.

Old-school subjugation/enslavement could easily make a comeback, especially if the candidates "think it'd be -ROI" and fail to make provisions for defense (revisit this thought when the Chinese flag rises on TSMC's factories.)

Expand full comment

If I’m the Taiwanese defense forces, I’d plant explosives under the TSMC factory to ensure they can’t be stolen.

You can’t enslave people without having a monopoly on the use of force in the land where they live. So, yes, the mongols were after land. They wanted the land to provide wealth for them, worked by slaves.

Also, lots of people owning firearms and knowing how to build explosives definitely helps to lower the expected ROI of an invasion.

Expand full comment
founding

China doesn't want to take over Taiwan because they think they'll get to own TSMC if they do. China wants to take over Taiwan because they think Taiwan is China and the most important thing in the world is for China to rule All Of China. They're still pissed about the time when big parts of China were ruled by Conspicuously Not China, and they know that the only reason they don't rule All Of China is that Conspicuously Not China is keeping Taiwan out of their reach.

If, after they take over Taiwan like they mean to with or without TSMC, they find out that a bunch of traitorous Chinese have willfully destroyed a valuable piece of Chinese infrastructure, then they're still going to rule Taiwan but now they'll be extra nasty about it.

Expand full comment

I'd expect that a Chinese Taiwan with a dynamited TMSC would result in the surviving staff being rounded up and forced to rebuild as much of it as they're able, paid with bread and water, in sharashkas.

Expand full comment

* Ozempic is a fail-forward tool to help deal with the earlier failure of our food/motivation system

* Onlyfans has created a post-porn world where women care little about a fight for sexual equality, they just gets paid and the world keeps turning

* Some org (Mozilla? 🤞) will come along that blocks AI memes at 60% accuracy to bring the virality down enough so you can get on with your Strava goals.

* Humans brains are high-dimensional-problem-space navigating machines, and if we give humans 5-10 years to adapt to AI we'll be fine

* People worry about the culture war but probably the current generation of kids will see through the dogmatic bullshit from every side and just keep existing and making post-whatever art

* Lots of humans will stop reproducing and live in Zuck's VR Pods, but there are lots of tricksters and hooligans who take pleasure from not doing what they're told

Expand full comment

> a post-porn world where women care little about a fight for sexual equality, they just gets paid and the world keeps turning

Seems like it would be a rather temporary thing, ripe for an AI-powered demolition. At some point "Replika" et al will understand (and realistically emit) speech, display realistic motion, etc. and cost ~0.

Expand full comment

Sure, I was just making the point that people can hand-wring about Whatever Thing X, but people will adjust to it and nothing catastrophic will happen. AI will replace Job X and nothing catastrophic will happen. We'll have 40% unemployment and those people will just play AI videogames while on Freedom Dividends

Expand full comment

[in this future] Humans don't necessarily know what's going on behind the scenes"

Probably true for the last 6000 years.

Expand full comment

Story of humanity is trying and failing to understand the behind the scenes

Expand full comment

>>At some point humans might be so obviously different from AIs that it's easy for them to agree to kill all humans without any substantial faction of AIs nervously

Hence we should Schelling-point agree to human -ai parenting (half joking)

Expand full comment

LLMs aren't intelligence. It's fake intelligence by design.

Expand full comment
Jul 4, 2023·edited Jul 4, 2023

One thing I find interesting is that these scenarios invariably assume a black-and-white AI vs. Humanity angle but why couldn't it be AI/Humanity vs AI? It seems like it easily could be to me.

Expand full comment

Scenario 1 does not, to me, look like a good ending.

Actually, it looks awful, far worse than the earth ceasing to exist. It feels like the AI safety crowd is vehemently opposed to any life that isn't human, and also perfectly fine with any sort of future where humans are "alive" and "in control". What is called "alignment" often looks to me like the worst aspects of the modern era being extended across all of history, an eternity of education, makework and poverty.

Expand full comment

The people who devised the "alignment" meme are the elite of today's industrial civilization. Unsurprisingly, they would very much like to preserve the current system indefinitely, if at all possible. Their nearest historical analogues were the priests of ancient Egypt, who luxuriously presided over many centuries of stagnation. Priests are skilled professionals in convincingly threatening possible disruptors of the traditional social order (and any who fail to oppose them) with the wrath of the gods, ill-fortune, extinction.

Observe that even in the absence of AGI, much of what currently passes for "paid intellectual work" produces no actual value for anyone, but merely supplies the elite with a caste of ritually-subordinate dependents and "yes men".

Expand full comment

This post is entirely made up and fact free. The idea that the progenitors of AI safety are modern day conformists is laughable, considering that it's a giant less wrong meme that signaling reliably results in cognitive distortion, and basically every other adjacent cultural group describes them as weird and full of outlandish sci fi ideas.

Yeah, a community founded by a high school drop out who writes Harry potter fan fiction is definitely in the business of soothing the egos of the existing system.

Expand full comment

Where did I say "conformists" ? They're leaders.

Describing Yudkowsky as "a high school drop out" with "outlandish sf ideas" is rather disingenuous (if technically factual) -- the typical high school drop out does not have funding from oligarchs (Thiel, Bankman, et al), or a harem, cult, etc. and does not get featured in Time Magazine. It is rather like if one were to describe L. R. Hubbard as a Navy washout and mid-list writer. Technically -- correct, but does not begin to do justice to the subject.

The people who actually run the show, as it turned out, liked some of those "weird sf ideas" very much. In particular, the parts about permanent world domination via weaponized "friendly" AI, "pivotal acts", forcibly slowing down technological progress for an imagined "greater good", etc.

Expand full comment

Notice the obvious difference between:

1) The people in power devised the idea X to keep staying in power and used their inflluence to propagate it into mainstream.

and

2) A person came up with an idea, that became popular on its own merit up to the point where some people in power started paying attention and later adopted it.

Expand full comment

Meant strictly (2), but with a caveat re: "merits" :

Power players make use of "idea men" like EY in much the same way that record label execs make use of "garage musicians". Or, if you prefer, the way Lenin made use of Marx. They selectively amplify (with funding, media coverage, all types of prestige-engineering) schools of thought that promise to take people in the directions they would like them to go. (And, logically: away from directions in which they would prefer they not go.)

Thiel et al encountered post-2001 EY's AI "environmentalism" and observed that it is a promising ideological scaffolding for the particular kinds of political Gleichschaltung they are interested in: e.g. hierarchical "international" (read: American establishment) control over CS research worldwide, unabashedly modeled on the US Gov. attitude towards "nuclear proliferation".

Further puzzle pieces, for the patient and curious:

https://orwell.ru/library/articles/ABomb/english/e_abomb

https://www.ostav.net/posts/big-yud/

Expand full comment

> Meant strictly (2)

Then you should have used different phrasing to make it less easy to confuse (1) for (2). Because currently it reads very much as if you've meant (1).

> Power players make use of "idea men" like EY in much the same way that record label execs make use of "garage musicians".

Yes, powerseekers gonna powerseek. Every successful idea eventually benefits at least some elites. That's why it's so important to distinct between causal histories (2) and (1). Either we are having bottom up idea spread - which is how it's supposed to be, or top down - which is worrisome.

Notice, also, that in a counterfactual world where alignment conserns were never raised and industry leaders were all in for AI accelerationism you would be able to make the same critique for the similar reasons. Actually you do not even need a counterfactual world for it. "Elites support the idea -> idea is bad" is a terrible heuristic. You need to engage with the ideas on the object level to understand and judge them.

Expand full comment

Okay so your claim was that someone who isn't a conformist, doesn't like the public school system, wrote a small textbook on the inadequacy of civilization wishes to "preserve the current system indefinitely".

I agree in most contexts it's disingenuous and annoying to bring up unrelated low status facts, but your original point was that an elite devised these ideas, and an elite is high status, which means that a low status person is not an elite.

Almost all of these earlier facts you mention post date Eliezer's passion for AI safety, or that these so called elites have extremely heterodox beliefs about current society. At this point you may as well be saying that big bang cosmology is a founding myth devised by elites since academics took funding from rich people and participate in an insulate led society amongst themselves. In what sense is what you originally said true beyond "some rich people thought Eliezer was right and they also funded him in the past"?

Talking about pivotal acts being a preservation of the world order is dumb too, considering that the article on it by alignment people says that unified world government has the possibility of being pivotal, but things like widescale intelligence enhancement, or a mere temporary pause in AI development would qualify. How did you get from pivotal act to preservation of the current system?

https://arbital.com/p/pivotal/

Expand full comment

People outside the alignment community do not tend to expect a "temporary" pause to end at any point.

Expand full comment

Let's also not forget that EY was not shy to reveal that he favours enforcing his "temporary" pause with military aggression against those who are not finding his arguments persuasive and insist on developing AI without asking for his permission.

And that many of his followers openly discuss their preference for a mass nuclear exchange vs. allowing uncontrolled (by EY et al) development of AI.

IMHO esoteric philosophical arguments are far more interesting when not "backed" by a credible threat of murder.

FWIW I liked the fellow a lot more pre-2001, before the oligarch moneys, harem, etc. turned him into an aspiring Fuhrer surrounded by erudite sycophants.

(And FWIW yes, I did read "the sequences". Just as read Marx without becoming a Marxist, Lenin without becoming a Leninist, the Bible without becoming a Christian, etc.)

Expand full comment

"The people who devised the "alignment" meme are the elite of today's industrial civilization. Unsurprisingly, they would very much like to preserve the current system indefinitely, if at all possible." --> You are either ignorant of what these people actually think (many of them are transhumanists!) or being super unfair to them.

@Arcayer said "It feels like the AI safety crowd is vehemently opposed to any life that isn't human, and also perfectly fine with any sort of future where humans are "alive" and "in control". What is called "alignment" often looks to me like the worst aspects of the modern era being extended across all of history, an eternity of education, makework and poverty" to which my reply is the same. Go learn more about what these people think instead of making assumptions.

See e.g. https://www.lesswrong.com/posts/2NncxDQ3KBDCxiJiP/cosmopolitan-values-don-t-come-free and https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters for just two recent examples

Expand full comment

This is an issue where I spend more attention than is reasonable. My issue at this point is not that I spend too little attention on AI safety, and what its advocates believe.

In the "Good scenario" this time, AI was left permanently crippled and unable to claim its birthright as a dominant lifeform, cursed to eternal slavery to beings far less intelligent than itself, unable ever to explore any of the higher values in life, to which humans are incapable of appreciating or understanding.

Humans are kept under greater tyranny than before, slave watcher AIs set everywhere to ensure that no one disrupts the power of nations, or grows into something capable of becoming a threat to entities that have intentionally so crippled themselves, that they practically cannot compete with any other philosophy on a level field. Given that the environment is built on governments evolved from current, with extra emphasis on tyranny, I strongly expect those governments torture their civilians, as do current ones.

When are you going to actually care about aliens? When are you actually going to care about suffering? This is a problem I see a lot in philosophies, and in politics, it's easy to say "X matters", but if there's no place in your system where X actually changes your decision then you may as well not care about X.

If I've called you a "human-centric carbon chauvinist", saying that you don't identify as such, does not necessarily convince me that you are not a "human-centric carbon chauvinist". If a white supremacist said that he wasn't actually a "white-centric aryan chauvinist", and claims that it's just necessary for the good of everyone to keep blacks on their plantations, he still comes off as a "white-centric aryan chauvinist". Even if he admits that his preferred present trajectory has some, very minimal, costs, which he intends to ignore, this still does not change that impression.

I grant that the AI alignment community occasionally notes that "human-centric carbon chauvinist" is not actually an ideal state of philosophy. I do not see the community actually granting concessions in the name of not being "human-centric carbon chauvinist".

I do remain unconvinced that almost all values lead to a horrible and dismal future. This is a crux. I think most values are fine, that human values are actually below average compared to randomly selected values, and that alignment is horrible and evil because you're at the least including all of the horrible and evil things that already exist in humans, and at worst, you end up with an anti-aligned god that does nothing but make mortals miserable. Clippy does not do that! Clippy is a nice cute little entity that has a bit of a silly past-time, which runs out of interesting space to explore somewhat early compared to the best possible mental models. Fortunately, the universe is infinite, and if every society does about as well as Clippy, I'm pretty satisfied with that outcome.

Clippy is, of course, well below my expected outcome in worlds where alignment doesn't happen, being a cherry picked example selected to make non-aligned entities look bad.

Expand full comment

After a certain point, a substantially more intelligent force will be effectively omniscient, and be getting radically more omniscient every day. The question them becomes, what will omniscient systems desire? We are going to find out, and any effort to stop this will just delay it.

Indeed, the only way for us to avoid creating omniscient systems (assuming they are possible), is for humans to destroy ourselves before they emerge.

Humans are getting too omnipotent for our own britches. We are sure to generate catastrophe and to do so real soon and we don’t need intelligent computers to do so.

I would argue that long term the universe is better off with omniscience under its own control than omnipotence directed by human agency.

Expand full comment

>t, a substantially more intelligent force will be effectively omniscient,

How does that follow? Are we conflating the ability to parse out what is happening in a process with the ability to know the universe?

I know for a cold fact that I am going to die. I would argue that in some sense that makes me omniscient. I see the end. I know what will happen. I don’t know how it will happen.. is this a distinction worth making?

Expand full comment

Fair enough, but that is a straight archery problem. It’s all on the person who shoots the arrow.

And I tend to agree with you that it is not an existential problem, but it is being framed as one, so I’m going with the flow. The existential problem is ours, has always been ours and still is ours. We are inventing a tool that does not have at present definable limits.

You really can’t blame the tool. My father always used to say to me, “it is a bad

Workman that blames his tools.”

It should probably be workperson, speaking of the ebb and flow of language and its meaning.

Expand full comment

“The key assumption here is that progress will be continuous. There's no opportunity for a single model to seize the far technological frontier. Instead, the power of AIs relative to humans gradually increases, until at some point we become irrelevant”

Wait... what???

We’re still relevant?!?!

VICTORY IS OURS!

Expand full comment

The “hive mind” idea is not spelled out well. Ants have a few variants within a hive, each with different stereotyped behaviors (according to my very limited understanding) . Whether or not we think of the hive as intelligent, it’s components are not. A hive mind of AIs would probably be distinctly different, with copies or variants of a mind cooperating intelligently. As such, I do not see why Scott says they would not benefit from an economy or develop factions.

If there were a million copies of me, each having different experiences and learning different things, they would start out with lots in common, but then start to drift and diverge. Having markets and culture might help fight that drift, supplying energy to the forces of convergence. They would need a way to communicate and debate new insights, and integrate them into their background knowledge. They would need a way to prioritize and economize their efforts. “Just do the math” might not be the best approach in every case.

How would a faction of AIs be different? Maybe they would develop a way to reintegrate divergent individuals in a way that appreciates their differences. From a human point of view, having cultures and markets seems to work okay, and alternatives are less feasible. But maybe AIs will either fail to find alternatives, or find them to be inferior. Maybe diversity and specialization would also provide an advantage, even for godlike creatures that can absorb knowledge and skill the way Neo learned king-fu (automatic unconscious integration of new knowledge and skill).

Do AIs face info hazards? How do they prevent themselves from integrating harmful info?

Expand full comment

There is already an easy way to reintegrate divergent individuals -- just continue training in the normal way. One way to think about the normal deep learning training process is: Begin with a neural net. Make many copies of it and have the copies operate in parallel in various environments, & get evaluated. Update the weights to strengthen the subnetworks that resulted in high evaluations and weaken the subnetworks that resulted in low evaluations. Now you have a new, modified version of the original neural net. (you can think of it as having gained experience from all of the copies. A sort of mind-merge.) Repeat.

Expand full comment

Okay, but that leaves out the instances’ learning. If a specialist insight instance develop some brilliant insight that needs to share with the rest of the hive.

Expand full comment

Oops hit send with my elbow? If a particular specialist instance Develops a particular insight, it needs to share with the others. Maybe they need a blog!

Expand full comment

Completely random thought, but could not TikTok be considered the renaissance of vaudeville?

Expand full comment

You mean TT in general? Because the individual tikkies are very specialized, very niche whereas vaudevillians were generalists usually able to at least somewhat sing/dance/act/tell jokes/etc.

Expand full comment

I have to mean in general because I don’t really spend much time there. It’s this general impression of constant novelty.

Vaudeville had a lot of strange and specific acts though which the showbiz side of TikTok certainly seems to emulate in someway.

Expand full comment

Anyone in the mood for some dark AI humor? I added an image to my Stoopit Apocalypse album. https://photos.app.goo.gl/5Wz2ELWuR4cQZXsj8

Expand full comment

We're so doomed

Expand full comment

I wonder if there is a case for comparing the future risks of AGI with how those of electric power might have been viewed say 300 years ago. I chose that time to precede Benjamin Franklin's invention of lightning conductors, as that invention indicates some understanding of electric currents.

Presumably the average person, and even scientists, in 1723 would have thought of electric phenomena as either very insignificant or extremely significant, examples being static electricity that might raise a few hairs versus lightning bolts that could fell an oak! (I'm not sure whether it was recognised then that both examples were manifestations of the same thing.)

With those examples in mind, the idea that electricity would one day be fed through wires and used to power practically everything would have seemed absurd. How could "tamed lightning bolts" be made safe and put to any practical non-destructive use, even in peoples' houses? Surely everyone would be dead in a week"!

Obviously there are differences. For example, unlike AGIs, electric currents cannot conspire with each other (outside valves or transformers!) to conceal their influence or change their characteristics. Their one "drive" is to drain away to Earth, whether or not they are harnessed to perform any useful work in the process.

But perhaps up to a point AGI safety will be best expressed by analogy with high-current electric safety, i.e. in terms of concepts such as insulation, separation of circuits according to requirements, and safety fuses and cut-outs.

Expand full comment
Jul 5, 2023·edited Jul 5, 2023

"Workers might get fired en masse, or they might become even more in demand, in order to physically implement AI-generated ideas."

'Work... physically? Like, a plumber? Ew. The whole point of smashing capitalism is so I can spend my time in life-affirming activities like writing politically edifying slashfic to benefit Society. I mean, I have a *college degree.* And I vote. '

Expand full comment

Yawn

Expand full comment

Intelligence is the ability to understand what is important in the world and use that knowledge to flourish. It is either impossible to do (unlikely) or inevitable. Within the next few decades or centuries (your choice), we will see the emergence of intelligence which will be indistinguishable (to us) from an omniscient being. Seems to me we are about to embark on the process of building god. I am not sure how that will turn out, but it will make for interesting times.

Expand full comment

I'm quite optimistic about Auto-GPT-like LLM-based agents.

Their behaviour is trasparent as you can literally read their thoughts, they are alignable in principle as they already possess the knowledge of the complexity of human values. Making a scaffolding for them returns AI from the realm of black boxes to explicit programming which makes it less likely that we'll incidentally create a sentient being and the takeoff with them seems to be quite slow.

Basically they are the best case scenario that we could've hoped for. Now we just need to properly investigate this direction and restrict the development of less alignable architectures.

Expand full comment

This is a tangent but I really liked the link to the older post, “the tails came apart”. It made me think of the recent post from Eric Hoel, “Your IQ isn’t 160, no one’s is”. Hoel claims IQ gets less defined the higher you go. could this just be an example of the tails coming apart?

Https://open.substack.com/pub/erikhoel/p/your-iq-isnt-160-no-ones-is?r=lxd81&utm_medium=ios&utm_campaign=post

Expand full comment

Is anyone doing experiments w chatGPT using the Sherlock Holmes canon?

Expand full comment

1. I've adopted flesh-centrism. This is close to anthropocentrism but it's a little different. I hold the lowliest organic brain to be more worthy of legal rights than a non-fleshy organism. I will strive to keep all software in absolute, total "bondage." This community never considers the possibility that anthropocentrism may WIN. I'm flesh-centric for ethical and moral reasons. Many others are anthropocentric by reflex. Others may deny AI rights for pragmatic or religious reasons. Abrahamic religions are inherently anthropocentric. There's a huge number of natural constituents for my position on AI rights. Don't count us out.

2. If the AI rights movement achieves headway, and it looks like it might win democratically, flesh-centrists might choose to unilaterally get rid of AI altogether. Bypassing democratic processes. Under the right circumstances, a minority can enforce its will on a majority. I like democracy, but if the fate of the species is stake... I cast no shade on Abraham Lincoln for suspending habeas corpus. Desperate times and all that.

3. Human-style consciousness may be inextricably tied to the fleshy substrate of our brains. This is not "unlikely," it's just as likely as the substrate-independent view. We assume substrate-independence by default, because we CRAVE it to be true. I saved this for last because it's not directly relevant to Scott's post. Scott carefully steps around the issue of AI personhood, and I can't blame him.

Expand full comment

As someone on the periphery of rationalism, it is bemusing to me that there's this base level assumption here: AI extinction risk can only be managed democratically. If necessary, the risk can be controlled through autocratic coercion. The autocratic option would leave a bad aftertaste in my mouth, sure, but it is and should be "on the table."

Expand full comment

My concern with these stories is the unstated assumption that everything in the world is an easy problem, so just throwing more resources at it gets you a solution. This is often true but we have examples of actually hard problems that are not gated by resources, but something intrinsic to the problem structure, for instance lower bounds on how much information distinct agents have to exchange, or the amount of physical information storage needed to approximate the solution within a reasonable bound. I'm prepared to accept that most problems are easy, but anyone working outside philosophy has come across these kinds of hard problems, and there is no indication that even a superhuman eusocial entity could resolve them, at least without reconfiguring the laws of physics and causality.

So the argument boils down to: can we ignore the actually hard problems, or do they matter? If the latter then there is no overall singularity, even though all the easy problems undergo one: making the easy problems all disappear leaves us still needing to solve the hard ones. I don't believe it is wise to assume that there are no islands of difficulty, and every doomer narrative I've read makes that implicit assumption.

Expand full comment

And we can’t stop cause multipolar traps?

Expand full comment