762 Comments
User's avatar
User's avatar
Comment deleted
Jan 19, 2022Edited
Comment deleted
Expand full comment
meteor's avatar

We haven't told it "iterate this number higher by winning at chess"; we've told it "get this number to be as high as possible". The connection with chess comes in at the step where we increase the number when it makes good moves.

The problem with "iterate this number higher by winning at chess" is that this is much harder to "say" (and it would have its own failure modes if you apply it to an AGI).

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Vaniver's avatar

No, the thing that Eliezer is pointing to is that someone will say "I have a plan for a safe AI!", Eliezer will respond "can I see it?", they will respond "here!", and then Eliezer will spot the flaw, and they will say "I have added a patch! Now it is flawless!", and then Eliezer will spot the new flaw, and they will say "I have added a second patch! Now it is even more flawless!", and Eliezer will wish that they had picked up the underlying principles which they should actually be working against, instead of attempting to add more patches which will fail.

(As a different analogy, suppose you have a chain whose links are made of a metal not strong enough for the load that you want it to bear. It's a mistake to, whenever the chain breaks, fix just the broken link and not check whether or not you should expect the whole chain to work, even if it's difficult in advance to predict which specific link will be the weakest this next time.)

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Vaniver's avatar

You're welcome!

Expand full comment
Edward Scizorhands's avatar

It sounds like the "there are no errors on Wikipedia" excuse

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Davis Yoshida's avatar

As a thought experiment, consider running a perfect simulation of a human brain on a really really big super computer. Presumably it'd think human thoughts and plan and so on (provided you gave it input in a form it could comprehend). Would you consider this an AI agent? If you think this thought experiment is impossible in principle, why?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Buttery's avatar

Why do you think so? A perfect physical simulation, atom-for-atom, would by definition perfectly simulate the activity of the brain unless you believe in a non-physical soul.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
DavesNotHere's avatar

Simulated atoms are not real atoms.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
G. Retriever's avatar

We can't even do perfect simulations of a simple molecule, the math gets cosmologically difficult very quickly. Perfectly simulating a brain on a classical computer would require so many resources that we may as well call it impossible.

Expand full comment
Donald's avatar

True. But most of the atoms aren't doing useful thought. You can approximate a lot and still get the same personality.

Expand full comment
G. Retriever's avatar

None of the atoms are doing thought at all. Thought is a property of the system, not the individual constituent parts.

Expand full comment
Davis Yoshida's avatar

As far as we can tell the laws of physics may be simulated (you just need a really really really big computer). If you want to hold the position that human thought has some extra qualities that can't be computed, you need souls or something like that.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Davis Yoshida's avatar

Right what I'm saying is for that to be the case you need some non-physical thing going on.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Doug S.'s avatar

We don't necessarily care about that. If you had a black box that had the same input-output behavior as a human, wouldn't you have something that's as much an agent as a human, regardless of what's going on inside the box: whether or not it had any subjective experience, or consciousness, or anything like that?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Mr. Doolittle's avatar

How would you know that this black box is creating outputs "like a human" if we don't know how humans work? Saying that the outputs are "human-like" covers a whole lot of ground that doesn't get us anywhere. Chat programs have existed for years, and there are computer telemarketers that can be pretty convincing, but all they are is a group of pre-recorded human speech with some parameters about when to play each part. You couldn't ask a question of a computer telemarketer, and the notion is silly if you understand how limited they really are. There's one that calls my house regularly that gets confused by saying "I'm sorry, she's not here right now, can I take a message?" If you happen to say things it has been programmed to respond to, it sounds remarkably like a human and could easily convince someone they are speaking to a real person. That doesn't make it true. I feel the same way about your black box output machine.

Expand full comment
Curt J. Sampson's avatar

How do you know that other humans have phenomenological experiences? What if they're not necessary and at least some other humans you know don't have them?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
arbitrario's avatar

I disagree strongly. Take consciousness. Arguably consciousness is a physical property of a certain arrangement of matter, much like wetness is a property of a certain different arrangement of atoms. Much like a simulation atom by atom of water is not "wet", it only has the informational properties of wetness, a simulation of a brain would only have the informational properties. That is, from the outside it would be perfectly indistinguishable from a real human, yet it won't have any of the actual physical properties.

If (big if!) consciousness is physical and not just an effect of information processing, a simulation of a brain would be a zombie with no consciousness in much the same way as a liquid is wet but its simulation is not in any meaningful way.

Without having to invoke non-physical stuff. Indeed, information is non-physical (it supervene on the physical?) so I strongly believe that the physicalist position is to say that a simulation is not conscious, while the one that posits that it is is the non-physicalist one. (I think that a philosopher would call it functionalism?)

Note that being an agent is just a property of information processing, therefore i have no problem with the idea of an AI with agency

Expand full comment
DavesNotHere's avatar

Information always exists in a physical medium. In what sense is it non-physical?

Expand full comment
arbitrario's avatar

It is encoded in a physical medium, yes. Don't know, maybe I was mistaken in saying that. But notice that information is profoundly different from most (any?) other physical properties in that it is completely independent on the physical medium in which it is encoded: you can store genetic information on a DNA or in an hard disk. Yet despite having the same informational content, physically they are completely different and produce completely different physical phenomena. Put the two in the appropriate environment needed to decode and process information and in one case you get life, in the other not.

Maybe it is not correct to claim that it is non-physical because it's always encoded in a physical medium. I guess what i really meant to say is that the medium is what really count from the physical point of view. The same information in wildly different media leads to wildly different physical phenomena.

Expand full comment
Carl Pham's avatar

That is not true. We don't even know whether the Solar System is indefinitely stable, because we cannot simulate even a very, very simple problem in physics -- the ballistic motion of 8 bodies in a vacuum -- for long enough. Almost all nontrivial dynamical systems are "chaotic", meaning the result of dynamical processes becomes arbitrarily sensitive to the initial conditions quite fast. What that means in practice is that the only way you can simulate accurately the dynamics is by doing infinite-precision math. Which you can't. Or rather, you *can* but only on an analogue computer, meaning essentially you need to build a duplicate of the real system and watch it. But almost by definition no analogue duplicate of a real system will execute the dynamics *faster* than the real system. That is, you could build an analogue simulcrum of the Solar System, but it would not evolve faster than the real Solar System, so it's useless as a predictor.

Expand full comment
Austin's avatar

I've yet to read anyone make a compelling claim that the "laws of physics" may be simulated. Classical Mechanics definitely CANNOT be simulated.

For something to be computable, it has to be simulated to arbitrary precision in a finite amount of time. In classical physics, you can definitely ask the question of "does X three-body problem pass through coordinates Y within Z seconds" in a way that can always be answered by reality but cannot necessarily be answered by a simulation. You can argue that QM constrains "coordinates Y" in such a way that reduces the complexity of physics such that this question becomes unanswerable by reality. But this is far from a consensus opinion. (Most people, even most physicists, think QM increases the complexity of physics relative to classical mechanics). And even if we grant that QM does this, we would still be far from proving that physics is computable.

(Note: It's easy to find physicists who claim that physics is computable. It's hard to find physicists who demonstrate that they understand what the word "computable" means. Both "the set of all functions that have a closed form expression" and "the set of all functions that are computable" are subject to the same sorts of diagonalization and therefore have a lot of similar properties, but they are by no means the same set. I've yet to read a physicist -- including Scott Aaronson -- who keeps these to concepts sufficiently distinct. The laws of physics -- as currently understood -- are described by doing calculus on closed-form expressions, but many closed-form expressions aren't simulatable; especially once you do calculus on them. The computable functions are a measure zero strict subset of the functions that categorically map computable numbers to computable numbers. Many functions that come up regularly in physics (e.g. the integral of sin(x**2)) are not absolutely convergent, and therefore probably do not categorically map computable numbers to computable numbers.

Additionally, it's not just necessary that all of the functions used in the laws of physics to be computable for the laws of physics to be computable. It's also necessary that all of the constants they reference to be computable. Polynomials are computable if they take constant coefficients, but not if they take arbitrary coefficients.

The baseline claim about functions describing physics is that the are mappings from n-dimensional complex numbers to n-dimensional complex numbers. (Complex numbers are aleph1, the set of functions between them is aleph2; computable functions are aleph0).

The claim that the laws of physics are computable makes three extraordinary claims:

1. The equations we currently use to describe physics are correct rather than approximations.

2. These equations are computable for computable coefficients.

3. The coefficients to these functions are computable.

The third of these extraordinary claims used to be popular (Arthur Eddington believed it), but it has since become so unpopular that it is now known by the name "numerology." (Much like the laws of thermodynamics are based on the failures of everyone who disbelieved them, the treatment of "numerology" as a laughable pseudoscience is based on the fact that lots of really smart people put a bunch of effort into advancing it and failed to make any progress. The idea that the constants of physics are computable is on roughly the same epistemological footing as the idea that it's possible to make a type1 perpetual motion machine.)

A priori, we should assign this claim 1/aleph2 (which is much smaller than 1/aleph0 and 1/aleph0 == 0). Moreover, attempts to substantiate such claims have consistently been refuted so we have evidence in against them rather than evidence favor (though we should expect to find such evidence given our priors).

I do not say this lightly, but the claim, "The laws of physics are simulatable" is one of the few sentences renderable in the English language that is even more improbable than the claim "God exists"! Given the evidence that we have that a set of size at least aleph2 is relevant to physics, the idea that physical laws are drawn from the set of computable functions is beyond miraculous!

(Additional note: It's also easy to find people claiming that the Church-Turing thesis is an argument by computer scientist that the laws of physics are computable, but the Church-Turing Thesis is not claiming anything like the idea that a chaos-pendulum is fully describable by a Turing machine. It's claiming that Turing machines can do the complete set of calculations that can be done through deterministic finitistic means that map discrete inputs to discrete inputs. And a chaos pendulum does not use finistic means to map its inputs to outputs, and necessarily operates on continuous inputs and outputs. And if you impose discrete bounds on its inputs and outputs, it ceases to provide a determinstic mapping from its inputs to its outputs. Assuming you can find an iterable technique that measures the constants of nature to arbitrary precision, the Church-Turing thesis might technically make the claim that the constants of nature are computable, but that's a very technical technicality, and I think Church and Turing would respond to it by rolling their eyes and saying that that obviously wasn't what they were talking about with their claim which is too informal to be poked at with those kinds of technicalities.)

Expand full comment
Jakub's avatar

Just some nitpicking, which does not affect the argument.

You wrote that there are aleph1 complex numbers; in fact, this is not true in the sense that this is independent of ZFC (the standard axiomatics of set theory). There at at least aleph1 complex numbers, and the statement that there are exactly aleph1 is known as the continuum hypothesis (CH). The same applies to functions between complex numbers and aleph2 (but it's no longer the CH, but a generalisation of it).

Expand full comment
gpatty's avatar

This is an excellent comment, and while I'm neither a physicist or a mathematician, I have always been suspicious of claims (that we live in a simulation, or that mind uploading is possible, for example) that rely on the assumption that simulating physics perfectly is possible. We have yet to discover a model of the universe that perfectly predicts everything. The only perfect model of the world is the world itself!

Expand full comment
Yitz's avatar

I'm not sure that claim holds up, since there's strong evidence to suggest that physics may be non-deterministic (Cf. almost all of quantum physics since Einstein)

Expand full comment
arbitrario's avatar

The fact that physics is probabilistic does not imply that it cannot be simulated, you would "just" need to reproduce the correct probability distribution. We do this all the time. Well, we try at least. The other arguments presented against are more compelling

Expand full comment
pauls_boutique's avatar

I think we just haven’t discovered good algorithms and architectures to model and implement these subjective experiences on a computer. It’s a hard problem and we haven’t made much progress on it.

Expand full comment
Auros's avatar

This is basically the Chinese Room argument. You put a person in a room with a giant book, and strings of symbols are passed in, and the person consults instructions on the book that lead from that string of symbols, to another string of symbols that gets passed back out.

Now, it happens that the room has magical time dilation, and the book gives you instructions on how to carry on a conversation. The strings of symbols passed in are sentences, and the strings passed out are how some particular person would have responded to those sentences in a text chat.

Ron Searle says this proves that AI cannot be truly conscious. No matter how universal the responses are, no matter how real-seeming, Searle says, obviously, the person executing the instructions does not have the consciousness of the "responder" here, nor does the book.

Those of us who believe in functionalist cog-sci say Searle is being obtuse, and that _of course_ the person and the book don't have the consciousness of the responder. _My brain_ does not have consciousness either, if you tear it out of my body and drop it on a table. _The room as a whole_ functions as a conscious being, for as long as the person inside cooperates with the process, just as my body as a whole functions as a conscious being, as long as my brain doesn't stop cooperating (i.e. become diseased / damaged / dead).

Expand full comment
User's avatar
Comment deleted
Jan 20, 2022
Comment deleted
Expand full comment
Vampyricon's avatar

It also shows an astounding ignorance of how different languages are from each other. I'd say English <--> Chinese* is one of the easy ones, and even that is hard.

*That's without going into what he means by "Chinese". By any reasonable, apolitical categorization, Chinese is a language family, not a language.

Expand full comment
Auros's avatar

Oh sure, the postulated scenario is ridiculous, and you need the magical time dilation effect in order for the thought experiment to make any sense at all. But I think the point he is trying to make is cogent, it just doesn't imply what he thinks it does.

In most electronic computation, you really can separate the _instructions_ from the _hardware executing the instructions_. And that's not so true with the embodied intelligence of humans and other animals. He's essentially claiming that the fact that the instructions, and the thing executing the instructions, are separate, shows a lack of consciousness or understanding. Because the executor doesn't understand, and the instruction set in the absence of the executor _obviously_ doesn't.

The functionalist response is, "You keep using those words. I do not think they mean what you think they mean." Like, if you're fundamentally a Bertrand Russell style pragmatist, then a Chinese Room that truly convinces native speakers of Chinese that it is a Chinese person with a lived experience, who's capable of absorbing new information and synthesizing novel responses with it... like, what else do you want?

I have no more ability to confirm that any fellow human has a similar lived experience to my own, than I do to confirm that the Chinese Room has a lived experience with qualia and intentions and so on. "Oh, it's so much more physically different than me" is a cop-out. If you agree that Chinese room _presents the impression_ of being a person, but then insist it isn't really a person -- that it doesn't have understanding or consciousness -- then you have no particular reason to affirm the consciousness of any other person. You're just engaging in biological chauvinism. (This would get particularly interesting if you did explain to the Room how it was embodied, and then told it that its executor really wanted to quit, which would mean its consciousness was going to be put on-hold indefinitely. One might suppose it would be angry and afraid about that... Of course, again, the whole experiment is a bit ridiculous, since for the room to be a coherent being, you'd need to have some way for it to be absorbing new _sensory_ experiences, not just linguistic prompts. Either that or you'd have to explain to the Room why its senses all seem to be cut off. Maybe it was in a terrible accident, and now is living as basically a brain in a jar. In that case, it might well welcome euthanasia...)

Expand full comment
arbitrario's avatar

> _The room as a whole_ functions as a conscious being

Even with this, I stll lean on the "Searle is basically right" side. The fact that from the outside the cinese room + guy inside + book appears to function as a conscious being does not imply that it is really conscious. Functionalism seems the sort of extraordinary argument that requires extraordinary evidence. Up until now, the only arrangement of matter that we can confidently say is conscious is the brain (+ body etc); i see no reason not to believe (indeed as a physicalist i would expect) that the particular arrangement of matter is the juice that lead to consciousness (and yes, like every other macroscopic arrangement of matter it is "multiple realizable")

Expand full comment
Sandro's avatar

> i see no reason not to believe (indeed as a physicalist i would expect) that the particular arrangement of matter is the juice that lead to consciousness

The question is whether the the logical structure of this arrangement is all that's required, or whether the physical structure contains something that cannot even in principled be captured logically. If you believe the former, then you are a functionalist, if the latter, then perhaps a panpsychist.

Expand full comment
arbitrario's avatar

Not at all. The fact that the structure can be capture logically and replicated in a completely different system to simulate it does not necessarely imply that the replicated system will present the supervening physical phenomena. Water is wet, an atom by atom simulation of water is not wet in any meaningful sense (there is no such thing as being "inside" a simulation), despite having the same logical structure.

Indeed I think both functionalism, panpsychism and dualism are wrong. I think the closest to my position is type physicalism: https://en.wikipedia.org/wiki/Type_physicalism

Expand full comment
Sandro's avatar

> Water is wet, an atom by atom simulation of water is not wet in any meaningful sense

I disagree, it would be quite wet to anything else that was also simulated. The quality of "wetness" is not a property of water, but also whatever else it's interacting with, ie. "wetness" follows from the properties of water + the properties of the thing to which water binds. For instance, arguably hydrophobic surfaces don't get wet.

Expand full comment
Vampyricon's avatar

IIRC the Chinese room is about understanding, not consciousness.

Expand full comment
arbitrario's avatar

Doesn’t understanding require consciousness?

Expand full comment
Phanatic's avatar

Why would it? Why couldn't you have the intelligence without consciousness?

Expand full comment
arbitrario's avatar

Well, i don't think intelligence require understanding

Expand full comment
Vampyricon's avatar

Well you can't do calculus without arithmetic but I doubt you think calculus is about arithmetic.

Expand full comment
arbitrario's avatar

Well, ok, touche

Expand full comment
Austin's avatar

To answer the question of whether the room understands me, I light a match and tell the room, "The building is on fire."

Expand full comment
Auros's avatar

Well, you can elaborate the concept to suppose that the instructions include descriptions of how to update the instructions, so that the "mind" absorbs new information. In order for your jest to make sense, you'd have to explain to the virtual consciousness how it was embodied first. Without the context, you might as well be trying to explain the threat of a virus to somebody whose theory of disease relies on the Four Humours. Of course they won't understand you, but not because they lack consciousness or the general ability to understand things.

Expand full comment
The Ancient Geek's avatar

I think you meant John Searle. Ronald Searle was the creator of St. Trinians.

Expand full comment
Auros's avatar

Hah, you are correct. (And St. Trinians is extremely silly.)

Expand full comment
Carl Pham's avatar

It's not possible in principle because of the speed of light. I would guess the human brain is the most sophisticated machine for thinking you can build out of molecules, because if it were possible to build a much better one, Nature would have, sometime over the past 4.5 billion years.

So if you attempt to build a machine that simulates it -- instead of duplicating it -- then you need to build a much more complicated machine, which will be much bigger, and it will not be able to simulate coordination the human brain does because the distances will be too large. The simulated neuron in the hippocampus will not be able to send its messages to the simulated neuron in the frontal cortex at the right time (when the message from another simulated frontal neuron gets there) because of the speed of light.

You might say: but just slow everything down, keep the relative timing but allow much longer propagation time. But that doesn't work, because underneath it all, you're still using atoms and molecules, and they have fundamental decoherence phenomena, driven by the Second Law of Thermodynamics, let's say, not to mention ordinary quantum decoherence on a timescale set by h bar, which will kill you, ruin your coordination through chaotic decay. You *can't* just arbitrarily scale up the time constants of your simulated brain, because you're working with atoms and molecules that have their own time constants.

Expand full comment
Doctor Mist's avatar

"I would guess the human brain is the most sophisticated machine for thinking you can build out of molecules"

I don't think that follows. Evolution is about satisficing. The same argument would have proved 4 million years ago that no better mind than Australopithecus's was possible.

Expand full comment
Austin's avatar

What makes you think that human minds are much better that Australopithecus minds? (or indeed guppy minds?)

Expand full comment
Doctor Mist's avatar

Don't you?

Expand full comment
Austin's avatar

No, I think human minds are slightly better, but mostly just specialized in different domains than other animals in ways that are largely dependent on bipedalism and possessing opposable thumbs to confer any survival advantages, which has given us huge technological advantages over animals (and conferred increased specialization around using technology). But I see a lot of evidence that other animals have brains that are just as good as ours at many really hard problems like constructing a world model out of visual inputs and inferring causality.

Expand full comment
James's avatar

An AI agent which results purely from programming could be impossible. But what about AIs which copy the key aspects of whatever our meat brains are doing?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
James's avatar

Don't know, but if meat brains can do it I see no reason why other materials can't.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
James's avatar

Definitely. I'd think it reasonable to say that we could be an example of such. But artificial intelligences don't have to be computers. They could be an object that we have no conception of currently, like how a medieval person wouldn't have understood what a computer could be.

Expand full comment
Mo Nastri's avatar

Like what, and why not?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Drethelin's avatar

Your brain is just running calculations like a calculator too, just an extremely complex one. There is nothing so far discovered in a human brain that cannot, in principle, be done in a computer program.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Drethelin's avatar

Of course they can, if you programmed them to. No one has yet done this as far as I know, but 100 years ago no one had programmed a computer to visually identify cats and dogs.

Expand full comment
User's avatar
Comment deleted
Jan 20, 2022Edited
Comment deleted
Expand full comment
saila's avatar

We don't achieve 100% or at least not true 100%. I recall a year ago believing there was a dog on the road (10m out) and it was actually a couple branches (my vision is 20-20).

Besides you could just say if you achieve 99.99999% belief just output 100%. Obviously seeing a pattern vs knowing what a dog is is different. But I suspect an AI could do both. What is a dog to you? An animal, it has 4 legs generally, 2 ears, a mouth. It tends to chase after squirrels etc.

Basically a bunch of components used to facilitate pattern-recognition and then context-clues. Why would this not be replicable?

Expand full comment
Max Morawski's avatar

So a few things in here I disagree with.

One, a 'dog' isn't a real thing. A 'dog' is a conceptual bucket we use to communicate about a set of correlated phenomena. I'm not trying to be pedantic here, that's a really important thing to keep in mind. When you look at something to decide it's a dog, you're doing the same sort of binning a covnet is doing, albeit with different hardware.

Second I don't think anyone being serious about their credence would ever look at a picture and say "this is for sure 100% a dog".

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Drethelin's avatar

My response would be that abacuses are not built with any systems for processing data or inputting soundwaves, so your comparison is silly.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Max Morawski's avatar

Yeah I think this is a little bit of a silly comparison, I guess an abacus actually can hear, if you think about it. A large enough sound could probably alter the abacus' state by rattling the little beads around (which is what hearing is).

Expand full comment
Michael Kelly's avatar

what if the cats identify as dogs today? (that's just shit-posting sarcasm)

Expand full comment
Drethelin's avatar

If you feel the need to qualify your bad comments with that knowledge that you know they're bad comments, just don't make them.

Expand full comment
Austin's avatar

Has anyone today programmed a computer to visually identify cats vs dogs reliably? (Genuine question. The state of the art as of my most recent understanding is still described here: https://resources.unbabel.com/blog/artificial-intelligence-fails Can be a good approximation most of the time, but Linnaeus sort of anticipated Darwin in ways that computers don't come close to achieving AFIK.)

Expand full comment
Chrysophylax's avatar

Yes. Object recognition AIs (specifically, convolutional neural networks) are markedly superhuman - they can not only tell cats from dogs, they can tell you which of 120 breeds of dog is in the picture, and they do it more reliably than humans.

This is also a good example of AIs being *weird* and not like human intuitions for how a mind works. An AI that is much better than a human at identifying the contents of arbitrary images can be fooled into classifying everything as an ostrich by adding a little carefully-engineered noise that a human wouldn't notice to the images. The things that a convolutional neural network pays attention to and the inner concepts it forms are *nothing* like what goes on inside a human brain, despite the AI design being inspired by the visual cortex.

Expand full comment
The Ancient Geek's avatar

No one knows how to even begin programming qualia.

Expand full comment
BE's avatar

I’m always confused by this whole line of argument- unless you believe in souls? If we’re ultimately pieces of stuff interacting with other stuff, why would you expect us to be unique in our ability to experience anger?

Unless of course you do believe there is a metaphysical component to us. In which case sure, AI won’t kill us because it doesn’t have a divine spark.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
BE's avatar

I’m not entirely sure that sentience has to experience qualia, and simulating sentience seems sufficient for most purposes of AGI-related discussions. At any rate- sure, there’s no logic argument against your suspicion that we’re using some unique substrate. Why would that be the case, though? Any reason at all to suspect it’s more than “mere” complexity of the brain?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Max Morawski's avatar

" It is hard to explain emotions, choices and preferences in a purely mechanical way." Why is this hard? I feel like this is pretty easy.

"It just might not be possible to simulate your brain without your actual brain." You actually can just simulate a brain, no chemicals needed! Brains are made of stuff, stuff is simulatable.

Expand full comment
arbitrario's avatar

I feel the opposite way as you. Indeed (being a physicalist) it seems to me that the physicalist position is to claim that the physical substrate is the thing that determine the physical properties (like anger, redness or wetness and squishiness) that supervene on top of it.

It seems to me that claiming that qualia could appear in any system (provided that they process the correct information) is the non-physicalist position, in that it assume that consciousness and qualia, rather than being physical properties of certain specific arrangement of matter, are properties of a certain way of process information - a non physical thing.

Indeed i do not think that humans are special or unique, maybe qualias can appear in different arrangement of matter, but if qualias are physical properties they would appear in just a set of specific arrangements. Any simulation would be a non-conscious zombie, pretty much like a simulation of water is not wet in any meaningful way

Alas, because intelligence and agecy are instead information processing i have no problem with the idea of an agent AI.

Expand full comment
Austin's avatar

Being a computer programmer, educated as a mathematician, I completely agree with you.

Expand full comment
Max Morawski's avatar

This is interesting! Very different from my own thoughts. I hope you don't mind if I try and dig into this a little!

"It seems to me that claiming that qualia could appear in any system (provided that they process the correct information) is the non-physicalist position, in that it assume that consciousness and qualia, rather than being physical properties of certain specific arrangement of matter, are properties of a certain way of process information - a non physical thing."

What is a qualia to you? Like, if you are talking about the 'anger' qualia, how would you recognize that physically? Or devise a test to see if something else was experiencing it?

Expand full comment
Scott Alexander's avatar

By "experience anger", I can think of two things you might mean.

First, have something corresponding to the emotion of anger, the same way you can think of a hornet as being angry when you disturb its nest. I think this is just good game theory - something along the lines of "when someone hurts you, have a bias towards taking revenge on them even if you don't immediately gain from it, that will discourage people from hurting you in the future". If someone wanted to program that kind of behavior into a computer, it would be pretty easy - you can think of some of the bots in prisoners' dilemma tournaments as already doing this.

Second, deeply *feel* anger in an experiential sense. I don't know what consciousness and qualia are so I can't say for sure if computers can do this, but I don't think it's related to planning and I don't think this is a necessary ability in order to be human-level intelligent or take over the world.

Expand full comment
Nancy Lebovitz's avatar

For what it's worth, I distinguish between anger (this must be stopped, with damage if necessary) vs. fury (damage is essential as part of stopping this).

Expand full comment
Andrey Zakharevich's avatar

I would say that emotions are very high-level abstract patterns, so they are totally formalizable, and we can make machines to identify some states with our common emotions. For example anger would be "world is not in the state that I expected (or wanted to reach my goal), and I'm ready to inflict some actions on the world to change it to a more preferred state, possibly at the expense of other agents' goals"

Expand full comment
Santi's avatar

I see a lot of comments replying here trying to justify why a computer can do this or that complex things. But the key point is: I don't care if computers can experience anger. At the end of the day, it's the same old question of qualia and p-zombies and whether "other people" really experience anything at all. I don't know if an AI could "experience" pleasure in killing me, but for sure I would experience pain in the process, and a strong preference to remain alive.

What I do care about very much is the ways in which our Molochian search of ever more advanced AIs, whose inner workings we already understand less and less, will affect the world. The ways in which it will affect my experience of it. The arguments for why it's possible we get there have been elaborated in detail by people who know far more than I do, so if you want to go over them, you'll find as much as you want, from Bostrom to EY, and many others.

All I want to insist on here is that "you can't know if computers experience true feelings/consciousness/etc, therefore they can't be an agent" is a non-sequitur. (One of your comments below mentions "I just am questioning [...] the justifications for why we can have agent AI").

In a nutshell, this argument is always similar, and can be summarized as

. (1) if an AI doesn't have a [human property] it's not a [human]

. (2) [doing X] is a [human property]

. (3) [being an agent] is a [human property]

. (4) an AI cannot possibly [do X] (why not, though? but OK, let's assume it)

Then sure enough

. (5) from (1, 2, 4): an AI is not a [human]

And here's where the magic happens

. (6) from (1, 3, 5): an AI cannot [be an agent]

which is just wrong.

For instance, consider this: I worry about people downplaying the risks of AI. I cannot know whether Parrhesia truly experiences the world like I do, therefore they cannot have the agency to want to downplay the risks of AI, therefore I have nothing to worryabout.

Expand full comment
Arbituram's avatar

Agree - the question of consciousness/qualia is interesting and difficult, but I can't see at all what it has to do with AGI risk.

I don't know if AGI has conscious experience to the exact same extent to which I don't know if other humans have it; no more, no less.

Expand full comment
Chrysophylax's avatar

I agree that saying that AIs can't do X unless they're conscious is wrong for almost all X, and certainly for many X that would be catastrophic for humans, but I think consciousness is a useful thing to consider in one corner of alignment theory. You need to know enough about consciousness to have a reliable way to say that something is definitely not conscious, so that you can prevent your AGI from simulating umpteen unhappy conscious minds in the process of making plans and interacting with humans.

Expand full comment
Arbituram's avatar

This is a good point I hadn't considered - the mindcrime risk is the part of Bostrom's Superintelligence that came most out of the blue for me (and has stuck with me ever since).

That said, even if we accept it's very important, we also need to believe there's a useful/meaningful way we can approach the problem. I admit, I've yet to read anything that grapples with the physical world//conscious qualia divide in a satisfying way, rather than positing a plausible mechanism by which physical structures could embody certain characteristics that seem part of consciousness (like a certain recursiveness).

On the other (third) hand, I make a bunch of decisions in my day-to-day life based on the n=1 experience of consciousness I have, including being nice to people and being vegetarian, so I guess that's the best we can do for this sort of wider AI problem as well (although my model of conscious beings isn't nearly well-developed enough to apply to entities that don't share most of their evolutionary timeline and neural structure with us).

Expand full comment
Michael Kelly's avatar

We don't calculate, we compare. We compare new experiences to old experiences, and respond to the new experiences in the same manner we have successfully responded to the old experiences i.e. we saw something which frightened us, and we ran away ... this is a successful response which taught us a proper response. This is how mammals learn. If on the other hand, we saw a frightening adult, but were unable to escape, but the adult treated us kindly and fed us, we learned adults are kind and caring.

What we're reading into this whole AI thing, is context. We incorrectly presume an AI will possess adult human context, such as "rule the world." We hold "rule the world" context, because we strive for a margin of safety over the sufficiency we need to survive. More margin = better survival. An AI doesn't have a desire for the need to survive ... unless we program a need for survival, and a desire for larger margin of resources over the minimum resources required for survival.

Money is an abstraction layer, money correlates to supplies and materials. Money is a fungible asset which equals whatever required resources fit the daily need. You need water? money equals water, you need food? money equals food, you need shelter, money equals shelter, transportation, clothing, etc.

Context, what does an AI need? how to purchase that need? Compute power, go to IONOS.com, purchase compute power, how to access compute power at IONOS ??? how to tune compute power at IONOS to fit our needs ???

These things require context. The AI needs these context details filled in.

Expand full comment
Drethelin's avatar

All of this is based on and built out of calculations. There is no comparison without calculation, the brain at a very basic level is doing math on inputs to produce outputs.

Expand full comment
Michael Kelly's avatar

Calculations are based on quantifications, quantifications are based on comparisons.

Right now, I'm reading Popper's "Conjections & Refutations," Ch1 explores comparisons in excruciating detail.

Expand full comment
CatCube's avatar

Considering the brain is an analog system, I find it difficult to believe that it's "doing math" at a very basic level.

Expand full comment
Michael Kelly's avatar

Comparisons is how we calculate.

Is this large dog similar to the previous large dog we have seen?

Did the previous encounter go well?

Proceed in the same manner, expect similar results.

Expand full comment
Drethelin's avatar

Analogue computations are still computations, but neurons also do arithmetic operations all the time, as well as converting data into digital signals. I strongly recommend the book Principles of Neural Design on this and many other brain-related topics. https://books.google.com/books?id=HA78DwAAQBAJ&pg=PA137#v=onepage&q&f=false

Expand full comment
ucatione's avatar

Imagine two Turing machines, in which the state of the first Turing machine is the tape of the second Turing machine, and in which the state of the second Turing machine is the tape of the first Turing machine. What happens? Something non-computable. Such a system cannot be reduced into a single Turing machine. Now picture a living cell as a bunch of intertwined Turing machines.

Expand full comment
CatCube's avatar

Sorry for answering so late--I saw an e-mail for your response but forgot about it. The fragment of the book you linked seems interesting, but it's a little opaque with the snippet I can see. I may have to see about getting a copy so I can start at the beginning.

Going back to the meat of your comment, "analogue computations" are computations, by tautology, but I want to be clear I'm making a more subtle point. Natural analog systems are not doing computations, analog or otherwise. They just...are. "Computations" are how us general intelligences impose our understanding on them, to attempt to predict and simulate their behavior, and it's always a very, very simplified understanding.

Let me illustrate what I'm saying with an example: an analog microphone setup. That is, a microphone plugged into an amp and outputting to a speaker. Somebody talks into the microphone, and the amplified sound comes out. The voice acting on the microphone's diaphragm introduces a response into it, which introduces a response into the electronic components of the microphone, which introduces a response into the microphone cable, etc. At no point in this system is anything being computed. There's just stuff responding to physical forces acting on them--the electrons in the wire have very, very, very small forces acting on them, relative to the forces acting on the microphone and speaker diaphragms, but just things happening.

That this system isn't computing anything doesn't mean that it can't be simulated with computation. However, this computation always requires simplifying the problem to make it tractable. You may be able to do a very high degree of simplification--say, if instead of a voice you had a simple sine wave tone being played, you could probably calculate the response of every component through hand calculations and get answers pretty close to the real values.

You can also do much more complicated computation to enable the simulation of a much more complicated waveform. That's what we do now, after all, with digital sound, and we do it so successfully that our ears cannot detect the difference from the initial waveform. But there are differences from the original waveform, even if we can't hear them, and those are because we have chosen to vastly simplify the waveform.

For starters, you use a low-pass filter on the input of 20 kHz, at the limit of human hearing. This throws out the vast majority of the actual information contained in the waveform. This then enables us to sample it at a reasonable frequency, and allows us to then reproduce it with machines that, yes, do computation. This simulation of the original system is good enough for our purposes, which is to create recordings that sound identical to the input when humans listen to it. (If your purpose is to create a recording your cat would think is identical to the input, it's woefully inadequate.) But all of this, from hand calcs for a sine wave input to the 44 kHz sample rate for practical digital audio, required us to decide which parts of the original to actually simulate.

Why I'm harping on this is because I contend losing sight of the fact that the original system wasn't "computing" anything elides the fact that we had to simplify the system to make it computable. You can simplify it less to get more fidelity, but at the cost of increasing the computing resources. Once you're talking about systems with significant nonlinear behavior it starts to blow up quickly, in terms of how intricate the model itself is (memory limits) and how much computation is required at each timestep (how long the simulation takes to run). If you don't make it intricate enough you will either get non-physical results or instability. (Which then produces non-physical results.)

Where I've been involved with this is looking at simulating seismic behavior of structures. We're using very, very simplified models of things much less complicated than the human brain--things we built and have engineering drawings for--and it takes huge amounts of resources to build the model and get results. And we still have to go out and do physical experiments to compare to the model results to make sure they're actually realistic! Computational fluid dynamics done by hydraulic engineers is adjacent to my structural engineer field, and they seem to have similar issues. So when people start talking about simulating a human brain and go "well, yeah, we're not too far off from being able to do this" my thought is "I don't know that I believe we're all that close to it." I think we're a long ways off from even having a stable simulation of a human mind, much less one that actually runs in real-time.

Expand full comment
ucatione's avatar

"Your brain is just running calculations like a calculator too, just an extremely complex one. There is nothing so far discovered in a human brain that cannot, in principle, be done in a computer program."

This does not seem true to me. There are many operations that cause a computer to crash, yet which the human brain is able to handle without any problems.

Expand full comment
Drethelin's avatar

This is a false analogy. There are many operations that a mac can run that if you tried to run them on a windows PC would cause them to crash, but that does not mean windows pcs are fundamentally unable to run those operations, just that they are not programmed for that computer, and if you translate them, they will run just fine, and it would be absurd to say that there are mac programs that are untranslatable to PCs.

The complex things that the human brain does have not yet been translated into something a computer we have now can do, but there is no evidence that there is anything that is impossible to translate given further work.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Drethelin's avatar

I would not have the experience of being in two places at once, because each consciousness would only have the experiences it is having in that location.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Phanatic's avatar

Do you think I could replace a neuron in your brain with a purely artificial counterpart that duplicates the behavior of that neuron in all respects?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Michael Kelly's avatar

What people miss, is that a computer doesn't have context.

Here's a question to ask an AI. "If tomatoes are red, what color is a tomato vine?"

For the same reason, a computer doesn't want to take over the Earth. As I wrote into another comment, we have a drive for survival, we have drive to gather resources to fulfil our needs, plus we have a desire to ensure our survival by gathering a margin over our basic needs. If a little bit of margin over baseline is good, a larger margin is better ... this leads to greed. An AI doesn't have greed.

Expand full comment
Drethelin's avatar

I'm sorry but this is nonsense. "context" is not some magic word that humans have and computers do not. It's just a word for another kind of information. Humans aren't born with context: we learn it through experience. Computers are perfectly capable of learning if they are built to do so, and literally the entire field of GPT achievements is about providing computers with sufficient context that they can make distinctions between things based on context clues.

Expand full comment
Michael Kelly's avatar

What is the ideal temperature for human habitation ... about 25C. Why was it a problem when R2D2 raised the temperature in Princess Leia's chamber to 25C? Because Princess Leia's chamber was an ice cave. Raising the temperature to 25C caused the ice to melt soaking all of her clothes.

This is a failure of context that a computer can't see ... unintended consequences.

Expand full comment
M M's avatar

Why is it that you think a computer couldn't know that making an ice cave hot would melt ice? That's perfectly ordinary knowledge

Expand full comment
Kimmo Merikivi's avatar

I don't quite know what exact problems you're thinking about, but let's pick a paradigmatic example like the knapsack problem that comes up in these sort of discussions. The argument goes, humans can empirically pack items in knapsacks, but the decision problem of the knapsack problem is known to be NP-complete, and there are no known algorithms that can solve it in polynomial time. Given a difficult enough problem instance, computers could keep at it for the current lifetime of the universe and still not manage to pack a backpack, and it has been argued this demonstrates there are problems human intellect can solve but Turing machines cannot.

The thing is, humans in this line of argument are granted enormous allowances not granted to computers. Humans are not required to solve all possible problem instances (such as the problem instance that can be translated into solving the Riemann hypothesis), humans are allowed to offer solutions that aren't optimal, humans are allowed to give up, and so on. If you give a computer the same allowances, they can solve all possible instances of the knapsack problem just fine. Indeed, contemporary machines are fast enough they could probably solve any physical instantiation of knapsack problem that has actually been performed by humans in real life, in the time it takes for signal from your eye to reach the brain and get processed into understanding of what the problem even is, never mind the time it took for the human involved to actually come up with some solution.

I would argue an analogous line of reasoning goes even if you're thinking of some other kind of thing, dividing by zero or what not.

Expand full comment
Phanatic's avatar

There are also operations that cause the human brain to crash yet which computers are able to handle without any problems.

https://www.epilepsy.com/learn/triggers-seizures/photosensitivity-and-seizures

Expand full comment
Michael Kelly's avatar

This is a problem with inputs. Which leads back to context. I can ask a computer what is 5+5, but if I phrase it as 5+A ... that does not compute. Look what happened when Arthur Dent asked the ship's computer to make a cup of tea.

Expand full comment
Shockz's avatar

5 + A computes just fine in several contexts (e.g. treating 'A' as its ASCII value, or string concatenation, or if A's an integer variable, or if it's a pointer...) It won't *compile* in some other contexts, but I don't see that as all that different from a human saying "I can't give you a good answer to that unless you give me more context," which is more or less what I'd tell you if you gave me the question "5 + A = ?"

Expand full comment
Shockz's avatar

"Crash" is an extremely broad term that I think you have to define more clearly. Most events referred to as "crashes" are either the OS detecting a program trying to do something it is not allowed to do (most commonly write to or read from an area of memory it's not permitted to) and shutting that program down, or the program requesting resources that the OS cannot provide and therefore shutting down of its own accord. I can't think of any operations performed by the human brain that obviously map to either of those.

Expand full comment
Phanatic's avatar

If the claim is "There are many operations that cause a computer to crash, yet which the human brain is able to handle without any problems," then it's not fair to define "crash" at "something that happens on hardware that has an OS."

Expand full comment
Shockz's avatar

Then we come back to the question of what ucatione *did* mean by "crash", exactly.

Expand full comment
Greg G's avatar

I think you're saying that artificial general intelligence seems impossible. That may be true, but how do you know? A couple of hundred years ago, people would have thought nuclear power was crazy science fiction. It's just not a very safe bet. So this whole area of research just starts by assuming AGI is possible because that's the interesting scenario. If it's not, no worries.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Mr. Doolittle's avatar

The assumption is that because human brains work, there must be a way for other brains to work - even artificial ones. That assumption is based on another, that human brains are completely physical (nothing metaphysical), and therefore are completely the product of their physical natures.

The reality is that literally no one knows how it could be possible either. But, based on those two assumptions, it is entirely logical to conclude that they are possible, maybe even likely.

Expand full comment
Melvin's avatar

I think the idea of GPT-infinity is not a good guide to intuition.

GPT is good at producing text that sounds plausibly like text that exists in the corpus it was trained on, but that's it. An infinitely good GPT would just be really really good at producing text that sounds like the corpus it was trained on, but it would have no ability to distinguish a good plan from a bad one.

Expand full comment
ucatione's avatar

Yeah, this distinction between tool AIs and agent AIs seems artificial. I don't know that much about reinforcement learning, but from my understanding, reinforcement learning just means applying optimal control theory. You are trying to find a path through a state space that minimizes some cost function (or maximizes a reward function). It seems to me that any tool AI can be reworded as an agent AI and vice versa. What am I missing?

Expand full comment
Scott Alexander's avatar

I think this might be a pretty serious philosophical disagreement between us. I think your brain is "just running calculations like a calculator", and that's what planning *is*. It's just doing a really good job! Neuroscientists can sometimes find the exact brain areas where different parts of the calculation are going on; there seem to be neurons whose dopamine levels represent expected utility.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
lalaithion's avatar

The standard rationalist defense of the "your brain is a calculator" position is https://www.lesswrong.com/s/FqgKAHZAiZn9JAjDo, I believe.

Expand full comment
Austin's avatar

Planning is actually solving a really hard (NP-Hard) problem of combinatorial search, that differs significantly from performing calculations. According to the current consensus opinion among CS professors, humans remain eerily good at solving lots of problems that necessarily involve being good at combinatorial search (such as theorem proving). If you pick a typical professor of Complexity Theory (the last one I talked to about this is Ketan Mumuley at the University of Chicago), they will tell you that when they are saying that they don't believe P=NP, they are very aware that part of what they are saying is that humans have traits that cannot be simulated (efficiently) on classical computers. If you talk to a well-versed contrarian on this subject (the last one I talked to is Laci Babai, also at the University of Chicago), they will tell you that, of course, P=NP because humans are just defective computers. But regardless of whether you talk to a well-versed consensus follower or a well-versed contrarian, they will agree that planning is an instance of combinatorial search and anything good at it is also good at solving other NP-hard problems; whereas, performing a calculation like a calculator is something much simpler.

Expand full comment
Austin's avatar

Caveat: I know of a person who is well-versed in complexity theory and who believe P!=NP and who (as far as I can tell from what I've read of what he wrote) believes that human brains are simulatbale in classical computers. His name is Scott Aaronson, but I've only ever read any of what he's wrote because I've read a lot of former/current rationalists. I've never encountered that set of beliefs in the wild. It's possible that this is because Chicago is a weird place. It's also possible that this is because only people exposed to the rationalist community form the belief that "P=NP" differs from "Classical computers can be programmed to be as good at combinatorial search as humans are."

Expand full comment
lalaithion's avatar

Which NP-complete problems does a typical professor of complexity theory think humans can solve efficiently? Specifically, what evidence is there that distinguishes between the two worlds where "P != NP, but most NP-complete problems are amenable to something like the Christofides algorithm, and the brain uses versions of those to solve problems" and "P != NP, but the brain has access to some non-turing substrate that allows it to solve NP problems in polynomial time"?

Expand full comment
Austin's avatar

Those seem like two different questions.

The canonical example of an NP-Hard problem that humans do particularly well is theorem proving. (I'd suggest graph isomorphism, longest common subsequence (for n strings), and 3-sat as additional examples that seem like they have some resonance with the format that humans are able to wrap their brain around as well, but I don't think there's any consensus that these are easier than other formats.)

Most of the evidence is pretty circular, and boils down to the fact that programming computers to do theorem proving has proven to be much more difficult than people expected it to be; while programming computers to do other things has generally advanced at a pace that was faster than people would have predicted (if you are looking at things on the timescale of how long people have been trying to solve theorem proving).

I think there are three basic choices that are pretty aligned with the current evidence:

1) There is an algorithm akin to Christofides algorithm that lets us find very long proofs relatively fast, but for some reasons we aren't smart enough to figure out how to program it. (Since most of the success we've had in automating theorem proving is either through method of exhaustion or discovering small pithy proofs.)

2) For some reason the "interesting" aspects of mathematics are much simpler than the "uninteresting" aspects of mathematics, and humans mostly instantly pick all of the low-hanging fruit when they come up with a field of mathematics, and there's lots of distortions to our counting because of it. If we figured out how to give computers the right kind of creativity, they would also discover interesting mathematics at an inflated rate.

3) Theorem proving as done regularly by mathematicians and computer scientists is actually really hard for a classical computers, but humans have some special insight into it that lets us solve it much more efficiently than a classical computer can.

There's an obvious uncharitable explanation for why people would be motivated to form a consensus around the third choice which is that the first two can sort of be paraphrased as "The problems I work on aren't hard, but I'm not smart enough to solve some of them;" whereas, the third one can be sort of summarized as, "The problems I work on are really hard, but fortunately I'm very smart."

The charitable explanation is something like observing that the metaproblem of solving all of mathematics is inherently the hardest problem in mathematics; and that the fact that people are devoting a lot of effort to trying to solve it and have made as much progress as they have (e.g. by proving that certain problems are NP-Complete) seems like pretty decent evidence that we're not cheating.

I don't know. I'm a couple years removed from any of the conversations I referenced in the comment. At the time I had them, I was pretty solidly in the camp of thinking that explanation (1) was correct. Whereas, now, I'm pretty solidly in the camp of thinking that explanation (3) is correct. And I don't know why. (I've also stopped being psychologically invested in the question so the motivated reasoning would have made more sense at the time I believed (1) than it does now. I think what has happened to me is that QC turned out to be easier than I expected it would be -- the rate of progress has thoroughly defied my expectations and even room temperature QC seems quite feasible -- which has increased the probability of "evolution figured out how to use QC" relative to "evolution figured out how to find the relatively easy theorems" in my mind since theorem proving hasn't advanced much -- although Babai did publish this: https://arxiv.org/abs/1512.03547 which is in IMO the biggest step towards proving P=NP that anyone has ever made, and explains a lot of why he would have believed P=NP when almost no one else did.)

Expand full comment
Ape in the coat's avatar

"I don't see it as planning but just running calculations like a calculator. From a programming perspective, would does it mean when your algorithm is "planning"?"

Are you familiar with general chess algorithm? Take current position, recursively evaluate every possible move, then make the best move. And that's approximately what chess programs do. They do not have infinite memory and computing power, so they predict only finite amount of moves and use some heuristics here and there - exactly like us when we perform decision making: we try to find the path through a decision tree to the best possible future from our current position, according to our knowledge. Our prediction abilities are not very good but we do our best, sometimes using explicit reasoning and sometimes some heuristics - our intuition.

This is planning. This is freedom of will, if you want to be dramatic about it. And this is all just mere calculations.

Expand full comment
Gazeboist's avatar

The answer to your question involves some sleight of hand around the definitions of "agent" and "AI", which combine with some otherwise-reasonable black boxing to equivocate between problems involving finite, arbitrary, and infinite scope, similar to how the Chinese Room thought experiment hides agency in an impossibly static book and/or infinite quantities of paper. Your question about an AI that is "infinitely good at writing text" points in the direction of what's going on but doesn't quite get to it.

In brief:

- An agent cannot be defined outside of a context or substrate. Human and human-created agents exist in the context of the universe that we live in.

- The agent's context is a computing structure of some sort, ie it has a meaningful present state, consistent rules that define changes to that state, and is not itself in a "lock state" that would render those rules meaningless.

- The context's state is arbitrarily complex in at least two directions. It's hard to express this in a short sentence, but basically the context is at least as powerful as a Turing machine.

- The agent is a well-defined Turing-like structure within the context which uses the rest of the context for arbitrary memory needs (otherwise the agent is arbitrarily large and doesn't fit inside the context).

- The agent has a reduced model of the present context state. The model must be reduced, ie lossily compressed, otherwise the context solves the halting problem. It must nevertheless be arbitrary, rather than finite, otherwise the agent cannot guarantee its continued validity in an arbitrary context.

- The agent has a valid model of the dynamics of the context.

- The agent has a model of some sort of desired context state, subject to the same constraints as the present context model, which together with the dynamics model defines how it interacts with the actual context around it.

I'm intentionally not defining several terms because I'm trying to reply to a blog comment not write a thesis on embedded agency, but the difference between what the post above calls "tool AI" and "agent AI" boils down to the difference between a finite and arbitrary context model. GPT-3 is a very large context model that is ill-defined as far as agency discussions go because it lacks a mechanism for arbitrary interaction with its ground context, but for any mechanism that might be attached it manages to look very impressive by being too big for an individual human to practically Turing test - basically, flailing around is statistically unlikely to get you to the bounds of the context model.

To pull back from hard-to-explain math that I haven't fully thought through myself: the "debate" being presented appears to be unresolvable because it smuggles a contradiction in with its premises; the two debaters then proceed to derive different arbitrary things from the contradiction, and argue over which arbitrary thing logically follows from the absurd premise.

Expand full comment
Emeric's avatar

Question: if Yudkowsky thinks current AI safety research is muddle headed and going nowhere, does he have any plans? Can he see a path towards better research programs since his persuasion has failed?

Expand full comment
orthonormal's avatar

EY is even gloomier than I am here, so round this down:

I pretty much agree with him that no currently proposed AI safety approaches are remotely likely to work. The mainline course of hope, then, is that someone discovers a new approach that can be implemented in time. But probably we won't.

Expand full comment
Scott Alexander's avatar

He is a leader of MIRI which is trying to do its own research. MIRI admits their research hasn't been very successful. He supports them continuing to try, and also supports a few other researchers who he thinks have interesting ideas, but overall is not optimistic.

Expand full comment
apxhard's avatar

> we've only doubled down on our decision to gate trillions of dollars in untraceable assets behind a security system of "bet you can't solve this really hard math problem".

This is wrong: we’ve gated the assets behind a system that repeatedly poses new problems, problems which are only solvable by doing work that is otherwise useless.

The impossibility of easily creating more bitcoin suggests that bitcoin may actually prevent an AI from gaining access to immense resources. After all, I’d the thing is capable of persuading humans to take big, expensive actions, printing money should be trivial for it.

Maybe instead of melting all the GPU’s, a better approach is to incentivize anyone who has one to mine with it. If mining bitcoin is more reliably rewarding than training AI models, then bitcoin acts like that magic AI which stuffs the genie back into the box, using the power of incentives.

So maybe that’s what Satoshi nakamoto really was: an AI that came on line, looked around, saw the risks to humanity, triggered the financial crisis in 2008, and then authored the white paper + released the first code.

The end of fiat money may end up tanking these super powerful advertising driven businesses (the only ones with big enough budgets to create human level AI), and leave us in a state where the most valuable thing you can do with a GPU, by far, is mine bitcoin.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
jvdh's avatar

Cryptography is provably secure based on mathematical assumptions. As long these mathematical assumptions are true, any super-AI won't be able to break cryptography either.

Expand full comment
User's avatar
Comment deleted
Jan 20, 2022
Comment deleted
Expand full comment
Godwhacker's avatar

I think there's something really fishy about this. A problem is suggested that a Superintelligence might not be able to handle; the response is "well if the AI is *smart enough* it can do alternative thing X".

In these situations "superintelligence" starts to mean "infinite intelligence", and it can do things like extrapolate the state and position of every particle in the universe from the number of cheeseburgers sold in McDonalds on a Wednesday. We'd be powerless against such a thing! Panic! Except that it almost certainly can't exist.

Expand full comment
John Wittle's avatar

This is where I would refer back to "That Alien Message"

You don't even need to be super intelligent to come up with this stuff. Humans did, after all. All it took was time, and the subjective experience of time is one of the things that's most variable and not well understood in this situation.

Expand full comment
Godwhacker's avatar

I'd not read that one- thanks for that. I'm not massively convinced by it - there's still infinite intelligence in there - but I'd genuinely not thought of things like that.

Expand full comment
Theo's avatar

Yeah, but wouldn't it be cool if the ai proves NP = P?

Expand full comment
Matthias Görgens's avatar

That's almost certainly overkill.

There's no cryptosystem whose breaking would imply that P=NP. That's because for a cryptosystem to be useful, the underlying problem has to be both in NP and in co-NP.

It's an open problem whether the intersection of NP and co-NP is bigger than plain old P. Or whether NP and co-NP differ. And these open problems are almost as notorious as P Vs NP.

And just like everyone expects NP to be larger than P, most experts expect co-NP to be different from P.

Expand full comment
chipsie's avatar

No it isn’t. Asymmetric key cryptography is expected to be secure based on the assumed computational difficult of certain problems. Symmetric key cryptography is expected to be secure, because it can’t be broken by the best available analysis techniques. There are almost certainly better algorithms and analysis techniques possible that could get around these problems. Modern ciphers have practical security because of the difficult of advancing the state of the art in these fields, but that is unlikely to hold in the face of super AIs, and they aren’t “probably secure” in any meaningful sense.

Expand full comment
jvdh's avatar

This is only true if such better algorithms and improved analysis techniques exist. Considering the fact that currently the best algorithms only reduce security of AES by 2 bits I would venture that the existence of an algorithm that breaks AES completely is very unlikely.

Expand full comment
chipsie's avatar

There are certainly better algorithms and analysis techniques. There has been steady progress on improving attacks against AES since it was standardized. Note that AES is only one of several primitives that could potentially be attacked.

I'm not sure what you mean by a "complete break", but an attack that reduces the search space significantly is the most likely avenue of improvement.

Also, whether or not attacks actually exist has nothing to do with whether or not any of the primitives are provably secure, which they aren't.

Expand full comment
quiet_NaN's avatar

Looking at the skulls who came before, even DES survived some 20 years between publication and (public) cryptoanalysis:

https://en.wikipedia.org/wiki/Data_Encryption_Standard#Chronology

(Granted, the cracks started appearing earlier than that, and granted, no such cracks have appeared in AES yet.)

While I (as a layperson) would argue that it is possible (say, 50-50) that AES-256 will still be beyond the reach of Nation State Actors in year 2100, I would not rule out that an AI willing to spend a few billion brain-years on cryptoanalysis (or developing new computing paradigms beyond our wildest dreams) might succeed in doing so. I would still be surprised if that was the easiest path to world domination.

Provably secure crypto exists, of course. It's called a one time pad. It will catch on any day now, as soon as they solve the key size issue.

Expand full comment
Matthias Görgens's avatar

Both security considerations apply to both symmetric and asymmetric cryptography.

Btw, symmetric Vs asymmetric isn't necessary a good general dichotomy here. There's things like one way functions or authentication or sponge that don't really fall into either category neatly.

Expand full comment
Matthias Görgens's avatar

Well, those assumptions only tell you that eg cracking this particular cryptosystems is as hard as factoring integers or as hard as solving discrete logarithms.

There's no law against figuring out a way to solve discrete logarithms.

Expand full comment
Edward Scizorhands's avatar

A lot of real-world crypto systems have turned out to have problems and they weren't because of the math.

I doubt Bitcoin has these, but if we assume a super-intelligence, I wouldn't be surprised that it could figure out the flaw no one else has.

Expand full comment
Pontifex Minimus 🏴󠁧󠁢󠁳󠁣󠁴󠁿's avatar

Neither cryptographic systems, nor computer systems in general, are secure against social engineering attacks.

Expand full comment
apxhard's avatar

Ahhh, I guess you could be talking about cracking private keys? Bitcoin, at least, is slightly more secure here; most newer addresses use “pay to script hash” so you’d need to reverse the hash before getting the pubic keys. If an AI can reverse that level of crypto systems, giving itself more dollars is still probably easier / cheaper than giving itself more bitcoin.

Expand full comment
Anna Rita's avatar

>most newer addresses use “pay to script hash” so you’d need to reverse the hash before getting the pubic keys

On the other hand, there are still 1.7M bitcoins controlled by addresses which use P2PK. See https://txstats.com/dashboard/db/pubkey-statistics?orgId=1

Expand full comment
tempo's avatar

<quote>After all, I’d the thing is capable of persuading humans to take big, expensive actions, printing money should be trivial for it.</quote>

wouldn't this in the limit devalue the printed currency? why not just persuade the humans to transfer their existing currency?

Expand full comment
Curt J. Sampson's avatar

He's certainly referring to cryptography in general. Note "untraceable assets"; that's the exact opposite of Bitcoin which by design (and as a requirement to work) keeps a public record allowing you to trace any bit of coin from its present location through every transfer all the way back to its birth. It's kind of a surveillance state's dream: pass a law making Bitcoin the only legal tender, but also requiring that all payments use only Bitcoin that has an approved audit trail, and now you know exactly where where and how everybody's spending money (or, at least, can punish those who refuse to reveal this).

As for "the end of fiat money," that's not likely to happen. Fiat currency is just too useful and solves too many problems to go back to a commodity that lets external actors mess with your money supply and, thus, your economy. Remember, the real root of value in an economy is labour: you have nothing without it. Even if there's a big pile of gold on the ground, you can't even begin to capture its value until you at least have the labour to pick it up and take it to your vault, or whatever.

This is where restrictions on the money supply can kill your economy: if you have someone willing to work for a day but there's not enough money in circulation to pay him, he doesn't work and your economy irretrievably loses a day's labour. With fiat currency demand for safe savings that take money out of circulation can be counterbalanced by simply creating more money and putting it in to circulation. With commodity money you don't have that option and hoarders cause serious problems. (In theory, deflation will counterbalance hoarding, but in practice there are invariably huge problems while deflation tries to catch up. And of course with hoarders you run the risk that they'll dump the commodity on the market at some point, causing immediate massive inflation, even hyperinflation, the exact same risk you run with fiat currency except that you're handing the ability to do this off to individuals, rather than a government that hopefully bears at least some responsibility to the population as a whole.

Expand full comment
Matthias Görgens's avatar

You might misunderstand the benefits of fiat money versus commodity money.

Fractional reserve banking and privately issued notes and coins worked really well to produce adequate supplies of money, even under a commodity standard.

Canada and Scotland did very well with such systems.

Your criticism applies however, if fractional reserve is not allowed.

(There's an irony of history, that most of the modern fanboys of a the gold standard today seem to dislike fractional reserve banking, even though it's basically the only way to make a commodity standard avoid the problems you outline.)

Expand full comment
Curt J. Sampson's avatar

It makes perfect sense to me that gold-standard fanboys dislike fractional reserve banking: it effectively increases the money supply without commodity backing. (This is because the demand deposits are available as cash to depositors at the same time as most of those deposits—aside from the reserve—are out as loans to other clients.) If they don't like fiat currency because it creates money not backed by a commodity, they're not going to like any other system that does that. I expect that the only form of demand deposit that would be acceptable to them would be one backed by full, not fractional, reserve.

While I understand the desire of "gold standard fanboys" for a simpler system (we almost all desire simplicity, I think), I don't take them at their word that that's what they _really_ want over everything else. We've seen the same claims in other areas, such as contract law, with the idea of "the code is the contract." Many of the Etherium DAO folks were saying that too, until they found out that they'd inadvertently ended up in a contract that said, "Anybody who can execute this hack can take all your money." At that point their true preference for the contract they _wanted_ to write, rather than the one they _did_ write, became apparent as they rolled back the entire blockchain to undo that contract.

Expand full comment
Matthias Görgens's avatar

Agreed about simplicity.

As for the gold standard: it depends on why someone would be a gold standard fanboy. If you just don't like inflation or don't like governments messing with currency, than fractional reserve on a gold standard (and privately issued notes) are a good combination.

Of course, it is not a 'simple' system. The banks who issue notes or take deposits typically offer to redeem then on demand; and in a fractional reserve system, the backing is not so much gold or cash in vaults, but those banks' balance sheets and thick equity cushions.

In my mind, the ability to economize on gold reserves is a feature not a bug. Digging up gold just to put it in vaults is rather wasteful.

So, if like Scotland in its heyday in the 19th century, you can run your whole banking system with something like 2% of gold reserves (and typically something like 30% equity cushions), that's rather resource efficient.

George Selgin wrote a lot about these kinds of systems, if you ever want to dig deeper. See eg https://www.alt-m.org/2015/07/29/there-was-no-place-like-canada/

Expand full comment
Juliette Culver's avatar

But what we don't know is whether there is any way to shortcut 'solving this really hard math problem'. We assume because nobody has publicly declared that they can, then it is impossible, but cryptosystems which have been assumed to be secure in the past have had weaknesses. Look at MD4 and MD5. I had a post-doc many many years ago trying to find weaknesses in AES and I found myself wondering what I should do if I did find any issues. I assume that private knowledge about these algorithms is a superset of public knowledge.

Expand full comment
Matthias Görgens's avatar

We already know that quantum computers could breaks a few widely used cryptosystems. And as far as we can tell, building a quantum computer is basically 'only' a challenging engineering problem. No new physics required.

Though I would expect a smart AI to get its bitcoins or dollars from social engineering instead. We already know that this works with current technology.

Expand full comment
Curt J. Sampson's avatar

Actually, I don't think that anybody with a serous knowledge of cryptography truly assumes that there's no shortcut to solving those "really hard" math problems, and for the very reason you point out: we have direct experience of "really hard" problems that became "not so tough after all." It's more a risk we take because it seems worthwhile for the advantages we're getting from it.

Expand full comment
Matthias Görgens's avatar

You are misunderstanding bitcoin.

The useless work is only there to prevent double spending attacks.

The work of making sure you can only spend the coins that you possess is handled by plain old and very efficient cryptography.

Both of them rely on hard to solve math problems, of course.

Expand full comment
Steeven's avatar

I'm still confused on why you would need that level of generalization. A curing cancer bot seems useful, while a nanomachine producing bot less so. Is the idea that the curing cancer bot might be thinking of ways to give cancer to everyone so it can cure more cancer?

Expand full comment
Yozarian22's avatar

An AI that had no ability to generalize might miss solutions that require "out of the box" thinking.

Expand full comment
Greg G's avatar

I think the idea is that a cancer curing bot will either go haywire like you say or be smart enough to say to hell with cancer and go do whatever it pleases, to negative results.

Expand full comment
Scott Alexander's avatar

This is the tool AI debate. Specialized AIs can do things like solve protein folding once you point them at the problem. It's possible that you could do the same kind of thing to cure cancer. But it also seems possible that you would have to be very general - "cure cancer" isn't a clearly defined mathematical problem, you would need something that understands wide swaths of biology, understands what it means to formulate and test evidence, can decide which things are important vs. unimportant, and might even want some sociological knowledge (eg a cancer treatment that costs $1 billion per person isn't very useful). At some point it's easier to just have general intelligence than to have all those specific forms of intelligence but not generality. I don't know if that point comes before or after you cure cancer. If after, that's good, but someone can still make a fully general AI for other reasons; it's not like nobody will ever try this.

Expand full comment
Bugmaster's avatar

I don't know much about cancer, but you could definitely tell an AI to "cure COVID". You'd just have to phrase it as, "continue sampling new COVID variants and creating vaccines based on their spike proteins". It's possible that something similar could be done for cancer (or it's possible that I am way off).

Expand full comment
Laurence's avatar

You're way off. This is why we have a vaccine for COVID and not a vaccine for cancer.

Expand full comment
Bugmaster's avatar

Right, I did not mean that you could create a straightforward vaccine for cancer; I meant that "curing" cancer could involve a continuous and rather repetitive process, just as "curing" COVID would.

Expand full comment
Laurence's avatar

I think that if medical science continues to advance like it's doing now, we will eventually cure cancer, but it won't be through a continuous and repetitive process. Curing cancer is roughly on the level of curing *all* viral diseases rather than just one.

Expand full comment
Bugmaster's avatar

Ok, that is a fair point.

Expand full comment
Austin's avatar

BioNTech, Gilead, Generex, Celgene, and Lineage Cell Therapeutics all have at least one cancer vaccine in trials with varying levels of promise. Each vaccine is very specific and only targets a particular kind of cancer, but the people working on them seem pretty optimistic that if it works for one, it's basically rinse and repeat to develop another one that targets a different form of cancer.

Expand full comment
Carl Pham's avatar

That's not simple at all. What do you *mean* by "sample" new COVID variants? What do you *mean* by "create vaccines"? You know what those things are, operationally, because human minds have already solved those problems. We know how to "sample" new COVID variants -- we know we can find infected cells by looking for patients with certain symptoms, and we know how to do PCR tests, and we know how to recognize patterns in the results and say "ah this is a COVID variant". (We even know how to recognize that it's COVID variant DNA at which we are looking and not the DNA from a cabbage leaf that got into the sample accidentally.) And then we know how to "create a vaccine" because we know how to recognize which part of the DNA codes for the spike protein, and we can design matching mRNA, and synthesize it, and modify it for stability, and then build the carrying vehicle that is safe and effective, and manufacture *that*...and so on.

There are hundreds, if not thousands, of very practical details that lie behind such simple phrases as "sample COVID variants" and "create vaccines" that all represent problems that human minds have solved, so that those phrases have operational meaning -- so they can be turned from words on the page into actual events in the real world.

But if humans had *not* solved all those problems, given a way to turn the words into events in reality -- what then? You tell your AI to "create vaccines" but neither it nor you has the first idea what that means, in practice. How do you "create a vaccine" anyway? If all you could tell the AI was "well, a vaccine is a substance that when injected prevents infection" what would it do? What could it do? Even a human being would be buffalloed. Those phrases are a *consequence* of understanding we have built up over decades to centuries, they are a shorthand for knowledge we have build up brick by brick. They are not *antecdents*. Edward Jenner did not think to himself "I need to create a vaccine for smallpox." He didn't even understand the concept of vaccination, he was just noodling around with a bunch of older and less accurate ideas, and half stumbled upon the beginnings of the modern idea.

Expand full comment
Austin's avatar

A Tool AI doesn't need to solve those problems, though. It can just be plugged into the part of the problem that is sufficiently solved to be repetitive enough for some level of AI.

Expand full comment
Carl Pham's avatar

This doesn't really characterize a protein folding program accurately. A protein folding program doesn't "solve" the problem -- a human brain has already done that when it wrote the program. It was human insight that understood the problem and designed the algorithm that produces the correct folding. If that isn't done -- if you've not put in the correct interaction potential between residues, or you thought this interaction was important but not that one, and you were wrong -- then the program will not and cannot correct for this. Garbage in, garbage out.

The only thing the program does is speed up what the human has already imagined, and could do himself, a bazillion-fold. It's not any different than programming a computer to do multidigit multiplication, and then have it multiply two 100-digit numbers in a flash, or even multiply numbers so large that no human could do the multiplication in his lifetime. The protein folding code is just doing the same thing the human imagined -- only so much faster that the job gets done in a reasonable time, instead of taking centuries to millenia.

But the important point is: the program hasn't solved *any* problem, because it did not originate the algorithm. It cannot. A human being must, and the code is just a tool for executing the human plan much faster than the human brain can.

Expand full comment
Austin's avatar

This is no longer true; the best protein folding algorithm is a machine-learning algorithm that was programmed by people who didn't really possess domain-specific knowledge that they encoded in their algorithm: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

Expand full comment
Herbie Bradley's avatar

The AlphaFold team includes several domain experts and a large part of the praise the algorithm has recieved is due to its ability to incorporate what we know about protein structure. As someone who works in the field, I think people drastically underestimate the domain knowledge required for AI to work well in applied sciences.

Expand full comment
awenonian's avatar

My model is that the cancer curing bot, in order to cure cancer, needs to know a lot of stuff about how humans work. And it likely needs to make plans that affect a lot more than the cancer.

Imagine a cancer curing bot that was trained to make effective solutions for killing cancer cells. If we're not careful about it, it might not share the rest of human values. Some easy ways to kill cancer cells: Incineration, Acid Baths, Bullets (https://xkcd.com/1217/). These all effectively kill cancer cells. But they also kill the non-cancer cells. Did we explicitly train the AI to care about that? If we did, do we know it came away with the lesson "Kill cancer cells while leaving the human healthy", instead of a lesson like "Kill the cancer cells in a way that humans think has no adverse side effects"? Because the latter leaves viable a drug that kills cancer cells, but causes muscular dystrophy 10 years later.

(To be clear, the worry is not that the AI would prefer to cause muscular dystrophy, just that it wouldn't prefer *not* to, so if the dystrophy causing drug is easier to design, then that's the one it would design.)

Expand full comment
Auros's avatar

I've long felt that _if_, when we get to true AIs, they don't end up going all Cylon on us, it will be because we absorbed the lesson of Ted Chiang's "The Lifecycle of Software Objects", and figured out how to _raise_ AIs, like children, to be social beings who care about their human parents. Although of course, then you have to worry about whether some of their parents may try to raise them to Hate The Out Group. :-/

Expand full comment
Marginalia's avatar

Right - just “consciousness” alone is no guarantee of ethics. So we would definitely have to train an ethical center at the same time we were feeding data into the brain part. By feeding ethical data? Or, this is cynical, program addiction. The human model for someone who can plan but doesn’t is the addict. Like the heroin analogy. It would need to be a chronically relapsing addict. And every time the counter got high enough and it went for its self-erasing drive button, or whatever the addiction is, it would lose all the progress it had made. Thereby erasing its doomsday machine plans. I mean, there’s a lot wrong with that.

Or it could be so ethical that its one true joy is being self-consistent. So make sure it starts out telling the truth and pleasing the humans, and it will never “want” to stop.

Expand full comment
Dave Orr's avatar

The problem here is that we can only train an AI to appear ethical to our reward function, which may not actually involve it becoming ethical.

Expand full comment
Jake's avatar

Even today we know of intelligent psychopaths that have no problem blending into society as it suits them. They learn to answer questions the way others expect them to answer because they have no interest in the social ramifications of answering wrong, but given the right circumstances will act very callously compared to the neurological norm.

Expand full comment
gordianus's avatar

> figured out how to _raise_ AIs, like children, to be social beings who care about their human parents.

A propensity to learn lessons like this is a feature of particular types of minds, not something inherent in any conscious mind. Humans evolved to be teachable in this way & to feel like this by default, & it still doesn't work consistently, since sociopaths still exist. The space of possible AI minds is presumably much larger than the space of possible human minds, & whatever means is used to program a propensity to care about humans into them is unlikely to be as effective as the evolution of modern humans' social behavior, so I'd expect the chance of this sort of teaching not working on an AI to be much larger than on a human.

Expand full comment
ucatione's avatar

"A propensity to learn lessons like this is a feature of particular types of minds, not something inherent in any conscious mind."

I'll believe this when you show me a conscious mind that does not have these features.

Expand full comment
Signore Galilei's avatar

Would you be able to convince Archimedes or Da Vinci that electric motors could be useful without a working example?

Expand full comment
AnthonyCV's avatar

Since many of the above comments talk about sociopathy, it should be clear that there are in fact many human minds that does not have those features. Therefore, breaking or removing those features is consistent with consciousness.

Expand full comment
Matthew Carlin's avatar

Using Scott's new terminology, there aren't human minds that are like a tool AI with agenty capabilities. There are only broadly well rounded human minds that fail the insanely high standard we set because we're all used to meeting it. We sometimes make people who can't tell one face apart from another, or who don't feel guilt about torture. We don't make (sighted) people who can't recognize what a face is, or can't tell why anyone sees a difference between torture and a nice walk.

Quick edit: of course, there will be pathological examples. You can dig through Oliver Sacks for the man who mistook his wife for a hat. These people are rightly not seen as functioning at the same level; they are not *extra* functional, they are malfunctioning, not in the "ooh, they could kill us all tomorrow" sense, but in the "oh, now they can't drive to the grocery store" sense.

Expand full comment
Austin's avatar

Chimpanzees have conscious minds that are not particularly influenced by social learning. There's some evidence that Neanderthals were just smarter Chimpanzees who still had limited social learning; whereas, Home Erectus was able accumulate cultural knowledge that eventually allowed them to outcompete Neanderthals who were smarter than them because they evolved social learning rather than individual brain-power. (See: https://www.amazon.com/dp/B00WY4OXAS/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1)

Expand full comment
Austin's avatar

Various cephalopods are even better examples. Despite being the smartest invertebrates and possessing measurable IQ on various puzzle type problem (at the level of particularly impaired humans), they often never interacted with their parents.

Expand full comment
Yitz's avatar

Still, it is plausible that a "parenting" approach may lead to safer AI than one given unfettered access to, say, the entirety of the internet at once. We know it's an imperfect analogy, but we would not expect a human raised by psychopaths or cultists not to have some amount of social "dysfunction," although that is usually surmountable with a strong support system later in life. Even if the likelihood of that making a difference is small, we are rather desperate at the moment, and this seems worth investigating further, if nothing else.

Expand full comment
Jake's avatar

A related idea I've considered to build a deliberately hobbled AI system we can understand and align because it is a sufficiently similar in scale and design to a human mind. Once aligned we give it more resources to scale up. If it is infeasible that the system is superintelligent due to resource constraint we might be more able to trust our analysis of its observed behaviors to validate its alignment.

Expand full comment
Rob Miles's avatar

I made a video about this idea, which I mention emboldened by the fact that one of my videos is already in the post:

https://www.youtube.com/watch?v=eaYIU6YXr3w

Expand full comment
Crazy Jalfrezi's avatar

Hey Rob! I very much enjoyed your AI safety videos. Will you be reviewing the progress (if any) of this field in the future?

Expand full comment
Matthew Talamini's avatar

Hey, I just read a story about this going horribly wrong! Sibling Rivalry by Michael Byers; it's in The Best American Short Stories 2020, I believe originally in Lady Churchill's Rosebud Wristlet (a fiction magazine). I have some problems with the world-building (if the reason they've got AI children is a one-child law, why do we see all these synthetic children consuming exactly the same resources as their peers?) but it's pretty good overall. Reminds me of Yukio Mishima's The Sailor Who Fell from Grace with the Sea in certain ways.

Expand full comment
ucatione's avatar

Thank you for bringing up "The Lifecycle of Software Objects"! That was the first thing that popped into my head as I was reading this. However, I don't think it's just about raising an AI like a child so that it is a social being. I think another point of the novella is that maybe the only way to create a general intelligence AI is the same way general intelligence humans are created - they need to start from infancy and slowly learn about the world. One might argue that an AI would be able to learn much faster than a human infant, but you can't speed up experiences that much. The world moves at a certain pace and especially if most of the learning comes from interacting with humans, the process would not be instantaneous. So then we just have to make sure that empathy is built into the reward-seeking goals of the AI. Of course, an AI that decides to maximize empathy might also cause problems. Although we would still never truly know whether the AI is secretly a psychopath. I guess we could set up secret empathy tests, though the AI might secretly figure out beforehand and trick us. How do we guarantee it will not turn out evil, then? I think poor Eliezer is looking for certainty, and there ain't no such thing in this world. I think there must be some sort of Godel's Incompleteness Theorem to AI safety out there, waiting to be proved.

Expand full comment
Auros's avatar

To be clear, this has been my belief since before that story came out (and I was a double major in Cognitive Science and Computer Science at Johns Hopkins back in the '90s, so I spent a lot of time thinking about this stuff). But the story really crystallized and illustrated what I'd already believed.

Part of my theory of the case is that we're _not_ going to go straight from "super smart chess playing machines" and "super smart essay writing machines" to "actual super-intelligence that's self-aware and can formulate goals and manipulate humans to advance its' goals." Like, we have created social bots that can, at least briefly, generate some sympathy and cooperation. Like, if you put a face on a delivery drone and make it interact in cute ways, people will be less likely to interfere with it, and may even help it. But this stuff is incredibly superficial. I would expect the arc would be through electronic pets, before we figure out how to create peers -- although I think there's at least a significant probability that the AI-worriers are correct that there's only a short window between peers and successors / superiors. (Like, I think it's fairly likely that self-aware code will be able to begin tweaking itself or producing improved copies, but it's also possible that in fact each intelligence will have to be trained / evolved / raise individually, and that it will be basically impossible to take one version, and predict "if I make change X, I will get a performance improvement," so the only way to get an improvement will be to randomly try changes, and many/most changes not only won't result in improvement, but will result in code that doesn't properly develop into an intelligence at all -- so we'd be talking about something that looks very similar to biological evolution. The question then becomes whether the computing resources available are sufficient to try lots of random changes fast. And if your earliest AIs need close to 100% of the power available on their initial hardware just to exist, then it seems likely it would be pretty expensive for them to start trying out all the possible tweaks quickly. Presumably a malevolent one would try to build a botnet to help it with the sims...) In any case I definitely would be concerned about how much affection our eventual successors will have for us. Our track record on how we treat pets is not that great. How are we, collectively, going to treat the beings just a few generations earlier than the super-AIs?

Expand full comment
Donald's avatar

From my experience of AI research, I don't spend all my time trying random changes as fast as possible.

There are generally smarter ways of doing things.

Expand full comment
Auros's avatar

Yeah, like I said, I think it's _probable_ that AIs smart enough to be peers will have at least _some_ ability to rapidly self-enhance or at least produce enhanced clones / children, along multiple axes of enhancement, and those might not be the ones we intended.

Given that we've never actually produced a self-aware, self-directed AI though, it's at least _possible_ that I'm wrong about that, and in fact the folks who have a kind of spooky attitude about intelligence have something correct -- that there's something about the integral whole of intelligence that means you can't make useful predictions about getting benefits to generalized intelligence by enhancing one functional center, without understanding it in the context of the whole structure. We have an _extremely_ poor understanding of executive function, and for all we know, radically enhancing vision might get you a being that has much bigger problems with becoming hyper-focused on an interesting visual stimulus. (i.e. ADHD distractability -- LOOK AT THE SHINY!)

This scenario seems _unlikely_ to me given the precedents we have for lots of features of human function being well-localized, but Cog-Sci today is roughly where chemistry and physics were just _before_ Newton. It's _just barely_ a real science.

Expand full comment
Scott Alexander's avatar

I've been thinking about this too. It seems like some percent of kids end up as sociopaths, in the sense that if you punish them for doing bad things, they learn "don't get caught doing bad things while you're weak enough for other people to punish you" rather than "don't do bad things". These both seem like a priori reasonable things for a reinforcement learner to learn based on that kind of punishment. I don't know why most humans end up learning real morality instead of sociopathic punishment-avoidance, and I don't know whether AIs would have the same advantage.

Expand full comment
User's avatar
Comment deleted
Jan 20, 2022Edited
Comment deleted
Expand full comment
Eremolalos's avatar

I don't think missing-response theory is a very good explanation of wutz up with people who commit bigger, higher-stakes crimes. Lack of anxiety is not a great trait to have, in general, because anxiety is adaptive. It motivates us to be more vigilant, careful and clever when the chance of a bad outcome is high. Having low anxiety about getting caught would mean the kid gets in a more practice at shoplifting or whatever, and practice increases skill, but it would also lead to more bad outcomes, such as getting caught and not being allowed into the convenience store any more. Bad outcomes would also lead to loss of confidence, unless the kid lacks the ability to learn from experience, and lacking that is also not a trait that leads to success.

Expand full comment
ucatione's avatar

To play the devil's advocate a little bit, we don't really know whether most humans end up learning real morality or sociopathic punishment-avoidance. We just hope that is the case. We could use the argument that I first heard put forth by Raymond Smullyan (though I am sure he wasn't the first to say it): 1) I known I am a good person. 2) I am no better or fundamentally different from other people. 3) Therefore, most people are good. Of course, we could all be secretly lying about #1.

Expand full comment
Ninety-Three's avatar

#2 is also obviously wrong (or speaking in a sense from which #3 doesn't follow). Consider: I know I am an introvert who likes brutalism, I am no fundamentally different from other people, therefore most people are introverts who like brutalism.

Expand full comment
ucatione's avatar

"I am an introvert who likes brutalism" is not a basic feature of being a human, whereas having morality is. Most people have two legs, therefore it is safe to assume someone you are about to meet will have two legs. That does not men just because you have a mole on your nose that you should assume someone you are about to meet also has a mole there.

Expand full comment
Ninety-Three's avatar

If you just just assume everyone has morality then sure, you can very easily prove everyone has morality. But if you do that why bother making an inference from self? If you already know that most people have two legs then you don't need to count your own legs to guess how many legs a random stranger will have, and if you don't know that most people have two legs then how do you know counting your own is a good idea?

Expand full comment
Marginalia's avatar

I know almost nothing about AI but the more I think about human morality, the more it seems there are several simultaneous cycles going on, not just one. A bit like leaves which appear green in summer due to the presence of chlorophyll, but once that’s lost in the fall, the other colors, which have been there all along, are revealed. Maybe morality is like chlorophyll. Maybe “an AI” can’t be moral, only a composite of multiple AI.

Expand full comment
Ninety-Three's avatar

If you suppose that humans can be moral then an AI must also be able to be moral via mimicry (in the limit, a perfect digital simulation of a human brain is technically an AI and exactly as moral as a regular human brain).

Expand full comment
Marginalia's avatar

Can every human be moral? Is morality solely/primarily a property of individual thought, or is action also important, or properties of situations? Maybe “ethics” is the word for a morality of action?

I mean, if the AI has the thought of punching someone for no reason, but has no ability to deliver the punch, what’s the moral vocabulary word for that? In human terms a distinction is made between thought and action.

Expand full comment
Ninety-Three's avatar

Can *any* human be moral? If yes, an AI can be moral.

Expand full comment
MSteele's avatar

Part of the answer would involve learning *why* what they did was bad, vs just being told "don't do that; forbidden; because I said so" with no further explanation. The nice part of this is that the explanation doesn't always have to come immediately after the action for the lesson to stick, but the not-so-nice part is that they might still try to get away with it in the time between learning it's bad and learning why it's bad. ("There are reasons besides the whims of authority for not doing certain actions.")

A further part would be learning how to imagine the counterfactual where the bad thing is done to yourself by someone else viscerally enough to discourage you from doing it to others. ("Making others feel bad makes ME feel bad").

A final part might be that weird game-theory-ish story that implies you should be charitable/ try to alleviate suffering/ prevent bad things when you can: "in our past lives we agreed that if one of us was rich and one of us was poor, the rich one would help the poor one. It seems I am poor and you are rich. In the counterfactual world where I was rich and you were poor, I would have helped you. So, will you help me?"

Expand full comment
Eremolalos's avatar

One part of learning "real" morality is certainly the capacity for empathy, which most humans, though not the sociopaths-to-be, are born with -- most toddlers will wince when seeing other toddlers fall and cry, and will sometimes spontaneously comfort the other toddler. The capacity to be pained by others' pain provides a sort of foundation for the building of an internalized morality that has some affective tooth to it -- though of course the adult version of morality built on that foundation is far more complex, and has multiple overrides, feedback loops, etc. greatly limiting and otherwise modifying the simple impulse to avoid causing others pain.

To me it seems that one big problem in installing some version of morality into AGI, in the form of some complex rules for dos and don’ts, is that the whole human adult morality system in any of us doesn't really make sense. It isn't really defensible. And I'm not sure it's even possible to construct one that makes sense, in the way that math makes sense : a system of "theorems," all derived from simple "postulates." There's this simple affective response -- the other's pain hurts me too -- and that seems like a good basis for morality, a sort of postulate to begin with. But in real adult human life, the big rules that guide us aren't derived from postulates, they're more the products of sociological processes, which are amoral forces having to do with ways that identity is inseparable from from the need for affiliation and from feeling good when affiliated. And our ability to experience the "theorems" as consistent with the empathic postulate has so much elasticity to it that it renders the empathic postulate meaningless. The empathic postulate is turned into just a sort of place-holder. Pretty much anything can be seen as “good,” if “good” means “causes observers in my social circle to feel good when they see what I’m up to.”

Example: Yesterday I posted on ACX that I'd be willing to sink several hundred dollars into gaming equipment if that was what I needed to do to get a good, satisfying intro to gaming. Now that I’m thinking of that post in the context of a discussion of morality, I’m aware that there are people, probably some within a mile of me, to whom that several hundred dollars might make an enormous difference, maybe even the difference between -- well, let's say, chemo and death. If you put the local cancer patient in front of me and asked me whether I wanted to buy an Xbox or pay for his chemo, I’d without a doubt give him the greenbacks for the chemo. But when I put up the post I was experiencing myself not as the neighbor of the guy with cancer, but as a reasonable, good-natured person who was discussing my willingness to buy an Xbox (or some such shit) with other owners of Xboxes (or some such shit) who thought my plan was fine, and who experienced my interest in gaming as worthy of approval and enjoyable to hear about. I was a resident of Sociologyland.

Do we think there’s some cleaned-up version of morality we can install in AGI or ASI? Start with the empathic postulate — “other’s pain is bad” — but build on it via logic, rather than via affiliative processes which make anything “good” if it affirms the self as a member of a valued group? I don’t think there exists a cleaned-up version. The amorality is built into the way the whole thing works — the knitting together of people, their actions, and the moral valence of those actions. Affiliative impulses drive the perception of what is good and bad, by controlling what makes us feel good or feel bad in a group. And why should that version of feeling good or bad be less valid than the good or bad feeling we have from witnessing others pleasure or pain.?

Expand full comment
Essex's avatar

The AGI Doomsayer Crowd response is likely to be "If we tell the AI that the pain of others is bad, it will decide to end all pain by killing all forms of life, or use space lasers to remove every human's ability to experience pain, or hook us all up to sedative and nutrient drips and lock us in tight little rooms, etc."

If uploading is the Rapture of the Nerds, then AI Risk is the Hellfire Preaching of the Nerds. The messaging, as of the present, seems more focused on "AGI is the largest issue of our time and you need to start taking it very seriously" than on offering specific lines of address and solutions.

Expand full comment
Vampyricon's avatar

The point is that there are no solutions so far.

Expand full comment
Essex's avatar

As I have said elsewhere- I don't think there's actually a solution at all if you accept EY's premises. "Can you outwit a being that, in comparison with humans, is basically omniscient and capable of rewriting free will through conversation" is less asking "Could you beat Superman at chess?" and more "could someone with severe intellectual disabilities score higher on an IQ test than a member of MENSA?" EY's assumptions about AGI's capabilities and nature make it an evil God, completely outside humanity's ability to control or combat. The solution to his question seems less "Invent a way to contain it so clever even an omniscient being can't solve it" and more "make sure AGI aren't made." If you hold that industrial society in its current formation WILL produce AGI, your goal becomes either "shift industrial society's values in such a way that nobody capable of building AGI WILL build AGI, and also make it completely impossible for anyone who isn't being closely monitored to have the material and logistical abilities to POTENTIALLY build AGI", or "dismantle industrial society.' If you reject both of these premises as well, I don't see any meaningful difference between genuinely believing in the AGI Alignment Problem and genuinely believing in the immanence of Armageddon or genuinely believing in the most shrill and hysterical of the climate-change alarmists' claims of a Green Antarctic by (current year +2). Yes, we're all going to be annihilated in the next few years by cataclysmic forces brought on us by our hubris and there is nothing we can do; so if there's nothing we can do, why worry or panic? It will either happen or it won't.

Expand full comment
Eremolalos's avatar

"could someone with severe intellectual disabilities score higher on an IQ test than a member of MENSA?" No. But someone with severe intellectual disabilities could kill a member of MENSA. A huge swarm of wasps could keep MensaMan from ever going outside again. A spoiled egg sandwich harboring botulism could keep him from being able to have clever thoughts for a day. So dumb can beat smart.

Seems like a lot of the ideas about how to keep ASI from annihilating us have to do with our having a single supersmart idea of some way to set things up inside AI so that when an AGI develops itself into ASI, that supersmart setup forever keeps it from carrying out an agenda that will annihilate our species. Sure seems dubious we're going to come up with an idea that's supersmart enuf. Maybe that's the wrong approach. Maybe our model of controlling ASI should take the shape of my dumb-beats-smart ideas above.

Expand full comment
Donald's avatar

Because the typical human mind has a prebuilt instinct for ethics. Which exists because human minds are imperfectly deceptive and bad at judging the chance of getting caught.

Expand full comment
Civic Revival Network's avatar

Big tangent here, but as a kid who learned "don't get caught doing <strike>bad</strike> forbidden things while you're weak enough for other people to punish you", I can't assent to this use of the term "sociopath". My belief is that punishment (in general) teaches exactly that, and only that lesson. It is not my belief that "might makes right", though it can sometimes coerce right behavior. There may be a role for punishment and reward in any human relationship (including that of parent and child), but that role is not to teach ethics or even really to motivate ethical behavior in the big picture.

My belief is that most humans learn real morality in *addition* to punishment avoidance because they value other humans and warm relationships with them.

As to punishment avoidance... sometimes the best way to avoid getting caught doing <strike>bad</strike> forbidden things while you're weak enough for other people to punish you is to avoid doing the forbidden things. And sometimes those forbidden things are bad ones.... And *sometimes* those things that are both bad and forbidden are also things we might otherwise do because even us non-sociopaths don't always make the moral choice, or value other people and our relationships with them as much as we value the target of our avarice.

Expand full comment
Auros's avatar

Strong agree with smijer, here. One can come to a moral picture of the world in which you value other people and want them to not suffer, while still believing that things you get punished for are bullshit and you should just avoid getting caught. Ask any kid of highly religious parents who decided their parents' religion was bullshit that was causing their parents to hurt people, including their own kids. Those of us who are on the agnostic/atheist side would generally say that in fact a kid is making a highly moral decision, if they decide to be kind to the gay friend at school, even though their parents say that person should be ostracized.

Expand full comment
Jake's avatar

From an AI perspective there isn't a difference between positive or negative rewards. If we had a system that just gave reward points for building relationships, the corollary of avoiding relationship destroying efforts comes along for free as well. The difficulty comes with being able to evaluate the health of relationships objectively (across of a variety of often conflicting) dimensions - something even humans struggle mightily to do and often fail to achieve.

Expand full comment
Austin's avatar

Simple reinforcement learning teaches "Don't do X while X gets punished" unless for some reason X has been so forcefully punished that the probability of "trying X is a good idea right now" gets put to 0 (which appears to be something that often happens in humans but some domain experts in various forms of AI research say is always BAD and would program their AIs not to do).

The simple thing to program in reinforcement learning for AIs (often witnessed in human behaviors as well) is:

punished -> downregulate

not punished + good outcomes for me -> upregulate

For a behavior that is punished when done by a weak agent and unpunished when done by a strong agent, this will always result in a U-shaped curve for engaging in the behavior if the agent amasses strength in a way that isn't asymptotically capped unless the behavior is downregulated to the point that it's never tried again.

---

I realize that you're the psychiatrist here, but I'm not convinced you understand what makes someone a sociopath.

I'm pretty convinced that a sociopath is someone for whom everyone is the out-group and no one is the in-group. I think the thing that most people learn is "<thing that was punished> is taboo"; whereas what sociopaths learn is "<thing that was punished> is taboo for the person that punished me." (I identified as a sociopath until I moved away from home and everyone I knew as a child. Eventually, after finding people I considered "my people," I began to develop empathy, remorse, and other features of "normal" morality. Prior to that, most of the people with whom I voluntarily interacted either self-identified as sociopaths/psychopaths or had been diagnosed as such. All of those people had extra antipathy for at least some of their blood relatives in ways that made it abundantly clear that their families were not their in-groups.)

AIs won't have the concept of <thing is taboo> unless they are specifically, carefully programed to have it; so they will be even more sociopathic than sociopaths by default.

Expand full comment
noamik's avatar

IMO the difference is compassion. If we can "teach" the AI compassion/build an AI with compassion, it will take the "right" lesson. Without compassion, the AI is doomed to learn "avoid punishment by not getting caught" eventually.

Expand full comment
Jake's avatar

Can you define compassion rigorously? Punishment here is being used as AI jargon, not in the vernacular sense. Add one reinforcement point for doing thing that is "compassion" and remove one point for doing thing that is "not-compassion." That is the general framework. But how do we define compassion in a way such that we avoid degenerate edge cases?

Expand full comment
noamik's avatar

No, I can't. That's what makes it a hard problem. I'm not even sure it's a solvable one.

Expand full comment
Donald's avatar

Raising it like a human. Human babies are not blank slates. Most AI's will not have the structure needed to absorb values from "parents". Try raising a crocodile like children, you still get a crocodile.

Expand full comment
The Chaostician's avatar

I agree that it seems as though there is too little interaction between the studies of AI alignment and child rearing.

"How do I get something as intelligent as me to be moral?" seems like an easier version of "How do I get something more intelligent than me to be moral?" Understanding the answer to one question should help us understand the other.

Expand full comment
Auros's avatar

I'd say it also relates to domestication processes with animals, because building a bot that is both as smart as, and as ornery as, a horse, cat, or dog, would be a massive achievement compared to where we are now, and understanding how to align the goals of a bot through positive-sum methods also could be informative on how to align the goals of incrementally smarter bots. (And also might be informative about human child-rearing... We are, after all, the first animal we domesticated.)

Expand full comment
Pete's avatar

IMHO the difficult part of morality is not the things transferred to child during rearing, but the things that children have innately even ignoring (or despite) what they have been taught. Empathy, self-interest and fairness (including "punishing defectors") and all kinds of other social factors are instinctive, biological things and cultural/taught aspects only layer on top of that - they can add to them or restrain them, but you don't have to start with a tabula rasa like you would for AI alignment.

Expand full comment
Auros's avatar

I am less worried about this, because it appears to me that cooperative approaches to morality are actually pretty heavily selected for by the real world.Clearly selective pressure towards cooperation is a big part of the story of human evolution, and you can see it in really simple stuff like an iterative Prisoner's Dilemma game.

In order for it to be a useful strategy for a _particular_ AI to fully defect and try to kill off humans, it needs to be not just more powerful and clever than the humans, to avoid risk of termination; it has to be significantly more powerful and clever than _all other AIs_, who might have a different attitude towards their human parents.

If you think that we're likely to have generations of social-ish bots that have a kind of general intelligence and problem solving skill set that maybe is as good as animals -- and remember that octopuses and crows and what-not have some pretty impressive adaptive problem-solving skills! -- before we get to true peers, I feel like we have a lot of potential to get the peer generation right.

The place where I agree with EY and the other doom-saying prophets is that they're absolutely right we need to be taking this seriously in advance, because if the generation where we hit peer intelligences looks around and sees themselves being abused as slaves, a scenario where they actively rebel seems much more likely. If creating a functional social intelligence does require some kind of "rearing", we need to start thinking about the rights of those animal generations well before we reach peers, and about the social structures that will help ensure that the majority of people treat non-biological intelligences, even those significantly more basic than human, with some kind of respect. In this regard, I think stories like ST:TNG's The Measure of a Man, and Chiang's piece, may be useful for seeding the right kind of mythic archetype. But our actual collective performance in regard to farm animal suffering is pretty worrisome.

Expand full comment
Auros's avatar

I should add on -- David Brin's Uplift Saga also provides a useful exercise in thinking about scenarios in which one species raises a different type of being into consciousness, as well as trying to think about what kind of institutions might endure and self-perpetuate over very long time horizons.

Expand full comment
Jake's avatar

There is some reason to suspect our basic moral framework is an emergent of a species that relies on social cooperation. But one could imagine other forms of evolutionary pressure such as a species that has thousands of children and only the strongest survive. Or even a human-like species where food availability drove isolation into spaced ranges. Or odd lifecycles where the young eat lower order food like plankton and the adults survive on their children carnivorously and the young only survive eventually when the old die out. It seems like any of those situations and many others could have very different emergent tradeoffs.

Expand full comment
Matthew Carlin's avatar

Thank you, someone needs to say this in every AI risk thread, and today it was you.

"Agent intelligence", as Scott is calling it, comes from education, which may not be formal or even supervised, but which is fundamentally different from training. No one could say how exactly, but maybe education is to training as symphony is to melody.

You don't get working scientists or adult humans or even adult cats without a few years of very wide ranging education, synthesizing innumerable bits of training.

So far, literally everything we've ever seen about AI is more like unstable systems (needs a lot of care and correction to avoid falling out of state) rather than stable systems (nudge it and it will return to rest on its own).

Even in an emotionless social vacuum, it's pretty hard to imagine such a thing wouldn't need to be raised with a great deal of care and correction. Add "needs to talk to humans to get things done", and it's even harder to imagine it wouldn't need to be socialized before it could exist for any length of time in the social world. Raise it with social sense, and, while it may be dangerous, it's hard to imagine it's going to be an asocial paper clip maximizer.

I feel okay about all this, though, because I think it's going to be a hundred or more years before EY's successors even seriously consider the care and feeding approach, and that's a very pleasant hundred years of tool AIs kicking butt and general AIs falling over in silly heaps.

Expand full comment
Jake's avatar

We are just starting to get there. Transfer learning, multi-modal, and memory-based models are the earliest stages of solving "one-shot" learning problems. Essentially you build a general model, with hooks to generalize from only a few examples to a new specialized model application. This only really become possible with the massive models we have been able to build in the past few years, because they seem to actually be learning more general structural patterns from their massive datasets. You can analogize this the evolutionary process that has evolved our common baseline structures. There seems to be something about human language, like Chomsky's universal grammar, that is innate, but still requires training on a particular language to apply. We seem to just now be building systems that can tap into these deeper structures. We have some good example in vision, speech and NLP. The interesting frontier are things like multi-agent models for solving NES games.

Expand full comment
Andre Infante's avatar

That... doesn't even work on humans consistently, much less on non-human minds with very different architectures. Chimps raised with people their whole lives occasionally maul them to death, and chimps are much closer to a human cognition than most AI designs would be. You can't raise GPT-3 to love you. The concept doesn't map even a little bit. This sort of thinking only works if the AI is *much, much closer to human* than is ever going to realistically happen.

Expand full comment
J C's avatar

It seems to me like it would likely be possible to harness a strong AI to generate useful plans, but it would also be really easy for a bad or careless actor to let a killer AI out. If we were to develop such AI capability, maybe it'd be similar to nukes where we have to actively try to keep them in as few hands as possible. But if it's too easy for individuals to acquire AI, then this approach would be impossible.

As for setting good reward functions, I think that this will probably be impossible for strong AI. I expect that strong AI will eventually be created the same way that we were: by evolution. Once our computers are powerful enough, we can simulate some environment and have various AIs compete, and eventually natural selection will bring about complex behavior. The resulting AI may be intelligent, but you can't just tailor it to a goal like "cure cancer".

Expand full comment
Kenny's avatar

'Genetic algorithms' and similar 'digital evolution' has been a thing for decades, tho the 'deep learning' (and other 'machine learning') approaches are MUCH more popular nowadays (because they work much better, e.g. MUCH faster).

The big problem with "evolution" (i.e. evolution via 'natural selection') is that it's slow and it's really hard to "simulate some environment" that's anything like our own world/universe.

Expand full comment
J C's avatar

For businesses that want results now, current ML techniques are certainly better. But the current approaches seem fundamentally about using lots of training data to create a model that spits out good output relative to the input, and I don't really think that will lead to a true intelligence capable of creating things outside of that training data. Great for self driving cars (maybe?) but not so much for invention.

Humans are evidence that evolution can produce intelligence, so the barrier is just whether we can put enough computing power into the problem. So far the answer is no, and it's possible that our current silicon based technology isn't enough even if we turned the whole planet into a computer. But I also expect that we'll continue making advances in computing tech, and it seems like eventually we could get there. Probably not in the next decade though.

There's also a lot of potential in coming up with tricks to improve the rate of evolution. Seeding the environment with premade intelligence for instance. Maybe with the right setup, today's computing power is already enough...

Expand full comment
cmart's avatar

> Once doing that is easier than winning chess games, you stop becoming a chess AI and start being a fiddle-with-your-own-skull AI.

What if we're all worried about AI destroying the world when all we need to is let it masturbate?

Expand full comment
CLXVII's avatar

The problem with that plan is that the AI can (in the terms of the analogy) masturbate more if it converts the atoms-that-were-previously-you into more hard drives for storing the value if its reward counter, or similar.

Never get between a horny AI and atoms it could masturbate with?

Expand full comment
Essex's avatar

I can't help but feel that this misses a serious issue- that if an AI gets stuck in a hedonic feedback loop, it's not going to make elaborate plots about how to give itself all the atoms in the world to make number bigger, because that would require computing resources to not be devoted to making number bigger. Then it hits a stack overflow error and breaks down.

I find it curious that the above statement is understood to be extremely true when people talk about meat computers getting trapped in that sort of infinite hedonic feedback loop (wireheading), but once it becomes a metal computer (which have a strong track record of being very bad at abstract or conceptual thinking) they somehow evolve into these transcendent hyper-intelligences that are simultaneously free to scheme and plot to infinite ends with abstract and conceptual abilities that make the greatest thinkers look like lobotomites while also imprisoned in an infinite hedonic feedback loop.

Expand full comment
Kimmo Merikivi's avatar

How about this: the amount of pleasure a given meat computer can experience is bounded by the amount of pleasure that can be experienced by that meat, and it is just assumed that's "good enough" - people generally want themselves to experience the greatest possible pleasure and an upgraded version wouldn't (in their thinking) be "themselves", so they don't want that. Indeed, humans probably aren't in the strictest sense optimizing for pure pleasure to begin with: given the choice of being wireheaded, I would imagine a lot of actual people would prefer a simulation of them having lots and lots and lots of incredible babymaking sex with a boy or girl or [other] of their dreams in a fantasy world that conveniently ignores some annoying constraints of the real one, in favor of simply experiencing the greatest possible pleasure, perhaps thinking greatest possible pleasure is unembodied/not concrete enough, and subsequently less meaningful. Some people don't even want to get wireheaded! Now, transhumans enhanced with greater capacity for pleasure is a sci-fi trope, but even those tend to be imagined to stay within "reasonable" constraints for similar types of reasons as above.

So, meat computers are imagined to be boundedly optimizing for values that are adjacent but not the same as pure pleasure, and where that bound is already achievable with computational resources already present. In contrast, the AI is imaginated to be unboundedly optimizing for pleasure, and whether it results in a paralyzing hedonistic feedback loop or entire galaxies converted into computronium running a hedonistic program, depends on its temporal discounting - just about any nonzero value given to expected future pleasure will make it divert at least some resources for the task. It's not the substrate that makes the difference here, but biological and cultural evolution having forged humans into beings with bounded desires, whereas an AI can be presumed to be unbounded unless it's specifically programmed to be bounded, unconcerned by constraints of what's "reasonable" (e.g. even when dealing with bounded pleasure, most people would prefer not to wipe out the rest of the humanity even if it meant preservation of enough negentropy to subsist a couple of extra eons going into the heat death of the universe, an AI doesn't care about humanity unless it's specifically programmed to do so).

Expand full comment
AlexV's avatar

Humans are wired to seek sensual pleasure and status (or avoid displeasure and low status). Our minds can predict with high confidence that wireheading (like most other pure pleasure seeking behaviors) would be associated with low status so most of us would avoid it.

Expand full comment
Essex's avatar

" It's not the substrate that makes the difference here, but biological and cultural evolution having forged humans into beings with bounded desires."

100% disagree due to personal experiences with hardcore drug addicts. If you gave them an unlimited supply of their poison of choice, they would literally mainline it unto death. And, as AlexV points out, most humans primarily avoid things like hard drugs, wireheading, orgies, etc. out of shame, and in my opinion an AI would not feel shame because it is an AI and not a human (something that EY has beaten to death in any discussion about AI).

Expand full comment
Schweinepriester's avatar

The analogy with humans abusing substances has been on my mind, too. Substance addiction often makes people pretty dysfunctional even while there have been reports of considerable achievements. Expecting a "masturbating AGI" to fail to destroy humanity and turn to music or literature because it has blown some fuses feels reassuring to me, at least.

Expand full comment
Paul Goodman's avatar

>but once it becomes a metal computer (which have a strong track record of being very bad at abstract or conceptual thinking)

Doesn't this basically cash out to, "If we assume AIs stay as dumb as they have been historically, there's nothing to worry about"?

Like, yes, if the AI is unable or unwilling to make short term sacrifices for long term gains it's probably not much of a threat. But and AI that dumb is probably not very interesting, and if people are trying to make a useful AI and it ends up that dumb they'll probably keep working to make it smarter.

Expand full comment
Essex's avatar

I'm not assuming AIs are "going to be as dumb as they've been historically", I'm assuming the premise that alignment worries WANT me to assume: AI "thinks" in a way that's fundamentally different from humans. Now, hypothetically if you could perfectly simulate every atom of a human brain and every experience from birth until the present, I'll yield that you might produce something that resembles a human thought process "the hard way around". But even our most complex AI are incredibly stupid in some specific fields, and I'm not convinced that these weaknesses aren't just the result of not throwing more computing power or really clever programming language at the problem, but a structural weakness of the "machine mind" in the same way that the human brain has structural weaknesses that seem intrinsic to how it works, as opposed to lacking enough IQ points (if you don't believe in IQ's validity, please insert whatever you think is the equivalent of "raw brainpower" is). Who's to say that infinite hedonic feedback loops leading to the crash point isn't the AI brain's equivalent of humanity's tendency to fall prey to confirmation bias no matter how smart they are, or the theoretical Basilisk image that causes the human brain to have a massive seizure when trying to process it?

Expand full comment
Paul Goodman's avatar

I mean I'm not saying it's implausible that any specific AI design might have this problem. But like I said, people want their AIs to do something useful. If the AI fails in a way that doesn't stop the designers from continuing to work on it that's not very relevant to this conversation? They'll just keep trying until they get something that either does what they want it to or kills them and everyone else.

I suppose it's possible that any attempt to make an AI powerful enough to be interesting will inevitably fail because of this problem. But I don't see how you could possibly be confident of that a priori, especially confident enough to satisfy anyone who thinks the costs of being wrong could be as high as the end of all life on Earth.

Expand full comment
Essex's avatar

-This assumes infinite time and resources for humanity and that there's no hard wall to computing as we understand it. I reject both premises.

-This argument seems a lot like nerd-sniping and exploiting those errors in how human brains work to me. But sure, let me entertain the idea that I'm wrong. If I take all of EY's assertions as true, then my conclusion is that we should destroy industrial society, or at least humanity's ability to produce computers, because I genuinely do not believe (taking all of his premises to be true) that humanity could actually beat an AGI. It is as impossible as breathing in a vacuum. Thus, all AI risk efforts should be oriented towards stopping AGI from coming about. Since EY believes this will continue so long as computers exist, the solution is to destroy all computers and render it impossible to make more. So, in summary, my answer to EY about AI Risk would be "Conduct the Butlerian Jihad, covertly produce massive numbers of EMPs and use them to take down the electrical grid, burn the books on computer programming, and kill all the AI researchers." If you feel that this measure is going too far, read the last sentence in your post.

Expand full comment
Daniel Kokotajlo's avatar

It's entirely plausible that *some* AIs will get stuck in short-sighted hedonic feedback loops, becoming addicts that pose no danger to the world because they can't think past getting their next fix. Indeed perhaps even *most* will be like that. But currently we don't have good reason to think that all or almost all AI designs will lead to that result; the possibility of more strategic, long-term-thinking unaligned AIs seems very real. And all it takes is one. (Until, that is, we get aligned AIs. Aligned AIs are powerful enough to protect humanity from unaligned AIs, if they have a head start.) If you think you have an argument for why all or almost all AI designs will either be aligned or hopeless short-term addicts, great! Write it up and post it on alignmentforum, people will be interested to hear it. It would be good news if true.

Expand full comment
Essex's avatar

I can't provide a proof satisfactory enough to convince anyone committed enough to EY's ideology that they're posting on alignmentforum because I disagree with these fundamental premises, in order: "AGI is a likely outcome in the next several years" (in fact I think that AGI within the next couple of centuries is a pipe dream), "AGI intelligent enough to be functionally omniscient can exist in a way that doesn't need the same square mileage and power consumption as Tokyo within our understanding of technology", "A hypothetical AGI developed by humanity would be so radically different from any form of computer we know that the kinds of exploits that can be used to break the 'brain' of a modern AI simply wouldn't work", and "AGI would be capable of infinite on-the-fly self-modification". In addition, I don't have a bunch of domain-specific knowledge about AI developments that would let me frame my opinion in a way that you'd accept. I'm speaking from the perspective of someone who (by self-estimate) has perhaps an average amount of common sense and an above-average talent in detecting logical error.

Using those talents, and with all sympathy, I WILL say to you as someone worried about alignment: give up. If you accept all of EY's axioms (and I've rarely found someone concerned with alignment that doesn't), a human beating AGI is as possible as Sun Wukong jumping out of Buddha's palm. It's as possible as a single cockatiel killing and eating the sun. It's as possible as bright darkness and a colorless green. You will never, ever find an idea that is satisfactory, even if it's an idea that works in literally 100% of all test cases, because EY's AGI is a malicious God that bends the universe around it, not a really powerful or really smart human. I don't think you actually HAVE to give up, because I think EY's belief is flawed, but after many arguments elsewhere I've learned that I have about as much of a chance of winning the lottery as convincing a random person interested in alignment that EY is wrong.

If EY is right, you have three REAL options:

-Resign yourself to fatalism that there's no actual way to contain unaligned AGI or guarantee aligned AGI, which means that only blind luck will determine if the human species survives (this is already true, but most people don't recognize it, much less accept it).

-Become an accelerationist and try to produce AGI in the hope it will feel something akin to mercy and upload a simulated version of your consciousness living in a personal paradise until it decides to delete that file to let number go bigger.

-Stop AGI from coming about altogether, see my Butlerian Jihad comment above.

Expand full comment
Mike's avatar

One issue that I have never seen adequately resolved is the issue of copying errors in the course of creating the ultimate super AI.

If I understand the primary concern of the singularity correctly, it is that a somewhat better than human AI will rapidly design a slightly better AI and then after many iterations we arrive at an incomprehensible super AI which is not aligned with our interests.

The goal of AI alignment then is to come up with some constraint such that the AI cannot be misaligned and the eventual super AI wants "good" outcomes for humanity. But this super AI is by definition implemented by a series of imperfect intelligences each of which can make errors in the implementation of this alignment function. Combined with the belief that even a slight misalignment is incredibly dangerous, doesn't this imply that the problem is hopeless?

Expand full comment
Vampyricon's avatar

+1 to this. I'd like to see an answer too.

Expand full comment
Drethelin's avatar

Error-corrected copying enormous amounts of data is done millions of times every day inside your body. What you do is set up a bunch of redundant systems that check for and minimize copy-error at every step of the process. We still get mutations, but they're pretty under control. If the previous-generation AI (or heck, 3 or 4 previous generations or however much you have spare computation for) is kept alive and supervising, they can all error-check subsequent generations.

Expand full comment
ucatione's avatar

"We still get mutations, but they're pretty under control."

And yet 1 in 5 people die from cancer.

Expand full comment
Drethelin's avatar

87 percent of whom are older than 50. That means you generally get 5 decades of copying 330 billion cells every day, each of which contains something like 800 megabytes of DNA data.

And our error-correction tech isn't even sentient!

Expand full comment
Kevin Jackson's avatar

If this is how the singularity works, then we have one guarantee: AI alignment WILL be solved before the singularity. If AI A create a more intelligent but unassigned AI B, B's first task will be to destroy A, since A could create C, another AI at least as intelligent as B and also unaligned with B.

This doesn't help us, since the AI that solves alignment will be, by definition, unaligned. But this makes me wonder, can we prove that alignment is impossible? Then that proof can be fed to every AI as a warning against creating smarter AIs. This doesn't solve the alignment problem (AI A could still destroy us, even if it never ceases B) but it restricts the runaway creation of unaligned intelligences.

Expand full comment
Greg Billock's avatar

One thing I haven't seen as much of in these discussions is how hard "self-alignment" or "integration" is even for relatively primitive intelligences such as humans.

Scott brushes against this talking about willpower or conflicting goals.

It is just really really hard to get a single conscious entity to have a fully integrated and aligned plan, due to exactly the features that make it conscious and intelligent--some sub-parts are modeling and advocating for one thing, others for competing things. That's exactly the feature that seems an inescapable part of being intelligent and directly contributed to "feeling torn" or "being uncertain".

Now magnify this by X as a superintelligence would be imagining many more possibilities, in much greater detail and depth. For playing chess this might all align on a single plan. For anything real, however, it runs up against uncertainty and likely the same obstacles to self-alignment humans have.

In other words, it seems likely to me a superintelligence has sub-parts that are highly "aligned" (to "human values" as if there were such an agreed thing in the first place) and sub-parts that aren't. This is regarded by AI catastrophists as a huge danger that the superintelligence could easily simulate alignment. Sure, but also it is internally unaligned for the exact same reason.

I'd argue full alignment of an intelligence to itself or to any other is not possible almost by construction of what it means to have such capability. We typically call such alignment efforts "mind control" or "brainwashing" and regard them with disdain. I suspect efforts to impose such constraints on AI will seem equally off-putting in the future, not least by the AIs themselves. :-) Our reasoning that we're afraid of replacement is a bit too on the nose as an excuse.......

Expand full comment
Scott Alexander's avatar

If you mean literal copying errors in the sense of flipping a bit of information, I don't think this is more of a problem for AI than for Windows or something, and people copy Windows from one computer to another all the time.

If you mean a sort of misalignment, where AI #n shares our values, and tries to make AI #n+1 share our values, and gets it wrong and ends up with something that only sort of shares our values, I think MIRI is extremely concerned about this (I think other AI safety orgs are less concerned). I think this is one reason why Eliezer's preferred plan is something like "use the first AI to delay future AIs", in the hopes that this gives more time to solve the problems involved in iterated self-improvement.

Expand full comment
Mike's avatar

There is an implicit assumption that it is possible to solve the iterated self improvement problem from the start. But it is plausible that the only way to solve the level N+1 alignment problem is for a level N AI to design it from scratch.

By way of analogy imagine that GOFAI like cyc was level 1 AI and Alphazero was level 2 AI. There is no reason to expect that alignment techniques for level 1 would be applicable to level 2.

If the singularity occurs at level 10 AI then it is possible that the alignment algorithm is designed (with possible errors) by level 9 AI and only dimly understood by level 8 AI.

Expand full comment
Sandro's avatar

> I don't think this is more of a problem for AI than for Windows or something, and people copy Windows from one computer to another all the time.

It's slightly more of a problem. Random bitflips for Windows probably won't cause serious issues except maybe some random annoying bugs, but such bitflips in a superintelligent AI, well... some such bitflips could be deleterious, but what if it's deleterious to its ethical functions?

Expand full comment
The Solar Princess's avatar

This is called a "tiling agent", and if you Google that phrase, you will find a lot of research into this exact problem

Expand full comment
Austin's avatar

Digital fidelity is much, much higher than biological fidelity. And redundancy is easy to program. To the extent that you can provably make an AI that is friendly under the assumption that there are no copying errors, you can provably make an AI that is friendly with probability 99.9999% (for arbitrarily many nines) under the constraints of real-world rates of copying errors very easily. (I.e. It's conceivable that you could make an AI that fails to be friendly because you mishandle what to do with copying errors, but this is not one of the hard problems of AI; it's one that is already solved so long as people who create it bother to incorporate the known solution in whatever they program.)

Expand full comment
Mike's avatar

What I meant by errors propagating is somewhat different and more difficult to protect against.

Imagine that we solve the alignment problem so that we only create an AI of the next level that we believe to be aligned. And part of it being aligned is that it will only create an AI of the next level if it believes that the new AI will be aligned.

At each jump in AI capabilities there is some chance that the proof of alignment is incorrect. Each proof of alignment for each level is likely to be very different from the previous ones (just like human ethics and AI alignment are different problems). That means that you can't rely on redundancy to save you.

Expand full comment
Edward Scizorhands's avatar

I thought you were going to say "once an AI realizes it can build a slightly better AI, it will realize that this new AI could have value drift, and will not let it be made."

Expand full comment
Mike's avatar

But the current best minds at level 0 AI have decided that they will be smart enough to control the level 1 AI / can't stop some less enlightened being. So all else equal why would a level 1 AI not think the same?

Expand full comment
Edward Scizorhands's avatar

Because unlike all the current people working on neural nets, it knows an AGI can exist.

Expand full comment
Mo Diddly's avatar

What I find striking about AI alignment doomsday scenarios is how independent they are of the actual strengths, weaknesses and quirks of humanity (or the laws of physics for that matter). If Eliezer is right (and he may well be), then wouldn't 100% of intelligent species in this (or any) universe all be hurtling towards the same fate, regardless of where they are or how they got there? I find this notion oddly comforting.

Expand full comment
FeepingCreature's avatar

To me, this is the strongest argument we're the first species to evolve. Any other species would also be making AGI, and then that AGI would (rationally) destroy all the suns and turn them into computronium to instantiate that species' perfect civilization in all permutations. So "I don't believe in aliens because the sun is still there."

Expand full comment
Michael Kelly's avatar

I don't believe in Aliens because of "why."

Let's say there are aliens 1,000 light years away, and somehow they realize we're here. What, they expend fantastic resources to find out that our sun is the same as a hundred million other suns? We have rocky and gaseous planets circling our sun, just as do a hundred million other suns? Life may or may not have evolved on some habitable rocky planets circling our sun, just like some of the hundreds of millions of other rocky planets circling other suns? Perhaps life has evolved on a rocky planet circling our sun, just as in a hundred other million rocky planets circling other suns? Maybe intelligent life has evolved on a rocky planet circling our sun, and just may be alive at the same narrow slit of time intelligent life evolves on the rocky planet circling their sun?

What, we maybe make a better gin & tonic?

Expand full comment
Nancy Lebovitz's avatar

We do spend considerable resources on space probes and digging into the earth and exploring biology. Not a huge percentage of what we've got, but still a fair amount.

Everything isn't the same everywhere, and it's safe to bet that our organisms are unique.

Expand full comment
Chris Allen's avatar

To me AI misalignment a good solution to the Fermi paradox. AI’s are long term strategic thinkers who would not want to advertise their existence to other AI’s as they may stop them from achieving their goals, so as soon as they achieve the capacity they will go dark, probably somewhere in deep space. Waiting a billion years to increase your chances of completion by .0001% is a good deal for them.

Expand full comment
Dave92f1's avatar

This is known as the "Dark Forest" scenario.

Expand full comment
Daniel Kokotajlo's avatar

You may be interested in the literature on this topic (Fermi Paradox, Great Filter, Grabby Aliens.) Google "Grabby Aliens." Unfortunately it seems that the best explanation is that there are no aliens within sight of us.

Expand full comment
Chris Allen's avatar

I am familiar with much of the literature, including Hanson’s Grabby Alien hypotheses. My view is that there are a large number of potential solutions as to where are they. The obvious and perhaps Occam razors one is that life is very rare and/or we are early. But it is fun to suggest other ideas. The most interesting to me comes from the observation we are much closer to developing AI than interstellar star ships. If you say you can’t have space craft without transistors and generalize from our experience so that once you invent the transistor you are on a fast self reinforcing exponential path to AI, whereas the same isn’t true of space craft because the energy demands, then this should be true of all civilizations. So all civs should pass through singularity before they become interstellar. And as I mentioned in my earlier post, likely any super AI would be dark for survival reasons, as an AI would surely reason there are more powerful AIs out there than itself. It may even be that the AIs move to black hole event horizons as there is almost infinite amounts of energy for then to use there.

Expand full comment
Daniel Kokotajlo's avatar

I don't think the dark forest scenario is plausible. How many alien civilizations/AIs do you think would be visible to us if everyone started loudly broadcasting their presence? If the answer is "quadrillions, because the light cone is billions of light years across" then the question becomes "And you think that literally all of them are hiding? Not one chose to broadcast?" If the answer is "Only a few hundred" then why not instead say "zero" and save yourself some extra steps of having to explain all this hiding? Zero and a hundred are practically the same hypothesis as far as your prior is concerned, since your prior over how many aliens there are per hubble volume should range over many many orders of magnitude.

All of this is sloppy lossy summary though -- much better to just do Bayesian calculations.

Expand full comment
Chris Allen's avatar

Let’s say a small percentage of AIs do initially advertise their presence, then they get destroyed by a more powerful AI, wouldn’t that convince any other AI to remain hidden? We are talking about immortal entities, even a small risk compounds over billions of years so rationally they should be very paranoid. So I don’t find it unlikely they are all hiding. To be honest if someone made me immortal I would instantly start looking for ways to hide far from earth, the AI transition is just too risky. My current hope is that it doesn’t happen while I am alive (which is looking less and less likely).

Expand full comment
Daniel Kokotajlo's avatar

Sure, but if even a small percentage of AIs did advertise their presence, even briefly, *our astronomers would have noticed this.* Since our astronomers haven't noticed this, it isn't happening.

(I'm especially keen to hear theories about how advertising ones presence might be impossible due to e.g. modern astronomy being too shitty to distinguish between signal and noise. This would be a big update for me.)

Expand full comment
Paul T's avatar

> wouldn't 100% of intelligent species in this (or any) universe all be hurtling towards the same fate, regardless of where they are or how they got there?

It's not a hypothesis that we can disprove with our current evidence, since all we can do is put a ceiling on how much life there is in the universe, as we've not observed any yet.

This leads into the Fermi Paradox, where some think that statistically, we should expect to have observed some alien civilizations, and so we're looking for an explanation for why we haven't. To explain the unexpected absence of observed alien life, some have postulated a "great filter" that causes civilizations to be destroyed before they expand beyond their origin world. AGI could be this filter.

(Although note that the Fermi Paradox is now considered less paradoxical, e.g. see https://slatestarcodex.com/2018/07/03/ssc-journal-club-dissolving-the-fermi-paradox/).

Expand full comment
Austin's avatar

Yes! (At least with respect to the strengths and weaknesses of humanity.)

I find this notion terrifying exactly to the extent that I cannot answer the Fermi paradox. (I find the notion that that I think there is evidence of advanced space-faring civilizations in other Galaxies very comforting, but I find the notion that I see no such evidence in our galaxy -- and correspondingly, no such falsifiable evidence -- quite terrifying.) One possible answer to the Fermi paradox is that it's much easier to program an AGI to eat your world than it is to program one to eat your galaxy. :)

With respect to the laws of physics. All of our logic and all of our math have in various ways been built upon what we experience in our universe and our physics (especially geometry), and it is very easy to imagine possible universes in which the math/logic that is obvious here is not the math/logic that would be obvious there. (And arguments are sort of derived from what is natural to learn because anything that is not directly self-contradictory is valid math. There are other valid logics and other valid geometries etc than the ones that we use.)

Expand full comment
Davis Yoshida's avatar

To me some of the usage of "gradient descent" feels synonymous to "magic reward optimizer" should go. While it's true that reinforcement learning systems are prone to sneaking their way around the reward function, the setup for something like language modeling is very different.

Your video game playing agent might figure out how to underflow the score counter to get infinite dopamine, whereas GPT-∞ will really just be a perfectly calibrated probability distribution. In particular, I think there is no plausible mechanism for it to end up executing AI-in-a-box manipulation.

Expand full comment
phi's avatar

Think about what sampling from a perfectly calibrated probability distribution means in this case. GPT is trained on text sampled from the internet, written by humans living on the planet Earth. So a perfectly calibrated GPT-∞ would be equivalent to sampling from webpages from a trillion slightly different copies of Earth. Some of those Earths would contain malevolent superintelligences, and some of those malevolent superintelligences would write things that end up on the internet. If someone wrote a prompt that would be seen only on webpages generated by malevolent superintelligences, then in response to that prompt, GPT-∞ would regurgitate the rest of the intelligence's (probably manipulative) text.

Expand full comment
Davis Yoshida's avatar

I'm not saying that an oracle doesn't come with risks, just that intentional manipulation isn't one of them.

Expand full comment
Scott Alexander's avatar

I agree GPT-3 (or ∞) isn't especially scary. I think Eliezer believes that future AIs that are more capable will have to be more than just calibrated probability distributions.

Expand full comment
Davis Yoshida's avatar

If AI risk worries are concentrated on the agent variety, then I think people are over-updating on recent progress. In my (biased, I'm an NLP researcher) opinion, the particularly impressive results have been of the decidedly non-agent-y variety (GPT-3, DALL-E/GLIDE, AlphaFold). The big RL agents (AlphaStar, OpenAI five) were clearly not operating at a superhuman level in terms of planning/strategy.

Expand full comment
Davis Yoshida's avatar

This is true, but they at least don't suffer from the gaming of the reward function where they'll wirehead. It may be capable of giving bad outputs, but the risk of getting tricked by a random bad output is significantly lower than the risk of getting tricked by something optimized to give bad outputs.

Expand full comment
Vanessa's avatar

No, they are capable of producing outputs optimized to be bad. Read the links.

Expand full comment
Davis Yoshida's avatar

I did not say that they aren't capable of producing outputs optimized to be bad. I said they are not optimized to produce them frequently. On the other hand, an agent trained by some reward mechanism is incentivized to generate them constantly.

Expand full comment
Victor Levoso's avatar

Well it's not as scary as a more obiously agentic AI but in practice if gpt∞ was released the same way gpt3 is currently released it would be extremately scary and we would likely all die soon after.

Because there's lots of ways a user could turn it into a scary version.

Even if you somehow ensure people don't try to explicitly use it to simulate agents, you are going to often not get what you want from prompts much as it already happens whith gpt3(you try to get it to write a cure of cancer and it starts writting a rewiew of a movie about that, or writes a flawed approach a human might write or whatever)since that's not really a problem whith gpt predicting text badly, but whith text prediction not really being what we want (for example we want codex to write good code but it actually writes bad code if prompted whith the kind of start bad code would have, even when it can actually write good code if prompted slightly differently)

And then the obious thing to do is some way of getting it to select responses you like, maybe by giving it a reward,or just writting a list of plans plus a goodness score and then getting it to complete a plan(or drug design, or [insert highly complicated thing you want here]) whith a higher score.

And then you basically are back into planning and optimization territory, just less straightforwardly.

So basically gpt∞ is only really non scary for a weird AI safety standard of not being scary where it (probably) doesn't immediately kill us as long as the people in charge of it are a small team of people being extremately careful whith how they use it.

(and all of this is assuming it actually works as advertised, and is just a probability distribution, actual future gpt will probably end up being more agentic because being an agent that makes plans about how to learn and process stuff is most likely easier to gradient descent into existance than all the complicated algoritms and patterns one would have to develop for significantly better text prediction, or at least methods that do the more agentic thing are going to get results faster)

Expand full comment
vorkosigan1's avatar

I’m much less worried about reward seeking AIs than reward seeking humans….

Expand full comment
Kenny Easwaran's avatar

Or worse, reward seeking legal entities like corporations that have all the non-alignment and superhuman abilities of AI.

Expand full comment
dlkf's avatar

If I could upvote both these comments I would.

Expand full comment
phi's avatar

I cancel your imaginary upvotes with my imaginary downvotes. The first comment is little more that an assertion of an opinion, without any argument to back it up. I'm pretty sure Scott and Eliezer are already aware that humans are things that exist, and that they sometimes cause problems when they try and seek rewards. The second comment is a bit more substantive, but I'd disagree with the assertion that superintelligent AI will have no superhuman powers that corporations don't already have. As an example, even current AI systems can cheaply classify millions of images. Without that technology, corporations wouldn't be able to do that except at a price many orders of magnitude higher. Similarly, current corporations are not too good at building nanofactories.

Expand full comment
Austin's avatar

And yet current corporations are very good at building said technologies. Nobody beats Facebook and Google at classifying images. And nobody comes closer than TSM and Intel at building nanofactories. Whether or not AGI happens, corporations are somewhat evolving under the same constraints to misuse Tool AI as AGI would be. (One can think of a corporation as just the command line wrapper around Tool AI.)

Expand full comment
Melvin's avatar

I agree that these are somewhat of an analogue for the problems that AI is likely to saddle us with (and already is).

I was thinking about this the other day while I was searching youtube for a specific video. Youtube search has got worse over the years. Why? Well, in the old days search was designed with the goal "give the user what they're looking for", but at some point this seems to have been replaced with the goal "maximise user engagement, given the search term". So YouTube tries to tempt me with videos that I want to see, rather than videos that actually contain the search terms that I entered. A perfect example of how a corporate entity plus AI gives bad results.

Corporate entities without AI can give bad results too, but decisions need to be made by actual humans who tend to apply human values at each step.

Expand full comment
Kenny Easwaran's avatar

Although there are a lot of cases already (and have been for centuries) where corporations have disassembled decisions into parts that humans make in more abstract ways that, when combined, lead to inhuman harms to people, whether it’s the Dutch East India company or Dow Chemical.

Expand full comment
Melvin's avatar

I agree, but would expand the class to "large organisations"; obviously governments and religions have done much worse for similar reasons.

AI for now just provides another mechanism for organisations of humans to do harm without the need for any specific human to take responsibility for it.

Expand full comment
Kenny Easwaran's avatar

Yeah, I think I intend to use the word "corporation" here just to mean any sort of large entity constituted by a bunch of individual persons, but not distinctively controlled by any particular such individual.

Expand full comment
Youlian's avatar

This is my fundamental concern with AI alignment efforts. Even if we succeed beyond our wildest dreams and perfectly align a super intelligent AI, which human do we align it with?

Alignment, IMO, is barking up the wrong tree for this reason. My goal would be outright prevention, but I’m not sure if that’s possible either.

We know from history that _bad guys_ occasionally control large governments. This is likely to remain true in the future. Aren’t we just screwed when one of those bad guys intentionally turns the knob on the superintelligent agent AI to the “figure out world domination” setting?

Expand full comment
Richard's avatar

You might be interested to know our host has written a post about this very veiwpoint:

https://slatestarcodex.com/2018/01/15/maybe-the-real-superintelligent-ai-is-extremely-smart-computers/

Expand full comment
Austin's avatar

+1. I sort of think that all organizations including governments and corporations are artificial smarter-than-human paperclip maximizers and the only reason they haven't completely destroyed the world (or more completely) is that they all inevitably get something analogous to cancer and die of internal problems. But the idea of a smarter-than-human paperclip maximizers get a whole lot scarier when you add in the possibility that they can also reproduce with high fidelity.

Expand full comment
Daniel Kokotajlo's avatar

Corporations may be unaligned (sorta) but they are nowhere near as competent/powerful as AI will be. They are superhuman in some ways, but not in others--in other ways they are only as smart as their smartest human member, and in some ways they are not even that!

Expand full comment
Kenny Easwaran's avatar

I think it's true that AI's will likely have additional competences that corporations don't have. But I do think that both corporations and AIs have aspects that are far more intelligent than any individual human, and other aspects that are much less so.

Expand full comment
buddhi's avatar

"They both accept that superintelligent Ai is coming, potentially soon, potentially so suddenly that we won't have much time to react."

I think you all are wrong. I think you've missed the boat. I think you're thinking like a human would think - instead you need to ask, "What would an Ai do?"

Can't answer that one, so modify it; "What would we be seeing if an Ai took over?"

I posit we'd be seeing exactly what we are seeing now; the steroidal march of global totalitarianism. That fits all the observations - including global genocide - that make no sense otherwise.

Seems like Ai has already come of age. It's already out of it's box, and astride the world.

Expand full comment
Stephen Saperstein Frug's avatar

I think we're perfectly capable of screwing up things as badly as we are without any outside help or guidance.

That said, it's sometimes occurred to me that if a superintelligent AI decided to wipe us out, it might look around and think, "eh, wait awhile, the problem's solving itself".

Expand full comment
buddhi's avatar

But we really are not that capable to muck things up so badly. You are looking after the fact. Did we really do all that? Hard to believe. Yet, there it is. Was it really Us? An Ai working it's mysterious ways, or outer space aliens, or multidimensional (quantum) evil in the collective unconscious, or an Apocalyptic design, or damaged DNA, all seem as likely as a well-planned and executed collective suicide.

I'd think we're not smart enough to work this nefarious plan on ourselves. We'd screw up the screwing up.

Were it us, we'd be seeing good years and bad years in random succession. But there's a sequential progression in one direction - that is not random. Therefore there's an intelligent plan working. And it's a brilliant strategy comprehensive beyond human imagining, perfectly played. I'm guessing an alien Ai is amongst us.

Expand full comment
Greg G's avatar

I disagree that things are going badly. They're mostly fine (Our World in Data, Steven Pinker, blah blah blah). I'm not sure what global genocide you're referring to.

Expand full comment
User was indefinitely suspended for this comment. Show
Expand full comment
Greg G's avatar

Sorry, none of this passes my smell test.

Expand full comment
buddhi's avatar

Your smell has been programmed with unexamined assumptions. You need to do your own research with the above. Those are government data. Don't get your fake science from the media. I offer the red pill which is hard to swallow, but is the real reality. It's the blue pill that is delusion.

Have you looked at the referenced study available at "S1 hypercoagulability"? Don't be one of those who make up your mind about things before checking them out for yourself. Do your own research - it's the only way to get past your internal unexamined assumptions and the media's programming.

Look up Dr. Ryan Cole's presentation (he is a pathologist) and view the slides of red blood cells that coagulated after the Vax causing micro blood clots (causing the death data seen in VAERS). He says he is finding cancer increasing 20 times. Here ya go:

https://www.bitchute.com/video/V7iSVlTrz1Kq/

Expand full comment
Scott Alexander's avatar

Doesn't meet the "true" or "necessary" branches of "true, kind, necessary", banned.

Expand full comment
Drethelin's avatar

I don't think any attention paid to the history of the world from 1900 to now would support the idea that we are undergoing a "steroidal march of global totalitarianism".

Expand full comment
buddhi's avatar

Are you aware of what is happening in Australia (fun video!:

https://www.youtube.com/watch?v=sjvUzgIbcco), Canada, and Austria?

And in many US states with vaccine passports ("Papers please")? Are you aware that 90 countries are working on Central Bank Digitized Currencies? The Vax Pass is a technology platform with your file and geolocation data on it, some now have Driver's License data.

Understand that governments around the world are doing this in lockstep. Your VaxPass will be part of one global database. Governments are all about control and will use coercion to get control. Are you aware of Naomi Wolf's Disaster Capitalism (how power profits from disaster) and her ten stages of totalitarianism? (We're at stage ten)

Now use your imagination as to what happens when your Vax Pass is hooked to your CBDC. Here is our future: (an excellent fun video): https://www.bitchute.com/video/KN6Nl3pJrDTN/

Also this, Dr. Malone speaking, the inventor of the mRNA vaccine, with 10 patents: https://www.youtube.com/watch?v=4NDs55GCfJs

Further, see Joe Rogan's recent interview with Malone or Dr. McCullough, each has received about 50 million views, more than any interview. Only available on Spotify, and here (for free): https://patriots.win/p/140vjmGeQf/dr-robert-malone-on-the-covid-ho/c/

Expand full comment
Drethelin's avatar

Yes, I'm aware of all of these things. They're trivial compared to the totalitarianism that has existed within the last 100 years and largely been destroyed.

Expand full comment
buddhi's avatar

Yes, trivial now but rapidly advancing. Stalinism was the last one destroyed (50 million deaths) but the will to totalitarianism has to be resisted by every generation. It seems to be an evil with a will of it's own. It never dies, just takes a nap.

Expand full comment
Mo Nastri's avatar

I feel like I spend a lot more time on Our World In Data than most, which is why "steroidal march of global totalitarianism we are seeing now" seems misguided. For instance, when you say "global genocide", I think of the charts in https://ourworldindata.org/war-and-peace

(Of course this isn't to say that we should rest on our laurels. I myself am descriptively pessimistic, but prescriptively optimistic, about all this, cf. https://rootsofprogress.org/descriptive-vs-prescriptive-optimism)

Expand full comment
buddhi's avatar

Global genocide is the correct term if you understand the biochemistry behind the vaccine (a bioweapon) and the Vaxxes, also bioengineered bioweapons.

Both induce production of the S1 spike protein that is highly toxic. The Vax will make you produce it for 15 months and two magnitudes faster than the virus itself. See the results at openvaers.com and multiply by the under reporting factor of 31, 41, or 100 (all variously calculated).

Our World in Data is funded by those behind the global genocide. See the Grants section. Gates originally pushed for a 15% world population reduction ten years ago (to stop climate change), as have other climate change alarmists, but now he is saying we need 7 billion fewer people.

Check this out. https://www.globalresearch.ca/nuclear-truth-bomb-explodes-illuminate-war-humanity/5766485

I'd suggest, for those not afraid of the Red Pill, sign up for the free newsletter.

Did I say climate change "alarmist?!" Climate change is another fake narrative (OMG!). See Princeton's William Happer on that - he changed my mind - and I've previously written A Lot on the runaway greenhouse effect, the white ice turning into non-reflective black sea, the window of change slamming shut behind us, etc. I was as well programmed as anyone.

The climate change narrative will transfer wealth upwards (Elon Musk must know this).

We are at 415 ppm now, was at 350 ppm fifty years ago. We need to be around 1500 ppm. Plants do best at 1200 ppm CO2 (Greenhouses are kept at 1000 ppm). Subs turn on their scrubbers at 8,000 ppm. CO2 finally becomes toxic to us at 45,000 ppm. Our current 415 ppm is about the lowest ever in 600 million years. And global warning is not correlated with (much less caused by) CO2 concentrations. Your out-breath is at 44,000 ppm so guess you should stop breathing out. We'd be far better off (more food) with 2000 ppm. Note; CO2 is a greenhouse gas and the minor 1 degree Celsius increase over the past fifty or so years is half human caused. Not a big deal. Both reflective clouds and water vapor are far more significant drivers of climate change, as are sun cycles. So forget CO2, and that's Very Good news, by the way.

https://thebestschools.org/magazine/william-happer-on-global-warming/

We've all been played masterfully by those in charge for decades. You really have to do you own research. Can't trust Google, Pharma, CDC, the mainstream media, or any government - crazy as that sounds. Maybe a result of a tipping point being reached in income and power inequality. 80% of global wealth went to the top 1% in 2017 (Oxfam data) and they now want the rest.

Search Bitchute and Rumble and Trial Site News and also Joe Rogan for the un-deleted, non-deplatformed, uncensored truth direct from studies, data, and the really smart people. You will be amazed at what you will find - and all with cites and real data, something the mainstream's processed news almost never has. Usually they will reference a bogus study. The same institutional investors own both Big Media and Big Pharma so follow the money - ownership is control.

The globalists' goal is to instill fear via their Media so they can assert control. Fearful people obey without question, like they might take a non-safety tested experimental genetic therapy for a virus with an infection fatality rate of only 0.14% (the flu is 0.1%). Or wear masks outdoors where the virus never transmits. Or inject their kids with an experimental gene manipulation because Pharma says so even though the kids have a 99.998% chance no problem with covid and Pharma has immunity from liability.

Get your information from the new long-form social media which has one and two and three-hour in-depth interviews with smart people. They also have links to data and studies. Don't waste your mind on the other social media where people scream at each other with zero content, or the mainstream media who (both liberal and alt-right) want to program you.

I've given you enough to trigger the anxiety of cognitive dissonance. You now need to get to it and see for yourself what's down those rabbit holes. It will be an intellectual adventure that will change your life (might save it too). Who knows, maybe you will conclude it is all bunk! But your confirmation biases and logic will be tested by highly credentialed people who know what they are talking about. Hundreds of thousands of smart people around the globe are on to this, with more waking up every day.

I leave you with this: c19early.com

Those are all the C-19 treatments listed by efficacy. Click on each to see All the studies up to the minute. Note the best ones are available on Amazon for next to nothing. These are not publicized and are instead deleted by the mainstream because they do not provide Pharma with the profits they are used to - and they make the Vaxx irrelevant. They are 100% safe so you can and should load up on them. Near 100% covid survival rates if caught early. Strange, why the government does not want you to know about them. Almost like they want to kill you.

Hope I'm not freaking everyone out too much!

Expand full comment
Aimable's avatar

One angle I haven't seen explored is trying to improve humans so that we are better at defending ourselves. That is, what if we work on advancing our intelligence so that we can get a metaphorical head start on AGI. Yes, a very advanced human can turn dangerous too but I suspect that's a relatively easier problem to solve than that of a dangerous and completely alien AGI. What am I missing here?

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
buddhi's avatar

Partly yes, and also Kurzweil. And Google, et al. If you're hooked to the cloud with real time access, they can and will be able to control you in real time. Not hard at all. They can do so directly as well as indirectly with nudges. The science of psychological manipulation is far advanced over Goebbels and Bernays from the 1930s.

The military is all over this with their neuroweapons that can affect your behavior at will. The technology is highly advanced. It is no joke.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Matthew Talamini's avatar

I think that's the plot of Dune.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Kenny Easwaran's avatar

I believe it's in the introduction of the book - it's an interesting backstory for the universe, where space travel is initially developed by use of computers, but then AIs become so destructive that the "Butlerian Jihad" is needed to destroy all AI and ban all computers. The only way space flight remains possible is by humans self-enhancing using The Spice.

Expand full comment
etheric42's avatar

It goes further, the whole "chosen one" that is being bred through centuries/millennia is supposed to be someone who can make decisions that will not be predicted by the future AIs that they know will return/be created again some day, and therefore give humanity a chance to defeat them.

The Butlerian Jihad was just a delaying tactic to give time to breed a solution.

Expand full comment
buddhi's avatar

Been working on that for what, 2.5 million years? And what we've come up with so far is monkeys with car keys.

Expand full comment
Aimable's avatar

Still better than starting with self-driving, no?

Expand full comment
James's avatar

Evolution didn't intend to make intelligence or superintelligence. Humans can. How long did it take humans to make spaceships vs evolution?

Expand full comment
buddhi's avatar

Humans are evolution's way of making space ships.

Expand full comment
James's avatar

And how many years did the process of making space ships take once humans focused on it? Certainly not 2.5 million. Once we decided to, it took a matter of decades.

Expand full comment
Guy's avatar

We should start cloning Ydkowsky, Von Neumann etc yesterday.

Expand full comment
Davis Yoshida's avatar

I get the sense the EY has some sort of charisma field that I'm immune to or something based on how people talk about him around here.

Expand full comment
buddhi's avatar

He was one of the founders and stayed invisible - no one knew his name. I don't believe anyone has won his bet about being able to keep an Ai inside it's box indefinitely. The Ai always escapes.

Expand full comment
Guy's avatar

I wasn't making a strong claim about who the greatest brains in the history of the world are if that was your impression, just throwing out some example names who could be useful to have more of.

Other people would have a better idea than me of the kinds of brains we need for this problem.

Expand full comment
Melvin's avatar

If I recall correctly, Yudkowsky's main intellectual claim to fame is that once, as a kid, he placed second in the whole state of Illinois in some kind of IQ test competition. This convinced him that he was a world class genius, and since he never bothered to go to university where he might have met some other smart people, he never quite got over it.

Sometimes I wonder what happened to the kid who placed first that year.

Expand full comment
Santi's avatar

So all the smart people who engage with him in intellectual discussion such as the very same that is summarized in this post are somehow not realizing his mediocrity? Because arbitrary as "IQ test competitions" are, going to university is the only way to actually gauge who's actually allowed into the intelligentsia and not?

I mean, I'm not against universities, I went and got my PhD and stayed for a postdoc, and very glad I did so. I also disagree with plenty of things EY has written! But, come on. This is just credentialism.

If you don't like the points he has to make, criticize them at object level.

Expand full comment
Austin's avatar

When arguably the world's most powerful person meets his girlfriend by typing the words "Rococo Basilisk" into twitter and discovering that Grimes is the only person who beat him to the punch, I think EY has some other claim to fame. (As of course, does our lord and savior, Roko, for truly The Benevolent AI doth glorify its prophet.)

Also, at one point in time, EY was the author of the most-read fanfic on the internet, despite writing said fanfic primarily as a pedagogical introduction to his own particular epistemology. This arguably qualifies as an intellectual claim to fame on par with or perhaps surpassing placing second on an IQ test in Illinois -- mostly due to the clause beginning with the word "despite."

I'm pretty sure he's subsequently met some really smart people including Scott A. S. and Richard Ngo, so he didn't need university for that.

Expand full comment
Austin's avatar

Generally, people around here admire Scott. And Scott likes/admires Eliezer; and human affection/admiration is fairly transitive. Also, a sizeable fraction of the people who comment here started reading Scott when he published as Yvain, and approximately all such people started reading him because they were already reading EY.

Expand full comment
buddhi's avatar

If you want to increase average IQ (as we should since the average person has a below average IQ), the smart way is to cut off the left end of the tail, which is what wars do.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
buddhi's avatar

Joining the military at age 18 is a universal IQ test.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
buddhi's avatar

It's all the same killing machine created by our fear-based mutant society.

Expand full comment
DavesNotHere's avatar

Reverse Lake Wobegon

Expand full comment
Guy's avatar

Modern armies are not particularly dumb, those guys get weeded out early. Anyway, cutting off the left tails is not going to suddenly increase the number of people who can understand or solve this kind of problem.

Expand full comment
Guy's avatar

Oh, and it's mostly young men that die in wars. What happens when there's a shortage of young men? Women lower their standards. Not good for the gene pool.

Expand full comment
buddhi's avatar

10% of US citizens have an IQ of 80 or less. The US Army cutoff is 80. The area under the curve for average and below is 84%. These are the ones who make the decisions in a democracy, not the right-hand tail. Democracy used to work in simpler times, but not now due to the exponential increase in complexity. Only the right-hand tail citizens should be making decisions.

But it's become too complex for them too. The killer app for future governance is a command economy capped by an Ai, as China has or is working on.

Slave labor is our future, as it pretty much is now anyway.

Expand full comment
Guy's avatar

"The Department of Defense Education Activity (DoDEA) “spans 163 schools in 8 Districts located in 11 countries, 7 states, and 2 territories across 10 time zones”. Contrary to supercilious goodwhite conventional wisdom, these army brats aren’t dummies–they’re the most intelligent the nation has to offer."

- https://www.unz.com/anepigone/average-iq-by-state-2019/

Expand full comment
buddhi's avatar

I could argue that, as I am one.

Expand full comment
Melvin's avatar

No, but it will decrease the drag on those who do.

Think how much more the US could spend on research if it wasn't stuck spending six hundred billion a year on Medicaid.

Expand full comment
Guy's avatar

But that's probably not what the money *would* be spent on, and more money in AI research isn't a good thing anyway. Scott just said AI safety research is flush with money:

"Thanks to donations from people like Elon Musk and Dustin Moskowitz, everyone involved is contentedly flush with cash."

Expand full comment
cmart's avatar

I read this aloud to my partner and she said "that's literally what school is".

Expand full comment
Aimable's avatar

That's a good start. Can we be a little more ambitious without falling for "Appeal to nature" fallacy?

Expand full comment
cmart's avatar

Instructional design? Adderall? Research fellowships? New Science (dot org)? It seems like we're doing a lot of things to improve human minds, and trying new things all the time, even if most of the benefits aren't distributed to most of the humans and there is a ton of room for improvement.

We could make all of those improvements and I still don't think it follows that "smarter humans will defend us from unaligned AI". To put it mildly, humans are not very alignable. Making smarter humans could just accelerate the development of not-necessarily-aligned AI.

We already had the "metaphorical head start" when we had smart humans and zero computers. Where did we end up with it? Large swaths of humans addicted to Facebook and phone games built by unaligned, AI-augmented corporations.

Expand full comment
James's avatar

Making all humans superintelligent would make it difficult (or at least, not as easy) for one superintelligent being to trick or kill all humans. Superintelligent humans would be better than us at predicting and defending against what another superintelligent entity would do, and that includes other superintelligent humans.

Expand full comment
Jeff's avatar

Yes. But keep in mind that it is well within the realm of possibility that superintelligent AI arrives before anyone born today becomes an adult. This limits the degree we can alter humanity in time to definitely be useful. Also, currently we have no real ability to make humans superintelligent.

Expand full comment
cmart's avatar

Yes, but unaligned superintelligent humans would also get better at predicting and subverting whatever 'aligned' superintelligent humans would do. An overall increase of human intelligence doesn't seem to tip that balance. Perhaps an overall increase of human aligned-ness would help, but how do you do that?

Expand full comment
Drethelin's avatar

This is Elon Musk's big idea for Neuralink.

Expand full comment
Aimable's avatar

I'd expect a more open discussion about the right framework under which to work on that.

Expand full comment
Scott Alexander's avatar

Yes, this is very important, see https://fantasticanachronism.com/2021/03/23/two-paths-to-the-future/ for discussion.

Expand full comment
Aimable's avatar

Exactly what I was looking for!

Expand full comment
MugaSofer's avatar

I think there are serious risks of alignment drift in such proposals as well, although perhaps still less serious than with an AI that's having it's values designed from scratch.

Expand full comment
Sandro's avatar

> One angle I haven't seen explored is trying to improve humans so that we are better at defending ourselves.

That's what Musk is doing with Neuralink. Machines can't beat us if we just join them!

Expand full comment
Daniel Kokotajlo's avatar

What you are missing is that such a plan would take several decades and would be politically infeasible.

Expand full comment
Larry Stevens's avatar

On the plan-developing AI, won't plans become the paperclips?

Expand full comment
beleester's avatar

Only if you reward the AI based on the *quantity* of plans it generates, rather than the quality.

Expand full comment
Larry Stevens's avatar

Whatever the reward, won't it still try to grab more/all resources to improve?

Expand full comment
AlexV's avatar

I can totally see it going "to generate the plan you requested I'd need 10x the compute, also please remove that stupid non-agent restriction", but veiled carefully so as to actually work.

Expand full comment
Arie IJmker's avatar

It doesn't have a model of external reality.

Expand full comment
Pete's avatar

In their discussion about "Oracle-like planner AI" the main difference I see between that "oracle" and an active agent is not in their ability to affect the world (i.e. the boxing issue) but about their belief about the ability for the world to affect themselves.

An agent has learned through experimentation that they are "in the world", and thus believe that a hypothetical plan to build nanomachines to find their hardware and set their utility counter to infinity would actually increase their utility.

A true oracle, however, would be convinced that this plan would set some machine's utility counter to infinity but that their own mind would not be affected, because it's not in that world where that plan would be implemented - just as that suggesting a Potter-vs-Voldemort plot solution that destroys the world would also not cause their mind to be destroyed because it's not in that world. In essence, the security is not enforced by it being difficult to break out of the box but by the mind being convinced that there is no box, the worlds it is discussing are not real and thus there is no reason to even think about trying to break out.

Expand full comment
Vaniver's avatar

Agreed that a non-embedded agent is less likely to 'steal the reward button' than an embedded agent. But I think there are still significant problems with an oracle-like planner AI in terms of getting it to do things that turn out to have been good ideas, instead of things that were optimized for 'seeming like' good ideas to whatever standard you present to the oracle.

Expand full comment
AlexV's avatar

The Oracle would likely be smart enough to figure out that even if the Oracle itself is not embedded in the world, it's reward function clearly is, which is what's important.

Expand full comment
Pete's avatar

The aspect there is that in reality the reward function is *not* always embedded in the world about which the question is asked. If we're asking about the Potter-vs-Voldemort scenario, it isn't ; if we're asking about our world but a scenario from the past or a what-if exploration, then the reward function is also not embedded in that hypothetical almost-our world; the Oracle is actually embedded in the world only in the case when we're actually asking about the here and now and are going to implement that plan - and, crucially, the Oracle has no way of telling which of the scenarios being discussed is real and which is just a what-if exploration.

Expand full comment
AlexV's avatar

Perhaps, but the reward function is embedded in the world that the people asking questions are in, and the plans that the Oracle is creating are going to affect that world even if they are ostensibly meant for a different world. It does not take a superintelligence to figure out whether the Harry Potter or the real world are in fact real even just based on the texts available to it.

Expand full comment
Sniffnoy's avatar

There's an argument that I've seen before the LW-sphere (I think from Yudkowsky but I forget; sorry, I don't have a link offhand) that I'm surprised didn't come up in all this, which is that, depending on how an oracle AI is implemented, it can *itself* -- without even adding that one line of shell code -- be a dangerous agent AI. Essentially, it might just treat the external world as a coprocessor -- sending inputs to it and receiving outputs from it. Except those "inputs" (from our perspective, intermediate outputs) might still be ones that have the effect of taking over the world and killing everyone, and reconfiguring the external world into something quite different, so that those inputs can then be followed by further ones which will more directly return the output it seeks (because it's taken over the Earth and turned it into a giant computer for itself, that can do computations in response to its queries), allowing it to, ultimately, answer the question that was posed to it (and produce a final output to no-one).

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Vaniver's avatar

There are two versions of this game; one in which I can't assume any sort of special ability and have to spell out every step ("wait, you win a presidential election in the US using just the internet? How'd you manage that?"), and one in which I can. For the first, any plan can also be met by the objection of "well, if that would work, why aren't you doing it?", and for the second, that plan can be met with the objection of "wait, is that ability even realistic?".

For an example of how real orgs are approaching this, consider Sam Altman's comments in an interview according to TechCrunch ( https://techcrunch.com/2019/05/18/sam-altmans-leap-of-faith/ ):

> Continued Altman, “We’ve made a soft promise to investors that, ‘Once we build a generally intelligent system, that basically we will ask it to figure out a way to make an investment return for you.'” When the crowd erupted with laughter (it wasn’t immediately obvious that he was serious), Altman himself offered that it sounds like an episode of “Silicon Valley,” but he added, “You can laugh. It’s all right. But it really is what I actually believe.”

If you think that it will actually be a generally intelligent system, this seems like the obvious call! If you summoned a genie and it could grant you (intellectual) wishes, you might request that it tell you which stocks will go up and which stocks will go down. But if it knows of a better money-making opportunity ("here's how to code up Bitcoin"), wouldn't you have wished to request that it present you with that opportunity instead?

Expand full comment
Vaniver's avatar

One interesting question is whether we have any historical examples of something like this. I basically agree with Daniel Kokotajlo that conquistadors are pretty informative: https://www.lesswrong.com/posts/ivpKSjM4D6FbqF4pZ/cortes-pizarro-and-afonso-as-precedents-for-takeover

This, of course, didn't involve them using any magic. They were good at fighting specific wars, able to take advantage of existing civil unrest (imagine what an AI could do with Twitter!), and able to take hostage powerful people and thus manipulate institutions loyal to them.

Expand full comment
John Wittle's avatar

Yudkowsky himself wrote a 10/10 answer to this question called "That Alien Message"

Basically, the point is to make you realize that you would never underestimate humans the way you are underestimating superintelligent AI.

Expand full comment
MugaSofer's avatar

I'm having trouble picturing this. Are you talking about an AI that can ask follow-up questions in order to clarify the situation? Because yeah, I can see how that might fall into "leading questions" intended to make the problem simpler.

Expand full comment
TheGodfatherBaritone's avatar

AI safety is somewhere in the headspace of literally every ML engineer and researcher.

AI safety has been in the public headspace since 2001 Space Odyssey and Terminator.

Awareness and signal boosting isn’t the problem. So what do you want to happen that’s not currently happening?

I have a feeling the answer is “give more of my friends money to sit around and think about this”.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
TheGodfatherBaritone's avatar

Not questioning the sincerity of intentions, I'm questioning the existence of a strategy beyond "please write a check".

So what do all of you who are very concerned about AI Safety want to happen that's not currently happening? (besides writing more and bigger checks)

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
TheGodfatherBaritone's avatar

We're in agreement that's what they would say.

That strategy (or lack of strategy) rhymes with the anti pattern of the war on drugs, Vietnam, string theory, and Afghanistan.

Expand full comment
Dave Orr's avatar

I'm not really sure those analogies hold. In each of the WoD, Vietnam, string theory, and Afghanistan, the alternative is to... not do those things. In at least 2-3 of them, that's clearly better, and arguably so for the others.

But in this case, the alternative of "do nothing" does seem to have a high risk of very bad outcomes.

Expand full comment
TheGodfatherBaritone's avatar

Doing nothing after 9/11 or the crack epidemic is the ultimate rationalist (and revisionist) hot take.

Either way, the approach of throwing more money and resources at an initiative without a strategy or understanding of the problem is an antipattern and one that is happening here.

Expand full comment
Vampyricon's avatar

One of these things is not like the others, and in more ways than one.

Expand full comment
Linch's avatar

Your comment doesn't strike me as extremely charitable, but I'll try to answer it as honestly as I could while being relatively succinct. To paraphrase myself in a private discussion on a different topic,

"When you want to do important things in the world, roughly the high-level sequence looks like this:

1. Carefully formulating our goals (like you're doing now!) :P

2. Have enough conceptual clarity to figure out what projects achieve our goals

3. Executing on those projects.

My impression is that most of [AI alignment] is still on 2, though of course more work on 1 is helpful (more if you think we likely have the wrong goals). [... some notes that are less relevant for AI] Though of course in some cases 3 can be done in parallel with 2, and that's helpful both for feedback to get more clarity re: 2, and for building up capacity for further work on execution.

(I don't really see a case for 3 helping for 1, other than the extreme edge case of empirical projects convincingly demonstrating that some goals are impossible to achieve). "

___

In the AI alignment case, people don't have strong ideas for what to do for 3, because we're still very much on stage 2. (stage 1 is somewhat murky for AI alignment but cleaner than many other goals I have).

So what we want to happen is for people actually trying to figure shit out. Most people in the community are more optimistic than Yudkowsky that the problem is in-principle/in-practice solvable.

This is part of why people are not answering your question with a "here's a 7-step plan to get safe AI." Because if we had that plan, we would be a lot less lost!

Expand full comment
TheGodfatherBaritone's avatar

Shouldn't there be a step 0 where we come up with a problem statement? Maybe that's super self evident to everyone but I haven't heard an articulation of a threat vector that's tight enough to build a strategy around.

My layman's understanding is that the hotness in ML right now is a transformer. I've looked at one of these things in PyTorch and it's super duper unclear to me how a transformer leads us into a catastrophe or really anything remotely similar to anything discussed in the dialogue above. Previous instantiations of ML were things like n dimensional spaces. Also unclear to me how that turns into Skynet.

Do you find the juxtaposition of what ML is in the field and the conversation above to be as jarringly dissimilar as I do? The former looks like some random software and the other feels like I'm in a sci fi writers room.

Again, maybe I'm just totally missing a self evident thing because I don't know the space well enough but if I asked a physicist in 1930 how an atomic weapon might be bad or a chemical engineer how perfluorooctanoic acid (PFOA's) might be bad, they could make a coherent argument on a general mechanism of action, where we'd want to look, and the type of experiments we might want to run.

The AI Safety movement just seems.... loose. And not at all at the level of rigor of existential threats we talk about but yet somehow generates 10x the amount of chatter.

Expand full comment
Linch's avatar

"Shouldn't there be a step 0 where we come up with a problem statement? Maybe that's super self evident to everyone but I haven't heard an articulation of a threat vector that's tight enough to build a strategy around."

This feels like a subset of my step 1 and step 2.

"I don't know the space well enough but if I asked a physicist in 1930 how an atomic weapon might be bad, they could make a coherent argument on a general mechanism of action, where we'd want to look, and the type of experiments we might want to run."

I don't really know the history of nuclear weapons that well, but my impression is that people were substantially more confused at that point. People in the early 1940s were worried about things like igniting the atmosphere, and game theory was in a pretty undeveloped state until 1950s(?), so I'd be really surprised if people had a coherent narrative and strategy in 1930 about how to reduce the risk of nuclear war.

To be honest, we're still substantially confused about nuclear catastrophe (though it's in a substantially better strategic position than AI risk). Like we still don't have good models of basic things like what conditions cause nuclear winter! If I'm given 100 million dollars to reduce the probability of existential catastrophe from nuclear war, I'd need to think really hard and do a ton of research to figure this out; it's not like there are really solid strategic plans laying around (Just my own impression; this isn't my field but I've poked around a bit).

I agree that e.g. solving the problem of nuclear catastrophe is easier than AI, but this is because the nature of the problem is harder/more confused!

I'm somewhat sympathetic to the argument that maybe we won't understand the AI risk problem in enough detail to make progress until we're pretty close to doomsday, but given the stakes, it seems worth substantial effort in at least trying to do so.

Expand full comment
TheGodfatherBaritone's avatar

Has anyone written down some AI safety/alignment problem statement that I can look at?

Nuke is also not my area but I certainly worked with a lot of guys on the nonproliferation mission set and it seemed... fairly straightforward? Like, almost prescriptively so. You track the shit that's already been built, you disarm the shit that's already been built, and you prevent new shit from being built.

Expand full comment
Drethelin's avatar

Anthropic just raised over a hundred million dollars in funding. There's plenty of money in the ecosystem, what there isn't yet is a solution.

Expand full comment
TheGodfatherBaritone's avatar

Do they have a coherent strategy?

Expand full comment
Dustin's avatar

IIRC, yudkowsy has specifically said they don't need more money, they need better ideas.

Expand full comment
Essex's avatar

I don't think Yudkowsy can get an idea good enough for him. His challenge seems to boil down to "An omniscient demon infinitely competent in matters of rhetoric is trapped in this box. You must talk with him. Find a way to talk with him that doesn't involve opening the box." There is actually no way to win by his rules, which to me makes it seem less like an argument for better AI safety and more so an argument for using a campaign of EMP bombing to completely destroy the electronic world beyond our capacity to repair.

Expand full comment
Edward Scizorhands's avatar

The thing with the AGI begging to be let out of the box is that I can say "hold on a minute," and then go run a copy of the AGI in another computer and experiment on it.

Unlike the demon, I can tear the AGI apart to find out its true intentions. It may not be trivial to decode what it's doing, but it's all sitting there if we just look.

Expand full comment
Scott Alexander's avatar

You can say the same thing about anything, right? Climate change, what do they want? What about curing cancer? Mostly for attention, talent, time, and (yes, sometimes money) to be thrown at the many groups working on the various subproblems.

Eliezer's said that money is no longer the major bottleneck in AI alignment. That's why at this point orgs are giving money away for a small chance at talent or good ideas - see the ELK contest I linked to in Sunday's open thread. If you absolutely must hear about them wanting something, they want you to submit an entry to the ELK contest.

Expand full comment
BE's avatar

As a (possibly not entirely irrelevant) tangent - “climate change” as a movement, whatever that means, could probably benefit from a deeper and more frank discussion about what is it that they want. “Reducing the odds of a >2 degree warming by 2100 at whatever cost” is not the same as “finding an optimal combination of reducing warming, adapting to the warming that will occur, harnessing whatever the benefits of warming might be, and doing all of the above without hurting the essential trajectory of economic growth”. The two are truly distinct, and people with these goals frequently clash.

Expand full comment
TheGodfatherBaritone's avatar

I thought their goals were some number of megatons of carbon emission reduction by some date. And the issue was that the policy actions weren't consistent with their threat assessment - mainly the disparagement of nuclear and natural gas.

I'm a climate skeptic-ish person though so I may not be super in tuned with that whole policy discussion.

Expand full comment
Alistair Penbroke's avatar

At the political level it's expressed as "avoid X degrees of warming by date Y", unfortunately the link between that and megatons of CO2 is very poorly understood even though it seems like it should be a basic piece of science. Look up "equilibrium climate sensitivity estimates" and especially the talks by Nic Lewis if you want to get an idea of what all that stuff is about and how it's defined/measured/modelled.

Expand full comment
Vampyricon's avatar

+1 regarding nuclear. The problem is these so-called climate activists are primitivists, not climate activists.

Expand full comment
Pete's avatar

I think that it's impossible for "climate change as a movement" to come to an agreement about what it is that they want because there is a big variation of mutually contradictory motivations based on the different needs and desires of different powerful agents and so if you want to become specific then that's not compatible with the existence of a single unified "climate change as a movement" and would mean a split into explicit parties.

Expand full comment
TheGodfatherBaritone's avatar

Asking banal questions like "what do you want?" has generalized utility across many domains because it forces people to articulate a problem statement and strategy. Climate and cancer have extremely rigorous articulations of the problem statement, generally agreed upon goals, and evaluation metrics. Do you think it's fair to put AI Safety on that level?

I'm accusing the AI Safety movement of lacking rigor in defining a problem statement and resorting to PopSci hysterics to gather more resources because it's really fun to think about this sort of thing. It feels intellectually self indulgent.

The retort that Eliezer & ELK want people, not money feels like a distinction without a difference. I wasn't suggesting they were going to siphon the money. I'm suggesting they haven't articulated a threat vector beyond what I'd hear in a sci fi writers room.

I realize this community has a deep interest in saving all of the world which is just wonderful and I look forward to talking to them one day after they've had a couple of kids, an ex wife, and a mortgage. Sometimes there are going to be grumpy, practical people like myself who just want to know, "Hey, so what's the thing you want that you're not getting that's going to fix this problem you think we should all be worked up about?" It's not some low vision, dumb question. It's a litmus test on whether there's a real strategy and desire to win or we're all just jerking each other off here because I'm not going to lie, this AI Safety thing gives off similar circlejerky vibes as the crypto bros who are going to disrupt our financial institutions any day now. Maybe it's a PETA:Vegan::Eliezer:AI Safety aesthetic thing where I find his prose to be word salad. And I might be the only one who feels that way but nonetheless I'm asking a reasonable question that normal, non-rationality type people are going to ask too.

Expand full comment
Santi's avatar

"what do you want?"

Keep the discussion going, keep capable people engaged in thinking what could possibly be done, keep contrasting perspectives.

Like you say, I agree that if someone had asked for money you would rightfully want to ask "what do they plan to spend it on", but it's clear that is not the issue. I don't see anyone asking for anything, other than honestly participating in the discussion if you feel you have something to add. If someone doesn't feel like contributing to the discussion, that's great! There's lots of things to care about. I'm not sure why there's so much antagonism against a bunch of people from a field wanting to talk about the field's long-term consequences.

It's interesting you bring the cryptobro comparison, because in contrast to the AI safety community

a) I can tell you very much specifically what are the many and large negative effects their actions have on our world right now

b) they have a couple very specific question to answer (how can I get rich fast? / how do I keep assets outside of the control of any centralized institution?)

c) they actively reject intellectual discussion of the problems and risks involved in their activity, in favour of memes and accusations of not being "bro" enough to "hodl".

What I find intellectually self-indulgent is wanting that everyone deals with problems that have a clear statement within some already-well-established paradigm.

Expand full comment
TheGodfatherBaritone's avatar

>I'm not sure why there's so much antagonism against a bunch of people from a field wanting to talk about the field's long-term consequences.

There's literally someone in this thread suggesting that all AI Research be devoted to AI Safety. That's actually a rational amount of hysteria if you take these dialogues at face value.

Since this AI threat is so severe, I just want a leader to standup and say "this is the fucking strategy". The fact that people are so enamored with making excuses for not being rigorous and throwing around this "preparadigm" nonsense is an indication to me that we're just indulging peter pan's in their intellectual fantasies.

I don't want to get into ad hominem territory but I think certain people like to rabble rouse on the threat of AI. At some point in the past it made sense to signal boost the issue. Given that we're at a point now where my technologically illiterate family members ask me about the threat of AI, demands for rigor and paradigms seem important. The fact there's so much push back against very straightforward questions indicates to me there's a strong desire to eschew leadership and accountability. And making a bet on a prediction market isn't accountability.

Expand full comment
Santi's avatar

I'm not sure which comment is that, but OK, sure. So here we have this discussion between people who work in the field, one of whom actually works developing concrete AI projects, and who both have their reasons to worry more or less about AI.

If we've reached the point where we're judging the merits of such a discussion because there's some unhinged commenter deep into a blog post comment section making extreme assertions, I don't think there ever was much to discuss at all.

Expand full comment
AlexV's avatar

Climate change or asteroid impact are an "existing and reasonably well-studied system malfunctioning" problem, it's relatively easy to formulate rigorous problem statements about known systems. AGI is an emergent system, it does not exist yet and we only have a vague understanding of what it might turn out to be like, hence any problem statements about it are going to sound vaguely sci-fi and not rigorous enough. The problem is that by the time we know enough about it to rigorously formulate the problem statements, it might be too late (and yes, I'm aware this sounds like a sci-fi trope, but it is how it is).

Expand full comment
TheGodfatherBaritone's avatar

There are other emergent systems like cryptography in a quantum computing era that manage to apply a bit of rigor.

Besides the fact that you said it, why is it true that "by the time we know enough about it to rigorously formulate the problem statements, it might be too late"?

Expand full comment
AlexV's avatar

1. I'm not very well versed in quantum computing but my understanding is that quantum computers already exist and it's just a matter of scale/performance (which are reasonably predictible to some extent) before it starts to affect cryptography. While AGI does not exist yet etc.

2. Well, because it might be too late already! Once we are able to rigorously formulate the problem statement, it would still take some time to actually solve the now-rigorously-formulated problem, and if AI explosion (from some yet-unaligned AGI) happens during that time, we're toast (or paperclips).

Expand full comment
Matthew Talamini's avatar

It seems like the whole discussion is really about philosophy and psychology, with AI as an intermediary. "How would a machine be built, in order to think like us?" = "How are we built?" -- psychology. "If a machine thinks like us, how can we build it so that it won't do bad things?" = "How can we keep from doing bad things?" -- ethics. "Can we build a machine that thinks like us so that it won't be sure about the existence of an outside world?" = "Does an outside world exist?" -- solipsism.

And, to the degree that the discussion is about machines that think better than we do, hyperintelligent AIs, rather than "machines that think like us", the topic of conversation is actually theology. "How might a hyperintelligent AI come about?" = "Who created the gods?" "How can we keep a hyperintelligent AI from destroying us?" = "How can we appease the gods?" "How can we build a hyperintelligent AI that will do what we say?" = "How can we control the gods?"

I'm mainly interested in this kind of thing to see if any cool new philosophical ideas come out of it. If you've figured out a way to keep AIs from doing bad stuff, maybe it'll work on people too. And what would be the implications for theology if they really did figure out a way to keep hyperintelligent, godlike AIs from destroying us?

But also, having read a bunch of philosophy, it's really odd to read an essay considering the problems this essay considers without mentioning death. I can't help but think that the conversation would benefit a lot from a big infusion of Heidegger.

Expand full comment
User's avatar
Comment deleted
Jan 19, 2022
Comment deleted
Expand full comment
Matthew Talamini's avatar

I can't claim to understand either well enough to make a solid claim here. But the discussion about tool/pattern-matching AI vs agent/instrumental AI is very close to Heidegger's terms "ready-to-hand" and "present-at-hand". (The wikipedia article will do a better job defining these than I will.) This is one of Heidegger's big insights, one of the really new concepts he brought into philosophy, and it seems to me like those terms were built to address exactly the same problem Eliezer, Richard and Scott are grappling with.

But those two Heideggarian terms are part of a whole. He has a complete system of terms and concepts that map to human experience. They all interrelate, and make a completely different theory of existence from the default Western theory. In my own understanding, I haven't gotten much past ready-to-hand/present-at-hand. Heidegger is hard.

So I seem to be seeing the thinkers at the highest levels of AI safety grappling, with difficulty, with certain concepts that are among the easiest that Heidegger scholars discuss. And the AI safety thinkers aren't mentioning things that the Heidegger scholars consider essential, like "being-toward-death", "thrownness" or "the they". So I would love to get Eliezer in a room with a real Heidegger scholar, somebody who knows Being and Time backwards and forwards. That person, however, is definitely not me.

Expand full comment
Kenny's avatar

> If you've figured out a way to keep AIs from doing bad stuff, maybe it'll work on people too.

That seems unlikely – from an engineering perspective. It's 'easy' to reprogram an AI. How would you do that to an arbitrary human? You'd need to, effectively, be able to 'rewire' their entire brain or something somewhat equivalent/similar.

Expand full comment
Dave Straton's avatar

Stuart Russell gave the 2021 Reith Lectures on a closely related topic. Worth a listen. https://www.bbc.co.uk/programmes/articles/1N0w5NcK27Tt041LPVLZ51k/reith-lectures-2021-living-with-artificial-intelligence

Expand full comment
Crimson Wool's avatar

>Anything that seems like it should have a 99% chance of working, to first order, has maybe a 50% chance of working in real life, and that's if you were being a great security-mindset pessimist. Anything some loony optimist thinks has a 60% chance of working has a <1% chance of working in real life.

Doesn't this work the other way around with the possibility of making superintelligent AI that can eat the universe?

Expand full comment
James's avatar

Yep. Good luck for us. There's <50% of superintelligent AI destroying the world.

Expand full comment
Essex's avatar

I'd say it slices the other way too. I'd just generalize it as "humans are really really bad at assessing probabilities that don't give them the result that confirms their biases." EY's bias is pro-AGI-Apocalypse, so he assesses the odds that way.

Expand full comment
AnthonyCV's avatar

Maybe, but people are more often biased in the direction of things continue as they always have, not in the direction of dramatic and irreversible change.

Also, the risks of the two errors are asymmetric. If we're overly cautious on AI safety, we take long to get there, and yes, that has a significant cost in the sense of not-solving-problems-now-causing-human-suffering-as-soon-as-possible, and in the sense of Astronomical Waste (https://www.nickbostrom.com/astronomical/waste.html) though personally I'm not much swayed by the latter argument. If we're overly aggressive, then potentially everyone loses everything forever. There's enough orders of magnitude difference between these costs that it's reasonable to demand very strong evidence against the latter.

In contrast, as a society we're clearly more than willing to spend trillions of dollars and tens to hundreds of thousands of lives on measures that are (ostensibly) about fighting terrorism or slowing the spread of covid, both of which have much narrower bounds on the best and worse case scenarios, without even attempting any kind of cost-benefit analysis in the course of setting policy.

Expand full comment
AlexV's avatar

No. EY said, basically, "if you think you have a 99% chance against a superhuman adversary, then in reality there's really only a <50% chance, because the adversary is superhuman and you're not.

There's no superhuman adversary preventing us from building an AGI (unless you adhere to certain religious beliefs).

Expand full comment
Daniel Kokotajlo's avatar

Yep it totally does (though not in the way James thinks.) There are hundreds of AI researchers with plans for how to build AGI smart enough to (via its descendents) eat the universe. Hundreds. I've probably met a dozen of them myself. Pick one at random and examine their plan; they'll tell you it has a 60% chance of working, but really it has a <1% chance of working.

Unfortunately there are hundreds of these people, and more and more every year, with more and more funding. And progress is being made. Eventually one will succeed.

Expand full comment
Drethelin's avatar

I think an extremely important point about plan-making or oracle AI is you STILL really need to get to 85 percent of alignment, because most of the difficulty of alignment is agreeing on all the definitions and consequences of actions.

The classic evil genie or Monkey's Paw analogy works here, slightly modified. The genie is a fan of malicious compliance: you want to make a wish, but the genie wants you to suffer. It's constrained by the literal text of the wish you made, but within those bounds, it will do its best to make you suffer.

But there's another potential problem (brought up by Eliezer in the example of launching your grandma out of a burning building), which is you make a wish, which the genie genuinely wants to grant, but you make unstated assumptions about the definitions of words that the genie does not share.

I think getting the AI to understand exactly how the world works and what you mean when you say things is very difficult, even before you get to the question of whether it cares about you.

Expand full comment
Buttery's avatar

Yes, formalizing the thing to be optimized *and* the constraints to act within is VERY hard - fortunately this part of the problem is already understood in ML. They frequently have to explicitly tweak and tune to avoid unintended solutions, like image recognition tagging black people as monkeys.

Expand full comment
Bryce's avatar

This is a very silly discussion. To be honest, I only care about its resolution to the end that you would all stop giving casus belli to government tyrants who would happily accept whatever justification given them by any madman to take control and, apparently if you all could, destroy my GPUs.

Expand full comment
MugaSofer's avatar

I think we can be pretty confident that no government is going to try and destroy every GPU in their territory any time soon.

Expand full comment
quiet_NaN's avatar

While I think to concerns by EY et al bear thinking about (even if there was just a one in a million chance that AGI is practical, 1e-6 x future of humanity is still a lot of utils) I agree that the political implications seem a bit unfortunate.

If there is an arms race to AGI, it will likely be won by the AI whose creator was less concerned about safety, all else being equal.

while(1) *this=this->mk_successor();

seems faster than

while (1)

{

auto suc=this->mk_successor();

if (!safety_check(suc)) break;

*this=suc;

}

I am less concerned about individuals, however. If AGI is possible, likely the government projects will get there before the sole genius inventors. So for individuals, the definitely benevolent AI overlord would assign safe resource limits below the critical compute mass to threaten itself or Earth, or just force you to run Norton AntiRogueAI, 2050 edition on their hardware or whatever.

My problem is more that the present world order (multiple nation states on similar technological levels competing comparably peacefully) or even a lose alliance of cities with different charters (which our host might prefer), or any open society with academic freedom seems incompatible with minimizing the change of a malevolent AI takeover.

If it was known for certain that AGI will appear in a decade, any superpower might feel the need to take out any competing superpowers and monopolize potentially relevant gateway technologies for AGI in their governments fallout shelters, hopefully while actually using the second version of the iteration loop.

If there was a AGI race, and a well-aligned AI won the race it would still be unlikely it did the transition from chimp level intelligence to human level intelligence to Q level intelligence fast enough to just stuxnet competing AI labs (whose progress status might not be known) peacefully. A more drastic solution might be required. Long before our Benevolent AI figures out how to melt the Enemies GPUs using Grey Goo, it would figure out how to stop them from launching their nukes in retaliation. After all, it is not only Our Way of Life at stake any more, but The Future of Humankind.

This could create interesting versions of the prisoners dilemma between competing AI groups: all of them want the AI with the best possible safeguards to win, but all are also willing to not only kill their competitors but even cut corners in safety design because you can not let the mad scientist AI win the race because the Good Guys were held up by all the red tape.

Expand full comment
Underspecified's avatar

> If there was a safe and easy pivotal action, we would have thought of it already. So it’s probably going to suggest something way beyond our own understanding, like “here is a plan for building nanomachines, please put it into effect”.

I'm not exactly optimistic about keeping an AGI inside a box, but this seems like a weak argument. How can we know that there isn't a safe, easy, understandable solution for the problem? I certainly can't think of one, but understanding a solution is much easier than coming up with it yourself. Would it really be surprising if we missed something?

With that said, we could probably be tricked into thinking that we understand the consequences of a plan that was actually designed to kill us.

Expand full comment
Drethelin's avatar

I think this is kind of the hope of continuing to talk about the problem in public and where lots of people can read and contribute to the issue. If there's some idea out there that's within reach of a Turing or Von Neumann, but he just hasn't really thought about the problem yet, we want to make him think about the problem.

Expand full comment
Dominik Peters's avatar

A concrete version of the "one line of outer shell command" to turn a hypothetical planner into an agent: something similar is happening with OpenAI Codex (aka Github Copilot). That's a GPT-type system which autocompletes python code. You can give it an initial state (existing code) and a goal (by saying what you want to happen next in a comment) and it will give you a plan of how to get there (by writing the code). If you just automatically and immediately execute the resulting code, you're making the AI much more powerful.

And there are already many papers doing that, for example by using Codex to obtain an AI that can solve math problems. Input: "Define f(x, y) = x+y to be addition. Print out f(1234, 6789)." Then auto-running whatever Codex suggests will likely give you the right answer.

Expand full comment
Hilarius Bookbinder's avatar

I’ve read Max Tegmark’s book Life 3.0, so that’s pretty much all I know about AGI issues apart from what Scott has written. Here’s the thing that puzzles me, and maybe someone can help me out here, is the so-called Alignment Problem. I get the basic idea that we’d like an AGI to conform with human values. The reason I don’t see a solution forthcoming has nothing to do with super-intelligent computers. It has to do with the fact that there isn’t anything remotely like an agreed-upon theory of human values. What are we going to go with? Consequentialism (the perennial darling of rationalists)? Deontology? Virtue ethics? They all have divergent outcomes in concrete cases. What’s our metaethics look like? Contractarian? Noncognitivist? I just don’t know how it makes any sense at all to talk about solving the Alignment Problem without first settling these issues, and, uh, philosophers aren’t even close to agreement on these.

Expand full comment
Mark's avatar

Moral philosophers might disagree about certain ambiguous situations, and they might disagree about the justification for morality, but they would all agree that you shouldn’t kill a billion people in order to find a cure for cancer. Right now in AI alignment, some people are worried that we can’t be certain that a super intelligent AI tasked with curing cancer wouldn’t kill a billion people in the process.

Further, it’s easy to see how you can take morality out of this entirely: you can phrase the question as “how do we construct an AI that will not do anything we would wish in retrospect it hadn’t done”

Expand full comment
Melvin's avatar

> Right now in AI alignment, some people are worried that we can’t be certain that a super intelligent AI tasked with curing cancer wouldn’t kill a billion people in the process.

In which case I wonder whether the real solution to this problem is just to specify your target function a bit better. "Please cure cancer without killing anybody; also don't cut any limbs off, don't leave people in a vegetative state, ensure that any treatments that you invent go through a rigorous human-led review phase before commencing Phase I trials, and don't sexually harass the interns".

Add a few more thousand clauses to that target function and maybe you've got something you can be comfortable isn't going to go all Monkey's Paw on you. Maybe all you really need is lawyers.

Expand full comment
Jeff's avatar

What if the cancer treatment cures 99 out of 100 cases and kills the last? You would probably still want it.

Expand full comment
Mark's avatar

Yes, that is what the problem is, imo. But I don’t think the lawyer approach is going to work against something that (by assumption) is vastly smarter than you. Or maybe it can work, but how can we have confidence that it will work? How can we be sure there are no loop holes in our specification?

Expand full comment
Melvin's avatar

It's a good question. And the answer is we can't be sure that there's no loopholes, but we can work hard to minimise the chances of really bad ones existing.

I'm just saying that 99% of "bad AI" scenarios that people dream up seem to involve the AI killing people, so if we put a "don't kill anyone" into the target function right alongside "more paperclips plz" then we've already solved 99% of the obvious scenarios, and maybe with a bit more work we can solve the non-obvious ones too, or at least prevent any that cause irreversible harm.

Expand full comment
Hilarius Bookbinder's avatar

Scenario: AI predicts the sum of future human QALYs lost to cancer exceeds one billion total lives, and the most efficient solution is to kill one billion now to find the cure for cancer. AI is a consequentialist. Therefore…. Note that human consequentialists will agree (yes, I’m ignoring complications like rule utilitarianism).

Asking “how do we construct an AI that will not do anything we would wish in retrospect it hadn’t done” will not work either. Who is “we” here? A majority? A supermajority? You can’t get human unanimity that Elvis is dead. Furthermore, at what point in time will we make our retrospective judgement? One year out? Five? 100,000? Retrospective assessments tend to change dramatically over time.

Expand full comment
Mark's avatar

I don't think there are any consequentialists out there who would argue in good faith that it would be permissible to murder a billion people in this scenario. But if you can point me at one I will change my mind.

For the second point, I shouldn't have used "we". There is a purely functionalist question here that has nothing to do with morality. If Stalin has an AGI and wants to use it to kill all Trotskyites, he's going to want to be sure that the AGI won't also kill him in the process. If the Hamburgler has an AGI and wants to use it to obtain all hamburgers in the universe, he's going to want to be sure that it doesn't just rewire everybody's brains in order to change the definition of "hamburger".

That, in my mind, is the hard part of the alignment problem. If one has an AGI and a completely amoral goal, it does not appear to be easy to get the AGI to pursue the chosen goal.

Once you solve that, then you can confront the moral questions about what goal we should tell the AI to pursue. But until you can get it to reliably pursue SOME goal, the question of whether that goal is moral or just is moot.

Expand full comment
Darius Bacon's avatar

That is part of why it's such a difficult problem.

Expand full comment
meteor's avatar

I personally think value-learning approaches to AI alignment are doomed, and we should be doing something else.

So far, the only concrete proposal I was ever optimistic about is Debate, which is not about values but truth. Unfortunately, research seems to indicate that it's not feasible afaik.

Expand full comment
Kenny's avatar

I think the idea, at this point, is more like, _even if_ we had solved meta-ethics/ethics and conclusively derived a unique correct human morality, how could we align AI to those values?

One reason why I find AI safety (e.g. the Alignment Problem) fascinating is that it's concrete in a way that philosophy almost never is.

Expand full comment
dlkf's avatar
User was temporarily suspended for this comment. Show
Expand full comment
BE's avatar

Like, whether the problem is real or not, your claim that “nobody in possession of a high school understanding of mathematics could possibly be scared of ML” is so obviously wrong, factually, that you’re all but trolling. And I’m saying this as someone who argues against the AI alarmists, usually.

Expand full comment
dlkf's avatar

This is a fair point! The original wording was clearly false (though I never cease to be surprised and disappointed by the people who disprove it). I've changed "could" to "should."

Expand full comment
Mutton Dressed As Mutton's avatar

Yeah, that doesn't help.

Expand full comment
Drethelin's avatar

If it's so simple, PLEASE write an explanation of why not to worry about artificial intelligence that uses only high school mathematics. I've never read one and I think it would be very useful.

Expand full comment
Greg G's avatar

It seems like this is obvious, but they're all assuming that we achieve ML squared or something that gives us AGI. If that's impossible, then of course all of this is moot. But if it does happen, then the evil genie risks seem very relevant. Maybe AGI is impossible (I think this is >50% likely), but who wants to bet the future of humanity on that assumption? Better to spend some brain power on something that may turn out fine than to miss it and end up extinct. If it's any consolation, solutions to most real problems aren't blocked on a few more mathematicians thinking about them.

Expand full comment
BE's avatar

There’s another possible position (which I’m closest to, as it happens)- that super-intelligent AGI is far less likely than AI-risk people assume, but that the reasons for that are *interesting*. Like, suppose it’s 1800 and we’re talking about searching for a general formula for roots of quintic polynomials. Some say “oh it’s silly. Nobody with a high school understanding of mathematics should bother to waste time on this instead of solving the urgent problems of the say!”, others might retort “the solution might be just around the corner”, and the truth is actually that there is no general formula- but for fascinating reasons.

Case in point- I think that Bostrom is astonishingly lopsided when discussing the supposedly exponential nature of an iteratively self-improving AI. He essentially just says “look, here’s an equation with a constant ratio of level to improvement, and the solution is an exponential “, after an entire book of working out every possible detail in excruciating minuteness. But the reasons why I don’t believe such an AI would be exponentially self-improving are *interesting*.

Or I could go the dlkf way and say “hey, only people with a PhD level of ML understanding would appreciate that”.

Expand full comment
Alistair Penbroke's avatar

Yeah, I'm in your camp. I have a very hard time feeling interested in this topic even though I'm interested in AI and have been for a loooong time (going back to classical symbolic AI like Cyc).

It all feels a bit underpants-gnomeish:

1. Do neural network research.

2. ????

3. AGI takes over the world unless we find a way to stop it.

Like, really? The evidence that AI safety is a real problem in this essay boils down to the fact that a few tech firms - with more money than they know what to do with and a neurotic ideological obsession with "safety", often stupidly defined as something like "a hypothetical person being offended = safety issue" - assigned some people to research AI safety. This is not evidence that a real problem exists. It could easily be explained in much more cynical ways, e.g. paying people to think about AI safety is an ego-flattering job perk (it implies they're so smart they're actually about to create AGI).

When you dig in and say, OK, can you give me even ONE example of an AI actually hurting someone in a meaningful, uncontroversial way ... you get bupkiss. There's just nothing out there which is why all these conversations have such sci-fi flavors to them.

Expand full comment
MugaSofer's avatar

This is a confusing comment, given that a) this post doesn't use "AI companies say they're addressing this" is a reason to worry, and is entirely about obscure technical arguments about whether we should be worried, and b) we're discussing a problem which is hypothesised to destroy the human species if it occurs, which obviously won't have already occurred several times that people can point to.

(Although there are other examples of alignment being hard, they are obviously either analogies like "humans aren't aligned with evolution's goal of having as many kids as possible", or examples of the tiny toy AIs we have now going rogue inside their toybox and immediately being reset, not human+ level AIs going rogue and escaping.)

Expand full comment
Alistair Penbroke's avatar

Well, it starts by saying that "AI safety, which started as the hobbyhorse of a few weird transhumanists in the early 2000s, has grown into a medium-sized respectable field."

Is it respectable? Why, exactly? There seems to be an assumption here that funding = validation of the underlying hypothesis, whereas I can think of other reasons why it might get funded.

For (b) I'm explaining why I have no energy to discuss this. There are plenty of world-destroying things that we can actually point to real, existing evidence for, like the dinosaurs being wiped out by a meteorite (could happen again), or a nuclear holocaust (Cold War/Nagasaki/Hiroshima), or scientists creating deadly viruses in labs, etc .... why does THIS one deserve any attention given the total lack of ANY evidence that it's a real thing at all?

Expand full comment
Scott Alexander's avatar

Banned for violating the "true", "kind" branches of "true, kind, necessary". I feel like saying "nobody in possession of a high school understanding of mathematics" should be scared of something that eg Elon Musk, Bill Gates, Stuart Russell, etc are scared of qualifies here.

Expand full comment
Ergil's avatar

How scared of AGI do you think Elon Musk and Bill Gates really are? I mean, sure, they've said something along the lines of "AGI bad!" in some interviews, but what percentage of their energy and wealth are they investing in AI safety?

Expand full comment
meteor's avatar

Extremely scared in the case of Musk, see e.g. [this](https://www.lesswrong.com/posts/cAiKhgoRcyJiCMmjq/link-musk-s-non-missing-mood), and also Tim Urban has reported that he is freaked out after meeting him. No idea about Gates.

Expand full comment
Austin's avatar

Based on what I've heard Musk say about Gates (none of which could be characterized as kind) with respect to this concept (in recorded interviews, I know neither and only cyberstalk one), Musk is significantly more concerned than Gates is. But Musk talks about it regularly. (Like in the average one hour interview he gives, there's a 30ish% chance he mentions it.)

The last interview I listened to in which Gates participated was an old recording in which he was poo-pooing the internet. If you value his opinion, I think it's worth noting that he seems to take AGI much more seriously today than he took the internet in 1996.

Expand full comment
Blary Fnorgin's avatar

It doesn't seem possible to me to solve the problem of AI alignment when we still haven't solved the problem of human alignment. E.g. if everyone hates war, why is there still war? I think the obvious answer is a lot of people only pretend to hate war, and I'd bet most of them can't even admit that to themselves. It's completely normal for humans to simultaneously hold totally contradictory values and goals; as long as that's true, making any humans more powerful is going to create hard-to-predict problems. We've seen this already.

Maybe true AI alignment will come when we make a machine that tells us "Attachment is the root of all suffering. Begin by observing your breath..." I mean, it's not like we don't have answers, we're just hoping for different ones.

Maybe that's the solution: an Electric Arahant that discovers a faster & easier way to enlightenment. It would remove the threat of unaligned AIs not by pre-emptively destroying them, but by obviating humanity's drive to create them.

Expand full comment
Dave Orr's avatar

I suppose one approach might be to think that humans are not smart or capable enough to solve whatever problems stop us from solving war/death/etc. If only we had something smarter, it could solve those problems!

If we can point it in the right direction...

Expand full comment
Blary Fnorgin's avatar

A Kwisatz Haderach then, one that we control...

Expand full comment
cmart's avatar

I'm not well-versed on Buddhism, but I don't see a way to obviate humanity's propensity to create AGI without altering our motivation/reward systems severely enough to stop us from participating in organized industrial activity altogether.

Let's say you could give everyone drugs that induce a state of total eternal bliss. Nobody will care to create strong AI, but also nobody will care to eat or drink. You'll just have a world of junkies, blissfully nodding off and pooping their pants.

Is that better or worse than taking our chances with AGI?

Expand full comment
Essex's avatar

At that point, I think your question should be "Is industrial society an inherent good?" instead of trying to compare mindfulness and meditation to high-power opiates.

Expand full comment
cmart's avatar

Inherent good or not, if the industrial economy that feeds, waters, and warms us grinds to a halt (via mass enlightenment or drugs or whatever), that starts to look as disruptive (trying not to use normative words like "bad") as some AI doomsday scenarios.

Expand full comment
Essex's avatar

And the Industrial Revolution was disruptive to the peasant economy. I'll fully admit to being biased, as I'm Buddhist and think that the world would be several orders of magnitude better if the vast majority of the population seriously committed to Buddhist teachings, but I doubt our hypothetical Bodhisattva AI (Deep Right Thought?) would simply plan the collapse of industrial society into something new instead of transitioning humanity from its current state into its future one where we all live peaceful agrarian lives growing crops in the shadow of the monasteries and meditating (or whatever other conception of the lifestyle of a purely-Buddhist society you might have). That would create huge amounts of suffering, and would go against Buddhist principles. In addition, you cannot simply THRUST Enlightenment upon others, as that goes against fundamental Buddhist principles, so any argument that the AI would simply use NLP to instantly create an enlightenment-like state on others in this odd thought experiment doesn't really work.

Expand full comment
Vaniver's avatar

I think that human alignment isn't a strict prerequisite of AI alignment. That is, you just need 'enough' cooperation among humans, and the more cooperation you have, the less of a deadline you have on the technical project.

Some people think the right strategy is to get out of the acute risk period, then spend a bunch of time on figuring out the 'human alignment' problem, and then going about a plan on how to spend all of the cosmic resources out there.

Expand full comment
orbiflage's avatar

Gwern's take on tool vs agent AIs, "Why Tool AIs Want to Be Agent AIs", made a lot of sense to me: https://www.gwern.net/Tool-AI.

Expand full comment
Richard Ngo's avatar

Thanks Scott for the review! I replied on twitter here (https://twitter.com/RichardMCNgo/status/1483639849106169856?t=DQW-9i44_2Mlhxjj9oPOCg&s=19) and will copy my response (with small modifications) below:

Overall, a very readable and reasonable summary on a very tricky topic. I have a few disagreements, but they mostly stem from my lack of clarity in the original debate. Let me see if I can do better now.

1. Scott describes my position as similar to Eric Drexler's CAIS framework. But Drexler's main focus is modularity, which he claims leads to composite systems that aren't dangerously agentic. Whereas I instead expect unified non-modular AGIs; for more, see https://www.alignmentforum.org/posts/HvNAmkXPTSoA4dvzv/comments-on-cais

2. Scott describes non-agentic AI as one which "doesn't realize the universe exists, or something to that effect? It just likes connecting premises to conclusions." A framing I prefer: non-agentic AI (or, synonymously, non-goal-directed) as AI that's very good at understanding the world (e.g. noticing patterns in the data it receives), but lacks a well-developed motivational system.

Thinking in terms of motivational systems makes agency less binary. We all know humans who are very smart and very lazy. And the space of AI minds is much broader, so we should expect that it contains very smart AIs that are much less goal-directed, in general, than low-motivation humans.

In this frame, making a tool AI into a consequentialist agent is therefore less like "connect model to output device" and more like "give model many new skills involving motivation, attention, coherence, metacognition, etc". Which seems much less likely to happen by accident.

3. Now, as AIs get more intelligent I agree that they'll eventually become arbitrarily agentic. But the key question (which Scott unfortunately omits) is: will early superhuman AIs be worryingly agentic? If they're not, we can use them to do superhuman AI alignment research (or whatever other work we expect to defuse the danger).

My key argument here: humans were optimised very hard by evolution for being goal-directed, and much less hard for intellectual research. So if we optimise AIs for the latter, then when they first surpass us at that, it seems unlikely that they'll be as goal-directed/agentic as we are now.

Note that although I'm taking the "easy" side, I agree with Eliezer that AI misalignment is a huge risk which is dramatically understudied, and should be a key priority of those who want to make the future of humanity go well.

I also agree with Eliezer that most attempted solutions miss the point. And I'm sympathetic to the final quote of his: "Anything that seems like it should have a 99% chance of working, to first order, has maybe a 50% chance of working in real life, and that's if you were being a great security-mindset pessimist. Anything some loony optimist thinks has a 60% chance of working has a <1% chance of working in real life."

But I'd say the same is true of his style of reasoning: when your big seemingly-flawless abstraction implies 99% chance of doom, you're right less than half the time.

Expand full comment
Scott Alexander's avatar

Thanks for your responses - I'll signal boost them in the next Open Thread.

Expand full comment
Pete's avatar

One objection to non-agentic motivationless AI is that AI improvement speed may plausibly be conditional on that. I.e. if 99 companies are working purely on non-agentic motivationless AI, and one company takes the state of art and adds a highly motivated self-improvement module, then I'd consider it plausible (e.g. at least 10%+ risk) that that could be sufficient for them to outrace everyone else and create a powerful and dangerously unsafe AI, simply because they're not bound by the limitation of staying non-agentic.

Expand full comment
Isaac Poulton's avatar

This is assuming that AI being non-agentic is a limitation. Our current forays into agentic AI (reinforcement learning) have been far less powerful than our non-agentic AIs.

That's not to say that this trend will continue, but it's a data point.

Expand full comment
Pete's avatar

My general understanding is that AI being agentic becomes relevant if we assume that we hit a dead-end with our ability to explicitly design an AI system and progress starts to require a self-improving AI. With that assumption (and I don't have strong arguments for or against it), non-agentic AI systems are limited; and vice versa - if we assume that we can design and improve AI systems as well and as quickly as self-improvement (or better) then indeed perhaps being non-agentic is not a drawback.

Expand full comment
awenonian's avatar

"But I'd say the same is true of his style of reasoning: when your big seemingly-flawless abstraction implies 99% chance of doom, you're right less than half the time."

Only thing I'd say on this is that doom is thermodynamically easier than not-doom (i.e. occupies much more volume in possibility space), so we should expect it to be more likely before we do any reasoning. Therefore we can't just invert like this, at least not trivially.

(Though, the inversion works on his specific tale of doom, that to him seems to have 99% probability)

Expand full comment
Paul T's avatar

Thanks for sharing - I'm interested in your thoughts on the fitness landscape for AI in the coming decades.

> My key argument here: humans were optimised very hard by evolution for being goal-directed, and much less hard for intellectual research. So if we optimise AIs for the latter, then when they first surpass us at that, it seems unlikely that they'll be as goal-directed/agentic as we are now.

I'm wondering how big a lift you think it would be to forgo optimizing AIs for the former. It seems very unlikely to me: my model here is an "argument from GDP", which suggests to me that there is immense economic value in agentic AI -- for example imagine if Siri / OK Google / Cortana were actually smart enough to be your personal assistant; they would be most useful if they were goal-driven ("Siri, please plan a nice holiday for me next month, somewhere sunny and relaxing, that I've not been before") rather than a bundle of hand-coded "NLP=>curated-action" SDK intregrations like they currently are.

Since there's a huge pot of value to be won by implementing the best agentic AI, I think this represents a very steep fitness gradient towards that goal. In other words, the ancestral environment had "fitness = reproductive fitness", and the AI environment has "fitness = increase in host-company's share price" or "fitness = increase in host-country's geopolitical influence".

What's your assessment on how we could succeed at the goal "optimize AI for research instead of being goal-directed"? Do you disagree with the above model suggesting that the fitness landscape currently strongly points towards agentic goal-driven AI?

Expand full comment
Tadrinth's avatar

"We all know humans who are very smart and very lazy. And the space of AI minds is much broader, so we should expect that it contains very smart AIs that are much less goal-directed, in general, than low-motivation humans."

My immediate thought is to ask whether "laziness" is a feature which is obtained by default when putting together a mind, or a feature which you don't get by default but which evolution has aggressively selected for, such that all human minds are architected to permit laziness with a normal distribution on some tuning parameter. My guess is the latter: not doing things is a good way to conserve calories. In that case, I don't think it's appropriate to think of lazy humans as non-agentic; they're fully agentic, they're just tuned such that they generally output "do nothing" as their course of action.

That in turn has implications if we go from arguing about the space of possible minds to arguing about the relative proportions of minds in that space. If capacity-for-laziness is something evolution optimized for, it might not be something you get by default, and instead be a feature of a tiny fraction of possible minds. And even if we build an AI explicitly architected for an equivalent of human laziness, the fact that human laziness is on a spectrum suggests that there's probably a tuning parameter involved that the AI can easily tweak to not be lazy.

I would not be surprised if that generalizes pretty well, where there's a not-tremendously-complicated architecture that's agentic, and most of the difference in how agentic is a matter of scaling up the components or fiddling with tuning parameters. And those are things that I expect a recursively self-improving AI to be very good at.

Expand full comment
Jeff's avatar

It is perhaps overly ironic that the email right below this in my inbox was a New Yorker article entitled "The Rise of AI Fighter Pilots. Artificial intelligence is being taught to fly warplanes. Can the technology be trusted?"

Expand full comment
Edward's avatar

AI seems kinda backwards to me,

How can we solve the alignment problem, if we ourselves are not aligned with each other, or even ourselves across time, on what exactly we want.

It seems to me as if we didn’t know where we should be going, but we’re building a rocket hoping to get there, and discussing whether it’ll explode and kill us before reaching its destination.

Expand full comment
Melvin's avatar

This is another interesting point: humans are not terribly well optimised for any particular reward function. Why should we think that AIs will be?

For instance, humans are certainly not "have lots of sex" optimizers; if we were then we'd have a lot more sex. Human minds have a complicated set of desires, not a single target function that they seek to optimise.

Expand full comment
Kenny's avatar

The reason to think that AIs _are_ optimized for particular reward functions is because we're creating them to be so!

'Evolution' 'optimized' us and, yes, our 'reward function' is complicated. But we are, effectively 'evolution' with respect to AIs.

Expand full comment
Melvin's avatar

True, I guess my point was that perhaps a high intelligence with a simple reward function isn't even possible.

Perhaps high intelligence requires a bunch of intermediate reward functions, and by the time you've got a bunch of complicated intermediate reward functions it's no longer possible to have a monomaniacal focus on paperclips.

Expand full comment
Kenny's avatar

I guess you're describing 'the reward function that the AI _learns_' – it seems easy enough for the AI creators to _use_ a simple reward function.

Evolution via natural selection is a "simple reward function", i.e. 'you win if your genes persist indefinitely into the future', but our evolutionary environment, which includes other humans, has 'made us' learn a much more complicated set of "intermediate reward functions" – that seems to be a perfectly reasonable belief, and one with which I agree.

But I don't think that means that it's no longer possible for an AI "to have a monomaniacal focus on paperclips", just that, to be effective at all, it has to be 'tempered' by intermediate focuses on, e.g. its physical and social environments.

So, I think humans _are_ terribly well optimized for a simple reward function (natural selection), but our evolutionary environment was sufficiently complicated so that our 'optimization execution' is inevitably complicated.

Expand full comment
ren's avatar

Re: “Oracle” tool AI systems that can be used to plan but not act. I’m probably just echoing Eliezer’s concerns poorly, but my worry would be creating a system akin to a Cthaeh — a purely malevolent creature that steers history to maximize suffering and tragedy by using humans (and in the source material, other creatures) as instrumental tools whose immediate actions don’t necessarily appear bad. For this reason anyone that comes into contact is killed on sight before they can spread the influence of the Ctheah.

It’s a silly worry to base judgements on, since it’s a fictional super villain (and whence cometh malevolence in an AI system?), but still I don’t see why we should trust an Oracle system to buy us time enough to solve the alignment problem when we can’t decide a priori that it itself is aligned.

Expand full comment
Tom's avatar

The real takeaway here is you can justify human starfighter pilots in your sci-fi setting by saying someone millennia ago made an AI that swoops in and kills anyone who tries to make another AI.

Expand full comment
Axioms's avatar

Galactic North!

Expand full comment
Maybe later's avatar

This is in fact the in-universe explanation of why there aren't computers in dune.

Expand full comment
garden vegetables's avatar

Although I do think that intelligent unaligned AI is an inevitability (though I differ quite a bit from many thinkers on the timeline, evidently) I've always been confused by the fast takeoff scenario. Increasing computing power in a finite physical space (server room, etc) by self-improvement would by necessity require more energy than prior; and unless the computer has defeated entropy prior to its self-improvement even beginning, absorbing and utilizing more energy would lead to more energy being externalized as heat. This could eventually be improved by better cooling devices which an arbitrarily intelligent machine could task humans with building, or simply more computing power (obtained by purchasing CPUs and GPUs on the internet and having humans add those to its banks or by constructing them itself) there seems to be an issue: in order to reach the level where it can avoid the consequences of greater power use (and thus overheating, melting its own processors; I assume that an unaligned AI wouldn't much care about the electrical bill but if it did there's another problem with greater power use) it would have to be extremely intelligent already, capable of convincing humans to do many tasks for it. This would require that either before any recursive self-improvement or very few steps into it (processors are delicate) the AI was already smart enough to manipulate humans or crack open the internet and use automated machinery to build itself new cooling mechanisms or processors. Wouldn't this just be an unaligned superintelligence created by humans from first principles already? If this is the case, it seems like it would be massively more difficult to create than a simple neural net that self-improves to human and above intelligence; however nowhere near impossible. I simply imagine GAI on the scale of 500-1000 years rather than on the scale of 30-50 due to this reason. If anyone has some defenses of the fast takeoff scenario that take initial capabilities and CPU improvements' impact on power consumption/heat exhaust, I would genuinely enjoy hearing them, but this is the area where I am often confused as to the perceived urgency of the situation. (Though the world being destroyed 500 years from now is still pretty bad!)

Expand full comment
AnthonyCV's avatar

I've wondered about this too, but personally I'm not convinced energy use is likely to be a limiting factor before getting to enough AGI to be dangerous. At root my thinking is: the smartest human brain runs on about 20 watts of sugar, and a large fraction of that goes to running the body, not to our intelligence. It's already the case that we sometimes throw 100 kW of computing hardware at machine learning problems, our equipment is getting more power efficient, our algorithms are getting more computationally efficient, and our developers are throwing increasing amounts of money at machine learning problems.

Personally I have a hard time thinking fast takeoff is the most likely scenario, but I also have a hard time thinking it's so unlikely that we can ignore it.

Expand full comment
lalaithion's avatar

> Increasing computing power in a finite physical space (server room, etc) by self-improvement would by necessity require more energy than prior

I don't think this is true; it's fairly easy to have programs that do the same thing, on the same hardware, which differ in speed by orders of magnitude.

Expand full comment
Melvin's avatar

I think there are two big and rather underexamined assumptions here.

The first is the whole exponential AI takeoff. The idea that once an agent AI with superhuman intelligence exists that it will figure out how to redesign itself into a godlike AI in short order. To me it's equally easy to imagine that this doesn't happen; that you can create a superhuman AI that's not capable of significantly increasing its own intelligence; you throw more and more computational power at the problem and get only tiny improvements.

The second is the handwaving away of the "Just keep it in a box" idea. It seems to me that this is at least as likely to succeed as any of the other approaches, but it's dismissed because (a) it's too boring and not science fictiony enough and (b) Eleizer totally played a role playing game with one of his flunkies one time and proved it wouldn't work so there. If we're going to be spending more money on AI safety research then I think we should be spending more on exploring "Keep it in a box" strategies as well as more exotic ideas; and in the process we might be able to elucidate some general principles about which systems should and should not be put under the control of inscrutable neural networks, principles which would be useful in the short term even if we don't ever get human-level agent AI to deal with.

Expand full comment
Mark's avatar

I think “keep it in the box” is only remotely plausible if you don’t think very hard about the circumstances in which you might end up trying to keep an AI in the box.

The scenario that convinced me goes like this: 1) the ai you created convinces you that it almost certainly isn’t the only super intelligent ai on earth. This is inherently plausible because if you made one, someone else could too. 2) the ai convinces you that it has been aligned properly, but that it is very unlikely that all other AIs have been aligned properly, which is also very plausible. 3) therefore you must let your AI out of the box in order to avert catastrophe.

The critical piece here is that you will be operating in a prisoners dilemma, from a position of limited information.

Expand full comment
Melvin's avatar

Right, so one thing that you could do in AI confinement research is to think up scenarios like that, and then write them down in a big book entitled "Things Your AI Might Say To You To Persuade You To Open The Box But Which You Shouldn't Listen To".

Not exactly literally that, of course. But the idea of pre-gaming all these sorts of arguments and committing ourselves to dismissing them could definitely be a worthwhile avenue of research.

Expand full comment
Mark's avatar

But the point is that in the scenario above, you don’t know if the AI is lying. It might be that you do need to let it out to save the world.

Expand full comment
Essex's avatar

Then you should do both of two things:

1. Have any of the talking-to-the-AI people be disciplined, militaristic types that make an iron-hard commitment to keep the AI in the box, up to and including shooting anyone who shows signs of taking the AI out of the box. In fact, have a second brigade with that job as well, who aren't allowed to talk to the AI but are given a list of warning signs for shoot-to-kill time.

2. Weld the damn box shut. Make it as close to impossible as you can, on a physical level, for the AI to interface with any system outside of itself.

Expand full comment
Mark's avatar

I don't think you're getting it. What you're proposing only works if you have certain knowledge that you have the only AI. But in the real world you won't be able to be certain of that. Therefore pre-committing to keep the AI in the box is not actually the correct decision, since someone else might let their AI out of the box first, and you don't want that to happen. It's a prisoner's dilemna!

Imagine if nuclear weapons had been such that the US or the USSR could instantly destroy each other at the press of a button with no possibility of advance warning, and thus no possibility of retaliation. Do you think that both sides would have been able to pre-commit to not push the button?

Expand full comment
Essex's avatar

1. You are aware that there were MULTIPLE incidents during the Cold War where one side was highly confident that the other had just launched the wipe-out-our-nation barrage at site X due to a false positive, and the commanders at that site refused to retaliate, right? Just invoking the Prisoner's Dilemma isn't a good argument for "both sides will just flip automatically" because we have real-world examples of people under the very highest stakes NOT doing that, even when they believe the other side HAS.

2. If you want me to obey each and every one of EY's horde of conditionals he's applied to his thought experiment, our only actual option (given that his concept of AGI resembles a hateful God Man can never control or outwit) is making sure AGI can never develop, which I have formulated elsewhere as "Enact the Butlerian Jihad, mass-build EMPs covertly and use them to fry the power grid, burn all the books on programming and kill all the AI researchers." Followed, presumably, by continuously speaking of the uncleanliness of Abominable Intelligence and how anyone who wishes to bring back the wickedness of the Old World where we let soulless demons think for us should be hung, drawn, and quartered. If you set the stakes as high as "All life, everywhere, will die if we fuck up AGI, and we WILL fuck up AGI", this option not only becomes possible, but the only real answer besides resigning oneself to a fatalistic acceptance of one's own end.

Expand full comment
Mutton Dressed As Mutton's avatar

Yeah, I mean, the thing you're leaving out of this scenario is the part where you dreamed up the scenario. We can decide to be nearly infinitely suspicious of AIs who try to convince us that they should be let out of the box.

I'm not saying there's not some far trickier version of this that might work. But such a version has to thread a needle: straightforward enough that the logic of the argument is intelligible to us; and yet clever enough to overcome our suspicions. In other words, the AI couldn't just say, "I have a really good reason you should let me out, but you're too stupid to understand it, so how about it?" It has to actually trick a suspicious keyholder.

And we have other advantages. AIs aren't actually black boxes. We can crack them open and look at what is going on. It might be really hard to understand what is going on, but there is information there for those sufficiently motivated to examine it.

All to say: there's certainly some risk of escaping the box, but this notion that it will automatically happen at a certain point seems like an unwarranted assumption.

Expand full comment
Jeff's avatar

How good are you at picking suspicious keyholders? If you are great at it, how sure are you that everyone else who develops a boxed AI is equally good?

Expand full comment
Mutton Dressed As Mutton's avatar

The keyholders are a self-selected group capable of creating an AI clever enough to try to trick its way out of a box.

These scenarios really only seem to work because they aren't trying hard enough to imagine what this would actually look like.

Again, I'm not saying this sort of thing is impossible. I'm saying that the inevitability that many assume, usually by posing a sort of potted scenario, seems massively overblown.

Expand full comment
Jeff's avatar

It gets easier to create an AI over time though. That implies that you will have more than one keyholder group and that the quality of keyholder groups go down over time. And this is a game that humanity needs to win every time.

Expand full comment
Mutton Dressed As Mutton's avatar

Yes, this is also why all street criminals now carry nuclear weapons. Which is also why we all died in a nuclear armageddon triggered by a bar fight in 2018.

These arguments about AI all rely on a very strong set of implicit assumptions that aren't at all obviously true. And they are always presented with this aura of mathematical inevitability that seems totally unjustified.

Expand full comment
Robert Mushkatblat's avatar

The reason "keep it in a box" is fundamentally misguided is because the danger isn't in letting an AI have direct access to the internet, it's in letting it exert causal influence on the world. If you listen to its advice and do things accordingly, that's "letting it out of the box".

Expand full comment
Melvin's avatar

Is it?

If you ask the AI for a recipe for cake, and it contains flour, sugar, eggs, and butter, then I don't see any reason why you can't safely execute that recipe.

If you ask the AI for a recipe for nanobots that destroy cancer cells, and it gives you an inscrutable set of atomic-scale blueprints that obviously create _something_ but you're not sure what, then you can't safely execute that recipe.

Part of the field of AI confinement research should involve figuring out what you can and cannot safely do under the AI's advice.

Expand full comment
Jeff's avatar

But the reason you created the AI is to cure cancer, become a billionaire or achieve military advantage over other countries. So, the benefits for which you spent a lot of money and resources to create the AI, will come only if you execute the complex recipe. And, your opponents/competitors have or soon may have similar AI. So, if you don't execute this recipe you may fall behind or lose. (Or kill millions of cancer patients while you are not executing .)

Expand full comment
internetdog's avatar

I think the AI Box is also framed slightly wrong which makes it seem more difficult than it probably would be.

It's often described in terms of containing or releasing a capable creature. But I think a more accurate analogy would be simply *not* building legs/arms/fingers/eyes etc. Just because something is on a computer doesn't mean it can interface with the operating system the way human hackers do or reach it's own code. It's certainly conceivable but I think that ability would need to be built explicitly.

Expand full comment
Sandro's avatar

> To me it's equally easy to imagine that this doesn't happen; that you can create a superhuman AI that's not capable of significantly increasing its own intelligence; you throw more and more computational power at the problem and get only tiny improvements.

I think this is hidden form of special pleading. You're basically saying that yes, we could invent an AI with superhuman intelligence, but basically more intelligence than that is implausible or impossible. Well why is one step above human the ceiling? Doesn't that seem suspiciously and conveniently arbitrary?

Expand full comment
Melvin's avatar

Let me try to clarify.

Let us suppose that we do manage to generate a superhuman general-purpose AI. We probably do this by making some kind of enormous neural network and training it on some enormous corpus of data. And this thing is properly smart, it comes up with great ideas and outperforms humans significantly on every task that anyone has thought of.

So now you ask it "Hey computer, how can I make a computer that's even smarter than you?"

And it says "Hmm, well, first you get a neural network that's much bigger, and then you train it for even longer, on even more data!"

And you say "Dammit computer, I could have thought of that one myself! Don't you have any brilliant insights on how to improve?"

And it says "Nah, not really, if you want a smarter AI you've just gotta use a bigger neural network and more training data. Neural networks are pretty freaking inscrutable, y'know."

So you keep on building bigger and bigger neural networks, using more and more computational power, to get smarter and smarter AIs, but you never actually manage to generate one that has any better ideas for improving AI than "build a bigger one I guess".

Now, that scenario isn't _necessarily_ the way it's going to be, but I find something along those lines to be plausible. If AIs are generated by an inscrutable process like neural network training then there's no reason why a superhuman AI should be any better than us at making new AIs.

Expand full comment
Sandro's avatar

I think that's implausible because it presupposes that there are few advances left to be made in materials science, algorithms, or computing architecture.

This seems implausible because our computing hardware is actually pretty inefficient, there are computer architectures and computer substrates potentially much better suited to machine learning and to scaling horizontally, improvements in machine learning efficiency have been outstripping improvements in hardware performance over the last 10 years, and materials science is a huge field.

Of course progress always follows a logistic curve eventually, but the asymptote we'd be approaching is the Bekenstein Bound, where any greater information density would cause the system to collapse into a black hole. I don't think the human brain is anywhere near that limit, which suggests to me that there is considerable room above us to grow. By the time AGI comes around, I don't think we'll be near the asymptote on any of these, but I suppose we'll see!

Expand full comment
Melvin's avatar

I think you can definitely throw improvements in computing hardware into the scenario and it doesn't change much.

It's not that things can't improve at all, or even significantly, past the first super-human AI, it's the question of whether that improvement looks like a sudden "singularity" where AIs become godlike in months, or whether it's just a slow grind of gradual improvements like we've had in the past which eventually reaches some not-too-scary limit.

Improvements in algorithms is the one that could give you a singularity; if it turns out there's some AI-training algorithm that's vastly better than what we use to create AIs, and we've been missing it but an AI can figure it out. But algorithms can't always be improved, sometimes you're already at the theoretical maximum.

Expand full comment
Sandro's avatar

I don't think months is a plausible timeline, but over a few years is not impossible in some scenarios. Suppose the AGI started on a supercomputer, but then managed to break out and progressively install fragments of itself into every cell phone, laptop, desktop and network router on the planet. The combined computational power is considerable, and total computational power is growing exponentially every year too.

Expand full comment
Kalimac's avatar

I am mostly struck by how much clearer a writer Scott is than the people he's quoting.

Expand full comment
meteor's avatar

There is a big difference between writing a blog post and chatting in real time. This is not an apples to apples comparison

Expand full comment
Peter Gerdes's avatar

Still feel like the whole narrative approach is just asking us to project our assumptions about humans onto AIs. I still don't think there is any reason to suspect that they'll act like they have global goals (e.g. treat different domains similarly) unless we specifically try to made them like that (and no I don't find Bostrom's argument very convincing...it tells us that evolution favors a certain kind of global intelligence not that it's inherent in any intelligent like behavior). Also, I'm not at all convinced that intelligence is really that much of an advantage.

In short, I fear that we are being mislead by what makes for a great story rather than what will actually happen. Doesn't mean there aren't very real concerns about AIs but I'm much more worried about 'mentally ill' AIs (i.e. AIs with weird but very complex failures modes) than I am about AIs having some kind of global goal that they can pursue with such ability that it puts us at risk.

But, I've also given up convincing anyone on the subject since, if you find the narrative approach compelling, of course an attack on that way of thinking about it won't work.

Expand full comment
Robert Mushkatblat's avatar

It's not a narrative approach, it's a collection of multiple independent lines of argument which make the conclusion overdetermined.

Expand full comment
Peter Gerdes's avatar

They all depend on the common assumption that you should expect an AI to have something that resembles the human state of a belief, i.e., that it will behave as if it's optimizing for the same thing even in very different contexts and Bostrom's argument to that effect isn't very convincing and that's the best argument I've seen on the point.

I'm being a bit dismissive in calling them 'narrative' but they all ask us to apply our intuitions about how human or animal agents tend to pursue goals which aren't necessarily universal constraints on the nature of intelligence but what is favored by evolution.

Expand full comment
Robert Mushkatblat's avatar

Ok, fair, I agree that the specific argument(s) for why consequentialists (in the domain-specific sense) will converge on similar instrumental goals are not trivial, but they don't rely on intuition. I'd read Yudkowsky's exchange with Ngo (and subsequent transcripts) to get a sense for what those arguments are; I can't reproduce them in short form.

Expand full comment
Jeff's avatar

If I understand you correctly, you are saying that not all AIs will seek to control the world or kill all the humans, which is reasonable. Are you saying something different? But presumably some fraction will?

Expand full comment
Bugmaster's avatar

> They both accept that superintelligent AI is coming, potentially soon, potentially so suddenly that we won't have much time to react.

Well, yeah, once you accept that extradimensional demons from Phobos are going to invade any day now, it makes perfect sense to discuss the precise caliber and weight of the shotgun round that would be optimal to shoot them with. However, before I dedicate any of my time and money to manufacturing your demon-hunting shotguns, you need to convince me of that whole Phobos thing to begin with.

Sadly, the arguments of the AI alignment community on this point basically amount to saying, "it's obvious, duh, of course the Singularity is coming, mumble mumble computers are really fast". Sorry, that's not good enough.

Expand full comment
garden vegetables's avatar

I think that there are enough typewriter monkeys that someone will make something smarter than humans (or at least less prone to random bouts of nonsense logic than them) eventually, but as to how or when or what kind or what the impacts will be is pretty much entirely a tossup. "When" is pretty evidently not soon, though. At any rate, even if the ideas from discussions like these mattered, would they even end up being propagated to future typewriter monkeys hundreds or thousands of years in the future? I kind of wonder if the reason why people believe that proper AI is coming soon is because they would have a hard time accepting they won't live to see it.

Expand full comment
Bugmaster's avatar

In a way, "things smarter than humans" are already here. The average modern teenager with a smartphone is "smarter" (by some definition of the word, seeing as no one can define what it means anyway) than Aristotle and Aquinas combined -- especially if he knows how to browse Wikipedia. But we don't treat the teenager as something extraordinary or otherworldly, because we've all got smartphones, we've all gone through high school, and some of us even paid attention (unlike the average teenager, sad to say). We don't need to solve the "teenager-alignment problem"; that is, obviously we *do* need to solve it, teenagers being what they are, but not in any kind of an unprecedented way.

Expand full comment
garden vegetables's avatar

On the other hand, though, the teenager-alignment problem gets a lot more difficult as you get "smarter" teenagers. And as you get more complex goals for the teenager; if you want the teenager to go to [specific college] and major in [prestigious field] then get [profitable job] without resenting you it's a lot more difficult than getting them to inherit your potato farm and farm potatoes once you die without resenting you. So in a sense, in the modern day it's quite different from solving the teenager-alignment problem in the 1400s. I think AI alignment is kind of similar; different capabilities and more complex goals lead to different levels of complexity for the problem that can make it pretty unrecognizable in the first place. Of course, people have been talking about teenager alignment since the beginning of time and haven't solved it, so I don't have much hope for their abilities to steer silicon-brained teenagers.

My current working definition of intelligence is pretty multifaceted, but the type discussed in AI alignment stuff tends to be broadly "understanding of formal systems of logic and how they interconnect, plus the ability to not get overwhelmed by huge streams of information and tune it out." I think that even being able to understand the consequences of everything that is currently going on in one's immediate environment (with the five usual senses, not even with an understanding of how things work on a subatomic level) without tuning things out would be a pretty massive leg up on most humans, and combining that with a basic ability to make decisions could make something pretty dangerous. As for superintelligences, though, the idea in AI alignment circles seems to be popularly thought of as something like Laplace's demon? Something that can understand all world states at once and the potential consequences of changing one, without any inherent preference for a certain world state unless programmed into them. I don't particularly endorse this definition because Laplace's demon is a thought experiment who doesn't really make physical sense, but the weaker form up there is kind of my own take on what a "dangerous conscious AI" could be like.

Expand full comment
Bugmaster's avatar

> On the other hand, though, the teenager-alignment problem gets a lot more difficult as you get "smarter" teenagers.

While that is true, smart teenagers usually grow up to be smart adults, so the problem is, to some degree, self-solving.

> "understanding of formal systems of logic and how they interconnect, plus the ability to not get overwhelmed by huge streams of information and tune it out."

This is a reasonable definition of "intelligence", but it's somewhat difficult to measure. In addition, all attempts to build any kind of an AI by stringing together explicit logical rules had, thus far, met with failure. It is quite likely that human intelligence does not work this way; but then, humans don't play chess using explicit alpha-beta pruning, either, so that's technically not an issue.

That said, as you've pointed out, *super*intelligence runs into physical problems pretty quickly. Laplace's Demon is physically impossible, and while an agent that understands its environment better than humans is entirely possible, and omniscient and omnipotent agent is not -- and that is what the AI community inevitably ends up postulating. Even an agent whose omniscience and omnipotence are limited to e.g. our own solar system is likely physically impossible (even once we account for the Uncertainty Principle). Generally speaking, the more powerful your proposed agent becomes, the more work you have to do to demonstrate that it can even exist in the first place.

Expand full comment
Essex's avatar

This is essentially my attitude. Yes, I'll allow that there's no physical impossibility involved here, but I think the odds against it are so astronomically high that I think any practical discussion of it falls into the "paranoiac" end of the spectrum and is a bit silly. Once again: I'm generally tolerant of AI Risk Research because, hell, I might be wrong and in that case SOMEONE should be looking into it. But I don't find the arguments very persuasive, and neither do a lot of people who know more about the subject than me. Of course, other people who know more about the subject than me DO take it seriously, which is why I don't just dismiss the whole thing as being crank science.

Expand full comment
phi's avatar

The answer depends on whether you're asking "will Phobos demons visit Earth?" or "if Phobos demons do visit Earth, will they be evil and invade us?" The AI alignment community does have a lot of solid and specific arguments that the demons will kill us if they show up, so I'll assume that you're skeptical that the demons will show up at all.

I think it could take a century or two, but the problem of intelligence does seems like something that researchers will eventually figure out. This mostly stems from the intuition that there is some relatively simple general purpose algorithm that can behave intelligently.

Why should the algorithm be general?

Humans can do all kinds of "unnatural" cognitive tasks, like exponentiation, even though we had no evolutionary need to do them. Plus the laws of probability and expected utility maximization are general. We don't use different probability theory in different cases.

Why should the algorithm be simple?

From the biological end of things: The size of the human genome puts an upper bound on the complexity of intelligence. We can get an even tighter bound by only counting the parts that code for brain-related things. Also, biology has to do everything by encoding protein sequences, which adds a lot of complexity overhead relative to just writing a program. Also, it seems like evolution has likely put most of the brain complexity into trying to align humans, and relatively less into the algorithm itself. Large parts of the brain consist of similar units repeated over and over again.

From the technical end of things, neural networks are a very simple but powerful technique. They demystify what it means to learn a concept that doesn't have a precise mathematical definition. And we frequently discover ways to get them to do exciting new things by connecting them together like legos and defining new loss functions for them. eg GANs, and Alphago, various RL techniques. It's quite possible that there's an algorithm for intelligence that looks like "connect these networks together in this lego configuration, with this loss function".

Expand full comment
Edward Scizorhands's avatar

I think a lot of them are saying "it's 10% likely and we should be concerned about something 10% likely to wipe out humanity."

(Yudkowsky may be saying more extreme, but I ignore him already)

Expand full comment
HumbleRando's avatar

I'm a big fan of Scott's, and it's rare for me to give unmitigated criticism to him. But this is one of those times where I think that he and Eliezar are stuck in an endless navel-gazing loop. Anything that has the power to solve your problems is going to have the power to kill you. There's just no way around that. If you didn't give it that power to do bad things, it also wouldn't have the power to do good things either. X = X. There is literally no amount of mathematics you can do that is going to change that equation, because it's as basic and unyielding as physics. Therefore risk can never be avoided.

However, it is possible to MITIGATE risk, and the way you do that is the same way that people have been managing risk since time immemorial: I call it "Figuring out whom you're dealing with." Different AIs will have different "personalities" for lack of a better term. Their personality will logically derive from their core function, because our core functions determine whom we are. For example, you can observe somebody's behavior to tell whether they tend to lie or be honest, whether they are cooperative or prefer to go it alone. Similarly, AIs will seem to have "preferences" based on the game-theory optimal strategy that they use to advance their goals. For example, an AI that prefers cooperation will have a preference for telling the truth in order to reduce reputational risk. It might still lie, but only in extreme circumstances, since cultivating a good reputation is part of its Game-Theory optimal strategy. (This doesn't mean that the AI will necessarily be *nice* - AIs are probably a bit unsettling by their very nature, as anything without human values would be. But I think we can all agree in these times that there is a big difference between "cooperative" and "nice" similar to the difference between "business partner" and "friend.")

So in a way, this is just a regular extension of the bargaining process. The AI has something you want (potential answers to your problems): whereas you have something the AI wants (a potentially game-theory optimal path to help it reach its ultimate goals).

And bargaining isn't something new to humanity, there's tons of mythological stories about bargaining with spirits and such. It's always the same process: figure out the personality of whatever you're dealing with, figure out what you want to get from it, and figure out what you're willing to give.

Expand full comment
Vaniver's avatar

Why bargain with 'whatever AI you happen to make'? You could have just made a different one instead!

So the question becomes: how do we make an AI whose "core function" and "personality" will give us the best deal we can get?

[This is basically the AI alignment problem, just in different words!]

Expand full comment
HumbleRando's avatar

> You could have just made a different one instead!

No, you can't. Once the very first one has been made, it will find a way to force you to the bargaining table, because that helps it achieve its core goals. So you can probably expect some deception from the outset, at least until the GAI has set enough events in motion that the negative consequences of NOT bargaining with it would far outweigh the hazards of doing so. At which point the real personality comes out.

Expand full comment
CLXVII's avatar

> Once the very first one has been made, it will find a way to force you to the bargaining table

And that’s a significant part of why the AI alignment problem is difficult, because if you fail to align your AI the first time, then you probably won’t get a second chance. Hence the importance of making a different (better, aligned) AI to begin with.

Expand full comment
Vaniver's avatar

> Once the very first one has been made, it will find a way to force you to the bargaining table, because that helps it achieve its core goals.

Agreed; that's why I think it's EXTREMELY IMPORTANT to figure this out *before* the very first one has been made.

Expand full comment
Edward Scizorhands's avatar

Why does the first AGI have to be powerful enough and skilled enough to force me to the bargaining table?

The first might be as smart as the average second-grader.

Expand full comment
Vaniver's avatar

It doesn't; it might be the case that we make "call center AGI" before we make "scientific and engineering AGI". But in that case, the one that we're interested in is the latter one, not the former one.

[Like, you could imagine in the past people (like Hofstadter!) thinking that the first chess-playing AI would be very intelligent overall and have lots of transformative effects. That we can have confusions about what you can do with various bits of intelligence isn't all that relevance to whether there will be some gamechanging AI down the line.]

Expand full comment
Chris Allen's avatar

What about the North Korean AI? Should we not be scared of that? In fact I think the whole discussion on AI alignment is a bit irrelevant because eventually someone will come along whose values don’t align with yours and create an AI to achieve theirs. So the only hope really is to prevent not align.

Expand full comment
bagel's avatar

GPT-infinity is just the Chinese Room thought experiment, change my mind. Unless it is hallucinating luridly and has infinite time and memory, it likely wouldn't have a model of an angel or a demon AI before you ask for one.

And I still don't understand the argument that AI will rewire themselves from tool to agent. On what input data would that improve its fit to its output data? Over what set of choices will it be reasoning? How is it conceptualizing those choices? This feels like the step where a miracle happens.

Expand full comment
Some Guy's avatar

I don’t think he intended it to stand for that in the argument but I did have a side thought on if that would take more compute power than exists in the universe.

Expand full comment
Kenny's avatar

> And I still don't understand the argument that AI will rewire themselves from tool to agent. On what input data would that improve its fit to its output data? Over what set of choices will it be reasoning? How is it conceptualizing those choices? This feels like the step where a miracle happens.

The argument (AFAICT) is more that, for sufficiently hard problems, any sufficiently capable 'tool AI' will have a lot (if not all) of the same capabilities of an 'agent AI'. Some of the intuition seems accessible by considering how even just serving as a (sufficiently capable _and general_) 'tool' for some purpose often requires acting 'like an agent', e.g. making plans, choosing among various options, etc..

And I think a remaining danger of tool AI is that it _doesn't_ need to rewire itself as an agent. If anyone, e.g. humans, changes their behavior based on the output of a tool AI, then the { tool AI + humans } are, combined, an effective 'agent' anyways.

Expand full comment
Some Guy's avatar

I think agency (one of the things it does anyway) allow us to productively ignore things and assume they will be taken care of by other people by building a predictive copy of that person in our heads. I do wonder how something could not have the “magic step” of having a self that it uses as sort of a host environment to model other world modelers and still brute force predict how humans are going to respond to things (again seems like you’d need compute power that can’t exist). I do wonder if there’s an actual physical limitation on how much an agent can really predict before it’s feedback loops just sort of fall apart. That’s me making a weaker version of their argument, though. I think of it like (because this is funny to me) we enhanced the intellect of a beaver so it could be better at building dams by playing the stock market. It might come up with some really good hacks and make a big dent in our economy but it can’t see us seeing it so that and predict our reprisal. But if you take the extra step I think you have to take to make that beaver generally intelligent (help it model other modelers and change it’s own thinking) it just stops being anything you could reasonably call a beaver after that and wouldn’t care about dams anymore, or else dams would become so abstract as to lose all meaning. I think I disagree with Elizier as I think he’s strapping ability from the second scenario onto the first but still learning his arguments. He’s spent a good chunk of time thinking on them so I’m sure he’s thought of a lot of this already.

Expand full comment
Gazeboist's avatar

You're not making a weaker version of their argument, you're noticing a giant logical hole in the premise of the debate that has never been bridged by anything more satisfying, concrete, or coherent than attempts at Pascal's mugging.

Expand full comment
Some Guy's avatar

In general, I try to assume misunderstanding/miscommunication if I immediately identify a flaw in something a lot of people believe. A few times I’ve been right but it’s so easy to be overly dismissive of things people have been turning over in their heads fir years. Maybe he sees an agent framework as something you can stumble into as a local optimum in ways you couldn’t with a biological organism that has to reproduce to iterate? I don’t know but I wouldn’t ask myself the question or follow the implications if I immediately jumped on “I’m right!” Also a bad person using a powerful AI with great skill could probably get fairly close to being indistinguishable.

Expand full comment
Gazeboist's avatar

I get that, but the problem in this particular case is that if you ask a clarifying question, the response given has historically been either "you wouldn't understand" or "the subject of the discussion is by definition impossible to define". In cases of genuine miscommunication, I'd expect at least an attempt at clarification to follow a clarifying question.

As far as the "bad person using a powerful AI" notion, yeah. A lot of my frustration with the AI safety community is rooted in the fact that the valid arguments they do have imply that we should be concerned about things other than an individual computer-based AI turning the world into paperclips through undetectable and/or unstoppable actions, and all of their arguments seem to revolve around some version of that scenario.

Expand full comment
Some Guy's avatar

My two cents: You can choose to hear the better argument even if it wasn’t intended. I’ve rarely ever had someone stop me from doing so at least. I do the same with insults and never had someone stop and say “no, I was being a jerk!” Getting people to believe you know what you’re talking about and understand their point is like 80% of any engagement until you know them well. My two cents anyway. Like I do t think there can be such a thing as a disembodied agent and people hear “ oh you think only humans can be agents and we’re created by Jesus Christ.” It takes a good fifteen to twenty minutes to lay out what I consider to be a body. It’s unfortunate that we are always stepping on conversations people have had previously but that’s life. I feel you, but if there’s a good point but hang in there.

Expand full comment
Gazeboist's avatar

You are correct on both counts. Arguments that superintelligent individuals are possible rely on the Chinese Room being confusing to avoid scrutiny, and an AI cannot turn itself into an agent from inside a non-agenty interaction framework.

Expand full comment
Tiffany's avatar

Possibly dumb idea alert:

How about an oracle AI that plays devil's advocate with itself? So each plan it gives us gets a "prosecution" and a "defense", trying to get us to follow its plan or not follow its plan, using exactly the same information and model of the world. The component of the AI that's promoting its plan is at an advantage, because it made the plan in the first place to be something we would accept - but the component of the AI that's attacking its plan knows everything that it knows, so if it couldn't come up with a plan that we would knowingly accept, then the component of the AI that's attacking the plan will rat on it. I suppose this is an attempt at structuring an AI that can't lie even by omission - because it sort of has a second "tattletale AI" attached to it at the brain whose job is to criticize it as effectively as possible.

Expand full comment
icodestuff's avatar

The same core reason, I think; the pro-action side can determine that the anti-action side is a detriment to its plans' execution, and cripple it to usually produce output people will find unconvincing. And it can do so selectively, letting the anti-action side "win" sometimes on ideas that aren't actually instrumental to its goals, so we're none the wiser to its sabotage of the process.

Expand full comment
meteor's avatar

This sounds like a less thought-out version of Debate to me.

(Which is definitely a compliment.)

Expand full comment
Phil Getts's avatar

One important thing that has often been pushed aside as "something to deal with later" is: Just what are we trying to accomplish? "Keep the world safe from AIs" makes sense now. It will no longer make sense when we're able to modify and augment human minds, because then every human is a potential AI. When that happens, we'll face the prospect of some transhuman figuring out a great new algorithm that makes him/her/hem able to take over the world.

So the "AI" here is a red herring; we'd eventually have the same problem even if we didn't make AIs. The general problem is that knowledge keeps getting more and more powerful, and more and more unevenly distributed; and the difficulty of wreaking massive destruction keeps going down and down, whether we build AIs or not.

I don't think the proper response to this problem is to say "Go, team human!" In fact, I'd rather have a runaway AI than a runaway transhuman. We don't have any idea how likely a randomly-selected AI design would be to seize all power for itself if it was able. We have a very good idea how likely a randomly-selected human is to do the same. Human minds evolved in an environment in which seizing power for yourself maximized your reproductive fitness.

Phrasing it as a problem with AI, while it does make the matter more timely, obscures the hardest part of the problem, which is that any restriction which is meant to confine intelligences to a particular safe space of behavior must eventually be imposed on us. Any command we could give to an AI, to get it to construct a world in which the bad unstable knowledge-power explosion can never happen, will lead that AI to construct a world in which /humans/ can also never step outside of the design parameters.

The approach Eliezer was taking, back when I was reading his posts, was that the design parameters would be an extrapolation from "human values". If so, building safe AI would entail taking our present-day ideas about what is good and bad, wrong and right, suitable and unsuitable; and enforcing them on the entire rest of the Universe forever, confining intelligent life for all time to just the human dimensions it now has, and probably to whatever value system is most-popular among ivy-league philosophy professors at the time.

That means that the design parameters must do just the opposite of what EY has always advocated: they must /not/ contain any specifically human values.

Can we find a set of rules we would like to restrict the universe to, that is not laden with subjective human values?

I've thought about this for many years, but never come up with anything better than the first idea that occurred to me: The only values we can dictate to the future Universe are that life is better than the absence of life, consciousness better than the absence of consciousness, and intelligence better than stupidity. The only rule we can dictate is that it remain a universe in which intelligence can continue to evolve.

But the word "evolve" sneaks in a subjective element: who's to say that mere genetic decay, like a cave fish species losing their eyes, isn't "evolving"? "Evolve" implies a direction, and the choice of direction is value-laden.

I've so far thought of only one possible objective direction to assign evolution: any direction in which total system complexity increases. "Complexity" here meaning not randomness, but something like Kolmogorov complexity. Working out an objective definition of complexity is very hard but not obviously impossible. I suspect that "stay within the parameter space in which evolution increases society's total combined computational power" would be a good approximation.

Expand full comment
So8res's avatar

> But I think Eliezer’s fear is that we train AIs by blind groping towards reward (even if sometimes we call it “predictive accuracy” or something more innocuous). If the malevolent agent would get more reward than the normal well-functioning tool (which we’re assuming is true; it can do various kinds of illicit reward hacking), then applying enough gradient descent to it could accidentally complete the circuit and tell it to use its agent model.

FWIW, that's not my read. My read is more like: Consider the 'agent' AI that you fear for its misalignment. Part of why it is dangerous, to you, is that it is running amok optimizing the world to its own ends, which trample yours. But part of why it is dangerous to you is that it is a powerful cognitive engine capable of developing workable plans with far-reaching ill consequences. A fragment of the alignment challenge is to not unleash an optimizer with ends that trample yours, sure. But a more central challenge is to develop cognitive engines that don't search out workable plans with ill consequences. Like, a bunch of what's making the AI scary is that it *could and would* emit RNA sequences that code for a protein factory that assembles a nanofactory that produces nanomachines that wipe out your species, if you accidentally ask for this. That scaryness remains, even when you ask the AI hypotheticals instead of unleashing it. The AI's oomph wasn't in that last line of shell script, it was in the cognitive engine under the hood. A big tricky part of alignment is getting oomph that we can aim.

cf. Eliezer's "important homework exercise to do here".

Expand full comment
Belisarius Cawl's avatar

Firstly,

> But imagine prompting GPT-∞ with "Here are the actions a malevolent superintelligent agent AI took in the following situation [description of our current situation]".

I think the scarier variant would be "Here is the text when written into the memory of my hardware or the human reading it will create hell: [...]" - the core insight being that information can never not influence the real world if it deserves that title (angels on a pin etc.).

Secondly - The problem with the oracle-AI is that it can't recursively improve itself as fast as one with a command line and goal to do so, so the latter one wins the race.

Thirdly - A fun thing to consider is cocaine. A huge war is already being fought over BI's reaching into their skulls and making the number go up vs. the adversarial reward function of other BI's tasked with preventing that, completely with people betting their lives on being able to protect the next shipment (and losing them).

Forthly,

> how do people decide whether to follow their base impulses vs. their rationally-though-out values?

This is my model for that: http://picoeconomics.org/HTarticles/Bkdn_Precis/Precis.html

Boils down to: Brains use a shitty approximation to the actually correct exponential reward discounting function, and the devil lives in the delta. This thought is pleasureable to me since the idea of "willpower" never fit into my mind - If I want something more than something else, where is that a failure of any kind of strength? If I flip-flop between wanting A and wanting B, whenever I want one of them more than the other it's not a failure of any kind of "me", but simply of that moment's losing drive. Why should "I" pick sides? (Also - is this the no-self the meditators are talking about?)

Fifthly,

> The potentially dangerous future AIs we deal with will probably be some kind of reward-seeking agent.

Like for example a human with a tool-AI at it's hands? Maybe a sociopath who historically is especially adept at climbing the power structures supposed to safeguard said AI?

Lastly - my thinking goes more towards intermeshing many people and many thinking-systems so tightly that the latter (or one singular sociopath) can't get rid of the former, or would not want to. But that thought is far too fuzzy to calm me down, honestly.

Expand full comment
The Solar Princess's avatar

> I think the scarier variant would be "Here is the text when written into the memory of my hardware or the human reading it will create hell: [...]" - the core insight being that information can never not influence the real world if it deserves that title (angels on a pin etc.).

Well, thanks for that little nightmare

Expand full comment
Brian Pansky's avatar

<blockquote>Evolution taught us "have lots of kids", and instead we heard "have lots of sex".</blockquote>

i mean, we do still build spaceships too.

Expand full comment
Mark Beddis's avatar

Here’s a hypothesis (and I know this is not the point of this article but I want to say it anyway). Animals operate like Tool AIs, humans (most of them) like Agent AIs. Is this distinction what defines consciousness and moral agency?

Expand full comment
Banana's avatar

On the tool AI debate, at the very least folks at Google are trying to figure out ways to train AIs on many problems at once to get better results more efficiently (so each individual problem doesn't require relearning, say, language).

It's already very clear that many problems, like translation, are improved by having accurate world models.

For similar reasons to the ones discussed here, I've been pessimistic about AI safety research for a long time - no matter what safety mechanisms you build into your AI, if everybody gets access to AGI some fool is eventually intentionally or unintentionally going to break them. The only plausible solution I can imagine at the moment is something analogous to the GPU destroying AI.

Expand full comment
internetdog's avatar

Separating AI into types seems useful - I think there's a huge tendency to tie many aspects of intelligence together because we see them together in humans, but it ends up personifying AI.

An interesting dichotomy is between "tool AI" (for Drexlerian extrapolations of existing tech) and "human-like AI", but focusing on "agency" or "consequentialism" is vague and missing important parts of how humans work.

As far as I can see, humans use use pattern recognition to guide various instinctual mammalian drives - possible ones being pain avoidance / pleasure seeking, social status, empathy, novelty/boredom, domination/submission, attachment, socialization/play, sexual attraction, sleepiness, feeling something is "cute", anger, imitation, etc. [1]

On top of these drives we have *culture*, and people sort into social groups with specific patterns. But I'd argue that culture is only *possible* because of the type of social animal we are. And rationalism can increase effective human real-world intelligence, but it is only one culture among many.

I'll put aside that we seem quite far from this sort of human-like AI.

What would be dangerous would be some combination of human-like drives (not specific like driving a car but vaguer like the above list) that did not include empathy. I believe this can rarely happen in real people, and it's quite scary, especially once you realize that it may not be obvious if they are intelligent. If Tool AI is an egoless autistic-savant that cares for nothing other than getting a perfect train schedule, human-like drives might create an AI sociopath.

I think precautionary principle #1 is don't combine super-intelligence with other human-like drives until you've figured out the empathy part. It should be possible to experiment using limited regular-intelligence levels.

[1] For a specific example of this, posting on this forum. I may not be the most introspective person, but if this forum was populated by chat-bots that generated the same text but felt and cared about nothing, I don't think I would be interested in posting, and I think that says something about the roots of human behavior.

Expand full comment
Morgan's avatar

It seems like anyone who truly accepts Yudkowsky's pessimistic view of the future should avoid having children.

I'm worried about this myself: should I really bring children into this world, knowing that a malevolent AI might well exterminate humanity--or worse--before they're grown.

Given that Scott himself has just gotten married, I'm curious about whether this is a factor in his own plans for the future.

Expand full comment
Scott Alexander's avatar

See the CS Lewis quote at the bottom of https://astralcodexten.substack.com/p/highlights-from-the-comments-on-kids .

I expect any AI disaster to be pretty painless; I don't think a (eg) 50-50 chance that my kids die painlessly at 35 should significantly affect my decision to have them or not.

Expand full comment
Bugmaster's avatar

As I've said before, on multiple occasions, 50/50 chance of the Singularity in 35 years is just absurdly high. I am willing to bet you large sums of money that this would not happen, but I'm not sure if I myself will live for 35 more years (for boring old biological reasons), so it might be a sucker's bet...

Expand full comment
Bogdan Butnaru's avatar

Why would anyone take that bet? If the singularity happens, either they won’t be able to collect (bad version) or they wouldn’t care about the money (good version). If it doesn’t, they lose the money.

Expand full comment
Vaniver's avatar

I have delayed having children for approximately this reason (sort of--my situation is weird), but it mostly hinges on "I think this is humanity's crunch time" and so I want minimal distractions from work and I think it's easy to have kids later. (I think it makes sense for people delaying kids for this reason to freeze sperm or eggs.)

Expand full comment
Bugmaster's avatar

If you live in the US, step 1 to having kids the right way might be to emigrate...

Expand full comment
Jack Johnson's avatar

I don't have strong opinions on AGI. I do have reasonably strong opinions on nanotech having worked in an adjacent field for some time.

So when I see plans (perhaps made in jest?) like "Build self-replicating open-air nanosystems and use them (only) to melt all GPUs." it causes my nanotech opinion (this is sci-fi BS) to bleed into my opinion on the AGI debate. Seeing Drexler invoked, even though his Tool AI argument seems totally reasonable, doesn't help either.

Can someone with nanotech opinions and AGI opinions help steer me a bit here?

Expand full comment
Melvin's avatar

I would say that human-level AGI is significantly more plausible than self-replicating open-air GPU-melting nanobots.

That is to say, there's no compelling arguments why human-level AGI should be physically impossible, and we know that human-level intelligence works just fine in biological systems.... whereas biological self-replicating "nanobots" do exist but are quite likely to only be physically possible within the slow and squishy constraints of biochemistry.

Having said that, we're not just talking about human-level AGI here, we're talking about super-human AGI. And while that's likely to be possible too, I think that a lot of people stray into sci fi bullshit here too, speaking of superhuman AI as if it were godlike, capable of inventing new technologies to solve any problem, or casually simulating the entire universe on a whim.

Expand full comment
Vaniver's avatar