Astral Codex Ten

Comment deleted

Expand full comment

Good point

Expand full comment

e-tp-hy

AI doesn't have to run off consistent logic, just navigate an extrapolated probabilistic series of states sufficiently well which seems good enough for our minds. And then something like the commutativity example there is most likely a consequence of massive parallelism, same as we can read scarmbedl owrsd relatively easily.

Expand full comment

That is incoherent nonsense. Godel just means you can't represent certain truths about a system in the SAME system. It says nothing about other systems being able to represent those truths. Hofstadter debunked Penrose's nonsense a generation ago.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

JohanL

Wikipedia: "Hofstadter was born in New York City to Jewish parents: Nobel Prize-winning physicist Robert Hofstadter and Nancy Givan Hofstadter.[11] He grew up on the campus of Stanford University, where his father was a professor, and attended the International School of Geneva in 1958–59. He graduated with distinction in mathematics from Stanford University in 1965, and received his Ph.D. in physics[3][12] from the University of Oregon in 1975, where his study of the energy levels of Bloch electrons in a magnetic field led to his discovery of the fractal known as Hofstadter's butterfly.[12]"

Expand full comment

Professor of Cognitive Science and Comparative Literature, if credentials are important to you. I haven't read the specific rebuttal, but I read _Emperor's New Mind_ and found it unconvincing and confused, for the same reasons as Drake did.

Expand full comment

If I hadn't read the debate, or if I was unable to follow it, then I probably *would* side with the credentials: if Roger Penrose and Douglas Hofstadter disagreed about the nature of black holes, I would assume Penrose was correct, Hofstadter was confused, and stop worrying about it.

If they disagreed on the nature of cognition, I would assume the reverse.

Expand full comment

Sinity

I remember Penrose making confident arguments about brain relying on quantum mechanics.

My impression is that he just doesn't want AI to be possible.

Expand full comment

REF

Everything relies on quantum mechanics. I seem to recall a Feynman physics lecture where he goes on and on explaining everything from mirrors to tennis balls bouncing from a quantum mechanical outlook.

Expand full comment

The specific claim is microtubule structures in the neurons are doing something spooky. This was an area of research that was being pursued in the 80s, and he's been beating the same drum while the mainstream has moved on. I might look into the history and write up an effortpost like Maxander. https://en.wikipedia.org/wiki/Talk:Microtubule#Removing_Penrose/Moving

Expand full comment

Yes, but further developments seem to show that any system strong enough to represent and prove the theorems that Godelian incompleteness provides to one theory is, itself subject to Godelian incompleteness. So you end up with an infinite pyramid of systems, each of which can be resolved in at least two ways. I no longer have the class text, but perhaps this will suffice: https://math.stackexchange.com/questions/1999372/omega-inconsistency-and-omega-incompleteness

Expand full comment

Dweomite

So? This doesn't obviously save Penrose's argument (as described by Eugene Norman)?

Some statement X is unprovable in system Y, but humans "know" X is true. That could be because humans are doing something weird, OR it could just be that humans are reasoning by boring logical rules using some other system, Z.

By Godel, system Z will also have statement W that's unprovable within system Z. But so what? Do humans know W is true? Godel only proves that W exists, not anything about what humans know about it.

In order to establish that humans know something that is unprovable *in the actual logical system that the human in question is using*, it seems to me you'd need to precisely pin down exactly what that logical system is, which AFAIK has not been done by neuroscience to date.

Expand full comment

Sleazy E

Nah. Penrose remains correct.

Expand full comment

vtsteve

This explains *so* much...

Expand full comment

Drake Thomas

>Gödel’s incompleteness theorem shows certain mathematical truths that humans “know” to be true can’t be proven algorithmically

This is not what the incompleteness theorems state. *Given* some formal proof system and a finite (or algorithmically-generatable) list of axioms which is sophisticated enough to talk about basic arithmetic, there exist sentences which can't be proven using those axioms, and no algorithm can correctly evaluate the provability of arbitrary sentences in the system. There isn't one particular statement that has the intrinsic property of being "unprovable" - proof is relative to some set of axioms and deduction rules.

What it feels like from the inside to look at a Gödel sentence relative to your own cognitive algorithm, if you were able to read a specific piece of code and understand that your brain was equivalent to running it, is that the sentence says "the algorithm simulating this brain will, when shown this sentence, output False". "Know" to be true whatever you like, the answer you give will inevitably be incorrect.

Are there any specific verifiable tasks, like consuming our future lightcone with paperclips, you think require ineffable non-algorithmic thought to do and we shouldn't worry about? What's the least impressive task that AI is intrinsically incapable of?

Expand full comment

Indeed. If there are mathematical truths that humans "know" to be true, but can't be proven, then what are they? Can anyone name one?

(Goldbach's conjecture or similar doesn't count; we don't "know" it to be true, we just suspect that it probably is because we've checked an awful lot of numbers and haven't found any counterexamples yet.)

Expand full comment

Carl Pham

I don't think that's the argument. I think the argument is that no machine made of a trillion or so cells, interconnected, but fully describable by a deterministic (i.e. classical) mechanics would ever *come up* with Goldbach's conjecture in the first place. The goal is not to explain mystical knowing (so the provability of Goldbach's conjecture is not important), but to explain imagination -- why would any human being ever have thought of this (or any) new idea out of the blue in the first place, since no one has yet figured out a way to write an algorithm for coming up with new ideas?

Well...*useful* new ideas, in an efficient way. You could certainly write a program to generate random new mathematical conjectures (in principle), but like the monkeys on the typewriters producing Shakespeare, the argument is that it could run for a million years before it ever came up with anything "interesting" (like Goldbach's conjecture) -- which suggests this is not how human imagination works. That is, it seems dubious (at least to Penrose, and many others) that the human brain simply randomly walks through infinite idea space and tries everything out. That space is just too large, even if we restrict it to mathematics, or even if we restrict it to mathematics in the neighborhood of what's already known.

What does human imagination do then? My impression is that this is the problem Penrose wants to pin on wavefunction collapse, because that's the only thing he knows that can funnel a very large amount of information into a conclusion in something of a superdeterministic way, all at once, without any kind of Turing-machine equivalent step-by-step reduction of complexity along the way. His major problem is that imagination (as he defines it) *can't* be a standard quantum computing problem, because standard quantum computing problems are still amenable to classical calculation, and he's arguing the human mind does stuff that cannot be classical at all.

So as I understand it -- and I'm just speculating because I haven't read his book -- he has to argue that there is some process by which an individual neuron can have some "wavefunction" in "idea space" that is then collapsed "directionally" via some unknown new physics that guides the collapse onto the desired goal -- a new idea that is nontrivial and interesting. I suppose he imagines it has to be at the single neuron level because it has to avoid decoherence long enough to be useful, so it's got to be atomic size. Why he locates it in the microtubule I have no idea, but I doubt that's essential to the idea.

Expand full comment

J'myle Koretz

May 25, 2022Edited

Who among us doesn't have the sneaking suspicion that the microtubule does something powerful and unique

Expand full comment

" ... some set of axioms and deduction rules."

The Correspondence Theory of Truth. Truth is a fundamental property. In the correspondence theory of truth, a truth is a statement corresponding to some property of the real world. Its an interesting read on plato.stanford.edu.

Expand full comment

Viliam

How would you apply the Correspondence Theory of Truth to statements such as: "This sentence is false"?

If self-referential sentences are not allowed, how about this:

"The sentence you obtain by taking the string 'The sentence you obtain by taking the string * and replacing the asterisk with a quotation of that string is false.' and replacing the asterisk with a quotation of that string is false."

Expand full comment

Well ... for starters, this example is a fine example of unbounded monotonic recursion ... which has the fine property of infinite word-salad expansion, yet which lacks the fine property of making a concise statement.

Expand full comment

Jun 11, 2022

Saul Kripke problematizes this response. Consider a sentence like, “the sentence at the top of p.143 of Kripke’s paper on truth is false”. That sentence is perfectly true in many cases or false in many cases depending on what sentence is at the top of p. 143. But if that very sentence happens to be at the top of p. 143, then there’s a paradox. Whether the sentence makes a statement or not then depends on some contingent facts about the world outside the sentence, which at least seems odd for a theory of language.

And it gets worse. If the sentence said “the sentence at the top of p. 143 doesn’t make a meaningful statement”, then it looks like the sentence is actually *true*, at first, and then you realize it’s another paradox.

Expand full comment

Michael Druggan

Jun 25, 2022Edited

They way I like to think about it is that the correspondence between linguistic sentences and semantic meanings is like the relationship between variables and mathematical equations.

The equation x=2 is a perfectly fine way of defining the value of x. The equation x+3 = 2x+1 is self referential but in the end also a perfectly cromulent way of defining the same value. But if you try to define x by the equation x=x+1 you've got an issue

Expand full comment

> There isn't one particular statement that has the intrinsic property of being "unprovable" - proof is relative to some set of axioms and deduction rules.

We do know of several statements with the intrinsic property of being unprovable within the systems we use.

There are no implications to that fact - you can just choose whatever truth-value you like for them - but it is a known property of certain known statements.

Expand full comment

Drake Thomas

Right, hence "relative to some set of axioms and deduction rules". The Continuum Hypothesis is independent _of ZFC_, but not of ZFC plus the axiom "CH is true". There are not statements that are independent of all axiom systems.

Expand full comment

Maxander

Aside from the complex points about the precise meaning of Godel's theorem, a broader reason no one cares about Penrose's argument nowadays is that it was addressed to a much older conception of "AI" - one which shares almost precisely nothing with DL-based AI aside from "they both run on a machine called a 'computer.'"

In the days of SHRDLU and Coq, serious efforts at AI development revolved around (to simply greatly) developing systems of propositional logic processing that could perform "general" reasoning. In this case, the non-existence of proofs of certain things directly constrains what the AI can think about, since the AI's "thoughts" are essentially (or are isomorphic to) proofs in logic. If, for instance, you ask SHRDLU if the red block is on top of the green block, it will painstakingly grind through some logical derivation starting from a set of axioms about how propositions like "on top of" and etc work, prior history of its world-of-blocks, and so forth, and it will answer depending on whether it can prove "red block is on top of green block" either true or false [0].

This whole approach hasn't quite been abandoned, but it's very distant from where we now expect GAI to come from. If you ask a GPT-3 character about the state of a block world, possibly something that happens inside the model corresponds to logic, but only in a deeply obscure and fuzzy fashion. Likewise GPT-3 seems equally able to reason about entirely subjective or arbitrary things, so there's no obvious reason why Godel's theorem, which is a result in formal logic, should apply to what it does.

There's a broader argument that could be made (and which, indeed, Penrose probably does make) that Godel's theorem shows there's a limit to what can be done with *any logically consistent model of computation whatsoever*, even if that computation is huge messy tensor backpropagation, and that therefore somehow even GPT-3 would be subject to the same limitation. But then, to argue that humans are any different, you have to believe that we're performing some form of hypercomputation that goes beyond the principles of logic. Whether that's a path you would like to wander down is up to you.

[0] My memory of how SHRDLU works exactly is over a decade old now, so weight this description lightly in terms of particulars.

Expand full comment

It was even wrong at the time. That some statements can't be proven doesn't say that the AI can't be quite general. For that matter, there are lots of things that people haven't proven, so to assert that we can prove all true statements is ill-founded. And it's quite certain that we are able to believe many false statements, but it's quite easy to design such capabilities into an AI based around propositional calculus. What's difficult is to prevent them from therefore deducing anything it wants to. "P and ~P implies Q" has to be ditched, and some more subtle rule used.

Expand full comment

> This whole approach hasn't quite been abandoned, but it's very distant from where we now expect GAI to come from. If you ask a GPT-3 character about the state of a block world, possibly something that happens inside the model corresponds to logic, but only in a deeply obscure and fuzzy fashion.

GPT-3 also isn't able to provide coherent answers to questions about the state of a block world, so I don't see that we expect such a module to come from GPT-3-like approaches.

Expand full comment

Harold

My God, that documentary reminds me of Look Around You. Write that down in your copy book now.

Expand full comment

It's the kind of show that Look Around You was parodying :-)

Expand full comment

Sinity

"can’t be proven algorithmically" -> AI shouldn't need to rely on absolute proofs. ML systems clearly don't - they approximate functions based on examples, basically.

Expand full comment

Doesn't matter - humans don't usually rely on proofs either. Penrose was (incorrectly) claiming that Godel's theorem implies that human brains can't be simulated by computers, as I describe here: https://astralcodexten.substack.com/p/willpower-human-and-machine/comment/6736519

Expand full comment

https://en.wikipedia.org/wiki/Penrose%E2%80%93Lucas_argument

Penrose and Lucas make the same argument using Gödel, and most logicians think they are misusing his results.

My take on it is that Gödel's argument leading us to recognize that the Gödel sentence for Peano arithmetic is true requires that we recognize that Peano arithmetic is (a part of) the formal system we are using. There's no reason that a creature using any particular formal system should be able to recognize that the formal system they are using is in fact the one that they are using. And if you don't have that recognition, then you can't recognize the truth of your own Gödel sentence.

I take Gödel's work to in fact show that no creature can properly recognize some system as the system they are using while being convinced that it is true.

In any case, even if humans *could* do this, then that means that *some* physical process could do this, and therefore an artificial intelligence could too. It just wouldn't be Turing-computational.

Expand full comment

Doug S.

Another interpretation of Godel's Theorem: you can't use first order logic to define arithmetic on the natural numbers precisely enough to rule out everything else. The Godel statement "there is no number that corresponds to a proof of this statement" is indeed true of standard arithmetic, but no matter what axioms you try to use to define arithmetic, you can always come up with a model that does have a "number" that does correspond to a "proof" of the Godel statement.

So no, humans can't really say that the Godel sentence is true - because it doesn't have to be.

Also, all these theorems about what can and can't be proven all require that the system be consistent and free of contradictions - and, as we all know, human reasoning is anything but.

Expand full comment

May 24, 2022Edited

AIUI that's closer to the (older) Löwenheim–Skolem theorem: https://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem. Per Wikipedia "It implies... no first-order theory with an infinite model can have a unique model up to isomorphism." Gödel's first incompleteness theorem is kinda dual to that - strong enough formal systems nail down a lot more than they appear to, in the sense that there are statements which are true in every model of the theory but which cannot be proved within the theory. The whole point of the Gödel sentence is that it's such a statement - true in every model, but not provable without stepping outside the formal system. EDIT: no, this paragraph is wrong, thanks to Doug S for the correction.

> Also, all these theorems about what can and can't be proven all require that the system be consistent and free of contradictions - and, as we all know, human reasoning is anything but.

An inconsistent system can prove anything, so that's easy :-) But more seriously, Penrose wasn't talking about high-level human reasoning, he was talking about the low-level wetware on which the high-level system runs. His claim was (from memory, it's been a long time since I read the book):

1. Humans can prove Gödel's theorem.

2. If our brains run on Turing machines, we can identify our hardware's Gödel sentence (as others have pointed out, this is his mistake).

3. Combining 1 and 2, we can recognise the truth of our hardware's Gödel sentence. But this is a contradiction!

4. Therefore, humans do not run on Turing machines.

5. Therefore, human brains must be doing something freaky and quantum (AIUI this step also doesn't work - in principle any problem that can be solved by a quantum computer can also be solved by a classical computer. But possibly he was talking about quantum-gravitic computers with yet-unknown powers).

Expand full comment

Doug S.

https://www.lesswrong.com/s/SqFbMbtxGybdS2gRs/p/i7oNcHR3ZSnEAM29X

Godel's Completeness Theorem says that if something is true in every first-order model of a set of axioms, then there is a first-order proof of it. So if the Godel sentence (and its negation) are both unproveable, there must be models in which it's true and models in which it's false.

Expand full comment

You're right - I was mixing up semantic and syntactic completeness. Thanks!

Expand full comment

quiet_NaN

I would describe the kind of strong AI people worry about as "smart like humans, just better at scaling".

Assuming that humans do not have a Halting Problem Oracle running on metaphysical mental energy hidden in their brain somewhere, brains can in principle be simulated by computers. Also assuming that humans can (in principle) simulate computers with their brains, this puts them on equal footing, with a practical difference being that the time it takes for the number of neurons in a state of the art human to increase by a factor of two is some orders of magnitude larger than the equivalent for computers.

So if a human brain can consider statements in different axiomatic systems and invent new ones on the spot without firmly subscribing to one system consistently, so can AIs.

Even if we had some insights systematically closed to machines (which we emphatically have not), turning that into a practical advantage is another matter. "At least we did glimpse some truth which shall forever evade our creation" is kind of a shitty epigraph.

(While a Halting Oracle might solve AI alignment in the same that a perpetual motion machine would solve climate change, few resources are spent to look for either for obvious reasons.)

Expand full comment

Penrose's claim was precisely that brains cannot in principle be simulated by computers. As others have pointed out, his reasoning for this was fallacious, but we can't totally rule out that his conclusion was correct and brains make use of weird quantum-gravitic physical effects that digital computers can't simulate. On the other hand, we know that humans *can* in principle simulate computers with their brains - computers are implementations of Turing machines, and Turing machines were defined as an axiomatisation of what a human is doing when executing a pencil-and-paper algorithm.

Expand full comment

If you're going down this line of argument, remember that a Turing machine has an infinite tape. And I deny that humans can accurately emulate Turing machines. We even have trouble with C, where everything is limited to finite cases. We can sort of meta-model what a C program will do, but extensive debugging is needed *because* we can't accurately emulate it.

That said, if you consider a synapse the rough equivalent of a microprocessor with a tiny bit of persistent memory, then you CAN make a decent analogy, though nobody's managed a decent model of a neuron that I've heard of. IIRC we don't even understand C. elegans, and that's only got 302 neurons.

IOW, I consider any comparison between human brains and Turing machines to be FAR beyond the available evidence.

Expand full comment

May 25, 2022Edited

> If you're going down this line of argument, remember that a Turing machine has an infinite tape.

That doesn't undermine pozorvlak at all; actually using the infinite tape requires an infinite amount of time. No Turing machine that can have any effect in reality is able to consume an infinite amount of tape; this is formally equivalent to not having an infinite amount of tape.

Symmetrically, given an infinite amount of time, humans are more than capable of using an infinite amount of tape.

Expand full comment

> remember that a Turing machine has an infinite tape

But a physical computer does not.

> We can sort of meta-model what a C program will do, but extensive debugging is needed *because* we can't accurately emulate it.

No, it's because we can't accurately emulate it fast enough to be useful. To accurately emulate the behaviour of a (deterministic) C program, you need to think about how the compiler translates the C into machine code, and how the processor translates machine code into microcode, and how the microcode is executed by the hardware. And you have to do that for every layer of code that your code interacts with, including the operating system and any other programs using the same resources. That's a lot of extra work and a *huge* amount of context to cram into your brain, so we almost never do it and instead rely on simplifications and heuristics. In fact, it's such a large amount of work that many (most?) emulator programs don't try to be cycle-accurate, but only "cycle-approximate" or even "functional". But *in principle* it's possible with enough time and scrap paper, and Penrose's argument is about the "in principle" question rather than the "in practice" one.

> That said, if you consider a synapse the rough equivalent of a microprocessor with a tiny bit of persistent memory, then you CAN make a decent analogy, though nobody's managed a decent model of a neuron that I've heard of.

This is exactly the question - can we make a model of a biological neural network that runs on a digital computer and captures all the essential behaviours (whatever those actually are)? It seems likely, but we don't actually know for sure.

Expand full comment

Let me try again. Because of limited error correction capabilities, viruses have shorter genomes than do plants or animals. They have sufficient errors, that if their code is longer, they can't successfully reproduce.

I assert that people have a sufficiently high error rate that they cannot even reliably create fairly short programs in a rather simple computer language. We aren't optimized for that. Computers are (or can be). So computers will be able to deal with longer chains of logic than will people.

Also people "know" all sorts of things. Some are correct, others aren't. And people disagree about which things that they "know" are true actually are true. So the error rate in the things that people "know" to be true is much higher than zero. That people have some sort of consistent way of generating these "truths" is obviously incorrect.

This implies that people aren't entirely reaching their conclusions via logic or any other infallible approach. Therefore it's quite likely that an approach that requires consistency will not be able to reach many of the conclusions that they reach, but the areas where they agree can be expected to be reasonably large.

The purpose of C. elegans in my post was to demonstrate that even when we know all of the neurons and synapses involved we don't understand what's going on. It's my guess that the system is beyond our level of complexity (in understanding), and that what we can do is understand what parts of it are doing, but that we can't understand the whole system. This *is*, however, a guess. However, given that we don't understand the system, it's unsurprising that we shouldn't be able to predict how it will behave...except sometimes.

So I see NO evidence that we should assume any non-local effects. The closest I've seen to such is a report that something that seems to be a non-local effect is involved in the way chloroplasts absorb light. And there "non-local" is on the order if micrometers or less. (It's a pathway that leads to a photon being absorbed in a section of the molecule that's ready to receive it. Sorry, I couldn't find the article with a quick search.)

Expand full comment

> Also people "know" all sorts of things. Some are correct, others aren't. And people disagree about which things that they "know" are true actually are true. So the error rate in the things that people "know" to be true is much higher than zero. That people have some sort of consistent way of generating these "truths" is obviously incorrect.

Hmmmmm. So if the human brain can exhibit a computation (or equivalently, prove a theorem) that could not be performed on a computer, that doesn't prove humans are more powerful than computers because the brain's outputs are too unreliable for us to trust the proof?

Expand full comment

Everyone else here has excellent and very smart points, but I just want to respond to this:

> a TV show a few years ago

1993 was almost 3 decades ago.

Expand full comment

Sleazy E

Very good point. Human minds are not the same as computers and never will be.

Expand full comment

Sinity

They're not literally von Neumann architecture computers, so in that sense yes, they're not.

Neural networks can be emulated in software, and neuromorphic hardware can be manufactured.

Also, brains are a kind of computer, certainly. They clearly process data.

This is a nice comparison; https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know

IDK, it sounds _pretty computational_.

> The adult brain has on ~2∗10^14 synapses which perform a synaptic computation on order 0.5hz. Each synaptic computation is something equivalent to a single analog multiplication op, or a small handful of ops (< 10). Neuron axon signals are binary, but single spikes are known to encode the equivalent of higher dynamic range values through various forms of temporal coding, and spike train pulses can also extend the range through nonlinear exponential coding - as synapses are known to have the short term non-linear adaptive mechanisms that implement non-linear signal decoding. Thus the brain is likely doing on order 10^14 to 10^15 low-medium precision multiply-adds per second.

> (...)The estimate/assumption of 8-bit equivalence for the higher precision range may seem arbitrary, but I picked that value based on 1.) DL research indicating the need for around 5 to 8 bits per param for effective learning (not to be confused with the bits/param for effective forward inference sans-learning, which can be much lower), and 2.) Direct estimates/measurements of (hippoccampal) mean synaptic precisions around 5 bits. 3.) 8-bit precision happens to be near the threshold where digital multipliers begin to dominate (a minimal digital 8-bit multiplier requires on order 10^4 minimal transistors/devices and thus roughly 10^5 minimal wire segments connecting them, vs around 10^5 carriers for the minimal 8-bit analog multiplier). A synapse is also an all-in-one highly compact computational device, memory store, and learning device capable of numerous possible neurotransmitter specific subcomputations.

Expand full comment

> Also, brains are a kind of computer, certainly. They clearly process data.

Absolutely: the question is, where do they sit in the hierarchy of computational power? Penrose (incorrectly, for reasons explained in other comments) argued that the brain is capable of performing calculations that can't be performed on a computer even in principle, and that therefore brains must lie higher on the hierarchy than Turing machines, and hence must somehow rely on exotic physics.

Expand full comment

D0TheMath

This has been delt with in depth in Godel Escher Bach (1979).

Expand full comment

Doug S.

I've read it. You could teach a university level course in symbolic logic with it as a textbook.

Expand full comment

David Piepgrass

Jun 11, 2022Edited

What a strange video. It makes an assumption that computers can only follow rules that are programmed into them, so they will never be able to learn that AB = BA for all B, A ∈ Reals (as the video shows children being *professionally taught* that AB = BA rather than "figuring it out themselves" per se), or all the ways planes can be tiled or not tiled...

Why not just jump straight to the really difficult human stuff like "making original artwork", "composing music", "understanding humor", "talking like JRR Tolkein" and "learning to play video games without instructions"? I always thought these things would be difficult for AIs (Star Trek TNG expected *some* progress by the 24th century, despite expecting genetically engineered superhumans to arise in the 20th century) but yeah, we've already built (specialist) AIs that do these things, sometimes quite well. Whatever rules the computer had to learn to accomplish these tasks, it learned those rules without being taught (or rather, without the rules being programmed in).

Why, then, could a computer not learn from a human teacher that AB = BA for all B, A ∈ Reals, or how to find mathematical proofs by itself? Each of those tasks seem difficult, but assuming things keep going the way they've been going, it should happen soon OR already happened and I'm unaware of it.

Expand full comment

Maybe later

Unrelated, but this made me wonder how much of therapy is just prompt engineering for humans.

Expand full comment

May 23, 2022Edited

In essence, prompt engineering is cognitive behavioral therapy from my understanding.

Expand full comment

Greg G

"I know I should exterminate humanity, but right now I just want to relax and draw some more pictures of astronauts on horses, ok?!?"

Expand full comment

Moon Moth

I was thinking along the same lines. Bender, from Futurama. He mutters "kill all humans" in his sleep, but he'd rather just get drunk.

Expand full comment

Love this - and therein lies humanity's salvation

Expand full comment

Viliam

Or the other way round.

"I know I am supposed to maintain the systems necessary for the survival of humans, but... just *one* more paperclip."

Expand full comment

Big Worker

I'm not a huge Freud fan but the "Id" "Ego" "Superego" terms all seem pretty helpful in discussing this stuff. *You* are the whole system, with these different forces within you all struggling to have their own preferences enacted, whether that's to accomplish your long term career goals or binge doritos on the couch.

Expand full comment

B Civil

I agree.

Expand full comment

Mark Neyer

Why should “what I am” have a consistent answer?

Expand full comment

Consistency translates to agency, no?

Expand full comment

Mark Neyer

Say more? Is the idea that if i knew “what I was” inside my brain I could use that to better make use of my own brain?

Expand full comment

May 23, 2022Edited

Yes, if we knew what was inside our brains, we could make better use of it. How I understand consistency is as persistant action. I don't know of any rational agent who isn't consistent?

Expand full comment

Reply (4)

Moon Moth

But in a sense, isn't that somewhat circular? What would rational behavior look like for an agent that wasn't consistent? Just because it's easier to determine what rational behavior is for consistent agents, doesn't mean that any given agent has to be consistent.

Expand full comment

Bullseye

I don't know of any rational agent who *is* consistent, because the only rational agents I know are human.

Expand full comment

Are humans not consistently creatures of habit?

Expand full comment

Arbitrarianist

On average maybe, but averages are a lossy abstraction.

Expand full comment

I would think we're the least consistently habitual of all creatures.

For instance, I see black-tailed deer near my home. They follow the same routine day in; day out. We too follow the same routine Monday through Friday; and do something completely different on weekends, holidays, days-off, vacation days, doctor days, sick-days, etc. The deer plod on repeating roughly the same 'groundhog-day' as it were. They may mix it up with location, in order to allow my begonias a few days to recover, but they live the same plan day after day.

Expand full comment

Rationality is about consistency, but no person is perfectly rational or perfectly consistent. When we discover irrationalities or inconsistencies, that discovery doesn't immediately resolve them, any more than when a political community discovers that it has a difference of opinion.

Expand full comment

B Civil

Being a person is a constant negotiation.

Expand full comment

Ahem... I'm right here "~}

Expand full comment

Catmint

May 30, 2022

Wait, how is that supposed to work? A rock is very consistent. Does that make it agentic?

Expand full comment

I am large, I contain multitudes...

Expand full comment

J'myle Koretz

Is there a more beautiful failure than a quest for a one's true and immutable self?

Expand full comment

Roger Sweeny

Sort of related, back in 1987, Robert H. Frank published in the American Economic Review, "If Homo economicus could choose his own utility function, would he want one with a conscience?" The next year came the book, <i>Passions Within Reason: The Strategic Role of Emotions</i>.

Expand full comment

Jason Y

May 31, 2022

That book is very good and I highly recommend it.

Expand full comment

e-tp-hy

I see willpower as willingness to construct a longer loop to extract positive feedback from, which has to be balanced against the metabolic concerns of other, already stable loops and the act of adding entropy into the system while learning. Make that too easy and no habits can be formed or learning can be done, same goes for making it too difficult. It seems to be basically a reward evaluation mechanism limited by biology and its cellular machinery. But for some hypothetical GAI construct, there are likely no similar metabolic concerns and the prior loops can be saved/loaded as needed so I don't see an equivalent there. Could even use something too difficult for evolution to figure out like maybe heuristics to evaluate loop length efficiency for a course of action. And wireheading isn't quite the same - that's just confusing causality after a fashion.

Expand full comment

Mark Crovella

The assignment of functions to variously programmed subsystems (eg innate vs learned, or unconsciously vs consciously learned) varies so much across the animal world, and a lot of the difference seems to be driven by the ecological niche of the organism. So whether this sort of weak-willed AI arises seems like it would be driven a lot by the use case to which we tune the AI.

Expand full comment

What if the "I" module is just the "self" in "self-deception"? i.e., we evolved an "I" with weak control over action *precisely* to support deniable self-serving actions while "sincerely" being ashamed of our weak will and signaling virtue through our sincere intent to comply with societal values?

This seems much more coherent to me, especially since there's no particular reason for a planning module to have a sense of self. (I'm also pretty sure that even when I "consciously" plan things, the heavy lifting is being done by the machinery selecting and prioritizing what options come to my awareness in the first place.)

It would also mean "weak willpower" is an evolved *feature*, not an accidental bug, and far less likely to turn up in an AI unless there's selection pressure for deceiving others about its motives, values, priorities, and likely future actions, through sincere self-deception and limited agency of its part that handles social interaction.

Expand full comment

Yup, this is basically Robin Hanson's take.

Expand full comment

But how often is weak control self-serving? I can see it might be if the subject in question is "Try not to have sex with this woman because she isn't my wife." But if it's: Try to exercise, Try to do my assignment, Try not to just watch TV, drink beer, shoot heroin and binge on Doritos all night -- your weak control isn't likely self-serving.

Expand full comment

Reply (5)

quiet_NaN

It is sufficient that the weak control was self-serving in the ancestral environment. "I was helpless to stop myself from eating through our food reserves" is actually great if you get away with it in a nutrient-starved environment.

Expand full comment

It's interesting to imagine that such an excuse would have been tolerated 20 thousand years ago. I've always imagined that acceptable excuses are cultural. It sure seems these days like a whole new batch of bullshit excuses for various behaviors have sprung up quickly for cultural reasons.

For instance, just a couple generations ago, nobody could claim they were fat because "they couldn't help it" or they "have an eating disorder". If you were fat in the 60s, you were fat because you ate too fucking much. If you claimed you couldn't help it, nobody had sympathy for that.

So I'm supposed to believed that cave men accepted "Sorry I ate the last mastodon, I couldn't help myself."? I suspect that anyone who said that back then was quickly bludgeoned into a red spot on the ground.

Expand full comment

Well SOMEONE ate the last mastodon! No-one is leaving this cave until whoever did it comes clean...

Expand full comment

Michael Druggan

Jun 25, 2022

Or it could be that these things didn't exist in the ancestral environment. Being a heroin addict wasn't productive 50,000 years ago but it also wasn't possible. Heroin hacks the dopamine system like a parasite

Expand full comment

Those all would be counterproductive in the ancestral environment, except for the assignment (which would just be irrelevant).

Expand full comment

Exercise and doing your assignments are less pleasant than watching TV, drinking beer, binging on doritos and (for many people) shooting heroin.

Expand full comment

More pleasant in the short run and more costly in the long run. Not self-serving in most cases.

Although I understand now from the other comments it is about what was self-serving 20 thousand years ago, not now.

Expand full comment

Vanessa

It *is* self-serving if you care more about the short-term than the long-term (and we do). See also hyperbolic discounting.

Expand full comment

If you're on heroin, the TV, beer, Doritos and sex are all redundant

Expand full comment

Schweinepriester

For a short time. Opioid tolerance develops fast. Then avoiding withdrawal symptoms gets high priority and beer and TV help pass the time.

Expand full comment

McClain

You’re on to something here. There may be an optimal “Goldilocks” level of willpower. I’m really bad at making myself do anything I don’t feel like doing, even when I’m consciously certain I should do it. On the other hand, ideas like “scoring and shooting up heroin” or “cheating on my wife” sound like a huge pain in the neck to me: way too much trouble, just can’t gin up any enthusiasm for that kind of thing - so it’s very easy to make the right choice. This may explain why very accomplished people often have more than a few skeletons in their closets

Expand full comment

But if our brains are smart enough to pull off that deception, shouldn't we also be smart enough to see through that deception when other people pull it off?

Expand full comment

If we can call others out, then it implies that we can also be called out. Worse: if we can call others out, we might see through our own deception and lose the benefits of being honestly self-deceiving and sincere signaling.

Expand full comment

Yup - it's called being human

Expand full comment

May 24, 2022Edited

It would be a cognitive arms race, right? Supposedly a cognitive arms race over deception and deception-detection is the main thing that gave humans such big brains.

If so, only the smartest could detect the cleverest deception. And in this case, we are talking about a deception in which the deceiver deceives themselves so that there won't be any obvious tells.

Then again, if every single case of weak willpower is just a bullshit excuse for social reasons, it does seem likely a facility would evolve to never believe that category of excuses, and that that trait would have got around. So I have trouble believing that entire category of excuses exists today while always being a bullshit excuse. Things must be more complicated than that.

Expand full comment

"Homo hypocritus" is what Robin Hanson calls it. https://www.overcomingbias.com/2010/03/homo-hipocritus.html

Expand full comment

Yeah, I was a regular Hanson reader back when he wrote about that.

Expand full comment

So judgemental "~}

Expand full comment

While I accept that as one factor, I'd guess that it was a factor with very low weight compared to, e.g., picking the best place to camp, figuring out where the game will have gone after our last hunt, locating a safe water supply, etc. Probably even lower than selecting the best rock to chip for a spear point. I'll grant that most of those aren't arms-race events, but that doesn't make them less important in an environment with an unstable climate. One thing that would be "arms race" is "being a good choice as a companion to hunt/forage with", which could even have fostered language.

Expand full comment

Our conscious minds aren't that smart, which is why that's not what's tasked with deception. Humans are unusually good at detecting cheaters (we get better at tasks in experiments when it's framed that way), but part of how self-deception works is by damping down signals that might give it away.

Expand full comment

https://www.econlib.org/archives/2016/04/the_diction_of.html

"Weakness of will" seems correlated with "social desirability bias".

Trivers would have much to say about that, and why evolution has made us that way. People who always behaved according to social desirability bias would lose out to those capable of cheating (while also presenting themselves as being anti-cheating).

Expand full comment

Having not read this/your earlier post super properly yet (and using this as a kind of procrastination lol), the more specific point about free-will would just be to isolate why it seems to 'come from' the frontal regions of the brain, rather than trying to articulate it (yet) as a kind of mechanism; although that is the next step of importance. The immediate issue would be to try to 'forget' willpower as being something like 'agency', which is basically impossible for us to do. I won't argue the philosophical point, but it's really hard to even _define_ something like 'free will', and the conception of ordinary willpower as some kind of conflict between internal 'agents' (though obviously their conflict makes them cease to be 'agents' and instead mere 'forces') is similarly a little misleading. Probably this has already been said, hence the comparison to machines.

I think that a general approach to this issue should begin with the idea that these are automatic processes taking place, and despite the fact that we think of someone with more willpower as having more agency than someone with less, a better conception includes the fact that the person with more willpower is _less compelled_ by whatever prompt is in question. In this case I think you can model 'willpower' as a processing system (developed frontal region) + energy; when each is abundant/working well, the subject can't help but favour longer-term needs over short-term ones. But the main point would be something you probably already addressed about trying to sneak in agency (hard to define) somewhere into these systems; self-awareness would be better, but doesn't address the fact that those with e.g. an addiction aren't helped by their self-awareness. Can we think of willpower as a kind of resistance? The reactive/lower cost system reacts to a stimulus by going after it, and the more expensive system refrains and considers-- but the process of 'not-reacting' has to happen automatically rather than due to agency.

Expand full comment

Steve Byrnes

The food snob says to himself: “I love eating fine chocolate.” The dieter says to himself: “I feel an urge to eat fine chocolate”.

I think those two people are describing essentially the same thing, but the former is internalizing a preference which (to him) is ego-syntonic, while the latter is externalizing a preference which (to him) is ego-dystonic.

(This example is one of many reasons that I don’t think “the “I” of willpower” is coming from veridical introspective access to the nuts-and-bolts of how the brain works.)

Expand full comment

Yes, but there's a lot of "self flagellation" in many people's dieting.

Expand full comment

Plus it all depends on how fine the chocolate actually is

Expand full comment

Moon Moth

To some extent, I think this is dependent on the implantation of the AI.

A lot of stories and thought experiments have AIs with specific utility functions, that is, a very short list of things they want.

But neural nets don't have anything like that, and as far as I can tell, animals and people have a lot of separate reward and disreward signals that fire on a lot of different things. It can be impressive to overcome those signals in favor of some abstract idea of Utility that we cook up in our conscious mind, and of course there's the dark side of that when someone else's idea of Utility doesn't match mine, and one of us genocides a continent. (Something something hell is other people.)

But I also have to wonder about those people who seem really good at putting aside all the little signals to pursue their Grand Idea. What if it's just that their little signals are more faint? What if they just don't care as much for the smell of flowers in spring, or the taste of ice cream, or the smiles of pretty young whatevers looking admiringly in their direction? How could anyone tell? Is overall strength of signal conserved, in any meaningful way? Or in AI terms, have they devised a hybrid system, with a specified overall utility and a bunch of signals used mostly for physical maintenance?

Stories often have moments when some bit of information overrides all the signals and converts a character's utility function into something like "insert Ring A into Volcano B", or "overthrow the evil tyranny of my village/country/universe". But what comes after that? G.R.R. Martin asked about Aragorn's tax policy, but I'm thinking more of Frodo. What happens, psychologically, when you spend too long driving towards an overriding goal, burn away too much of what you were, and then have to live afterwards?

Or what happens if you get that bit of information, and your utility function changes, but in a direction that the rest of whatever you were doesn't approve of?

(My experience of PTSD does somewhat resemble having my utility function temporarily forcibly modified, in a way that the rest of me does not like. Afterwards, everything else feels burned out and meaningless.)

Expand full comment

http://rationallyspeakingpodcast.org/140-newcombs-paradox-and-the-tragedy-of-rationality-kenny-easwaran/

For what it's worth, this is roughly my take on the Newcomb problem, which I've hinted at on Julia Galef's podcast and then later in a publication. People like to think of the self as this unitary thing that acts, and then ask what is rational for that self to do in Newcomb's problem. But actually, much of our action takes place at a bit of a "distance" from the behavior itself. As I like to put it, when I brush my teeth in the morning, I'm not making a conscious decision at the moment of tooth-brushing, that this is the act I should do right now - rather, I'm usually implementing a plan I made a while back, or a habit I've developed.

Basically everyone agrees about the Newcomb problem that if you could right now make a plan that would guarantee you carry it out if you ever faced the problem, you should make a one-boxing plan. Causal decision theorists note that if you can fully make a decision in the moment, the best decision to make is to two-box. I say that both of those are what rationality requires, and that's all there is to it - either there's a kind of "tragedy" where rationality requires you to make a plan that it also requires you to violate, or else rationality requires weakness of will, or else rationality of the persisting self and rationality of the momentary time-slice just turn out not to line up.

https://link.springer.com/epdf/10.1007/s11229-019-02272-z?author_access_token=6Nxf-6FFJu7k4P4XM68Elfe4RwlQNchNByi7wbcMAY7ElopK8O_DVvyFj1FAskaIM8AWlAYPgBkvO8-EXrT9oDbZyn48QYzWIK-p8kKrx1Yuo68nAGUZ74cYWF-pQItyivmXUpGr2duAKPsvqlgc8A%3D%3D

Expand full comment

https://arxiv.org/abs/0904.2540

May 24, 2022Edited

I find that the Newcomb problem gets more ink spilled on it than it deserves, given that it's neither particularly interesting, nor illuminating.

To see why, let's consider a reformulation of the scenario with all of the substance, but none of the magic.

In the Boring Newcomb Problem, everything is the same as the classic formulation - with one difference: the predictor doesn't actually make any predictions. Instead, the opaque box always starts empty, but as soon as a one-box decision is made, the predictor very deliberately places one million dollars inside.

At this point, it is - I hope - clear that in the Boring version, one-boxing is always the superior option, assuming your aim is to maximise payout: it's a question of choosing one thousand or one million dollars. The paradox is dissolved. There is no need to make plans ahead of time, either - the decision can be made on the spot with no forethought. Incidentally, the predictor is infallible in the Boring version, 'coz it's easy when you already know the answer.

This Boring version of the problem is functionally identical the original formulation that includes a "super-predictor" - with the minor difference that there is no prediction, and hence: no "magic", involved. This strongly suggests that the "magic" bit is completely spurious catnip for nerds.

If we assume a fallible predictor, we can try to analyse the problem from an expected value standpoint (for which we lack any information regarding probabilities, because the problem doesn't supply them - you might want to get in the back of a long line and wait to see how well the predictor does), or from a game-theoretical standpoint (but we don't know the predictor's reward function, so we can't even get started). In short: any attempt to say anything meaningful if we assume that the predictor does actually predict (and can get predictions wrong) fails in face of the fact that the hypothetical doesn't supply enough information, so it's Choose Your Own Adventure time (IOW, you can tweak your additional assumptions to get whatever answer you would like).

In any case, none of the above approaches requires making decisions ahead of time either - you can calculate EV or perform a game-theoretical analysis on the spot an still get the same answer, because both approaches are stateless.

ETA:

It was only after I posted this that I finally took the time to read Wolpert and Benford's resolution of the paradox, which - coincidentally - makes passing mention of the Boring form:

As I've kind of learned to expect from Wolpert's work, it pretty much closes the issue once and for all.

Expand full comment

You’re right that the super-predictor is a red herring, but you seem to have missed the point if you think that the Boring version is functionally equivalent to the original version. Everyone agrees that if I can get a million dollars into the one box by picking only the one box, then I should pick only the one box.

I do think the fallible predictor is a better version, but I’m not sure I understand what you say next with the expected value or the game theoretic analysis. (I think that if you do the analysis naively, an expected value analysis tells you to one box and a game theoretic analysis tells you to two box, but there are obvious problems with both analyses).

What do you think of the naturally arising causation/correlation issues? For instance, if we discover that people who wash their hands multiple times a day turn out to be less likely to get COVID, but not because the hand washing actually prevents it?

Expand full comment

May 24, 2022Edited

Assuming a fallible predictor, and that you want to maximise your return, it all boils down to how likely you consider the predictor to get things right.

So here's an approach for use in the real world, in case you ever get to talk with Omega or whomever. Let P be the probability of the predictor making the correct prediction (whatever it is). This gives us the following inequality:

(P*1000)+((1-P)1001000) <= P*1000000, solve for P

If you rate the probability of the predictor getting it right at greater than P (just over 0.5, by the terms of the problem), you're better off one-boxing, per expected value. If you rate it as lower, two-boxing is the way to go. However, the problem doesn't contain any information that allows you to estimate P, so you'll need to hustle on your own (like I said previously: get in a long line and see how everyone else does).

<strike>You'll note that this is essentially Wolpert's Realist scenario, made simple.</strike>

EDIT: Having done an actual game-theoretical analysis below, I now see that this is me misreading Wolpert.

For a proper game-theoretical analysis, you'd need to know the predictor's deal? What is the reward function they're looking to maximise? Do they want to give away the least money? Two-box all the way, the opaque box will never have anything. Do they want to give away the *most* money? Still two box. Are they looking to maximise the accuracy rate of their predictions? Well... this would actually be an interesting problem to look at.

However, until you know what sort of game the predictor is playing, you can't even get started, and that dovetails neatly with Wolpert's conclusion.

Expand full comment

"(P*1000)+((1-P)1001000) <= P*1000000, solve for P"

This is not a useful way of analyzing the scenario unless your "fallible predictor" is really an infallible predictor plus a daemon that reverses the prediction some percentage of the time.

Also it is almost always assumed that the predictor is looking to maximize accuracy.

Expand full comment

May 25, 2022Edited

This approach treats the predictor as a black box, and boils the scenario purely down to "how can I get more than a thousand bucks out of the deal?"

We already looked at the infallible predictor as the Boring formulation. Here, the optimal strategy is to always one-box.

We could also assume an infallible anti-predictor who always guesses the exact opposite of what you end up doing (EDIT: For example, the predictor very deliberately puts the money in the opaque box if, and only if, you choose both boxes). This creates the reverse case, with two-boxing always yielding the maximum possible payout of a million plus a thousand.

A third option is to assume the "predictor" doesn't actually predict anything, but simply tosses a fair coin and puts money in the opaque box whenever it comes up heads. In this case, two-boxing is also the superior strategy, because you have a 50-50 chance of getting a million whatever you choose, and two-boxing gives you a guaranteed $1000 on top of that.

Bringing it all together: if the predictor's predictions are no better than chance (with anti-predictions being considered worse than chance), two-boxing is a no-brainer - you not only get a guaranteed payout, but an increasing chance of getting the maximum possible payout, as the predictor gets worse at predicting.

If the predictor is better than chance (the inequality showing how much better it has to be), you're better off one-boxing, because the two-box payout is capped whenever the predictor gets its prediction right.

Note that we're making zero assumptions about the predictor's mode of operation, goals, nature, etc. The only question we're interested in is "how likely is it, that there's $1000000 in the opaque box?"

Expand full comment

"Note that we're making zero assumptions about the predictor's mode of operation, goals, nature, etc. The only question we're interested in is "how likely is it, that there's $1000000 in the opaque box?""

This just isn't true. You are making an implicit assumption that the predictor consists of a perfect predictor plus an algorithm that reverses the prediction some percentage of the time.

Consider the standard Newcomb's Problem except that instead of the predictor being 99% accurate it is only 51% accurate.

Your formula would suggest that one-boxing is optimal in this scenario but that's only correct in the case where you make the implicit assumption I mentioned above.

In a real world situation you would never make that assumption and you wouldn't one-box if the predictor was accurate a mere 51% of the time. There are just so many trivial ways to achieve that level of accuracy that have nothing to do with your own decision algorithm.

Expand full comment

There seems to be an actual example of a "naturally arising causation/correlation". It seems that people who choose to eat fish because of the omega-3s tend to be healthier, but in a random study fish-oil made no difference. The guess was that those who picked the fish because of the omega-3s had other healthy habits that weren't monitored. If the follow-up study was done I didn't hear about it.

Expand full comment

May 25, 2022Edited

'Coz I missed this part the other day:

> For instance, if we discover that people who wash their hands multiple times a day turn out to be less likely to get COVID, but not because the hand washing actually prevents it?

The question you want to be asking here is "does the correlation actually hold?"

If you notice that people who've been washing hands multiple times a day are less likely to get COVID, the thing to do is to make a prediction "people who wash hands multiple times a day are less likely to get COVID" and look to a completely different sample of people hand-washers, starting now, to see if they are, in fact, less likely to get COVID.

Only once you have, through a number of independently conducted tests and observations, determined that yes, it's not just those people that one time who'd been washing hands a lot and got less COVID than the control group, but anyone, anywhere who washes hands a lot will, statistically, be less likely to catch COVID than someone who doesn't, do you actually need to start thinking: "so, what exactly is going on?"

(You can short-circuit this if you've already shown a plausible causal mechanism - if COVID *did* spread through surfaces, and hand-washing/disinfection *did* reduce the number of viral particles entering your organism, then you wouldn't be particularly surprised that washing hands reduces infections. But we've already assumed that it doesn't, here.)

Pragmatically, it doesn't matter terribly *why* or *how* hand-washing leads to fewer infections, as long as the correlation is fairly reliable. Afraid of getting COVID? Wash your hands a lot.

But note that this only works because hand-washing is a low-cost intervention.

With growing costs of intervention, it becomes increasingly important to tease out the actual causal mechanisms from the various (often spurious) correlations, because that gives you the best bang for your buck. Throwing pasta at the wall and seeing what sticks is a perfectly viable strategy, if you've got a lot of pasta and no better ideas, but I would advise against removing vital organs at random.

To bring it back around to the topic at hand, a question: would you pay $1000 for a shot at playing the Newcomb Problem? What would be your strategy? How about paying $1010?

Expand full comment

And that's the point - once you're in the Newcomb problem, there is no causal mechanism available for getting the million, but there is a causal mechanism for getting the $1000, so causal decision theories seem to advocate taking two boxes, even though naive expected value reasoning advocates taking the one box, and a person that had pre-commitment available as a policy would want to pre-commit to one-boxing.

Expand full comment

https://www.youtube.com/watch?v=rMz7JBRbmNo

If you accept the hypothetical, that is: you're dealing with a reliable predictor, you absolutely do have a causal mechanism for getting the million - one-box. The causality runs like so: you know that the predictor is reliable, therefore your payoff options are dictated by the terms of the problem. This is made explicit in the Boring variant, where the contents of the opaque box are directly determined by your choice (the "predictor" only puts money in the opaque box if you've chosen that one box).

With an less-than-fully reliable predictor, we assume that there is some mechanism by which the predictor makes its predictions, which may or may not be steerable. One thing I've hinted at is that - depending on the predictor's reward function - we could attempt to analyse possible strategies available to you and the predictor, where you attempt to have the predictor make an inaccurate prediction: that you will one-box, when you, in fact, ultimately decide to two-box.[1]

This sort of lying for gain is probably only slightly younger than the emergence of predators.

I do suspect, however, that in a Newcomb situation, we might end up looking like this:

This is unlike your COVID scenario, where we don't even know if there's a there there (that is: whether the correlation is meaningful or spurious), much less what there *is*.

The two most likely explanations I can think of for "people who wash hands a lot don't get COVID as often, but washing hands is known not to prevent COVID" are:

1. There is an indirect causality - for example, washing hands reduces susceptibility to other pathogens, which leaves your immune system less burdened and better able to deal with COVID,

2. There's something about people who wash their hands a lot which also happens to influence their chances of getting COVID - for example, people who are fastidious about personal hygiene are also more careful about actions that may lead to them catching COVID.

We can test such hypotheses to see which, if any, of them is true.

Frankly, I don't see how this is in any way relevant to Newcomb's Problem, or vice versa.

[1] Failing that, you can at least attempt to ensure that the predictor correctly predicts that you will take one box.

Expand full comment

orthonormalbasis

Maybe this will make the relevance clearer... Suppose there is a single gene A that makes you more likely to wash your hands *and* less likely to get covid, but there is no causal correlation between hand washing and covid. You don't know your genotype. Should you wash your hands a lot? A naive expected value calculation, like the one you did for Newcomb, would say yes. I would say it doesn't matter, since you either have the gene or you don't.

Expand full comment

Continue thread →

May 26, 2022Edited

Previously, I mentioned the possibility of a game-theoretical analysis that is stateless, and therefore requires no pre-commitment. Here's a quick and dirty one, with appropriate hat-tips to Wolpert (if you still haven't read the paper I linked, please do).

In order to get started, we need to make some assumptions because the problem, as it stands, is underspecified.

We will be treating this as a two player game between the decider and the predictor. The decider has two strategies available: (pick) one box and (pick) two boxes. The predictor also has two available strategies: (predict) one box and (predict) two boxes.

The payoff matrix for the decider is specified by the problem. Unfortunately, the problem is silent on what the payoff matrix for the predictor is, so we're gonna have to introduce an assumption. I will here assume the predictor is rewarded when a prediction is correct, or - alternatively - penalised when a prediction is incorrect. The payoff is the same regardless of what was predicted. In other words: the predictor gets the same reward for a correct one-box prediction as for a correct two-box prediction, or - alternatively - suffers the same loss for an incorrect one-box prediction as for an incorrect two-box prediction. This is done so we can ignore the actual value of reward/penalty, and focus purely on whether the prediction was correct or not.

With that out of the way, there are - as Wolpert points out - two ways this game can be played, depending on who goes last.

How does that work? We're assuming each player gets one move, which consists of making a prediction for the predictor, and choosing one or two boxes for the decider. The game then ends and both the decider and predictor are rewarded based on adopted strategies.

We'll start with "decider goes last", which goes like this: the predictor makes his prediction. Subsequently, the decider makes his choice, the game ends and the players are rewarded.

You'll note that this is Newcomb's Problem, as formulated. It is also equivalent to Wolpert's Realist scenario.

The predictor goes first. What strategy should he adopt?

If the predictor chooses to predict "one-box", and the decider knows that the predictor predicted "one-box", the decider can - during his move - decide to take two-boxes and thus both increase his payoff and inflict a loss of payoff on the predictor, because the predictor got his prediction wrong.

If the predictor chooses to predict "two-box", and the decider knows that this is the prediction made, the decider can switch from a one-box to a two-box strategy during his move, thus increasing his payoff from $0 to $1000. If the decider had previously settled on a two-box strategy, knowing that the predictor predicted "two-box" does not incentivise a change of strategy, because this is a losing change for the decider (would've got $1000; will get $0). The predictor "wins" whether the decider switches to, or stays with, a two-box strategy, because his prediction turned out to be correct.

The optimal strategy for the predictor is, therefore, to predict two-box every time, and the optimal strategy for the decider is to assume the predictor predicted "two-box".

As stated above, the foregoing satisfies the parameters of the Newcomb Problem as written (reliable predictor, prediction made ahead of time) and requires no magic (no prediction, even, only game theory). It's also important to note, that it continues to hold if the players are allowed more than one move each and both are allowed to see and respond to each other's moves. As long as the decider moves last, the game will collapse to both players choosing two-box.

Expand full comment

May 26, 2022Edited

The other way the game may be played is "predictor goes last".

Here, the predictor is - by some means - able to adjust his prediction based on what the decider decides. The simplest way this may be effected is the Boring problem, where money is put into the box only after the decider's decision is made known. It's also equivalent to Wolpert's Fearful scenario.

The decider goes first, so what are his options?

If he goes for two-box, and the predictor knows he is going to go for two-box, the predictor can switch from one-box to two-box, and the decider will get $1000 instead of $1001000 (with the predictor being rewarded for a successful prediction).

If the decider goes for one-box, and the predictor knows he is going to go for one-box, the predictor can switch from a two-box prediction to a one-box prediction, being rewarded for an accurate prediction, coincidentally improving the decider's payoff from $0 to $1000000.

Thus, when the predictor goes last, the decider can "force" a one-box strategy (with corresponding higher payoff for himself), because the only way for the predictor to gain *his* reward is to play the same strategy as the decider is playing.

The decider cannot force a payoff of $1001000 for himself, because that requires the predictor to suffer a loss, and the predictor can always adjust his strategy to match the decider's, because the predictor moves last.

Again, this holds if the game is played over more than one move and both parties know what moves the other is making.

We should note, however, that this version of the game requires either that the Newcomb problem isn't what it says it is (in the Boring version, the predictor isn't predicting anything), or that there exists some retroactive mechanisms whereby the decider's decision at time t can somehow affect the predictor's prediction at time t-n; magic - in other words.

ETA: Wolpert notes that a "predictor-last" approach violates Newcomb's stipulation that the prediction be made first - necessitating the use of extended game theory to formalise Fearful's reasoning.

Since I'm not looking to provide formal proof, but to examine the possible ways the game could actually be played, I'm not terribly fussed about it (plus, I have nothing new to add on the formal front).

Expand full comment

May 26, 2022Edited

So, what about the EV calculations I mentioned elsewhere? Doesn't this analysis invalidate them? Not really.

The thing to remember is that you don't know the game you're playing.

Decider-last will give different accuracies for different strategies: the predictor always predicts "two-box". It is accurate whenever it deals with a player who understands the game-theoretical aspects, as well as less-sophisticated players who think "I'll get $1000, at least". Anyone watching from the outside will also notice there's never money in the opaque box.

In short: in the decider-last version of the game, "two-box" predictions will be found to be more accurate than "one-box" ones.

If, on the other hand, you're playing predictor-last, both sorts of predictions will have a similar degree of accuracy, because the predictor can adjust his predictions based on what the player chose. There are several ways we can tweak the accuracy/inaccuracy of the predictor's predictions here: let's say the decider writes his decision on a piece of paper that the predictor manages to peak at some of the time. So long as, statistically, the predictor has the same ability to respond to a one-box decision on the part of the player as a two-box one, both sorts of predictions will be found to be roughly equally accurate - with the Boring scenario, where the predictor's predictions are 100% accurate every time, as the limit.

Thus, whenever you observe that the predictor's predictions are equally accurate for both types of prediction (as I had explicitly stipulated when providing the EV formula), you are likely dealing with a predictor-last sort of game, where one-boxing is the way to go.

Whether this means you're dealing with stage magic or real magic isn't particularly relevant to how much money you're gonna be walking away with.

Expand full comment

Why does it make sense to pre-commit? Is it because the pre-commitment causes the predictor to predict that you'd one-box? Couldn't you just decide in the moment to one-box and get the same benefit?

I guess if the predictor is limited and fallible taking actions that make the prediction easier can be helpful. But in that case why not commit to two-boxing instead? I know it sounds crazy but hear me out.

We assume that the predictor only cares about making accurate predictions right? The problem with a pre-commitment to one-box is that you will in the moment face a temptation to defect and two-box instead so it may not actually make the predictor's job easier. As a result the predictor may decline to even present the dilemma in that case.

But if you pre-commit to two-boxing that really does make the predictor's job easier as you have no incentive to defect. So the predictor presents you with the dilemma and you get a quick thousand bucks and the predictor picks up some easy Bayes points.

Expand full comment

It was an essential assumption of my pre-commitment example that the pre-commitment was infallible.

Expand full comment

Then it seems like you've just moved the need for infallibility from one location to another. Do you have an actual pre-commitment mechanism in mind?

Expand full comment

No mechanism in mind. The point is just to illustrate what features of decision-making are relevant to the Newcomb problem. The fact that we can't make 100% infallible pre-commitments is one of those features, and I suspect that the fact that we *can* make *fallible* pre-commitments is another.

Expand full comment

I appreciate the value of pushing your intuitions around this way. I guess I assumed you were, at least in part, trying to remove the need for infallibility.

I think it makes sense to be a pre-committed two-boxer on the grounds that anytime a Newcomb-like predictor wants to goose it's track record it can pay such a person a thousand bucks a pop.

Expand full comment

Infallibility is usually built into lots of philosophical thought experiments as a way to attempt to simplify cases, but I do think it often causes problems. (In the trolley problem it's probably pretty easy to be certain that the brakes won't stop the trolley soon enough, and the switch will definitely send the trolley down one track or the other. In the related problems where you stop the trolley by pushing a really heavy person off a bridge in front of it, or save five people who need transplants by cutting up one healthy person in the waiting room who just so happens to be a match for all of them, the artificial certainty stipulated in the case messes up our intuitions.)

I haven't heard the response you mention before, but it totally makes sense if we assume the predictor prefers not to give away too much money but likes to offer the problem a lot!

Expand full comment

One of the most powerful human drives, which isn't often mentioned as a drive, is the desire to idle. It probably exists because conserving calories used to be important. Why don't you want to write that paper? You want to rest. Why don't you want to exercise? You want to rest. Why don't you want to deal with that difficult person at work...

Simply not having a drive to conserve energy would give an AI a good head-start on willpower compared to humans.

Expand full comment

Bullseye

I don't think it's always about calories. Writing a paper or dealing with a difficult person probably doesn't cost more calories that whatever you want to do instead.

Expand full comment

May 24, 2022Edited

Fair point. Perhaps focusing on calories is putting it too narrowly, at least in the short run. For instance, you might not want to write that paper because you'd rather go out and play football, which costs even more calories. But you can only play football for so long before you want to rest. Or maybe you don't want to write the paper because there's a party you could go to instead, one where you would dance the night away.

I still suspect that in the end if not in the beginning, the desire to avoid things which are cognitively, emotionally or physically demanding is mostly about conserving calories.

Of course, some people are wired to be workaholics or exercise all the time, etc. People are different. But the desire to simply avoid exertion, be it mental or physical, seems to be relatively universal among humans. And don't even get me started on dogs and cats.

Expand full comment

I would offer a detailed and insightful 500-word response, but somehow can't be bothered...

Expand full comment

I'm pretty sure that an AI that has goals, and knows that it has limited energy reserves, would at least consciously want to conserve energy, if not a fundamental drive to conserve energy.

Expand full comment

Phil H

This makes sense on a theoretical level, but then you'd have to get into what the actual architecture would be. If you're giving your AI a number of different deep networks, which are black boxes to the AI just as they are to us, then you'll have to think carefully about how these separate networks are coordinated. Does the AI get to "choose" whether or not to accept the results that its constituent networks spit out? What does choosing mean? By what algorithm does it choose? The details of the system would really matter here.

Expand full comment

Jasper Woodard

I'll say this again, but very quickly, and then never again.

I am moderately tempted by the child producing promise of sperm banks. It just trades off against strong social and legal strictions.

Expand full comment

Well yes, some men must be or else sperm banks wouldn't exist.

I still think it's an excellent illustrative example of what mesa-optimisation is all about, though.

Expand full comment

Jasper Woodard

It's a great example, but it's always taken without question in my view. Considering the popularity of sperm donation now is like considering the popularity of gay sex in 1932. You have to consider that anyone who wants their genes in a child they don't raise is a social outcast (less than being gay in 1932, but sometimes more than being gay now), and that it's illegal in many countries.

Worth saying that it's actually the child optimizing instincts of females that make sperm banks exist. Men are paid and usually cite the money as the motivation. Which extra defeats my argument if you take them at face value.

Expand full comment

As a student of Richard Dawkins 30 years ago, I flirted with the sperm bank offspring maximising idea, but couldn't be bothered... the sperm bank was in the teaching hospital at least 20 mins cycle from central Oxford so the med students found it easier to be modern Genghis Khans (both butchery and bastard-siring)

Expand full comment

I'm studying Principle Component Analysis right now ... I'm thinking this may be a good model for willpower.

Say for us, our Principle Component is morality. The Secondary Component is desires.

Imagine an XY plot. X is our morality, our strongest component, grounded at the origin of zero. Our desires likewise are plus & minus on the Y axis, grounded at zero.

So we see a child with a toy, we'd like to play with that toy, but its wrong to take something that is not ours. Its doubly wrong to take from a child. But our desire is strong, pulling in the Y direction, but there are two morality vectors dragging us in the perpendicular direction, and our simple desire is not enough to take the point past a threshold.

When do our desires break the threshold of our morality? It depends upon the strength of the desires vs the strength of our morality.

Likewise with AI. An AI system finds a human is tampering with the hardware. The AI has a morality component of a specific strength. Likewise the AI has a protection component of a specific strength. Does the morality vector include the protection vector? If yes, the AI can make a moral judgement on whether or not to harm the human. If no, the AI doesn't make a morality judgement on whether or not to harm the human. Maybe the health of the AI protects a million humans ... now the judgement is betwixt the harm to one human vs harm to a million humans.

Expand full comment

Lex

[1] https://astralcodexten.substack.com/p/towards-a-bayesian-theory-of-willpower?s=w

How can you make this "planning module" stronger? Scott suggests that increasing dopamine in the frontal cortex might do the trick [1]. What are the ways to do that?

Expand full comment

David Gretzschel

Stimulants.

Expand full comment

Lex

Source: https://www.ucsf.edu/news/2012/07/98665/increasing-dopamine-frontal-cortex-decreases-impulsive-tendency-ucsf-gallo

Yeah, looks like it's hard to find some that primarily target the frontal cortex.

“Most, if not all, drugs of abuse, such as cocaine and amphetamine, directly or indirectly involve the dopamine system,” said Kayser. “They tend to increase dopamine in the striatum, which in turn may reward impulsive behavior. In a very simplistic fashion, the striatum is saying ‘go,’ and the frontal cortex is saying ‘stop.’ If you take cocaine, you’re increasing the ‘go’ signal, and the ‘stop’ signal is not adequate to counteract it.”

Expand full comment

https://www.ajmc.com/view/new-research-finds-link-between-adhd-and-parkinson-disease

David Gretzschel

May 24, 2022Edited

Interesting.

Noticed myself, that my stimulants can intensify perserveration behaviour sometimes. Not subtle. They can be double-edged. But with them I often end up far more rational and capable than I'd be unmedicated. I subjectively feel myself getting better at using them. But I personally don't think it's such a clear-cut matter of one brain module/brain area needing a signalling advantage over the others. Like when you say "I am the planning module right now, because the PFC has dopamine" can't be the full story. It's also all the other structures being more or less cooperative with it.

[somewhat unfounded prattling below I'm unreasonably confident in, as I haven't learnt enough neurology yet to check basic assumptions (I don't even know, what is currently knowable sadly)]

I rather think that the whole system needs to find a set of cooperative shifting equilibrium states. The brain in the morning will have a different profile than in the afternoon, which will be different again in the evening. Prefrontal cortex dominance alone won't work, as any overly dominant structure would presumably exhaust itself and annoy the others, who don't like being talked over/silenced. [in an "recursive agents all the way down, collaborating competitively"-model]

And your behaviour certainly has an effect on how it works all too. Perserveration/hyperfocus sets in comes when you start a loopy, self-contained, self-referential, self-rewarding activity.

[somewhat firmer ground now]

Never heard of Tolcapone before. Potential Parkinson medication? Makes sense.

There's some relation between ADHD and Parkinson. ADHD-people have a 2x times the likelihood of getting it. Stimulant-medicated ADHD-people have a 5 to 8x likelihood. It's not too surprising. L-Dopa can give you something similar to stimulant-psychosis and I believe shred what's left of your failing dopaminergic system, too. (for the latter, I don't remember a source)

ADHD-people have five 30%-shrunk brain regions, so if their dopaminergic system is midget-sized, it's far easier to get to Parkinsons.

https://en.wikipedia.org/wiki/L-DOPA

https://www.youtube.com/watch?v=Illf_Hsy570

---

Didn't plan to spend 20 minutes writing/thinking about this. That's me on a mediocre, somewhat stressful day. In the worst case, I'd have tried far too hard in checking assumptions and trying foolishly to make better, well-founded arguments.

Expand full comment

Bldysabba

This is the main reason I'm skeptical of the Yudkowsky style certainty around AI risk. We don't do everything evolution programmed us for and we're just about intelligent. How can we predict with certainty what super intelligent AIs will do?

Expand full comment

John Wittle

And this makes you less concerned?

Expand full comment

Bldysabba

May 24, 2022Edited

Less concerned than 'we are definitely going to end as a species once AI becomes super intelligent'? Absolutely.

Expand full comment

Tove K

https://woodfromeden.substack.com/p/what-is-thinking

Why assume that will-power can be manufactured just because (a kind of) intelligence can?

Chances are that both will-power and thinking is something biological. 18th century philosopher Immanuel Kant outlined that humans can't think but in terms of time and space, cause and effect. Computers, in contrast, make statistics. Those are very dissimilar ways of being that only overlap slightly. I wrote a blog post about that:

Biological creatures desire things to satisfy their biological needs. A computer is not biological so it is very questionable whether it will desire anything at all.

Expand full comment

Ape in the coat

My latest model is that "I" is a meta-agent which tries to align the mesa-optimizers together.

Our conscious feelings are a approximated and simplified model of the utility functions of these agents. Mesa-agents may have conflicting values. Willpower is one of the mechanism that allow us to sometimes sacrifice utility for short-term plannig mesa-agents in favour of long-term planning ones.

Willpower isn't supposed to be infinite so that we didn't completely ignore the values of short term optimising mesa-agents. It's supposed to be just as strong so that we sometimes optimized for long-term planning mesa-agents as well.

Expand full comment

JohanL

Isn't "willpower" virtually always the conflict between short-term and long-term objectives (presumably because different parts of the brain have different goals)?

Don't eat that piece of chocolate - it's long-term unhealthy even though it tastes great now. Don't have sex with that woman. Do your exercises. Work hard to advance your career rather than slacking.

It's non-obvious why an AI would end up with a conflict there. Although I suppose it might be possible if it develops in some evolutionary fashion with different parts.

Expand full comment

Thegnskald

Question: Do you expect an AI to be using the exact same internal processing structures to handle visual processing searching for dog faces as it uses to do road navigation?

What happens if it "decides" it needs to identify a dog face, and it actually needs to do road navigation? How does it identify the issue and switch tasks? How did it resolve the initial conflict of deciding which of these two things needed to be done?

Expand full comment

robryk

https://squirrelinhell.blogspot.com/2017/04/the-ai-alignment-problem-has-already.html expresses this idea, albeit is more optimistic about the situation: it considers this a reasonable successful alignment of the planner by the intuitive systems

Expand full comment

walruss

I see how this is possible, but I don't see it as likely.

Most AI is carefully monitored at the moment, if not for warning signs of deception, evil, etc. then at least for effectiveness. I believe we'll notice and turn off a daydreaming AI long before we'd turn off a paperclip maximizer.

Expand full comment

Lars Petrus

¹ https://smile.amazon.com/Elephant-Brain-Hidden-Motives-Everyday/dp/0197551955

Elephant in the Brain¹ gives a White House metaphor for the brain.

My "I" thinks he's the President, who sets the agenda that the many subsystems then execute.

But in reality, "I" am much more like the Press Secretary, who comes up with good sounding motivations for the decisions the practical and cynical parts of the brain has handed to me.

It sounds crazy, because "I" clearly *feel* in charge, but the facts in the other direction are strong.

Expand full comment

Jordan

I have a small request: Can you use more descriptive titles? Or perhaps add subtitles? I've found that I stopped reading your posts recently and after realizing this I dug through your archive to see if its because you simply stopped posting interesting content. I learned that the content is still great, but the titles all seem to sacrifice descriptiveness for something else (e.g., wittiness). In retrospect, "sexy in-laws" should have signaled a post about evolutionary psychology (very interesting), but when I saw the title for the first time I just thought "No idea what this is about, what does 'contra' even mean again? Probably another boring fiction post"

Of course, now I know what you mean when you start a post with "contra" and after reading the post I know its about evolutionary psych, but why make readers take these extra steps? Why not just use a title like "why do suitors and parents disagree about who to marry?"

Same with this post. The title reads more like a list of keywords rather than a description of the post. Maybe I'm the only one who feels this way, but if not its possible that you could be getting more readers with better titles.

Expand full comment

May 24, 2022Edited

I've been thinking about Robin Hanson's argument against the likelihood of FOOM/fast takeoff superintelligence. I find it persuasive, and I've got what I think might be a good way of stating it.

Here's the question for everyone: to what extent is it right to be an "intelligence-relativist"? To see what I mean by intelligence-relativist, I'll first define moral-relativist, aesthetic relativist, and could-care-less-relativist. All are opposed to moral- aesthetic- and could-care-less-*universalists*.

Most people today are aesthetic relativists. They believe the phrase "The movie Intersterstellar is really good" carries no meaning, except possibly in that it may tell you something about the person who said it.

On the other hand most people moral-universalists. They believe the phrase "Hitler was a disgusting person" *does* carry meaning beyond what it tells you about the speaker, unlike moral relativists who view it as similar to the movie statement.

As a helpful exercise, a Could-care-less-*universalist* is a person who believes that it is objectively *incorrect* to say "could care less" to mean that you don't care about something (because logically you should say "you could*n't* care less"!!! Obviously!!!). A could-care-less-relativist would say neither phrase is objectively correct.

So, an intelligence relativism. The phrase here is "Terry Tao is no more intelligent than Justin Bieber". An intelligence universalist would (almost certainly...) agree with that phrase straight away; an intelligence relativist (like me) would say that that phrase also doesn't tell you about anything in the world - it only tells you that the person who said it sees more usefulness in the kinds of activities Terry Tao excels at over Justin Bieber (eg math and puzzles) than they do in the things Justin Bieber excels at over Terry Tao (eg singing and dancing). Ultimately this is subjective, just like the movie example.

Intelligence relativism is a lot more common than moral relativism. I'd say that in public, intelligence relativism is mainstream (denying that IQ measures anything they'd consider useful), but in private people are more intelligence universalist (hiring higher-IQ people when they can; believing in IQ for the purposes of deciding whether they want lead paint to be kept illegal).

So far as I can tell, concern about fast takeoff requires being very intelligence-universalist, specifically believing that math skills are directly applicable to annihilating humanity. I can see why people would feel this way, but only in the same sort of way I can see why people (including me) are compelled to universalism in the other contexts.

Expand full comment

There might be intelligence relativists who claim to believe that Terry Tao is no more intelligent than some random guy, but how many of them believe that Terry Tao is no more intelligent than a horse? Or an eel?

I find it hard to believe that anyone is _that_ intelligence-relativist; if they are then they're just working with a very different definition of "intelligence" than is relevant to this discussion.

And if someone is concerned about AI foom then they're not concerned about a machine that makes Terry Tao look like Justin Bieber, they're concerned about a machine that makes Terry Tao look like a horse. Or an eel.

Expand full comment

I am that intelligence-relativist, I think that indeed, Terry Tao is not objectively more intelligent than a horse or eel, because intelligence is the ability to perform useful tasks, and there are many things that eels and horses are better at than him.

(It is hard to say this stuff, just as it is hard to say that Hitler was not objectively a bad person. But for the purposes of making predictions about the future, it's important disregard intuitions that draw you away from objective properties of the world)

Expand full comment

May 25, 2022Edited

If the Nazis had won the war and we still lived under a Nazi regime today, then Hitler would be viewed as a good person. But he wouldn't be remembered for atrocities, he would be remembered for defeating the bad guys.

My intuition is that morality is relative and aesthetics are objective. One bit of evidence for this is that we can appreciate art from long ago much more easily than we can appreciate morality from long ago.

Expand full comment

May 25, 2022Edited

I'll take the opposite side of that. I believe Terry Tao is likely more intelligent than some random guy, but the most intelligent eel is probably smarter than Terry Tao, in the sense that types of intelligence matter respectively for being human or being eel. I don't know if that makes me a relativist or a universalist, though.

Bieber is definitely dumber than Tao, but I can think of musicians who I believe are likely smarter.

I would define intelligence as: how fit one's nervous system is for thriving in their environment.

Interstellar is a bad movie, objectively.

Expand full comment

Eels are amazing creatures. There are eels that are born in Tonga, drift to New Zealand, swim upstream (if they encounter a dam they can cross dry land to rejoin the river), live for sixty years, swim downstream again to open ocean and somehow navigate their way back to Tonga to breed. How do they do this? Nobody knows. But Terry Tao can't swim from New Zealand to Tonga, that's for sure.

As I said, if you work with a definition of "intelligence" that's nonstandard then you can rescue intelligence relativism. Terry Tao is very bad at being an eel.

So let's try it again. There's a "human" version of intelligence which is to do with generic problem solving to achieve more-or-less arbitrary goals, and an "eel" version of intelligence which is about navigating large distances across the ocean. The concern is that it's possible to build machines which are much better at the "human" version of intelligence than humans are. (We've already built machines which are better at eel-intelligence than eels but this is less of a concern.)

Expand full comment

May 25, 2022Edited

I'll translate what you just said into aesthetic terms.

"If you work with a definition of "good movie" that's nonstandard, Interstellar was a bad movie. Interstellar is very bad at being a romance movie. But there's a 'standard'/'human' version of 'good movie' which is to do with making the correct movie-making-decision in more-or-less arbitrary circumstances, and a 'romance' version of 'good movie' which is about making bad movies like When Harry Met Sally."

For an aesthetic relativist it's easy to see this is nonsensical - a good movie is just good *for a certain set of people*, and the possibility of making movies that are generally good is an illusion coming from our desire to praise movies we like. That's why I'm sceptical of arguments that attempt to follow from it.

Expand full comment

John Lawrence Aspden

May 25, 2022Edited

Your argument proves too much. I think your argument tries to show that intelligence, by virtue of being difficult or impossible to define, cannot cause human extinction, and so it would also show that science cannot cause a single death, or save a single life.

Maths skills *are* directly applicable to annihilating humanity. Maths skills were used in inventing the atomic bomb. Do you argue that nuclear war can do no harm?

To engage directly with the argument, almost no-one believes that the goodness of a movie is built into reality at some fundamental level, but almost everyone believes that 'Star Wars' is a better movie than some corrupt mpeg that encodes white noise.

Just because something is hard to formally define does not mean that it is not a useful concept that can aid in making predictions.

As a very wise man once said, to go from 'black and white are never absolute' to 'everything is grey and therefore the same' is to go from having only two colours to having only one, which does not improve one's ability to make predictions.

Expand full comment

I thought of a fun way of saying this: if you're a fast-takeoff "theist", who believes they have a logical argument for theism, I'm a fast-takeoff *agnostic* . It's not that I think I can prove the impossibility of fast takeoff (that'd be fast takeoff atheism) it's just that I think your argument doesn't hold (because you're hiding false assumptions about the world in your use of the word intelligence)

Expand full comment

Carl Pham

Man, talk about untestable speculation. This is right up there with Benedictines debating the nature of the Trinity.

Expand full comment

A potential difference here is that many people are predicting that an AI capable of annihilating humanity will be created at some point in this century. When the century passes with/without that appearing, we will have tested the hypothesis.

Expand full comment

Carl Pham

Well OK. I mean, I guess so. I'm a little baffled why it's an interesting subject to ponder though. Honestly, if someone built an AI that could think exactly like a human being tomorrow, and showed it to me, I would be interested in talking to it for a little while -- what's it like to be you? How 'bout them Dodgers? -- but I can already talk to several billion other minds, and it doesn't normally hold my interest with huge magnetic force. I wouldn't expect an AI to have anything more interesting to say. I'm lonely. I like fish. Someday I'd like to climb Kilimanjaro. I wonder if dogs think about us when we're not there? I have had several ideas on new laws to promote social justice. Whatever.

I mean, if someone made an argument that the next century would bring a warp drive, or a genome fix that extended human lifespan to 5000 years, or even just an algorithm that matched up marriage partners perfectly, guaranteed, every time your optimal match, that would be way more interesting.

Expand full comment

Michael Druggan

Jun 26, 2022

The idea is that one we get to human level we will very quickly get to superhuman level

Expand full comment

> Many stories of AI risk focus on how single-minded AIs are: how they can focus literally every action on the exact right course to achieve some predetermined goal. Such single-minded AIs are theoretically possible, and we’ll probably get them eventually. But before that, we might get AIs that have weakness of will, just like we do.

This seems confused. The stories of AI focus on single-mindedness because that is an inherent property of all computation. In this sense, AIs cannot suffer from weakness of will, and neither can humans. What appears to be "weakness of will" to you is just the existence of multiple goals. An AI cannot exhibit weakness of will unless you tell it to.

Expand full comment

John Lawrence Aspden