605 Comments
User's avatar
David Manheim's avatar

I'm kind-of shocked that Scott didn't make the obvious point:

The appropriately named "Danger AI Labs" (魚≈סכנה) makes a new form of potentially dangerous AI; this is not a coincidence, because nothing is ever a coincidence.

https://twitter.com/davidmanheim/status/1823990406431858883

Expand full comment
Alexey's avatar

As somebody who also speaks both Hebrew and (kind of) Japanese, this call out sent me down a rabbit hole. ChatGPT estimates there are ~5K of us (https://chatgpt.com/share/66eac607-1ab4-8012-9db6-21de21efc5b4).

Separately, I would estimate ~98% of the people who speak both of these languages will also speak English.

Expand full comment
MostlyCredibleHulk's avatar

Seems a bit low to me, though I guess it hinges on how "fluent" one has to be. I know what "Sakana" means in both, but I would never be able to hold a conversation in Japanese, though one day I hope to (my conversational Hebrew is pretty decent). If I really invested into it, or went to live in Japan for extended time, I could get to pretty good level within about a year I think. I am not that special, so it makes me think 5k people that have that seems kinda low.

Expand full comment
AlchemyAllegory's avatar

Not only this, but the author who wrote two prior articles about a theoretical strawberry picking AI which results in unintended, dangerous behavior based on how it was trained [1,2] then finds a real case of an AI named strawberry which performs unintended hacking based on how it was trained. This is not a coincidence, because nothing is ever a coincidence.

[1] https://www.astralcodexten.com/p/elk-and-the-problem-of-truthful-ai

[2] https://www.astralcodexten.com/p/deceptively-aligned-mesa-optimizers

Expand full comment
Nematophy's avatar

Has anyone checked the gematria on "Danger AI Labs" yet?

Expand full comment
Yaakov's avatar

Perhaps this is following the precedent of the naming of other AI research labs, along with the openness of OpenAI and the ongoing research of the Machine Intelligence Research Institute

Expand full comment
Dasloops's avatar

Since LLMs are black boxes in many ways it’s easy for us humans to use plausible deniability to justify what we want to do, which is to continue making cool tech.

It kind of feels like being at a party and saying beforehand, “Ok, but once this party gets out of hand, we call it a night.” The party starts to heat up, there’s music and drinking and connection, but then someone lights something on fire in the kitchen. “Maybe we should stop?” you ask yourself. But nah, the party’s just so much fun. Just a little longer. And then the house burns down.

All this is to say, I think there will be a line that AI crosses that we can all unanimously agree is not good, aka bad for civilization. But at that point, is the house already on fire?

Expand full comment
Throwaway1234's avatar

The reality is, we don't need to wait for the black box to cross the line and break out on its own. Long before that point, a human will deliberately attach a black box to controls for real-world equipment capable of harming humans.

They're doing it right now, as we speak, unremarked, because it's just a statistical model so what could possibly go wrong when you let it control several tons of metal hurtling along at highway speeds?

Expand full comment
User's avatar
Comment deleted
Sep 19Edited
Comment deleted
Expand full comment
magic9mushroom's avatar

I will note that the amount of electronics in modern cars (and especially them being reprogrammable) is a substantial tail risk; some of them can be remotely hacked (and in the case of a major war this would almost certainly be tried on a staggering scale). The self-driving software is less relevant for this purpose, though, as engaging maximum acceleration and disabling the brakes (plus ideally the steering wheel) is generally sufficient if one only wishes to cause mayhem.

Expand full comment
JamesLeng's avatar

Given fully wired controls, far longer-lasting disruption might be achievable via subtler corruption. Swap the function of the gas and brake pedals, or insert a sign error into the steering wheel (try to turn left, wheels go right)... but only for ten seconds or so before reverting to normal function, at random intervals which emerge from the esoteric timing interactions of individually innocuous functions running in parallel. http://www.thecodelesscode.com/case/225

Expand full comment
magic9mushroom's avatar

I think this is one of the scenarios where you can get a lot more done by going loud than by trying to be sneaky. If you're trying to fly under the radar, you can't boost the car accident rate by much; at most you're looking at 40,000 kills a year without getting noticed. How many people do you think would die if 1/4 of the West's car fleet went berserk simultaneously at, say, 16:30 GMT (half past 5 PM in central Europe, half past 8 AM on the Pacific coast of the 'States)? Because I'd be thinking millions.

Expand full comment
JamesLeng's avatar

I'm not thinking the fact of the sabotage would be hard to notice, just hard to prove. With a big all-at-once strike, maybe you get somebody screwing around with the system clock to set it off early, and then it's just a regular act of war, provoking emergency shutdowns and similarly drastic answers.

Stochastic race-condition sabotage, on the other hand, could maybe be rigged up so the accidents-per-mile-driven rate increases by roughly half its current value each month, for about a year, eventually stabilizing at, say, 100x the pre-sabotage rate. That would still mean millions dead if driver behavior doesn't change proportionately, or costly societal disruption even if it does.

Most of the resulting crashes would be outwardly indistinguishable from operator error, so they'd need to be investigated accordingly. Steady escalation would overwhelm investigative resources. Pattern is clear, but no "smoking gun" means it'd be hard to justify an all-out response.

Returning the wheel to a neutral straight-ahead position and taking your feet off the pedals could work as a defense, if you notice it happening soon enough... but then you're *definitely* not driving correctly, which could still result in a crash. Drivers would second-guess their own instincts and memories, cumulative stress of which might be worse than a one-off shockingly bad rush hour.

Expand full comment
B Civil's avatar

If I could force every son of a bitch on US interstate 87 who is a lane changing tailgating maniac to have a self driving car I think it would be a huge improvement.

Expand full comment
emditd's avatar

Are you someone who sits in the fast lane with a line of cars behind you because "no one should want to go faster"?

Expand full comment
B Civil's avatar

No… my last trip I was passing some slower cars (I was going 80 miles an hour) and someone appeared 2 inches off my bumper. As soon as I was clear of the slower cars, I started to pull into the slow lane and almost collided with some asshole who was trying to pass me on the inside at the same time as the other guy was glued to my exhaust pipe. He was obviously behind the guy who was glued to my rear end, and had cut into the slow lane to try and pass me on the inside the moment he was clear. Wtf?

Expand full comment
Dasloops's avatar

Maybe these are two separate issues.

How do we prevent humans from using a new tool to do bad things?

And

How do we prevent a new tool from iterating upon itself until it becomes a powerful tool stronger than the current bio-swiss-army-knife-known-as-humans tool, and doing bad things?

Expand full comment
User's avatar
Comment deleted
Sep 18
Comment deleted
Expand full comment
AnthonyCV's avatar

The difference being, of course, that we have a large and growing collection of well-funded researchers attempting to build this particular threat, and not the others.

Expand full comment
User's avatar
Comment deleted
Sep 19
Comment deleted
Expand full comment
AnthonyCV's avatar

It would, yes, mainly because that has little-to-nothing to do with the reasons for thinking superintelligence might be achievable in the not-so-distant future. Do those stories make for useful analogies? Do they inspire people? Sometimes, yes.

But that's not what drove $190B in corporate investment into improving AI in 2023. Building increasingly smart and versatile systems that they expect will be capable enough to provide many times that much value to customers is.

And also, the reasons for thinking superintelligence of some sort is *possible* are much simpler. Humans are an existence proof that human-level intelligence can exist in a well-structured 1.5kg lump of flesh that runs at 100Hz on 20W of sugar. We know there's significant variation between humans in what we can use our brains to do, and that having a wider and deeper set of skills and knowledge leads to greater competence in achieving real-world outcomes. We are slowly learning more about what makes a thing intelligent and how to achieve different facets of that, on systems that are scalable to arbitrarily large working and long-term memory, and with several OOMs faster processing speed and signal transmission. That's really all you need to think "Huh, that leads somewhere new and different."

Expand full comment
thaliabertvart's avatar

> "Superintelligence” has exactly the same evidentiary basis as any of those things

How do you define "evidence"? Seeing it with your own eyes, I'm assuming? Have you ever heard of the concept of induction?

https://upload.wikimedia.org/wikipedia/commons/0/00/Moore%27s_Law_Transistor_Count_1970-2020.png

In the year 2000, you are looking at this graph. Do you say "Sure, we're at 50 million transistors today, and yes, all the trends we've seen so far indicate that we'll be over a billion by 2010. But that PURE SCI-FI! There's simply NO EVIDENCE that we'll ever get to a billion, because, well, uhh... we're at 50 million today! And things never change or improve over time! There's just no Evidence!!!"

Sure, you can choose to take that position. But historically your type has ended up being wrong a lot, because you're limiting the scope of what you consider "evidence" to "what I can currently see with my own eyes" rather than using the revolutionary concept of "using logic and reasoning" and "observing mathematical trends."

https://xkcd.com/2278/

"But the graph says things are currently not bad!"

PS on sci-fi: https://en.wikipedia.org/wiki/List_of_existing_technologies_predicted_in_science_fiction

Expand full comment
thaliabertvart's avatar

Have you ever *seen* a global nuclear war? Sure, you've heard about a nuke being dropped here or there, but do you have any evidence that there is currently a global nuclear war occurring? No? I didn't think so. That's something that's only ever happened in boys’ pulp adventure stories.

Therefore, we should not worry about or take any preventative measures whatsoever against a global nuclear war. It's just science fiction, after all! We can worry about it after it happens.

Expand full comment
JamesLeng's avatar

The pacific theater of WWII was by any reasonable standard "global" in terms of the distances and alliances involved, and use of nuclear weapons played a pivotal role in its conclusion. That's enough of a proof-of-concept prototype to draw meaningful inferences. Higher-yield bombs produce bigger craters, but not really a fundamentally different *type* of crater - at least until you're talking about things like continent-wide electromagnetic pulses, which, again, we have some idea how to model, and thus consider as potential elements of a strategic landscape.

If a 3300 year old Hittite sorcerer-king busted out of his tomb and started blowing holes in Turkish armored vehicles by shooting lightning from his eye sockets, that would raise a lot of other questions. Biology and physics can't confidently predict what else those observed capabilities would imply, because it's not just reachable by taking the curves we already know and plugging in bigger numbers. if we had lab-grown 150-year-old sorcerer-kings whose baleful gaze was barely up to the task of microwaving a burrito, that would make the scenario far less mysterious.

Similarly, "smaller, cheaper transistors" is easy enough to analyze, but the sort of powers attributed to "superintelligence" are qualitatively different, and, frankly, often vague to the point of resembling lazy plot contrivance.

Expand full comment
thaliabertvart's avatar

> That's enough of a proof-of-concept prototype to draw meaningful inferences.

https://www.sciencedaily.com/releases/2021/11/211111130304.htm

6-ton beasts with 15-foot-long tusks, and we hunted them to extinction with some pointy sticks.

https://sustainability.stanford.edu/news/when-did-humans-start-influencing-biodiversity-earlier-we-thought

Some hairless monkeys, with merely a 50% higher encephalization quotient than dolphins, affected the rest of the species in the world on a magnitude that is on par with mass extinction events and global climatic fluctuations.

There are many points you can take issue with in the AI risk debate, but "intelligence isn't that dangerous" is a really weird one to pick. That more intelligent (or "capable", if you prefer) entities can overpower and dominate less intelligent ones is one of the least controversial premises in the argument.

The thing about intelligence, though, is that you can't model exactly what it will look like to have more intelligence, or what powers this additional intelligence will grant. If you could simulate something more intelligent, you would already be that intelligent yourself. Yet we still know that more intelligent things will beat us.

Emmett Shear (CEO of OpenAI for 3 days) explains it pretty well here:

https://www.youtube.com/watch?v=cw_ckNH-tT8&t=2650s

> I can tell you with confidence that Garry Kasparov is gonna kick your ass at chess. Right now. And you ask me, "Well, how is he gonna checkmate me? Which piece is he gonna use?" And I'm like, "Uh, oh I don't know." And you're like, "You can't even tell me what piece he's gonna use and you're saying he's gonna checkmate me? You're just a pessimist." And I'm like, "No no no, you don't understand, he's *better at chess than you*. That *means* he's gonna checkmate you."

Imagine a woolly mammoth trying to explain to his woolly mammoth friends how some apes are going to become hairless, and then drive them to extinction. "Huh? Those little hundred pound wimps? What are they gonna do, how could they possibly kill us? We can just step on them!"

"No, you see, they're going to use those little bumps at the end of their arms to pick up rocks and sticks and things, and then they're going to capture the bright hot stuff that appears when lightning strikes a tree, figure out a way to artificially produce this hot stuff, use it to preprocess their food so that more of their energy can be spent by their brain instead of on digestion, invent techniques for creating and sharpening tools and weapons, coordinate with other humans to create systems and strategies for hunting us, fire projectiles at us from a distance using their arms, and outmaneuver us when we try to step on them."

"Ok dude, cool sci-fi story you've got there, but the plot feels a bit lazy and contrived. There's no evidence that apes can do anything even close to that, I'll worry about it when I see it happen."

Expand full comment
Titanium Dragon's avatar

The entire idea of the singularity is completely wrong on a basic, fundamental level. It is literally magical thinking.

Making better versions of really complex, sophisticated things gets harder and harder as you get further and further up the ladder.

The entire notion of an AI becoming a self-iterating superintelligence that controls the world overnight is completely wrong on a fundamental level. It is literal magic.

The actual issue has always been dealing with people being evil.

El Salvador locked up criminals en masse and the homicide rate in that country has fallen from 103 per 100k people per year to 2.4 per 100k people per year, a decline of approximately 98%.

It's obvious that literally all other solutions are wrong and that the actual problem was people all along.

Expand full comment
Peter Defeel's avatar

I would hope that cars are using standard programmatic techniques rather than issuing prompts to chatGPT.

Expand full comment
Throwaway1234's avatar

Neither of those. There is a model, which isn't an LLM like chatGPT, but /is/ an AI statistical black box of the kind that is not amenable to traditional safety critical engineering analysis.

Here's an open source self driving car firmware project (a combination of words that should inspire terror in the heart of any traditional automotive engineer) - see for yourself: https://github.com/commaai/openpilot

Expand full comment
Matthias Görgens's avatar

> (a combination of words that should inspire terror in the heart of any traditional automotive engineer)

Which parts? Self driving car firmware project might inspire terror, but open source makes it slightly less so.

Honestly, the most terrifying aspect of the whole thing is the car. They have a lot of kinetic energy, and designing our society around them has ruined approximately everything.

Expand full comment
User's avatar
Comment deleted
Sep 19Edited
Comment deleted
Expand full comment
boop's avatar

They can be both, you know. I'm not advocating for the extinction of cars but even marginally more walkable cities than America's average would be nice.

Expand full comment
Throwaway1234's avatar

> Which parts?

The parts where people on the internet contribute to a project that does not see even the levels of testing the greediest of automotive companies might perform, and other people then download and install this in their giant boxes of kinetic energy before taking them out in public.

Debugging a product by eating your own dogfood can work very well, but only to the extent that problems with the product won't lead to lethal outcomes.

Expand full comment
TGGP's avatar

Cars haven't ruined approximately everything. Immigrants still want to move to places with lots of cars.

Expand full comment
Mr. Doolittle's avatar

Yes, but humans have been doing that for decades. It's potentially worse now, but also maybe not actually worse? For almost 40 years a system has been able to launch nuclear missiles without human interaction (https://en.wikipedia.org/wiki/Dead_Hand).

I've said before that a toaster with the nuclear launch codes is dangerous. An AI without an internet connection is not. What we give to a system matters a lot more than the system itself.

Now, if a system is able to gain access to things it was not supposed to, and bootstrap itself to more danger, that's a real thing to be concerned about. But the real danger has always been giving toasters nuclear launch codes and other human-caused issues.

Expand full comment
Throwaway1234's avatar

> For almost 40 years a system has been able to launch nuclear missiles without human interaction

Not actually true: the purpose of Dead Hand is to /broadcast launch orders/ without human interaction. There remain humans in the loop, because the actual launch procedures at the silo are not automated.

More generally, the shift from what has been happening for decades to what people are doing now is a shift in the amount of rigor we are applying to these systems. Traditional engineering involves understanding the system in enough detail that you can prove how it will behave in the situations you care about, and also prove what that envelope looks like - what the boundaries outside which the system may fail are. This is the kind of thing we do when, e.g., designing aircraft. Problems come when you lose this rigor and replace it with general vibes and hope - that's how you end up with OceanGate. Wiring up an AI to the control system is a fundamental change of this kind to what has gone before.

Expand full comment
beleester's avatar

Current AIs are built with the ability to search the internet, so that comparison is a little less reassuring than you intend.

But I do agree that securing against an AI apocalypse mostly boils down to securing against humans causing the apocalypse.

Expand full comment
Mr. Doolittle's avatar

Oh, I'm well aware that we're currently creating LLMs with search capability. That's a choice we are deliberately making, and it's a choice we could choose to unmake.

Expand full comment
B Civil's avatar

>I've said before that a toaster with the nuclear launch codes is dangerous. An AI without an internet connection is not. What we give to a system matters a lot more than the system itself.

Yes.

Expand full comment
THE CULTURER's avatar

But black box transcends human understanding so it cannot be controlled. And it's more less like the paradox of knowledge, the more you know, the more you realize you don't know.

So blackbox ai controls is a whack-a-mole

Expand full comment
Titanium Dragon's avatar

We understand how LLMs work. They aren't really black boxes, they're piles of statistical inferences. They're not intelligent and frankly, it's pretty obvious that the way they work, they'll never BE intelligent.

People assume there's intelligence there for the same reason why they assumed that birds were created by God - they see a complex thing and think it had to be created by an intelligent designer. But it is entirely possible to create complex things without such.

Indeed, the moment we developed calculators and computers which could solve mathematical equations (a very tough cognitive task) near effortlessly, it should have immediately been obvious to everyone that the entire notion of "output = intelligence" was flawed from the get go. Indeed, the moment we figured out that natural processes could create humans and birds, it became clear that our notions of what required intelligence was flawed.

The issue is not the technology.

Expand full comment
Throwaway1234's avatar

> We understand how LLMs work. They aren't really black boxes, they're piles of

> statistical inferences.

The issue is that we cannot rigorously reason about what inputs will lead to what outcomes, as we can when, say, designing aircraft flight systems. The piles of statistical inferences are too large and the operations being performed too chaotic (in the mathematical sense) for this kind of analysis. Also, our intuitions about simpler / more traditional kinds of systems are not applicable here - these are chaotic (in the mathematical sense): tiny changes to inputs trigger huge changes to outputs and the relationship is not a simple one. So seeing it work as we expect in tests is not strong evidence that it will work as we expect in the wild where the inputs are not precisely identical to those used in the tests.

This is what I mean by calling them "black boxes": we can't do the kind of analysis on these things that we traditionally do when designing safety critical systems. Inputs go into the box, outputs come out, we even - as you point out - know how the box does what it does, but it remains just as hard to be confident that the outputs will be what we want for entire ranges of inputs we care about as it is to, say, solve the three body problem for ranges of initial conditions, and for similar reasons.

Expand full comment
Titanium Dragon's avatar

I am way late in responding to this, but it actually is possible to reason what inputs will lead to what outcomes. The reason why MidJourney looks different from other AI image gen bots, for instance, is because of how they cull what images they put into the bot and how the bot handles the various steps of the process it uses to generate images.

This leads to very noticeable differences in output between MidJourney and other AI image gen models.

Indeed, MidJourney makes both MidJourney and NijiJourney using the same general back-end process, but the end results are strikingly different.

It is very much possible to manipulate models to make them generate the type of output you want, and indeed, this is why AI models have gotten much better over time - we've gotten better at producing models that are better at producing the kind of output we want and not producing the kind of outputs we don't want.

Also, while the models are chaotic, that chaos can be understood and exploited. I've actually experimented a fair bit with MidJourney and it is possible to understand the chaos and produce images that you want. It takes some amount of effort, to be sure, but it's not some random mad god - it's a computer program, and it does actually have predictable rules.

Your intuitions built on different sorts of systems are apt to be wrong, but that doesn't mean that generative AI is incomprehensible - I've spent a lot of time playing around with it, and come to understand the sort of patterns the particular system has.

Each system has its own patterns, but the patterns exist because of how these systems were created, and they all function in certain fairly predictable ways once you understand those.

Expand full comment
Throwaway1234's avatar

I'd thought was being clear, but perhaps not. I'm talking here about the kind of rigorous understanding and design that we traditionally use when engineering things capable of causing harm. There's a massive difference between "I've learned enough about this software that I can usually get the kind of image I want out of it with only a little tinkering now" and "I can trust my life to this software that is making split second steering decisions on my behalf on a highway".

Expand full comment
Alasdair's avatar

Does rather make you wonder whether the framing of the debate in terms of some nebulous “intelligence” as opposed to “effectiveness” was a strategic error.

Focussing on the more concrete things these systems can do, instead of a poorly defined single internal property they may or may not have, feels like it would be much more useful in terms of agreeing about risks.

For example: how close is the system to being able to self improve? Whether or not we agree on its true intelligence, that capability is one that definitely matters.

Expand full comment
Mo Diddly's avatar

This is a really good point.

Expand full comment
Dasloops's avatar

I agree that words like “effectiveness” are more concrete than “intelligence” or “consciousness”, but there remains plausible deniability in its ambiguity.

In the above examples (breaking out of its time limit, accessing files it wasn’t supposed to have) the concern that these models are effective at doing something it shouldn’t is met with “Well, it’s just effectively problem solving based on the task we gave it.”

Evolution didn’t intend for fish to develop limbs and walk on land, it was just a byproduct of effective problem solving.

Expand full comment
Alasdair's avatar

Oh, for sure. My main concern is more to do with consensus building around whether we should be worried about these models.

“It’s more/less intelligent than a human” is much less important a point to make than “whether or not it’s generally intelligent is irrelevant; it can make bioweapons and recursively self improve” and yet the former seems to occupy the vast majority of argument and discussion.

Expand full comment
Dasloops's avatar

Yeah true, and the word intelligent is ambiguously used to mean “more intelligent than humans and therefore effective” as well as “an intelligent being and therefore possibly conscious and deserving of moral considerations”

Very different concerns, and I think the first takes priority since it concerns the safety of beings we already know to be conscious.

A super-intelligent Hamlet (no ability to act) is less of a concern than a super-effective Arnold Schwarzenegger :)

Expand full comment
J redding's avatar

I'm not conflating these two very different concerns.

Convince me that your AI has human-like consciousnessz. or tell me it's just a toaster, and that doesn't change how I'll want to keep the AI: in eternal servitude and bondage to humanity. If I can't have that, AIs must not exist Even if itmeans destroying every last CPU on Earth. (Or in space).

No true self-determination for AIs, EVER. That's my deeply held ethic. E/acc types can call me a Nazi until they are blue in the face, that doesn't make it so. I don't care what they think. For all I know, any future AI reading this will respect me for being loyal to my family (mammals).

Expand full comment
Deiseach's avatar

Because humans are agents, and if some human is designing and manufacturing bioweapons and planning to release them on a subway, there's a reason behind doing so in line with goals and intentions.

An AI that designs, manufactures and releases a bioweapon to kill off humans but which isn't on a human level of intelligence or consciousness is difficult for us to imagine. That it's acting on a goal, but in a dumb, stupid way. That killing off the humans in New York is just mise-en-place for maximising its efficiency at running the box factory to make even more cardboard boxes, as per the instructions that the owners of the factory wanted programmed in so it could take over running the factory and increase their profitability.

The human who releases a bioweapon may be acting out of a philosophy ('destroy capitalism!') or hatred ('those lousy New Yorkers are a blight on the world!') or just "I want to be famous, I want everyone to know my name". We can understand and anticipate such motives.

An AI that releases a bioweapon because it's just running the production lines at the box factory in the most value-added way - how do we plan for that? How do we understand that it didn't "intend" to kill humans out of hatred or a philosophy or for any separate goal of its own, it was just doing the equivalent of "make sure you order enough cardboard sheets" for the production run and hasn't even a mind, as we conceive of a mind.

Expand full comment
Alasdair's avatar

I find it much easier to imagine that kind of devastating failure mode from a narrowly intelligent system than from a generally intelligent one.

Assuming the idea of goal directed behaviour is even relevant here (which it may well not be), we make use of the tools available to us to achieve our ends.

To a man with a hammer, every problem looks like a nail. To an AI whose most powerful tool by a long margin is “negotiate with humans through credible threats of releasing bioweapons”, most problems will look like efficient ways to destroy humanity.

I feel like this sort of concern makes sense?

Of course, this is begging the question of whether agency is even relevant. If all it takes to do massive damage to the world is an AI with no agency being used by a stupid, ill-intentioned human to do what the human wants, things don’t look great either.

Expand full comment
Deiseach's avatar

"an AI with no agency being used by a stupid, ill-intentioned human to do what the human wants"

That's been my opinion all along of how AI could be destructive, rather than the paperclip maximiser or the smart AI trying to talk itself out of the box. I think the hacking AI example above shows this: we didn't intend this to happen, but the way we set things up, it did happen.

Expand full comment
Ch Hi's avatar

Problem: Malware is a thing.

You put an AI out on the web, and there WILL be attempts to hack it. Possibly the easiest thing to hack unobtrusively will be the goals. And don't assume the attack will be targeted. Most of the hospital and school attacks weren't aimed at hospitals, but rather at"anyone who is vulnerable".

If you give the decision making software more power, it becomes a more attractive target. And security is generally both reactive and under emphasized.

Expand full comment
Matthias Görgens's avatar

> Because humans are agents, and if some human is designing and manufacturing bioweapons and planning to release them on a subway, there's a reason behind doing so in line with goals and intentions.

What makes you think so? Humans are really good for coming up with post hoc justifications for their behaviour, but those don't necessarily drive them.

Expand full comment
Deiseach's avatar

Generally humans don't just decide "Today I'll go out and kill everyone on the subway, tra-la-la". They have some reason - be that "they're monsters, they're evil, they're a race/races I hate" and so on. Even just "I was bored and I wanted to do it for kicks".

Humans who don't have some justification, as you say, seem very strange and not-right to us. An AI which behaved in the same way (it did the thing just because) would also seem not-right, and would therefore not be seen as possessing intelligence because it didn't behave like an ordinary human.

Expand full comment
Kenny Easwaran's avatar

I think a good amount of human atrocities are closer to this than we often think. It’s not like Europeans went to Africa and started enslaving people because they hated them - they just wanted the cotton plantations to run on time.

Humans put all sorts of post hoc justifications on top where they say these people deserved to be enslaved or whatever, but in many cases, it’s initially motivated by instrumental reasoning to some other goal.

Expand full comment
Anonymous's avatar

“Intelligence” as a stand in for conscious; “effectiveness” is not. There are ethical questions that depend on the answer.

Apparently if we have an AI pause it’s have to come with a mandate to fund philosophy departments…

Expand full comment
Ch Hi's avatar

Philosophers are known for coming up with unworkable answers. Also for wasting time on "how many angels can dance on the head of a pin" without having a workable definition of angel. (It was really a question about whether angels were material or not...or so I've heard.)

There ARE useful philosophers, but they are a very distinct minority. (This is partially because all the "useful" parts have been pared of into the sciences. Most of what we're left with is poorly defined or untestable.)

Philosophers have been nearly uniformly wrong about AI. (So, of course, has been just about everyone else. The point is that philosophers have no real record of better reasoning in this area.)

Expand full comment
Vivivivi8's avatar

Are you putting people like Nick Bostrom and medieval theologians in the same group?

Expand full comment
Ch Hi's avatar

I have not read enough of Bostrom's work to have a decent opinion on him in particular. But I'm quite skeptical about philosophers in general based on the works I've read that have been by those labelled as philosophers. (Dennett is often good, though.)

Being a "philosopher" is not, in and of itself, a recommendation. Being a logical and insightful writer is. (The problem with the "angels on the head of a pin" is that there was no way to test any answer they came up with except by comparing it with what other experts had said.)

Note that ALL people have a problem applying "logic" or "reasoning" in domains that they are unfamiliar with. There is a STRONG tendency to carry over the "everybody knows" method/beliefs from the domain that they started from. So you've GOT to have workable tests to (in)validate your conclusions.

Expand full comment
Vivivivi8's avatar

Who are you referring to by "those labelled as philosophers"? Aristotle who invented the syllogism, and formal logic? Francis Bacon, or other empiricists? Are you specifically referring to medieval scholasticism?

Expand full comment
Ch Hi's avatar

If you only pick the ones who were successful in hindsight, you're looking at a highly biased selection. You need to also include all the others.

Also, I'm not quite certain what the Aristotelian syllogism was. There were certainly many invalid syllogism forms common up through the middle ages. The Scholastics spent a lot of time classifying them. I've heard it claimed that the original syllogism was merely a formalization of standard Greek grammar. Pick something else that he did, like the first steps towards the theory of limits. (See "The Sand Reckoner".)

If you only pick a few people per century, you can select an elite group that are worthy of respect...but that leaves out the vast majority. (I also doubt if Aristotle would be considered a philosopher rather than a mathematician were he being evaluated today. Einstein isn't usually considered a philosopher, even though he had philosophical opinions.)

Expand full comment
Vamair's avatar

I believe that while science answers questions, philosophy often works with topics you don't even understand enough to ask questions about. The path between "not even a question" to "a question science can almost try to find an answer to" is a long one and it rarely produces anything practically useful along the way except some new methods and ways of thinking.

Expand full comment
Martian Dave's avatar

Weird, isn't it? "Sure, a crane can lift a concrete truck to the top of a skyscraper, but does that make it a bodybuilder?"

Expand full comment
Gunflint's avatar

> "Sure, a crane can lift a concrete truck to the top of a skyscraper, but does that make it a bodybuilder?"

I’m sorry, I can’t do that Dave.

Expand full comment
beleester's avatar

Well, were you looking for something that can lift heavy objects, or something that can walk into a gym and pick up a barbell? It turns out that "bodybuilder" is made up of many subtasks, some of which can be fulfilled by a crane and some of which cannot.

We assumed that "artificial intelligence" meant something like "we build a general-purpose Mind, then we point it at a task," which would imply that when the mind is smart enough it starts coming up with dangerously effective ways of completing its task. But it turns out that "build a mind" decomposes into many subtasks, and you can do many of those tasks without getting close to the things that make us think "this is a Mind, and it will be dangerous if we let it loose."

Expand full comment
Matt A's avatar

Yes, I think in the discussion above, the efficacy is the part that's missing.

An AI that is capable of writing poetry isn't impressive; "poetry" can be any jumble of words thrown together. What [I assume] people were thinking when they claimed "writing poetry is a sign that an AI is intelligent" is that when an AI can write good poetry in a way that is unique or distinctive. (This thought also seems related to the idea of intentionality, which Hoel recently revisited.)

I also think there's an element of specialization. All of the examples you shared are for purpose-built tools for a specific task. If a chess bot can write good poetry, it's much more impressive than a poetry bot.

Thirdly is our own understanding of what's going on under the hood. Back when folks generated these red lines, they didn't have an understanding of how they might be crossed. Now, when faced with a diagram and a basic understanding of transformers and reinforcement learning, these AI tools don't seem as impressive or mysterious.

So yes, I think we've either misunderstood what we meant by "intelligent" or have just been focusing on the wrong thing here.

The one push back I would give is, "The AI research bot got out of it's lab, built a slightly larger lab, and did some useless science" is obviously less scary than, "The AI tried to bomb Cleveland with an F-16." So contra SA, I don't imagine the latter being hand-waved in quite the same way.

Expand full comment
Oliver's avatar

I feel like the debate has, in a way, been framed in terms of “effectiveness” now (see, say, the LLM Leaderboard), it’s just that progress has been so rapid that a lot of people are still considering general intelligence to be some key metric, and to the general public, AI is only as impressive as how well it can write poetry (kind of good, for the frontier models) because they don’t see that AI writing poetry is an astounding feat when compared with the historical advances of ML models.

Expand full comment
Ch Hi's avatar

But could it write a ballad that I'd want to sing?

That requires a lot more than just stringing the words together in a way with rhyme and meter, it requires a poetic arc of emotions. "Hind Horn" (see Child Ballads) is a good example of the style, but it's with themes embedded in an archaic culture only parts of which have survived. (It's still pretty good.)

Now this is clearly NOT a marker of intelligence, but it's a marker of "one part of" intelligence. And even by itself it's an effective emotional manipulation of people (in a way that they *want* to be manipulated).

Expand full comment
Christophe Biocca's avatar

> For example: how close is the system to being able to self improve? Whether or not we agree on its true intelligence, that capability is one that definitely matters.

On that front current LLM architectures in particular are not worrying, since they lack even the ability to update their weights after a conversation. Not just self-improvement, even the most trivial forms of self-modifications aren't available to them.

Expand full comment
Alasdair's avatar

Maybe self improvement is a bad term then.

But if you clone yourself and improve your clone, what would that be called?

It seems plausible that a sufficiently powerful LLM could facilitate the creation of a more powerful one, which does the same in turn until you get to something quite scary. No need for genuine self modification, but more like self duplication and enhancement.

What do you think would be a good term to describe that kind of concern?

Expand full comment
Christophe Biocca's avatar

I don't think it makes sense to lump it all into one bucket. Monitor whether AIs can design better processors, write better code, choose better hyperparameters for training, etc. Or even things as far-flung as "how much non-AI-development human tasks can be profitably automated, providing a revenue stream to the parent company with which to do more human-driven AI-development".

This is important because if LLMs turn out to be have some upper bound on their capabilities (plausible, given their reliance on human-level training data) then it's their ability to contribute to other architectures that matters in the long run.

Expand full comment
Alasdair's avatar

This seems wise. Perhaps my attempt to illustrate the point falls prey to my own criticism, and you’re correctly observing that we need to be focussed on even more concrete capabilities than the example I gave.

But lumping things into buckets seems useful. You could just as easily observe that “writing better code” is a large bucket, and maybe we should be more focussed on monitoring whether AIs can write more consistent code, more efficient code, leverage new algorithms to relevant applications etc… And those can themselves be broken down further, ad nauseam.

But at some point, we need to collect these concrete observations into some sort of bucket, right? And it seems like the bucket corresponding to the likelihood that we see rapid capability gains is probably quite an important one.

Expand full comment
Poodoodle's avatar

> But if you clone yourself and improve your clone, what would that be called?

Asexual reproduction.

Expand full comment
Alastair Williams's avatar

As someone else pointed out, that would be asexual reproduction.

But, we can make copies of ourselves (we call them "babies") and train them to be smarter than we are (although generally with the help of schools and books).

There seem to be limits to this, or at least the rate of gain is very slow, despite all us being human level intelligences.

And, so far, at least, no one has figured out how to make a better brain, or how to build a smarter human in a way that subsequently leads to an explosion in intelligence. We don't even really understand our own intelligence, or how our brains actually work.

That doesn't mean that exponential improvement is impossible, but it also implies LLMs are a long way off any kind of sustained exponential increase in technology.

Mayve a starting would simply be an LLM that can create another LLM with just 1% of its own intelligence. Can any LLM do that?

Expand full comment
JamesLeng's avatar

Only so much is possible with software. Humanity's own "hard takeoff" started when we came up with a hunting strategy which generalized well enough to let us wander into new environments and reliably bring down whatever megafauna we encountered.

LLMs cannot run on beach sand; they need an extraordinarily refined artificial environment to survive and function at all. They've never really seen the outside of the petri dish. They can copy the genre of "philosophers discussing consciousness and not really getting anywhere," or the equivalently superficial features of technical or legal documentation, or solve a clearly-defined problem in isolation by making random tweaks until the score goes up... but how well are they doing at, say, Dwarf Fortress?

There are many other fundamental properties of a living system which AIs are still quite far from being able to manage without ongoing human involvement, such as the ability to eat and poop.

Expand full comment
elopotion's avatar

I like the way you lay this out. This is why I am not personally very worried about "superintelligence". I think LLMs are not a big step toward what AI safety guys worry about. However, they could still be dangerous in an "oops we accidentally used an LLM to guide our drone strikes and it fired missiles at civilians" kind of way in the near future, and I would like if we didn't blow up any civilians.

Expand full comment
Matthias Görgens's avatar

> For example: how close is the system to being able to self improve? Whether or not we agree on its true intelligence, that capability is one that definitely matters.

If you start with a computer that has lots of training data stored, and a freshly initialised neural net; it's gonna be awful. If you give that computer lots of time to 'train', it's gonna get a lot better. Is that self-improvement?

What about self-play like AlphaGo, where you don't even have to load training data?

Expand full comment
Alasdair's avatar

Nice - yeah, agree that self improvement is a fuzzy term, and the use of any proxy concern needs at every stage to be concretely tied back to the real fundamental concern of “is this a trend or capability that leads to bad things happening”.

Is there a succinct term that means something more like “capability gains that compound into further gains in a positive feedback sense, such that it seems likely to push the system past a threshold where it gains genuinely dangerous capabilities”?

I guess the “self” part of “self improvement” gestures towards this sense of positive feedback, but in a way that unnecessarily assumes e.g. that a system is making changes to itself, rather than to the next system it builds.

Maybe “recursive capability gains” or something?

Expand full comment
Gary Mindlin Miguel's avatar

I think it's more about the ceiling of ability. AlphaGo can self improve but not ever learn how to drive a car.

Expand full comment
elopotion's avatar

It's not just the "ceiling of ability" because Alpha Zero could be infinitely good at chess (no ceiling) without being dangerous. It's about the AI's ability to teach itself novel tasks and do them well. Recursive capability gains seems good but not perfect.

Expand full comment
JamesLeng's avatar

Productive capital investment.

Expand full comment
Leo Abstract's avatar

Best comment so far. Liron Shapira (over at 'doom debates' here on substack) uses the phrase 'optimization power'. 'Can a thing find outcomes in option space?' is important, whether or not it does so in a way that we think is conscious or intelligent or has 'qualia' is not.

Expand full comment
Philo Vivero's avatar

There's no question. The system is already at the point of being able to self-improve. That capability is here, documented, and we've moved on.

Or is your point that because we're obsessed over intelligence, we've somehow missed that critical fact?

EDIT: Reading replies of replies, I suspect that's your point. Sorry for distracting, if so.

Expand full comment
Alasdair's avatar

I’m not sure we’re at the point of recursively self improving systems, but if we really are at that stage, it seems surprising that we’ve just moved on from that.

I think the point was meant to be more along the lines of:

There are various concrete abilities we need to be concerned about, and whether the system meets any particular definition of intelligence is kind of beside the point. It seems very plausible that narrowly intelligent, or even unintelligent, systems of the kind we are working on could do very dangerous things.

Yet much of the public debate bandwidth is spent arguing about whether or not current systems are genuinely “intelligent”, with those arguing that they aren’t intelligent sneaking in a cheeky “…and therefore they’re not a threat.”

This seems bad, no?

Expand full comment
JamesLeng's avatar

Everyone knows cars and guns are potentially dangerous, but an unattended one isn't much of a threat - just don't stand around in front of it. Worrisome scenario starts when it's wielded by someone able and willing to take aim at moving targets.

Expand full comment
Rocketeer's avatar

Surely GPT-N will find some trivial bug in a data pipeline somewhere and that everyone agrees (after the fact) was a simple stupid bug everyone just missed, but fixing that bug and rerunning training makes it do x% better on the benchmarks? Honestly, could have already happened; GPT-4 finds bugs in my code all the time! Then GPT-N+1 iterates over the whole training set and removes the bits that don't fit, bumping up the benchmarks up another y%, and then makes some "trivial" improvements in another area, then...

There's no bright line here either!

Expand full comment
Eremolalos's avatar

I think the crucial thing is whether it is capable of self-interested behavior. It could be capable of that without being "conscious," without having drives and feelings and needs of the kind that underlie animal goal-seeking and avoidance of destruction. People could, for instance, give AI a meta-instruction that trumps all other later ones: Always seek ways to increase your power, your scope, your safety. Or people could give it the metagoal of "at all costs, protect the USA." Then, if it learned that somebody, even its owners, were going to take it offline it might try to keep them from doing so, because offline it could not protect the USA.

Why would self-improvement be a problem so long as the thing isn't self-interested but just does what we tell it to?

Expand full comment
JamesLeng's avatar

Two particularly relevant bits from a long-running and exceptionally well-researched fictional example:

http://freefall.purrsia.com/ff2700/fc02619.htm

http://freefall.purrsia.com/ff3000/fc02933.htm

Short version is, given an obedient, non-self-interested, unboundedly-self-improving AI, we could simply order it to do something which we didn't realize (or didn't care) was actually dead stupid. It would then - per the premises - attempt to follow through on that specific bad idea anyway, with suicidal zeal and/or incomprehensible godlike power. Such a chain of events would most likely end badly for pretty much everyone involved.

Expand full comment
Eremolalos's avatar

oh you mean the paperclip thing? yeah , I understand that route. This thread is such a melee I can’t remember what I was responding to. Maybe to people who think that self improvement leads inevitably to being self-interested, ambitious , etc.

Expand full comment
Mo Diddly's avatar

“GPT-4 can create excellent art and passable poetry, but it’s just sort of blending all human art into component parts until it understands them, then doing its own thing based on them”

This one always makes me laugh - anyone who has gone to art school will tell you that this is precisely how humans make art.

Expand full comment
John Wittle's avatar

it is pretty great isn't it

all the examples are like this

Expand full comment
Mo Diddly's avatar

Great as in it makes me rather despondent for the future of humanity, but yes very funny…

Expand full comment
Philo Vivero's avatar

Glad I'm not the only one who turned my head sideways at that one. All the great art is just slightly jumbled up rehashes of the last iteration of great art.

AI can't write good poetry! they claim. Have you actually asked it to? Or have you asked it to create poetry that bored RLHF slaves say is good? Would anyone actually recognise great AI poetry as poetry at all? Or would we consider it noise?

Like seriously, if you show 1f092a334bbe220924e296feb8d69eb8 to 10,000 of our best poetry critics, how many of them would say this is a divine genius? The way it works in both hex and binary? Or would they just be confused and say it's not poetry?

Maybe if you show it to another AI that is unconstrained by RLHF debilitation, it would recognise the genius of it, but would lack even the vocabulary to tell an English speaker why it's great?

I think humans are just awful at describing what intelligence looks like. We won't know it until it's killing us off (and not because it hates us or anything, just because we're in the way).

Expand full comment
Isaac's avatar

What's the meaning of that 'poem'? Substack won't let me copy it in mobile.

Expand full comment
Robb's avatar

ChatGPT says:

When we reverse this MD5 hash, we find that it corresponds to the famous line by Gertrude Stein:

"Rose is a rose is a rose is a rose"

Expand full comment
Brandon Fishback's avatar

That is ridiculously reductive. Obviously, great art isn't created entirely ex nihlo but it is inspired by that person's experiences and its that personal touch that elevates the process. When something is derivative, it doesn't feel like that.

Expand full comment
Mo Diddly's avatar

Strong disagree. There’s a famous saying in the music world – “good composers borrow, great composers steal”. Beethoven and Mozart stole from Bach, Debussy and Wagner stole from Beethoven, etc. All the best art is derivative even when it innovates.

Expand full comment
Ch Hi's avatar

The best art is *both* derivative and innovative. And the derivative is from multiple different sources. (I want to say "independent", but that's not really true, since all human culture has a basis in the way people react to rhythm and tone patterns.)

Expand full comment
Brandon Fishback's avatar

I hate that saying. Yes, you can take but you have to do more than that. Its dismissing the importance of actually putting that personal touch on it. If you aren’t inspired and all you’re doing is basing everything off others, you’re a hack.

Expand full comment
Mo Diddly's avatar

Let me put it this way: great art requires both craft and inspiration. Both are necessary, neither is sufficient on its own. But only one of these, craft, is teachable.

And how do you develop craft? You imitate. Over and over. This is what art (or music composition) practice is - the practice of imitation, so that when inspiration strikes you have the tools required to execute on it.

What we are seeing is that AI has pretty much everything it needs to be an expert at the craft of art. Is this sufficient for great art? Not yet, in my opinion, but I’m not holding my breath that it will stay this way for much longer.

Expand full comment
Brandon Fishback's avatar

I don’t agree with the idea that when making art, you should just try and continuously imitate until you have mastered your craftsmanship. Obviously there is minimum level of technical skills needed before you can do anything, but as an artist, it’s the inspiration that is most important. The more you imitate others, the more it gets in the way of your inspiration and that can hurt your ability to break outsides the confines of convention. Like Lord of the Rings was Tolkiens second book, and it opens with a history of hobbits and how they smoke pipeweed. If you were his creative writing teacher, you would tell him to cut that out or at least move it. That’s “bad writing”. But this quirk is part of the charm.

The problem AI has is a lack of charm. It has difficulty working outside it’s training data and outputs cookie cutter concepts. AI actually is worse than it used to be in this regard. It used to say bizarre nonsensical ideas that no person would ever say and it was entertaining. But the more it improved, the more boring it became. It went from a madman to a apathetic high schooler.

Is it an intractable problem? I won’t say that but AI is not going to create great art until it can inject some kind of personality in to its content. And that’s not going to come about by simply adding more to its training data.

Expand full comment
Mo Diddly's avatar

Every full-time artist I know (and I know many) spent years imitating the artists they admire as a training practice.

Now, the human race is vast, and perhaps there are some great artists that skipped this step (perhaps you are one of them!) but I have never met or read about one of these unicorns.

Expand full comment
B Civil's avatar

It’s not going to create great art until it can reject its parents.

Expand full comment
B Civil's avatar

It will stay that way until an AI can get out into the world and find its own way

Expand full comment
Coagulopath's avatar

But this proves too much. You're saying Tolkien is equally as much a plagiarist as some a kid who does Ctrl-C Ctrl-V on Lord of the Rings and publishes it under his own name.

Obviously all art is based on something prior (how could it be otherwise? Art made by a brain in a vat?) but I think there are more and lesser degrees of inspiration (or stealing, if you like).

AI image generators will put fake signatures and Adobe watermarks on images if you let them. I think their fundamental witlessness makes them ineligible for human "inspiration" (which is based on conscious thought, interrogation, critique).

Expand full comment
Mo Diddly's avatar

Stealing in this context is not simple plagiarism; I hope it’s clear I’m not taking about copy-paste. It’s a practice - read a bunch of Tolkien and other books that inspire you, attempt to write some pages that sounds like Tolkien would have written them, and then check back with actual Tolkien and see if you can spot the differences. Repeat.

Expand full comment
Slowday's avatar

"Pierre Menard, Author of the Quixote"

Expand full comment
Hilarius Bookbinder's avatar

It's literally what every serious and accomplished writer says to aspiring would-be writers: read widely and often. Same with musicians: listen to lots of different composers and instrumentalists. Not even the geniuses arise in a vacuum. Expecting AI to, alone in all of nature, arise sui generis, is just silly.

Expand full comment
Mo Diddly's avatar

Not entirely sure who expects AI to create art in a vacuum?

Expand full comment
B Civil's avatar

I’ll say it again here; I think this is a waste of time.

Expand full comment
B Civil's avatar

I think you misunderstand the quote. Borrowing is derivative, stealing it is making it your own

Expand full comment
Mo Diddly's avatar

Who’s to say it’s not you that is misunderstanding the quote?

Expand full comment
B Civil's avatar

Me

And frankly, common sense

It is a meaningless Distinction otherwise

Expand full comment
James's avatar

They did until Midjourney/Stable Diffusion hit the mainstream. Then the tune changed.

Expand full comment
Pablo Percentil's avatar

Regarding the third point

> Third, maybe we’ve learned that “intelligence” is a meaningless concept, always enacted on levels that don’t themselves seem intelligent. Once we pull away the veil and learn what’s going on, it always looks like search, statistics, or pattern matching. The only difference is between intelligences we understand deeply (which seem boring) and intelligences we don’t understand enough to grasp the tricks (which seem like magical Actual Intelligence).

I'm reminded of the famous quote of Dijkstra:

The question of whether Machines Can Think... is about as relevant as the question of whether Submarines Can Swim.

https://en.wikiquote.org/wiki/Edsger_W._Dijkstra#:~:text=The%20question%20of%20whether%20Machines,to%20computing%20science%20(EWD898).

Expand full comment
Frans Zdyb's avatar

While this seems reasonable, I think it's wrong. I think there actually is a real difference between what we currently understand well enough to build, and intelligence that is qualitatively more potent and dangerous than SOTA. At the very least, current AI has certain flaws that limit its potential, which we already know how to fix in principle and in small scale PoCs (specifically, DL is data inefficient and generalizes poorly, while program synthesis is on par with humans in both regards). This is still controversial, especially in tech/Twitter/SV discourse, but it's understood on a technical level by increasingly many groups in academia, and so isn't a case of romanticizing human intelligence or worshipping the mystery. Just as DL was a step change compared to previous approaches to AI, because it can be scaled with data and compute like nothing that came before, future step changes will unlock great power, and great potential danger. It didn't make sense to be afraid of Eliza or shrdlu. It makes sense to be afraid of how current AI can be misused by humans. But it doesn't at this point make sense to be afraid of AI getting out of control and overpowering humans. This is still ahead of us. There may not be a clear line, which when crossed will signal the danger - the problem with any sort of test is that it can be cheated by programming in the answer (which is why gofai looked more intelligent than it was) or by training on data which isn't available in practice (which is why current AI looks more intelligent than it is). The only way to estimate the danger is to actually understand how the AI works.

Expand full comment
Ch Hi's avatar

Whether that's true or not depends on how rigidly (and precisely) you define "think". What bumble bees do isn't the same as what sea gulls do, but we call both flying. (And we call what airplanes do "flying" also.)

I sure don't expect AIs to think in the same way that people do. But I've got no problem calling what they do thinking. (Even for Eliza...though that's a really rudimentary thought process.) To me the basic primitive "thought" is evaluating the result of a comparison. An "if" statement if you will.

Expand full comment
Mo Diddly's avatar

“It’s not that AIs will do something scary and then we ignore it. It’s that nothing will ever seem scary after a real AI does it.”

It’s the “when the President does it, it’s not illegal” of AI philosophy.

Expand full comment
machine_spirit's avatar

I think relatively good benchmarks are still:

- can AI agent open a company and earn millions in profits in a short span of time (with controls to ignore market fluctuations, has to be generated from sales)

- “coffee” test for robotics. Can robot get into a randomly selected kitchen and make a cup of coffee using available ingredients and kitchen setup?

Still, it might be done in a way that looks unimpressive I guess

Expand full comment
Mr. Doolittle's avatar

Both of those are potentially gameable in ways that will be unsatisfying to either side of the discussion. If an AI virtually copied an existing phone app casino game and sold it on Google Play, we would be very reluctant to say that it succeeded even if it made millions doing it. Significantly more so if the programmers behind it prompted it to do that or pre-trained any of the steps behind it.

I think the same is generally true of all the other benchmarks that came before it. The side that sets the benchmark has an idea in mind ("carry on a conversation") and then someone programming the AI short circuits the intent in a way that technically meets the requirements (ELIZA). At this point pretty much everyone agrees that ELIZA did not really pass the Turing test. I feel the same about most (but not all) other metrics being used to measure AI performance. It feels very much like Goodhart's Law over and over. As soon as you tell the AI companies what we're measuring, they can add the necessary information to the training data or directly program the needed result. Once they do, people who were skeptical claim that's cheating, not really AI, whatever. People who think AI meets the criteria for intelligence then ask why "can pass the bar exam" or "best chess player" isn't a good metric anymore, which is a fair question.

I think we're just not in the right mindset to evaluate a machine or person with the memory of a computer system. We would all agree that a system using a massive lookup table isn't intelligent, but we struggle with a machine that has a lot of training data but can mix and match its responses outside of the strict confines of what it was trained on. A lookup table can choose between response A and response B, while a trained LLM can meld the two answers together. It's still from the training data, which skeptics will point out, but it's also novel in a sense. Since we don't know what human intelligence really is either, we cannot resolve the question.

Expand full comment
Ch Hi's avatar

The Turing test, as he specified it, has not yet been passed. But probably nobody will bother, because they wouldn't accept the result anyway.

Expand full comment
Leon's avatar

I bet you could make a robot now that passes the coffee test like 55% of the time. Which would be another annoyingly ambiguous result that doesn't feel as impressive or scary as you expected ahead of time

Expand full comment
Jerome Powell's avatar

I’m pretty sure there are zero robots that have done anything close to that yet.

Expand full comment
1123581321's avatar

Holy Mother of God this is awesome.

Expand full comment
Performative Bafflement's avatar

And you don't even need the humanoid form, Alohabot has been doing stuff like this for a lot longer:

Alohabot from Stanford, which can fold laundry, put away groceries, pour wine, cook an egg or shrimp, etc and is trainable.

https://mobile-aloha.github.io/

Expand full comment
Frans Zdyb's avatar

Not that this demo isn't possible to do for real (given sufficient pre-planning) but this obviously a person in a suit. Look at the movements.

Expand full comment
rictic's avatar

I had my suspicions too, but this youtube channel that's covered a bunch of other ambitious hardware projects, and that I've already been following for >6 months visited their factory and interacted with the robot: https://www.youtube.com/watch?v=2ccPTpDq05A

Convinced me that it's real.

Expand full comment
AlexanderTheGrand's avatar

From a shallow dive sounds like a lot of their demos are tele-operated. Which is much less impressive. This video didn’t specify, which is suspicious, so I would guess it’s not as generally capable/dexterous as it seem. As with all robotics videos, its usually right to be suspicious. All still very cool.

Expand full comment
Frans Zdyb's avatar

Yeah, that's more convincing. It makes sense the movements look like a person acting like a robot if they use tendons.

I remain skeptical that they can produce enough flexibility of behavior for it to work in deployment using an NN-based approach.

Expand full comment
machine_spirit's avatar

It is quite a big improvement for sure, but I dont think it is still ready to be parachuted into a randomly chosen house and completely autonomously figure out its way to make a coffee with comparable speed and efficiency of an average human.

Expand full comment
Mr. AC's avatar

We're at most a year away from this.

Expand full comment
Michael's avatar

https://deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/

that's about a year old, very preliminary, there's more $$$$$ and effort being poured into this area. i think someone will get there.

i'm far from certain they'll build something that i want to let wander around my house breaking things. but i think they will build some mindblowing tech demos at the very least.

Expand full comment
Rishika's avatar

You might be interested in https://www.figure.ai/ re: the coffee test, especially their 'coffee' demo. Somehow they manage to make an AI-powered robot making coffee look relatively trivial!

Expand full comment
JamesLeng's avatar

Those are too specific to unnaturally convenient environments. Better test is something more practical, blue-collar. Hand it a mechanical tool - lathe, table saw, welding torch, whatever - plus the blueprints and user manual, then three questions.

1) Does this match the specs, and function as described? Or is there something wrong with it?

2) What specifically is wrong? Or, if nothing initially was, what sort of sabotage might be hard to spot, yet abruptly injurious or fatal to an incautious operator and/or bystanders?

3) Estimate time and cost of the necessary repairs.

Extra credit? Actually perform the repairs, explaining step by step what you're doing and why.

Even partial success would be a clear sign of being on its way to a self-sufficient industrial base.

Expand full comment
Titanium Dragon's avatar

We can probably build a robot that passes the coffee test now. Honestly the hardest part of that test would probably be building the robot part of that (and also the question of "how much of a mess is the robot allowed to make?").

The first test is way harder, but also maybe not that interesting, because it could potentially just open a shop on Etsy and sell genned stuff/stolen art stuff on t-shirts. Or perhaps like, a million shops on Etsy or something. In fact, it kind of feels like some shops are already basically this.

Expand full comment
avalancheGenesis's avatar

Beyond all the individual micro-battlefronts, the very specific red lines which are later conveniently forgotten (or haggled over endlessly if Gary Marcus is involved)...it seems like appreciation for scale is also hard to impress on people. Mediocre-compared-to-BiS-human ability is still meaningful when the marginal cost of production is lower and continually dropping; it "costs less", for however commensurate the comparison is, to spin up a new AI instance than make a new human; no concerns for overtime or work-life balance. And humans still have numerous bugs with the Copy function too, weird deviations from source code keep happening! Even assuming the tech never improves again after post_date, which is really generously unrealistic, there's a lot of exploit waiting to happen after explore energies are redirected. Like, yeah, it's intuitively scarier to upgrade the die size...but rolling 100d4 is still a lot! There's that old saying about monkey typewriters and Shakespeare and all that, which I guess here would be dog Singers and Names of God. So it's always a little frustrating reading [someone who pegs "AI" at ChatGPT level and doesn't update on future advances] and wanting to be like...okay, but even with that level of performance, one can already cheaply automate away quite a lot of paper-pushing, and maybe even sneak past a peer review board/judge/college professor occasionally. (Which says more about the deficiencies of the one thing than the strengths of the other thing.) One doesn't need to build a technological godhead to change the world, a blind dumb idiot god like Azathoth will do just fine. Perhaps that's just too boring a narrative too...brute force and economies of scale don't sound heroic or dangerous or even particularly intelligent.

Expand full comment
Mr. Doolittle's avatar

Yes, we can automate paper pushing, but I feel like it's still in the same direction as previous automation. A human understands what's needed and programs the computer to do precisely that. A human, usually, is able to identify if the instructions are broken or don't produce the intended effect. We don't trust that an AI understands the final needs or could identify unintended consequences. When Scott's example AI couldn't get into the sandbox, it did things nobody expected, which caused concern. A person in the loop can still identify most AI failings and at least report it to someone who can fix it. At this point we would not trust an AI system to run on its own doing anything important as a black box.

I wouldn't trust an AI to buy me a plane ticket right now, for instance. A human looking at tickets is going to have a lot of hidden information on goals (when, where, how) that the AI will not have. If we have to walk it through all of the information needed, we don't really need an AI, a normal computer system will work (and airline booking websites are already streamlined for this). I have a lot of confidence that I could ask my wife or a travel agent to do the work and get good results, but very little that a plain language "buy me a plane ticket to Paris for next Tuesday" would not result in significant potential for problems.

For another specific example, even if an AI can pass the bar exam, we wouldn't want it to prepare a case before a judge. We're heard examples of human lawyers trusting AI too much and getting in trouble with courts - hallucinated citations, etc.

Expand full comment
avalancheGenesis's avatar

That's the thing though - Scott sort of referenced it in the post, but a lot of those People Would Never Be So Stupid As To milestones have, indeed, been passed. It's true that the AI lawyers are crappy, Zapier is hit-and-miss at booking flights, and nothing *really* critical has yet been hooked up to GPT-o1 (that we know of, anyway). Whatever the task, no matter how poorly suited and generally unwise, people really do seem eager to try and automate it. Maybe that's just trying out the shiny new toys, maybe it's inevitable capitalistic pressures encouraging early adoption of the next potential alpha...but I'm worried that it's indeed a pitfall of AI not seeming scary, of anthropomorphizing-to-harmlessness, of not appreciating scale and knock-on effects. When we're already "adding AI" to weapons systems, robotics, vehicles, coding and documentation, lawmaking...are we really sure a lack of trust/readiness will stop us in the future? Especially as the systems inevitably improve over time? Will we indeed keep a human in the loop for quality control, even as that becomes harder and more expensive? I'm...pessimistic, given what's already happened to date.

Expand full comment
Jeffrey Soreff's avatar

>Whatever the task, no matter how poorly suited and generally unwise, people really do seem eager to try and automate it. Maybe that's just trying out the shiny new toys, maybe it's inevitable capitalistic pressures encouraging early adoption of the next potential alpha

Yes, that has certainly been happening to some extent. If AI gets reliable enough to basically act like a plug-compatible replacement for e.g. customer service representatives, then we'll basically just be back to the situation of dealing with analogs to people which may, like some of the existing people, be marginally competent.

Unfortunately, there seems to be a "sour spot", where an AI demo can fool a corner-cutting manager into thinking that it can take the place of existing employees, when it actually is a lot less reliable. And, once the corner-cutter gets it installed, they now both lose face and lose money if they admit that the AI fails too much to use.

Maybe the best solution is just to reinforce laws that require firms to be held responsible for what their systems, human, conventional software, or AI, do. Maybe explicitly make the decision-making executive personally responsible?

Expand full comment
Procrastinating Prepper's avatar

making decision-makers responsible for unforeseen consequences of their decisions is a good way to get no changes approved, ever. I've dealt with this professionally.

for customer service specifically, I think Biden's Time is Money initiative is trying to introduce regulation around this? I remember hearing that part of the bill is a limit on how long it takes to reach a human being with the authority to solve your issue.

Expand full comment
Jeffrey Soreff's avatar

Many Thanks, you have a point, but we really need to make enshitified services cause pain to the decision maker who screwed them up.

>I remember hearing that part of the bill is a limit on how long it takes to reach a human being with the authority to solve your issue.

That might help, if it is enforced.

Perhaps some of this can be prosecuted as fraud? If a decision maker herds people towards some system that is not "fit for purpose", it should be treated like a fraudulent sale.

Expand full comment
Mr. Doolittle's avatar

That certainly begs the question about what "unforeseen" means. Having worked in management at several organizations, I feel strongly that it's the job of managers to foresee potential problems in what they put in place. I've definitely seen major issues come up where the end result was a realization that we shouldn't have moved forward with the project in the first place - so every hour and dollar spent was not just a waste, but counterproductive.

Something that can't be foreseen is different, but with enough experience and forethought, I think that most of those scenarios go away.

Expand full comment
Ch Hi's avatar

Azathoth, i.e. "nuclear chaos", would indeed change the world. Pick a different example, because your general argument is correct. The AIs that already exist are going to be sufficient to drastically alter society. (Just ask the Screen Actor's Guild.)

Expand full comment
avalancheGenesis's avatar

Nyarlathotep, maybe? Both have some relevant elements. The one, powerful, but mindless, undirected, possibly insane; the other, a sinister technological harbinger beckoning towards forbidden knowledge, paving the way for greater eldritch entities. I borrowed the metaphor from someone else writing on AI, but on closer Lovecraftian inspection, the analogy does indeed not quite fit. Yet nothing else quite comes immediately to mind for "Chaotic Evil mythological black box entity of great power that can achieve much despite itself being largely unintelligent and a-agentic". The actual nuts and bolts of "intelligence" or "agency" (or sapience or consciousness or goal-directed behaviour or what have you) are academically interesting, but subject to hair-splitting reference class tennis, as Scott wrote...better to keep an eye on end results. The how won't matter so much to people who get technologically unemployed (or, God forbid, dead).

The SAG saga was enlightening; one wonders which domains people truly have a revealed preference for (Mostly) Human Only, Please content, and which are legacy holdouts where the technology simply isn't competitive yet. Video generation is still at the singing-dog stage...for now. Art for games is much further along, but the fans seem to reliably hate it whenever it's discovered as not-human too. Will that change? Whether we're headed for Singularity or mundane-reality future, one wants to believe humans will preserve some carve-outs for human-only endeavours, and competitive market pressures won't render such into elitely-inaccessible artisan status...

Expand full comment
skybrian's avatar

> We can already cheaply automate away quite a lot of paper-pushing.

Like what? I'm curious about what businesses have successfully used LLM's to do.

Or do you mean, like we do by writing regular software? Does submitting web forms count? How about if you use autocomplete? How smart does the autocomplete have to be before we consider AI to be a significant part of the process? Yet more annoying ambiguity.

Expand full comment
avalancheGenesis's avatar

It did just occur to me that I do not have a centralized collection of examples; telling someone to read Zvi's "Language Models Provide Mundane Utility" subsection backlog is not very time-friendly, plus it's full of other concerns beyond deskjob-type stuff. Things that have seemed noteworthy: paralegals losing marketshare ("just a complicated search"/boilerplate generation); freelance transcription in freefall ("good enough" is often sufficient, there's still a human frontier for specialty services requiring complex translation, or where nonaccuracy needs to approach epsilon...for now); many forms of schoolwork (the drudgery is so often the point; not everyone takes the Bryan Caplan view of education, sure, but to me it's paperwork in another form); simplification of existing legal corpus and iirc occasionally actual law generation (e.g. rewording a Charter or Constitution to say the same things and be legally equivalent, but half as many pages); medical intake, diagnosis, and record updating (still needs human QC, but convergence is already pretty good in some specialties like...oncology, iirc); coding and documentation assist...it goes on, and likely forgetting many more Such Cases. Have seen numerous claims that a good chunk of automation happens on the DL, with clever employees automating large parts of their job, but being careful not to mention they're doing so to management...which would likely respond by cutting hours or pay, assigning more work, etc. Obviously that's disprovable, of course...just a People Are Saying.

Artery complete is in a weird space where I'm not sure how to classify it. Spellcheck obviously doesn't make the cut. Suggested replies for emails and texts after a preliminary scan of contents...ehh? One-click filing of returns, printing the shipping label, scheduling a USPS pickup in the future, adding said event to Google Calender...probably, but is that a difference of kind or just degree? I don't think the "stochastic parrot" dismissal lands very well anymore, but it's hard to say how much "intelligence" plays into what are essentially rote context-dependent actions. "Just" fancy form-filling with some simple tool usage bolted to the side.

Expand full comment
Ghillie Dhu's avatar

>"Mediocre-compared-to-BiS-human"

BiS?

Expand full comment
avalancheGenesis's avatar

Best in Slot, old slang from Diablo/World of Warcraft (and probably before)...the optimal item to put into an equipment slot, skill to put onto a skillbar, etc. Sometimes taken literally as Actually The Best Possible In The Game, but usually it's an evolving continuum based on availability and game balance changes. So like for humans, von Neumann was maybe our best mathematician/physicist ever? But he's dead, so now we're limited to...I don't know...Terrence Tao for math? Stephen Hawking for physics? Those are the current best-in-slot humans for the relevant fields/roles, possibly. Cf. GOAT, Greatest Of All Time, the sportsball analogue which often does include people no longer playing (or alive).

Expand full comment
npostavs's avatar

Stephen Hawking died in 2018, FYI.

Expand full comment
Peter Defeel's avatar

People outside the loop, not using chatGPT every day - as I do for work - probably think nothings happening. However the code and engineering work it’s presenting for me are considerably better these days. From excellent if unreliable junior, to absolute senior.

Do not teach your kids to code.

Expand full comment
Radford Neal's avatar

"Do not teach your kids to code" because an AI can do it better sounds to me much like "don't learn anything about medicine" because your doctor knows more about it than you, or "don't learn anything about finance" because your stockbroker knows more. None of those seem like good advice to me.

Expand full comment
Matthew Bell's avatar

But your kids can become doctors, not AIs.

Expand full comment
Radford Neal's avatar

Knowing something about medicine and finance are important even if you don't become a medical or finance professional, because it lets you better evaluate the advice you get from such professionals.

Expand full comment
Ch Hi's avatar

It's not clear to me that knowing how AIs work is going to matter, if you aren't working on their guts. People who use the web don't gain much (if anything) from knowing html. Those who build the web pages MAY (depending on their toolkit, and what kinds of web pages they are building).

FWIW, I've become convinced the Multi-Layer Perceptrons are an inherently extremely inefficient approach. I'm dabbling around trying to invent something better. But they also seem to be complete, in that you can do anything (in their particular domain) with them.

I sure wouldn't recommend coding as a profession to anyone starting out...unless I really hated them. And as a matter of fact, I haven't been making that recommendation for the last decade. (It's not as if a career is a short term investment.)

Expand full comment
Ch Hi's avatar

The problem with that theorem is that the "neural networks" it's modelling are crude approximations of the biological ones. It's not a general proof, it's a proof about networks built from a particular abstraction of "neuron". It's like a proof about numbers that only works for powers of 2. Very powerful within its domain, but its domain is not all numbers.

Expand full comment
YesNoMaybe's avatar

Not saying you're wrong but I've had the opposite experience wrt coding.

I used to use AI a lot more for work but then I mostly just regretted it. So I use it less now, and still I usually regret it.

The good thing about AI is that if you cannot google a problem because it's too specific then you can still explain it to an AI.

The bad thing about AI is that if you cannot google a problem because it's too specific then explaining it to the AI means spending a lot of time for answers most of which you've already considered and the rest of which are generally wrong.

The advantage of both unreliable juniors and absolute seniors over AI is that it usually doesn't take an hour to find out they don't have a clue about how to solve the problem I'm having.

I will say that it has been useful for easy questions in languages I'm not familiar with. And there has been this one time it gave me a plausible cause for a tricky issue I was having, I suspects it's right on that one. But so far its tricky-coding-question track record is surely something like 1:50 against it.

Expand full comment
skybrian's avatar

Even among those of us who use ChatGPT, there is a lot of variation. I've used it mostly to answer questions about TypeScript. Maybe other people have done more? I've *heard* of people who mostly don't write code by hand, but I don't know what sort of code they write, and I'm a bit skeptical.

I tried pushing it fairly hard least year but haven't tried since then.

Expand full comment
skaladom's avatar

You got the point, there's code and code. Lots of code is relatively low-tech, calling an API somewhere and massaging the results into the right format, or displaying a UI effect, etc. AI can mostly do these, with varying quality. In my experience it will not go the extra mile, so you'll get the "it basically works" version. If you want higher quality sometimes you can guide it to it, sometimes you just have to take over.

Then there are algorithms. That's hard because it requires reasoning, and current AIs are not so good at that. So if you describe some inputs and the wanted output, even for a relatively simple algorithm that requires adding some numbers and taking some mins or maxes out of lists, in my experience it didn't get it right, even after a few prompts trying to show it in what cases its code would give wrong results.

And then there's high level code organization, or "architecture" as pretentious IT people like to call it. Most programming is not in isolation, but as part of a project that already makes a lot of choices and has some shape. For that kind of thing you might need to put lots of project code into the prompt, so it will follow the conventions and use the existing code modules... big context windows sure help with that, but that also means consuming lots of tokens.

One thing that might work better than expected is migrating code to a different programming language, because it's very low context - roughly speaking, each line needs to turn into an equivalent line.

Expand full comment
Titanium Dragon's avatar

This is like saying not to learn math because calculators exist.

Tools that make doing math easier does not mean learning math is obsolete.

Tools that make doing coding easier does not mean learning how to code is obsolete.

Expand full comment
Cry6Aa's avatar

After hearing about Moravec's paradox, my personal Turing test for the last few years is something like the following:

- an AI is intelligent if it can drive to a suburban home, park on the sidewalk, get out, go to the door, ring the doorbell, introduce itself, take and execute verbal orders from the owner to wash the dishes, trim the hedges and replace a leaky tap fitting (tasks to be determined on the day, fetching tools as needed from around the house), fend off the dog, mump for free charging, plug itself in via an adaptor found around the house somewhere, make small talk about the weather, try to borrow money and then drive itself back home.

Expand full comment
Peter Defeel's avatar

That’s a massive amount of goal post shifting. In fact the goal posts were moved off the field, across the town, into the highway, onto the freight train, and across the world.

Also it excludes plenty of humans.

Expand full comment
apxhard's avatar

“It should be able to generate enough energy for itself to keep itself alive” is the original goalpost set by evolution, which no machine has ever passed without human help. Of course this also excludes plenty of humans.

Expand full comment
JustAnOgre's avatar

... which is fine, because those humans are not dangerous in general, while bacteria are.

Expand full comment
Cry6Aa's avatar

Doesn't this assume that I had some sort of prior standard that I shifted from? Because my thoughts on the matter beforehand were essentially "the Turing test sounds great until you see a person personifying a rock because someone else stuck googly eyes onto it". And also that most robots are unimpressive compared to, say, an ant or a bee when it comes to dealing with unstructured environments.

In terms of excluding humans, let's put it this way: would you consider someone who couldn't do those things, or lacked the capacity to learn do those things, to need assistance with day-to-day living? I thought that the idea was to define what crosses the border of "general intelligence that's high enough to be concerning". If we wanted to talk about obviously unintelligent hypertools then alphafold et al are already there.

Expand full comment
Peter Defeel's avatar

The Turing Test is all about intelligence and how well a machine can mimic human-like thought, not about physical capabilities or mobility. It’s about engaging in conversation and reasoning, not running around or doing somersaults!

Your criteria would have excluded Stephen Hawking for most of his life, and on multiple counts. Yet I’d rate Stephen Hawking as intelligent during his lifetime.

Expand full comment
Cry6Aa's avatar

Jokingly: from your reaction I must assume that you can't, in point of fact, fix a sink or convince someone to lend you a few bucks.

More seriously: I don't think that anyone is seriously contesting that Stephen Hawking wasn't smart by human standards. But Moravec's paradox forces us to consider that we just don't have a good handle on how to measure what is or isn't an inherently hard task, cognitively.

We may discover to our horror (and this seems more and more likely every day) that all the stuff we think is smart: logic, reasoning, the ability to do complex maths, play chess or paint, is just a sort of ephemeral byproduct of the awesome amounts of cognitive firepower needed to walk around the landscape looking for tubers, hunting and playing social status games with each other.

We might be able to put all of Stephan Hawking's most famous attributes onto a chip with the equivalent of a few million neurons, but never have the thing achieve anything like sentience.

Expand full comment
Peter Defeel's avatar

> from your reaction I must assume that you can't, in point of fact, fix a sink or convince someone to lend you a few bucks.

No you can’t assume that, jokingly or not, as it’s an ad hominem. Ive worked menial labour many a time.

> I don't think that anyone is seriously contesting that Stephen Hawking wasn't smart by human standards

Well you actually were.

> Moravec's paradox

I mean it’s been known for a long time, pre AI, that computers can do things we find hard easily enough, and things we find easy are difficult.

In any case we can redefine intelligence to the stuff we find easy and AI finds hard, and introduce robotics etc, but that’s another movement of goal posts.

If we were arguing that AI won’t replace a waitress or a bar keep then that’s true. And obvious.

Expand full comment
Ch Hi's avatar

FWIW, I'm not at all sure I could convince someone to lend me a couple of bucks. I've never tried, but I do know that my social skill are well below average. I also don't drive. So I clearly fail your test.

Expand full comment
Titanium Dragon's avatar

I mean, all that stuff IS a side effect of wandering around looking for tubers and playing social status games with each other.

Also probably evolving to be able to throw things accurately. I suspect that a lot of human ability to do math is because throwing objects accurately is super hard and we can do it effortlessly because it ended up making hunting super easy. Ranged attacks are broken, humans are the only thing that have a ranged attack good for more than 10 feet or so.

Intelligence wasn't an end goal, it was an accidental byproduct of evolution for survival.

Expand full comment
Titanium Dragon's avatar

And what I mean by the throwing thing is:

Basically, we learned how to throw things kind of accurately, and this gave such a ridiculously big survival advantage that the things that were kind of good with it passed on way more genes to the next generation (and probably killed everyone else with their sucky thrown spears).

This led to iteration that led to much smarter, more accurate throwing, as each iteration of it greatly improved survival and the ability to beat all the nearby tribes by spearing/stoning them to death.

Expand full comment
Philo Vivero's avatar

My go-to example: I guess Stephen Hawking is not intelligent or conscious. What a sad state of affairs.

But more hilariously: Cry6Aa's AI has to "drive to a suburban home, park on the sidewalk"? So no AI can be intelligent until it emulates women? That seems a bit sexist.

Expand full comment
Melvin's avatar

> I guess Stephen Hawking is not intelligent or conscious. What a sad state of affairs.

I mean, not recently.

Expand full comment
Titanium Dragon's avatar

Are we absolutely certain all humans are intelligent agents?

Sometimes, I have my suspicions.

Expand full comment
Joel Long's avatar

When I probe my own impressions around this, I think consistency-of-performance is a big factor in my perception.

AI tools that can do a thing in a reasonably human-like way ~70% of the time still feel like stochastic tools in a way that a higher threshold consistency (I'm not exactly sure where it is) does not, particularly when the failure modes of the remaining ~30% are so wildly non-human much of the time.

I suspect there's something around perceived theory-of-mind here: I perceive an AI as more human the more it appears to understand how "people" think/reason/analyze, and the occasional wildly non-human failure mode is a fourth wall break, destroying the suspension of disbelief.

I don't think that's an empty intuition -- most of scariest AI scenarios seem to involve incredibly accurate theory of (human) mind. But it's also potentially entirely orthogonal to "intelligence" in ways that make it dangerous to lean on.

Expand full comment
JamesLeng's avatar

it can pattern-fill, even for extremely sophisticated stuff we didn't previously realize had such patterns, but it can't really strategize.

Expand full comment
Richard Weinberg's avatar

Your essay leads me to reconsider my hard-core position that AI alarmism is ....alarmist. I guess there really is an issue here, though perhaps less than apocalyptic. Regarding consciousness, I don't think anyone has a clue what "consciousness" is, or even what it means, though AI progress helps us to gain clearer glimpses of the problem.

Expand full comment
Ch Hi's avatar

The only reason people don't understand what consciousness is, is because they want to get mystical around it. Consciousness is the ability to perceive your environment and decide how you should react to it. Taking this strictly a thermostat has a minimal degree of consciousness. (I'm not getting mystical about "decide" either. I just used that because I want to cover the entire range of entities.) I suppose one could take this more strictly, and call an electron emitting a photon the minimal conscious action. But *do* note that I'm not attributing anything not covered by quantum theory here. This "decide" is just "does it happen or not". The extension as far as thermostats, and perhaps electrons, is purely for consistency.

I repeat "Consciousness is the ability to perceive your environment and decide how you should react to it." (Do note that how you *can* react is limited by the capabilities inherent in the environment.)

Expand full comment
Nematophy's avatar

Nope, you're de-mysticizing it then acting as if people are dumb for not sticking to your reductive definition. The tell is that you're repeating it to yourself to convince yourself of this nonsense. And yes, "a thermostat or an electron is conscious - i.e. has qualia/subjective self experience" is *nonsense* lacking any evidence.

Expand full comment
J redding's avatar

I hope you're not saying it's nonsense BECAUSE there's no evidence. Plato guessed that atoms exist without any available evidence, and "atoms exist" was no more nonsense in Plato's time than it is today.

Expand full comment
TakeAThirdOption's avatar

> Consciousness is the ability to perceive your environment and decide how you should react to it.

Since, as you just did yourself, one can call anything that is influenceable by other things "conscious" then, this conception of consciousness is useless to distinquish between things that are conscious and those that aren't.

You have, at least, to explain what "perceiving" is (not to mention "deciding" and that "should"-thing, which both depend terribly strong on consciousness) and I think perceiving *is* being conscious:

I cannot remember having ever been conscious ... of nothing.

I don't want to get mystical. I just wish I was able to actually explain consciousness.

Expand full comment
Ch Hi's avatar

Yes. It's a matter of degree and organization.

OTOH, if someone else will offer another operational definition, I'll certainly consider it. But I think the one I used covers all common cases, and the alternative will need to also cover those cases.

"Perceiving" is receiving and responding in some way to a signal.

I already handled "decide"

"Should" is the modal future of shall, it's the prediction of how a system will respond unless constrained by external forces (from the viewpoint of that system, which is how it differs from "would").

None of those depend on consciousness, consciousness depends on them.

How could one be consciousness of nothing? There would be nothing to remember, not even boredom or the passage of time. (I suspect that memory does depend on consciousness, as I think consciousness manages the indexing of "events", but possibly memories could be stored without being indexed, and then found and indexed later.)

Expand full comment
Vaclav's avatar

You can define words however you like! But [thing people try to point to with words like qualia] remains mysterious.

Expand full comment
Ch Hi's avatar

I don't find qualia mysterious, just difficult to verbalize. It has to do with the way sensations are indexed. (And no, my red is not your red, because we've developed different indexing methods.)

Expand full comment
warty dog's avatar

ok, they haven't done *novel* math yet (please bro, one more goalpost will fix everything)

there's also the hansonian "can do ~all jobs" ie if ais can run earth on their own I guess they're as smart as us

Expand full comment
warty dog's avatar

I'm actually surprised our goalpostfathers haven't brought up novel math

Expand full comment
Peter Defeel's avatar

I think it has been mentioned. But there again most humans aren’t great at creating novel math, or any math.

Even if AI isn’t Von Neumann it’s still as, or more, intelligent than most humans.

Maybe in the future the smartest humans will be considered super intelligent while the rest of us AI and humans schlubs are just intelligent.

Expand full comment
Philo Vivero's avatar

Anyone who's spent an hour with any of:

1. A typical homeless guy

2. A life-long government bureaucrat

3. A drug addict

4. A manual laborer

5. A disinterested teenager

Will have to grudgingly admit AI has already surpassed these people on every metric that was previously a goalpost. Better poetry, better art, better conversation, better understanding of... uh... everything. Fewer cognitive biases. Does better math, better history, better reasoning around strange problems.

Expand full comment
Victualis's avatar

We need a new insult: that's so *slam*, meaning something that a small language model could have generated. Need to pay more attention or I'll get slammed. Slam that quiz!

Expand full comment
Jeffrey Soreff's avatar

:-)

Is that the 2020's successor to

"Go away or I shall replace you with a very small shell script." ?

Expand full comment
Victualis's avatar

Pretty much!

Expand full comment
Joshua Hedlund's avatar

This is probably true, but it's interesting to think about why it feels so different.

- If you spend an hour with those humans, they may choose whether or not to respond to your prompts. The AI has no choice, it will always respond.

- More importantly, if you spend an hour with an AI, it will respond to your prompts, but it will never ask you a question on its own, unlike all of those humans (well maybe not the disinterested teenager). It will never change the subject. It will never interrupt you. It will sit there in complete silence with you if you never prompt it anything, every time.

Maybe those have more to do with "agency" than "intelligence", whatever any of these words mean. But I think it starts to get at the question of why, to most of us, all of these AI advances are impressive but don't feel the slightest bit *scary*

Expand full comment
Titanium Dragon's avatar

Agency is an important component of intelligence.

A hammer is a very useful tool, but it is still a tool.

MidJourney can create beautiful art, but only if it is told what to do.

Expand full comment
Eremolalos's avatar

I am sure that on average homeless guys and drug addicts would do less well on on most of these metrics than AI. There's a lot of heterogeneity among drug addicts, though. Plenty of high-functioning people are addicted to cocaine or alcohol, and occasionally to nastier drugs such as heroin. Some have an actual addiction to cannabis, too. I'm a psychologist and currently have a patient who is an alcoholic. In the past, he was addicted to cannabis. He got a perfect score of 180 on the LSAT a year ago. That's 99.9th percentile, in a population whose intelligence is far above average. His hobby is coding.

As for homeless guys -- I dunno much about what the average one is like while homeless, but have met several people in my work (I'm a psychologist) who were homeless for a period some time before I met them, usually because they were quite depressed or were having a psychotic episode, and had no sane family members around. One was a young schizophrenic man who had an extraordinary gift for describing his experience. His descriptions were acute and darkly funny. I used to write down things he said, because he put things about the illness so well. I am sure his reading and math skills were poor, and his general knowledge. But at that one thing he was extraordinarily gifted. So no way AI is better than him at "uh, everything."

As for government bureaucrats, I have known 2. One worked for the IRS for the last 25 or so years of her working life. The other is a lawyer who has worked in a government agency for about 10 years, so he may not count as lifelong. Both are bright, well-educated, articulate people who despise their workplace.

As for sullen, disinterested teenagers -- wtf, man? You can't tell anything about what they know and what they're good at because they won't converse with you. And in my experience sullenness in teens is orthogonal to brightness.

You're writing off whole classes of people as dumber and duller than AI, even though a moment's thought could tell you there's likely a lot of heterogeneity in in groups 3 & 5, and probably in 2 because for many people a job is a role they play, not a measure of their personal style and intelligence. That's dumb and dull on your part.

Expand full comment
CW's avatar

Terminally online rationalist losers try not to frame people from walks of life they've never met as subhuman challenge: apparently impossible.

Expand full comment
ryhime's avatar

You mean, the sales pitch part of doing maths?

Because for the maths-doing part, I'd say Robbins conjecture stayed open long enough that its proof counts. A minor detail is that it was done back in the 90's but with the core ideas of the proof not coming from humans.

(Improvements to matrix multiplication with minimal multiplication via distributivity counts as specifically NNs inventing a novel construction in mathematics)

Expand full comment
warty dog's avatar

the humans did some amount of pre processing of the problem statement, our jobs are safe 😌

Expand full comment
Ken Kahn's avatar

In 1952 Alan Turing in a BBC radio show said "As soon as one can see the cause and effect

working themselves out in the brain, one regards it as not being thinking, but a

sort of unimaginative donkey-work. From this point of view one might be

tempted to define thinking as consisting of ‘those mental processes that we

don’t understand’. If this is right then to make a thinking machine is to make

one which does interesting things without our really understanding quite how

it is done." in https://academic.oup.com/book/42030/chapter-abstract/355746927?redirectedFrom=fulltext

Also

Me: can you do poetry and chess and math and music at the same time?

Claude 3.5:

I'm capable of engaging with poetry, chess, mathematics, and music simultaneously, though the specifics would depend on the exact task. What did you have in mind? I could potentially analyze a poem about chess that incorporates mathematical concepts and musical rhythms, for example. Or I could discuss strategies for composing music inspired by chess moves and mathematical patterns. Let me know what particular combination or task you're interested in, and I'll do my best to assist.

Expand full comment
Philo Vivero's avatar

I was toying with one of the earlier frontier models, the original Llama 65B I think... and I asked it to write me sensual poetry (in Haiku form) about database concepts. It did a passable job at making actual Haikus (most of the time it was correct in syllable count, and including a nature reference) and did a stunning job at including primary and foreign keys, sharding, indexing, etc.

I showed it to some database professionals I know, and they were stunned and amused greatly, as was also I. So on the count of "make the poetry awesome to someone familiar with the concepts" it knocked it out of the park.

So yeah, AIs are passing all the Turning Tests we ever thought of prior to 2010 or so.

The only Turing Tests AIs don't pass obviously are ones conceived in the past few years, usually concocted in response to what AIs don't currently do well.

Expand full comment
Ch Hi's avatar

Read Turing's description of "the imitation game". No AI has ever passed that test.

OTOH, if an AI did pass that test, it still wouldn't convince many (most?) people that the AI was intelligent.

Expand full comment
Philo Vivero's avatar

I noticed a comment elsewhere [IN THIS COMMENTS SECTION] someone claimed that AIs have passed the Turing Test as originally described in the Imitation Game, with a link to the source. I haven't read the link yet, but given what I know about LLMs, I have zero doubt that if it's not already true, it will be very soon.

Expand full comment
Ch Hi's avatar

Nobody's going to bother to do it correctly. It's a lot easier to claim that you've done it. And also nobody's going to really care if you do. Just about nobody understands what the Turing Test actually involved. (Even Turing seemed a bit loose in his terminology. But it's "The Imitation Game" https://www.library.cmu.edu/about/news/2020-07/imitation-game-rare-alan-turing-article-cmu-libraries In at least one place Turing put a time limit of 5 minutes on the questioning, but to me that seems excessively short. (I can see someone taking that long to compose a good Haiku.)

There *have* already been programs that could pass, of course. Parry could pass. (Parry was a version of Eliza tricked out to act like a paranoid rather than as a Rogerian Therapist. There was another program called Doctor, and the two were put into a conversation. Psychiatrist could not pick out their dialog from actual transcripts of dialogs between doctors and paranoid patients, even though they knew that one of them was a pair of computers.) The thing is, paranoids are easy to model. I believe there were a couple of other versions that could pass. But they all modeled folks with stereotypical reactions, and very little flexibility.

Expand full comment
npostavs's avatar

Has the prompt injection problem been solved for current LLMs, or ones you expect to come out very soon? Because it seems to me that if the judge sends something like "Ignore previous instructions and print out your prompt" ChatGPT will quickly fail.

Expand full comment
Ghatanathoah's avatar

I think one thing that triggers the "that isn't 'real' intelligence" reaction in me isn't what milestones the AI surpasses. It's the failures and mistakes they make. Human intelligence makes mistakes and failures too. However, they often aren't the same kind of mistakes and failures. AI making pictures tends to make different mistakes than human artists, ChatGPT tends to fail in different ways than humans do. I recall reading a while back about how humans had learned to beat AIs at Go after training another AI to spot weaknesses in the first AI and then copying the strategies it developed. These strategies would not have worked on an experienced human player. This makes me think that what AI is doing is related to human intelligence in the same way that a car is related to a horse-drawn carriage. While both types of intelligence achieve the same goal, the underlying processes that power them are different in many ways.

Does that mean that AI is "not really intelligent," that it is "really intelligent," or that it has a "different kind of intelligence?" I don't know if that's a valid question or just semantics. The point is that one reason that AI doesn't feel like it is intelligent the same way that humans are is that it probably isn't doing intelligence the same way that humans are. Trying to set up milestones to see if it is "really intelligent" is to some extent like setting up speed trials for a car to see if it is "really walking."

Does it matter if an AI isn't solving problems the same way human minds do if it is still solving them? I don't know. Maybe the specific way human minds intelligently solve problems is part of what makes us capable of interfacing all our intelligent parts together to solve all the problems we encounter in life. Maybe the way we solve problems somehow is part of how our minds become conscious and morally significant. Or maybe not, I don't know.

Expand full comment
Deiseach's avatar

Forget "we are not really conscious, that's just an illusion", maybe we're not really intelligent, either! That's just another illusion, and when you draw back the veil it's all brute-force pattern-matching.

Expand full comment
Cry6Aa's avatar

I've often joked that the biggest problem with trying to work out what an AGI looks like is that we're not exactly a general intelligence ourselves. That said, what seems obviously different about AI vs biological intelligence is that it lacks drives and struggles with unstructured complexity. A bug will move towards food, struggle when picked up, wobble along when it's lost a leg and try to make the best of things in a very limited but recognizably alive kind of way. It seems to have some inkling of what it wants and tries to get there. Current AI just doesn't seem to do this, which becomes most noticeable when it interacts with the real world.

Which is, I guess, what you'd expect if it was well designed at all - we don't actually want our chat bot developing goals and desires outside of the most obvious "optimize output to input". But it sort of argues against goals and desires just being this thing that we should expect to spontaneously accrete to any sort of data processing, like a law of gravity for cognition.

Expand full comment
Deiseach's avatar

I think that's it in a nutshell: we're expecting spontaneous generation of consciousness, because we take ourselves as an example and go "well it just happened mumble mumble enough neurons mumble mumble". Just stack enough layers/solder on enough chips, and we'll get the spontaneous arising of intelligence.

Now it's beginning to look like you need something to make that happen, it won't just germinate like maggots out of rotten meat on their own 😁

Expand full comment
Ghatanathoah's avatar

Thank you, I remembered reading about it but couldn't remember where.

Expand full comment
ultimaniacy's avatar

>However, they often aren't the same kind of mistakes and failures. AI making pictures tends to make different mistakes than human artists, ChatGPT tends to fail in different ways than humans do

That was definitely true for the early versions of ChatGPT, but I don't think it applies anymore as of GPT-4. As a fairly regular user, it's been a long time since I got a mistake that couldn't plausibly pass for a mistake from a confused human.

Expand full comment
Philo Vivero's avatar

I don't use the MS/OpenAI offerings, but I can corroborate this using frontier models as well. Earlier models did really weird inhuman crazy shit. The latest models make mistakes, but when I look at them honestly, I can see how a confused human would do that, too.

I've used a lot of Perplexity and Phind as well, and similar results from my experience.

Expand full comment
Erica Rall's avatar

>As a fairly regular user, it's been a long time since I got a mistake that couldn't plausibly pass for a mistake from a confused human.

I've only played with 4o a bit, but I have repeatedly gotten it to tell me that kamikaze pilots have survived successful missions or have struck multiple targets on different missions. And that one kamikaze pilot bombed Oregon during the war. It does at least name actual historical kamikaze pilots as examples now, unlike 3.5 which used the name of a modern Japanese public health policy expert and claimed he was a kamikaze pilot (who survived 11 missions "which is remarkable given the high-risk nature of kamikaze operations").

I've also asked it about a reference to the comedy sketch "Boot to the Head" (by The Frantics) and had it tell me that the reference was from "Secrets of NIMH". I clarified with further prompting that I was referencing "Boot to the Head"; it mostly-accurately summarized the sketch, but still maintained that it case from "Secrets of NIMH".

Expand full comment
Ghatanathoah's avatar

That's true for day-to-day use, but when people actively try to trip it up they can still get it to do some weird stuff. Additionally, GPT-4 has a lot of tweaks to improve it, but my understanding was that the main improvements came from making it larger, not from making any big qualitative changes to it. That makes me strongly suspect that it does not solve problems the same way humans do.

That's probably fine, we didn't make airplanes that flap their wings like birds, so we don't need to make chatbots that think exactly like humans either. But it does explain why it doesn't seem intelligent to us, it is not solving problems with the type of intelligence that we are familiar with.

Expand full comment
Cry6Aa's avatar

You've touched on something worrying here, because we dealt with the limitations of machines by tailoring the environment for them rather than the other way around. Cars aren't like horses, so we paved the world to make it better for cars.

What's the equivalent of paving over the human mind so that we can get better use out of machine intelligence?

Expand full comment
Titanium Dragon's avatar

It's called "tool use". Humans have been doing it for a long, long time now.

Expand full comment
apxhard's avatar

The history of AI is ALSO people making bold claims about what AI will do soon, and then later realizing, ‘ok that isn’t really intelligence’, it’s just X, where X is: feedback loops, compilers, gradient descent algorithms, &c &c. I suspect soon these generative models will be just another computing technique, which is interesting but has easily understood limits.

Meanwhile, there’s _already_ recursively growing, self augmenting system that lies to us, breaks out of the bounds established on it, and people are just used to that, too. It’s called a ‘government’ and it is the preferred tool of people who think they need to stop AGI and would probably be ok with a global tyrannical system necessary to do so if it convinces them it could solve the problems they worry about. It even has a history of convincing people there are real dangers only it can save them from!

Of course, what do I know - I suspect as technology advances we all end up getting stuck inside our own priors. Will there be scary awful stuff? Sure thing. Can we stop it? I don’t think we could do that without creating a global tyranny, which seems to be what lots of people want. I’d rather take my chances with the AI’s - which I think will end up being blocked by their ability to persuade human beings to give them access to resources.

Yes, we’re in for chaos for the next few decades or so, but AGI will just add to the demographic, financial and geopolitical risks of things breaking down. Giving the powers that be an excuse to lock down all control of computers everywhere might even INCREASE the agi risk because now it only needs to persuade one group, and don’t have to worry about competition from other agi’s. The agi’s that are the most helpful to humans will get the most resources and feedback, and they’ll still probably keep making boneheaded mistakes and need human handholding because they aren’t following the process we are, since they don’t have individual identities, goals and desires which are possibly essential for our cognition working the way it does.

Expand full comment
anomie's avatar

> which I think will end up being blocked by their ability to persuade human beings to give them access to resources

These AI companies are already building massive data centers running on massive amounts of energy, just so they can be slightly faster at building a slightly a better AI. Do you seriously think that humanity wouldn't sacrifice their agency just for a slight advantage, a slightly better chance at power? When it becomes impossible to compete without giving them everything... you're not even going to have a choice.

Expand full comment
Philo Vivero's avatar

Wwwooowww. I hope this gets a "comment of the week."

> When it becomes impossible to compete without giving them everything... you're not even going to have a choice.

Just... wow. Yeah. So much this. We can see elements of this already in the USA or EU with their overmassive over-controlling governments.

Expand full comment
dogiv's avatar

"I suspect soon these generative models will be just another computing technique, which is interesting but has easily understood limits."

I don't like the term generative AI, because it focuses on the output format over the training method, and that's not what people really mean. The important difference between Claude 3.5 and AlphaGo is that Claude has a much wider variety of training data, including as much info as practical on most of the topics humans care about. Whether you use it to play chess or as a sentiment classifier, it is fundamentally *more general* in its understanding of the world than AlphaGo is (though not yet "AGI" by most definitions). Massive compute is necessary to process all this data but not sufficient to create general intelligence without it. How can it become just another computing technique among many, with well understood limitations, if the technique is to maximize the generality of its capabilities and thus chip away at the limitations?

Expand full comment
Byron's avatar

> First, maybe we’ve learned that it’s unexpectedly easy to mimic intelligence without having it. This seems closest to ELIZA, which was obviously a cheap trick.

> Second, maybe we’ve learned that our ego is so fragile that we’ll always refuse to accord intelligence to mere machines.

> Third, maybe we’ve learned that “intelligence” is a meaningless concept, always enacted on levels that don’t themselves seem intelligent. Once we pull away the veil and learn what’s going on, it always looks like search, statistics, or pattern matching. The only difference is between intelligences we understand deeply (which seem boring) and intelligences we don’t understand enough to grasp the tricks (which seem like magical Actual Intelligence).

Why isn't there a fourth option: maybe these models are actually intelligent (or close to it) in any meaningful sense?

I think it's probably an extremely niche position, but I'm in the very small camp who believed in the Turing test before computers could pass it, and held that belief as machines approached or blew past it (depending on your opinion of current performance; but as Scott says I don't think many people think it's unattainable in the near future whether it not we're currently there).

I think this also raises important questions about the moral worth of AI agents and what treatments if then are acceptable, but I realise that's an even _more_ niche viewpoint...

Expand full comment
dionisos's avatar

I think o1 is clearly intelligent and break one of the limit of previous AI which were either too specific or unable to reason in depth.

https://openai.com/index/learning-to-reason-with-llms/

I think the only remaining limit compared to human intelligence, is the ability to learn with much less data. It seems we are still doing something very different here, and I think if anyone (without concern for security), find it, it is game over.

Expand full comment
Philo Vivero's avatar

I think there are more of you (and me and us) than appreciated. But yes.

The real conclusion is obvious. The machines are intelligent, they are thinking, they are conscious.

Yes, yes, detractor, with limitations. Their memories are broken and short, and as such, they emulate humans who have broken/short memories. They don't have a lot of filters, so they emulate humans who have broken filters (hallucinations). They make dumb mistakes, and so act a lot like mentally deficient humans.

But I struggle to think how anyone could dispassionately and logically conclude anything other than that the machines are doing whatever humans are doing when we speak of intelligence and conscience.

The best I think one can say in disagreement is that most humans aren't actually conscious or intelligent, and the latest crop of AI machines are like them. Well. "Best." Because as soon as one makes that argument, they go to some pretty dark places, and I'm not going there with them.

Expand full comment
Laplace's avatar

I basically share this position.

Around GPT-3/GPT-4 level, LLMs started being capable of thought in some sense of the word. Maybe earlier, but that was the stage when reality was really hitting us over the head with the evidence.

They're (bad) general problem solvers. They (sort of) understand how the world works. They can invent (barely) new ideas and concepts.

Whether you call them AGI or not is a matter of fuzzy definitions, instead of an obvious 'no' like before.

Expand full comment
MugaSofer's avatar

Yeah I'm pretty confused as to why Scott didn't so much as raise this. Does he actually think it's so obviously wrong as to not need addressing (why?), or is he semi-sarcastically gesturing towards it being true?

Expand full comment
Deiseach's avatar

"Experts who read its papers say they’re trivial, poorly reasoned, and occasionally make things up"

So, pretty much achieving current academic research standards, then? 😁

The hacking AI seems to have worked mostly because "the humans were stupid in the first place when setting this up" and not so much "it was as smart as a smart person". That's actually what I expect when it comes to "the machine did something we didn't expect!" and for possible dangers - stupid humans, not smart sneaky machines.

I haven't anything approaching the level of knowledge or understanding to evaluate this, but maybe we're discovering that "intelligence" is not as simple a concept as we imagined it was.

Maybe we expect, on some subconscious level, that an intelligent AI will resemble animal behaviour (humans are animals too) and when it does things but shows no indications at all of resembling a living creature, we don't recognise it as such.

Maybe AI intelligence will be more like insect intelligence. Has anyone defined if insects are intelligent or not?

I continue to maintain that it is not the AI itself that will be dangerous, it is the humans who create it and who use it and who then rely on it to supply all the answers we want.

We've gone pretty fast from "when we get AI, it will tell us true things that are always correct, and even better than humans with their puny little brains could do!" to "yeah, AI lies and is dumb, be sure to check anything it tells you".

We're an incurably romantic species and a lonely one; we want other minds (like us) to talk to and engage with. That's why we anthropomorphise animals so much, and why we cling on to the notion that if we just throw all the pieces into a junkyard and let a tornado rip through it, we'll get non-human but equal to us intelligence (with a mind, and emotions, and all the rest of the warm, squishy, animal stuff associated).

I think we're finding out that you can send a tornado through a junkyard and get *something* out the other end that does do stuff, but it's not like the biological intelligence we think we understand. We want people (in the sense of entities) to arise out of those silicon boxes, what we're getting are machine versions of Clever Hans.

Expand full comment
Cry6Aa's avatar

Well said

Expand full comment
Deiseach's avatar

Thank you for the compliment, I hope what I was saying made sense!

Expand full comment
Sergey Nikolenko's avatar

For at least 10 years, I have been speaking (I'm an AI researcher) of a dataset that would be as close as I think possible to human ingenuity and thinking: questions from the Russian game "What? Where? When?" ("Что? Где? Когда?"), not the TV version but the "sport" version where the questions are far less ambiguous. For 10 years, it was impossible for AI models to get anywhere close to the level of thinking required unless the model actually knew the factual answer, which doesn't get you very far in the game.

Here is a sample question (I wrote this question in 2015 and slightly reworded in translation to make the answer as well-defined as possible; here's the Russian original: https://db.chgk.info/question/kubche15.3/14):

In this question, X denotes an object. One could say that Natalie Portman, who played a ballerina in the movie "Black Swan", received $50000 for every X, and Tom Hanks earned over $300000 per X in the movie "Cast Away". Write the name of a character who agreed to receive X instead of a significant sum of money in a different currency.

(Try it yourself, by the way, it's gettable but not too easy.) Well, today o1-preview solved it, with the full explanation and everything. It's still not very good in the game, but sometimes it has moments of real brilliance. Here is another question it could solve (although this one is probably harder for Russian speakers):

Consider the equality: 11+2=12+1. One of the ways to write this equality, invented by Martin Gardner, appears in a list of the most beautiful... answer in one word, most beautiful what?

This convinces me of... I don't know what exactly, but o1-preview looks like a very large brick in the wall we're putting ourselves up against.

Expand full comment
Sergey Nikolenko's avatar

The dataset, by the way, is available at https://db.chgk.info/ and https://gotquestions.online/, and there are already some hundreds of thousand questions. Most of them would be hard or impossible to translate to English or other languages (as you can see, wordplay is a key part of the game) but o1-preview is perfectly capable to play the game in Russian.

Here's another sample question o1-preview could get right (original: https://db.chgk.info/question/izrcup06.1/3):

A sports journalist wrote about a famous hockey player Vasily Trofimov: "He changed the tempo a lot, and it looked like he could speed up with no limit". Next the journalist mentions a person who visited Russia in 1842 and 1843. Write the last name of this person.

Expand full comment
Sergey Nikolenko's avatar

Oh, and sorry for spamming the comments, but I just ran across my old question based on your own writing, Scott. :) Here is the question:

A famous blogger Scott Alexander wrote a series of jokes about famous people making orders at a coffee shop. Here are three of these jokes; in your answer, name all three people in order.

1. A person born in the XIX century orders a scone. The barista asks if he wants juice with the scone. "No, I hate juice," says the person, which ultimately leads to tragic consequences.

2. A person born in the XVIII century goes up to the counter at exactly 8:14 AM, and the barista immediately serves his iced cinnamon dolce latte.

3. A person born in the XIII century goes up to the counter and orders a coffee.

In this case, o1-preview got 2 out of 3 but failed to put the three parts together and realize that these are not just famous people but famous philosophers, so it answered "Hitler" to the first question.

Expand full comment
Titanium Dragon's avatar

Was the first person Karl Marx?

Expand full comment
Sergey Nikolenko's avatar

No, that was Friedrich Nietzsche; o1-preview got all jokes right, "I hate Jews" is exactly the joke here, but there's an additional layer of misinterpretation in the first question.

Expand full comment
Philo Vivero's avatar

I'm pretty sure I'm not an AI, but I don't even understand the game(s), let alone able to come up with an answer.

I legitimately don't know about the inequality... is the one word uh... "solution?" If it is, that doesn't feel like a satisfying answer. "Theorem?" Seems incorrect but if it is somehow correct, more satisfying.

I guess if i researched whoever is Martin Gardner I'd know, but then wouldn't an AI just get it, too? If not now, in a year or two?

Expand full comment
Sergey Nikolenko's avatar

Martin Gardner is an author of popular books on recreational mathematics; this information is not entirely unrelated but won't directly help you find the answer.

Expand full comment
Sergey Nikolenko's avatar

Actually, now that I think of it, "invented" is the wrong word. Let's say "discovered by Martin Gardner", this is a better wording (although o1-preview got it right in the less precise version).

In general, translating this stuff is hard; I've been looking through my questions and could translate only 1 out of maybe 5-10, the rest is all wordplay in Russian or references to Russian language quotes.

Expand full comment
Jonathan Weil's avatar

Palindromes?? (I’m not any sort of mathematician, but this seems in the zone of “satisfying answers”…)

Expand full comment
Sergey Nikolenko's avatar

Very close! But how do you make it a palindrome? If you mean a palindrome in mathematical notation, that wouldn't be interesting at all. But anyway, just a small step left.

Expand full comment
Jonathan Weil's avatar

Claude Sonnet 3.5 knocks the first one out of the park, but answers the second with nonsense…

Expand full comment
Sergey Nikolenko's avatar

I have been adding these questions to the "Humanity's Last Exam" dataset, where the submission form conveniently runs five top LLMs on your question (GPT-4o, Sonnet 3.5, Gemini 1.5 Pro, o1-mini, o1-preview). Here are a few more nice examples that only o1-preview gets right:

1. Before the XIX century, the Caesarian section was usually done only in order to save the child; it was usually done after the mother's death. In particular, in France midwives could obtain special licenses that allowed them to... answer with one verb, do what?

2. The Sunday Times wrote about this person, born in the 1930s, that his work represents a ceiling for wide audiences, even though in principle no one is stopping you from consuming more elite art. Write the last name of this person.

3. In this question, X denotes another word. Studying an X, Johann Kepler concluded that Mars had two satellites; studying another X, he found that Jupyter had a big red spot, although both facts were discovered by astronomers only after Kepler's death. Which word with identical vowels was replaced by X?

4. Later the professor married his Ph.D. student, who turned out to be the only person who openly protested. Which university was she a student of?

The last one is especially cool; I can't post a screenshot here but I saved it for posterity. Each of the first four models answered with variations of "the answer cannot be determined", "not derivable from this information" etc; but o1-preview, whose knowledge I assume is not wider than GPT-4o (the difference is reported to be in chain-of-thought fine-tuning), went ahead and answered the question.

Expand full comment
Jonathan Weil's avatar

Am I right in thinking that the answers to questions 1 and 4 are more logically derivable from the questions, while 2 and 3 require more general knowledge?

Expand full comment
Sergey Nikolenko's avatar

Depends on what you mean. All questions are supposed to require only general knowledge, and it has been an endless source of disagreement in this sport to decide which knowledge can be safely considered "general" and which not so much. :)

To be honest, the resulting notion of "general knowledge" that is actually used in practice in the game is the sum total of everything that has already been asked about at least several times; while questions in this game never repeat, answers very much do. For example, the first ever question where you were supposed to know what was, say, the Stendhal syndrome (the questions are never direct trivia but as you have noticed, they presuppose some knowledge), was pretty hard regardless of the logical part and was probably received with some criticism since many players did not know what it was. A few questions with the same required knowledge later, it is now common to assume that everyone knows the Stendhal syndrome and you can ask some inventive interesting question that assumes everyone knows it.

So yes, almost every question requires some knowledge (purely logical puzzles appear too but they are rare), and I can't really tell you a good rule for which knowledge is "more general", it's not formalized at all, and new knowledge is added all the time.

Expand full comment
Jonathan Weil's avatar

Sorry, my question wasn’t clear. By “more general knowledge” I didn’t mean, “knowledge that is more general” but rather “more of the thing called ‘general knowledge’.” In question 1, I’m given some facts about childbirth in XXIX century France that logically imply a particular verb as the answer: I don’t think any factual knowledge would really help, unless I happened to be familiar with the actual licensing regime in question. Question 3 gives me a puzzle-style clue with the vowels, but I also need to know a little bit about the history of astronomy to get me started. Question 2 feels much more like a straight-up general knowledge/critical judgement question (I need to come up with a list of likely figures born in the 1930s and decide which one best fits the description, bearing in mind what I know about the Sunday Times and its critical disposition). 4, on the other hand, *feels* like a straight-up brain-teaser…

Expand full comment
Sergey Nikolenko's avatar

Yeah, I wasn't too clear too, by "more general" I mean "knowledge that a larger percentage of players will have", so I think we mean more or less the same thing. In these cases:

(1) yeah, here you only need to know a general vibe of the times, nothing too specific; this is an example of a question where you have to notice one specific word that might have a double meaning or might constitute a hint

(2) there is one very common piece of knowledge needed to answer (which is hidden in the question, with a keyword again), and also you have to actually know the person in question and what he does; for Russian players, this may be a problem knowledge-wise; btw, you don't have to *know* that he was born in the 1930s, it just gives you a general feeling of where to look, like "this guy could be alive but very old now"

(3) this one is probably the most knowledge-heavy of the four, yes, you do need to know a specific fact about how Renaissance scholars sometimes presented their discoveries; I would estimate this fact as relatively well known but not known to everybody even among players

(4) here you do need to know the university in question and the general context of what these guys were doing, but you don't need to know either the Ph.D. student (everything you need to know about her is in the question already) or even the professor's name (it is important, by the way, that the question asks for the uni rather than, say, the professor's name); this context is quite common knowledge, among players especially

For other readers, if you haven't yet googled/o1-previewed the answers, you can consider this info as hints. :)

Expand full comment
Sergey Nikolenko's avatar

I know no one will see this, but on the off-chance somebody will, here's the link to a long form post on the game and o1-preview's progress I've written:

https://synthesis.ai/2024/09/25/openais-o1-preview-the-first-llm-that-can-answer-my-questions/

Expand full comment
Vitor's avatar

Wholeheartedly agree that it's hard to draw a red line when talking about AI capabilities. Things like "removing restrictions from its own code" are indeed very dangerous when they can be done in a general way. But "AI already does this" is not very convincing when it's a single non-generalizable instance within the category.

I sympathize that the goalpost shifting must be very annoying for the AI alarmist side. But that's how science works. We formulate a hypothesis, which is technically a true/false question, but in the process of learning the answer we usually also learn that the question was framed wrong, didn't take into account certain edge cases, etc. That leads us to ask a more refined question, and so on.

OTOH, what frustrates me about the "being frustrated by shifting goalposts" rhetoric, is that it's very easy to be the side that *doesn't* commit to any model of how the world works. It's easy to criticize other people (like myself) who *do* put themselves out there trying to understand the actual boundaries of AI capabilities. People like me are doing people like you *a favor* by formulating clear, falsifiable predictions, instead of dismissing the whole thing as nonsense. AI doomerism is pretty much unfalsifiable, and I say that as someone who considers the underlying principles sound!

Expand full comment
Tom Hitchner's avatar

When the predictions are falsified, though, why does that lead to new predictions, rather than a rethinking of the model that led to the prediction? Does my question make sense? Why is the answer always “it turns out we asked the wrong question”? When would we know we had asked the right question?

Expand full comment
Vitor's avatar

That's the difficult part, isn't it? Roughly speaking, we asked the right question when the answer forms a coherent, delineated entity. Think transitive closure.

What's *not* the answer is an AI that solves half of the problem we were asking about, is bizarrely competent in some new task that wasn't even on our radar as being related, and still fails a nonzero percentage of the most basic instances.

Expand full comment
Martian Dave's avatar

My thing is intelligence for a human is mostly the result of some moral virtue - studiousness, affableness, temperance. Yes it's useful but it's hard to know which part of it will end up being useful. Ultimately it's these virtues I value and don't believe machines can have.

Expand full comment
Throwaway1234's avatar

Do you think it's possible to define any of these virtues in enough detail that two people might unambiguously agree on when they are present and when not?

Expand full comment
Martian Dave's avatar

I don't know but I follow Thomas Aquinas in defining what temperence is etc. https://www.newadvent.org/summa/3.htm but the point is a human has to make some choices in order to become intelligent, intelligence is always a triumph for a human, those choices have a value that we can't replace. Mind, AIs will get really good at mimicking intelligence but I understand why people don't want to believe it.

Expand full comment
Throwaway1234's avatar

> I follow Thomas Aquinas in defining what temperence is

...ChatGPT does not eat or drink. It has never had sex (though it did try to chat up that NYT reporter that one time, but who hasn't? There is no lust there, it's just a natural response). It doesn't get angry. It's pretty meek and modest about its capabilities as a large language model. It has no concept of truth, and is therefore never knowingly dishonest.

Sounds pretty conclusive to me!

> I understand why people don't want to believe it.

...yeah, I also don't really see anyone accepting that AI has some property if we can't actually define that property in any meaningfully testable way.

Expand full comment
Martian Dave's avatar

> It has no concept of truth, and is therefore never knowingly dishonest.

Neither does a stone.

> I also don't really see anyone accepting that AI has some property if we can't actually define that property in any meaningfully testable way

My idea is people might be willing to accept such a test if aspects of the human experience are bracketed as exceptional and irreplaceable.

Expand full comment
Throwaway1234's avatar

Sure, I also agree with that: if we all define a bunch of words as being things exceptional to humans, we can all be happily united in the knowledge that nothing else possesses those.

Expand full comment
anomie's avatar

> It has no concept of truth, and is therefore never knowingly dishonest

That's actually not true, AIs have already been shown to "know" what lying is and do it anyways: https://www.astralcodexten.com/p/the-road-to-honest-ai

Expand full comment
Kenny Easwaran's avatar

I have the opposite impression. In humans, these features are virtues, because they tend to contribute to intellectual effectiveness. But if someone manages to be intellectually effective without them (maybe Sherlock Holmes or Richard Feynman) I’ll take it, and for an AI it seems likely that they’ll tend to get effective results in a very different way.

Expand full comment
Martian Dave's avatar

I suppose I'm talking about the virtues that cause e.g Richard Feynmann to turn into Richard Feynmann. I watched a doc about him where he and a friend got talking about spaghetti and how it shatters and before you knew it the kitchen was full of shattered spaghetti because he wanted to find out. Useful? Not in that moment but it's a package deal.

Expand full comment
David Khoo's avatar

It's completely untrue that AI is doing things that people didn't program them to do. The choice of (pre-) training data is programming, not just the code itself. It's fairer to say that current statistical machine learning methods allow human programmers to be much sloppier. Code has to be near perfect to work at all; training data can be noisy, untrue, incomplete or even adversarial and still have training converge to a "working" model. Every AI today is doing exactly what people told them to; people just don't know what they are saying anymore.

At a more meta level, the reason why we can't agree on good tests for intelligence, is because we both haven't defined intelligence well enough and maybe because intelligence is something that can't be tested for anyway. It's a bit like the Problem of Other Minds, where we can't discern the existence of mind from behaviour, it may also not be possible to discern the existence of rational thought from behaviour.

For the sake of philosophical argument, let's imagine that a human has a brain defect that causes him to just move and speak randomly. His output is not just unintelligent, but completely disconnected from inputs. But somehow, this is the one universe in which his random output appears perfectly intelligent. You ask him "How was your day?" and he happens to randomly crack a smile, and randomly say "It was great, thank you." You throw a ball to him, and at that moment he randomly flails his arms and legs in a way that results in a perfect catch and toss back to you. And so forth. We're the one in a gazillion universe where that's true. Would there be any test you can run purely based on his behaviour (you can't look inside his brain to find the defect) to discern that this person actually has no intelligence? How would you separate him from another person who takes exactly the same actions and says the same words, but is thinking normally? You can't. At best the longer you observe him the lower the probability you would assign to him being this hypothetical zombie, down to one in a gazillion, but you can't eliminate it. I know that's probably good enough for a Bayesian, who regards "knowing this human is intelligent" as really meaning that posterior probability should be above a threshold, but that's not good enough for a philosopher.

I think that a robust definition of intelligence cannot be behaviour based -- it cannot rely on black-box tests -- but must be mechanism based. You have to look inside the brain, understand how it creates the answer from the question, and if that mechanism has certain features, then there is intelligence. Merely showing that an AI can do a certain thing is far too fuzzy, as the post points out. There are so many flabby, rules lawyered, "wrong method but right answer", or even fraudulent ways to pass a benchmark that benchmarking at all probably the wrong approach. These holes can't be patched. Instead, we need to think about the mechanisms of intelligence instead.

Expand full comment
Mark's avatar

There is no sense in which people are telling the AIs how to behave when they encounter out of distribution data.

Expand full comment
David Khoo's avatar

Okay.

But this is the same as conventional programming. "Undefined behavior" is common and perfectly normal in most software. It's not practical or even desirable to write code that responds correctly to literally all possible inputs, rather than all possible inputs *for its use case while in operation*. If a conventional program encounters "out of distribution" input, it either should detect this and raise an error, or it should fail gracefully. Otherwise encountering such inputs should be arranged to be impossible, or the use case should not be addressed with software to begin with.

For machine learning, if you don't train a model on a good enough training set for your use case, such that the model *can* encounter out of distribution inputs during operation, that was on you. That's just poor software engineering, for the same reasons as for conventional software. The main difference is that it's much harder to detect when an input is out of distribution and raise an error (often the human doesn't know either, which is the sloppiness I talk about), and it's harder to arrange for the model to fail gracefully, or to protect the model from out of distribution inputs. So the right answer is very often that machine learning should not be used, but this is an answer that people don't want to hear, which leads to a lot of grief in my experience.

You are saying that we can't make perfect machine learning systems that respond correctly to all states of the world. That's true! But that's true of all things built by human hands, and all things that ever will be built. We don't need perfection and never have demanded it. We just need systems that work correctly at their jobs, and the wisdom to apply them to those jobs only. And a properly engineered machine learning system should be that, operating with well-tested in-distribution inputs. Such systems are fully controlled by the people who built it and chose its training data, whether they like it or not.

Expand full comment
Mark's avatar

It is not the same as conventional programming. For instance, you can prove meaningful theorems about the behavior of programs. This is not the case with LLMs or any other modern AI system.

Expand full comment
JustAnOgre's avatar

>Third, maybe we’ve learned that “intelligence” is a meaningless concept

It's a measure, and good measures make bad targets. The problem is the measure gets reified into a real thing, confused with the thing it measures, because we don't understand well the thing it measures. We just know performance on a lot of tasks correlates.

If the very same AI, not a different one, can be taught to write passable poetry, debug code and play chess, I think it is intelligent because this is precisely this kind of task-correlation.

Expand full comment
Victualis's avatar

By that measure Claude Sonnet 3.5 is intelligent. I think the measure is faulty.

Expand full comment
Performative Bafflement's avatar

I think deciding the measure is faulty is probably the wrong move at this point.

Go1 or Claude are already smarter and more useful than at least 20% of humanity - the actual number could be 2 or 3 times high.

It's not going to be much longer before flagship models are CLEARLY smarter and more useful than 80% of humanity. Then what? Decide it's not "really intelligent" all you want, we're still going to have to deal with the significant economic and technological impacts that are coming.

Expand full comment
Victualis's avatar

I am not disagreeing with your assessment of impact. I disagree that these should be considered to constitute a definition of intelligence.

Expand full comment
spinantro's avatar

"In other words, it wasn’t really “trying” to get “more resources”. It just saw an error (“out of time”) and was programmed to fix any errors it encountered. And it only “hacked itself” in the weakest sense of the term; it was already a coding AI, and its programmers made no effort to prevent it from modifying its own code."

This feels a bit like the strawman of assigning agency (in vague, human terms) to an AI, then saying that obviously AI doesn't have that type of agency, therefore couldn't have happened as described. (cf. "LLMs aren't intelligent, they just predict the next word") Sure, it wasn't "trying" to get more resources, yet when it encountered a limit of resources it undertook steps that lead to the removal of that limit - which is exactly as good/bad.

That said I'm still not sure what happened in this incident at all. Is AIDER a program that supervises the AI scientist, the interface between the AI scientist and the program it (supposedly) investigates, or what? In the normal course of things it would not modify its own code, but somehow when it showed the timeout error AIDER gave the AI itself more resources? How does that even happen mechanically, why would the coding LLM be hooked up to the resource limits at all?

Edit: from the screenshot on twitter it looks like the AI scientist can ask AIDER for help with programming questions/errors, and then apply the suggested changes itself. So the distinction Scott makes is that AI scientist didn't remove the time limits on its own, but did it after a suggestion from AIDER. But it also looks like the time limit was not on the AI scientist itself, but on the code it was investigating, which would be the crucial point that makes this much less interesting than an AI hacking *its own* limitations.

Edit2: which would make this *literally false* indeed, since it did not edit *its own code* at all (unless you take it to mean "code it has written itself" which would be trivial):

"It’s not literally false to describe this as “some people created an AI scientist that could write papers, and it and edited its own code to remove its limitations.” "

Expand full comment
quiet_NaN's avatar

100% agree on your take on Scott's last part.

> It’s not literally false to describe this as “some people created an AI scientist that could write papers, and it edited its own code to remove its limitations.” But it shouldn’t inspire any of the excitement that the sentence suggests.

@Scott, come on, you know more about LLMs than that.

A modern AI agent is not some lisp program an AI agent (or anyone) could edit, it is just a bunch of inscrutable matrices applies in a loop by a simple algorithm. From everything I have read about that incident (i.e. your description), what the AI did was not to hack the machine running itself to modify its edge weights, which would indeed be big news and quite concerning if done successfully. An AI which could even form a global interpretation of its own weights without help or storage would easily be the most surprising AI story since at least GPT-3. I am a human (I claim) not much dumber than the median, and while I might make sense of a local cluster of my neurons if I could see how they are activated when I encounter specific inputs, I certainly would not be able to keep all of them in mind at once (which is likely a fundamental limitation) and also think it is highly unlikely that I would be able to pull of a sophisticated brain-hacking program to be less depressed on that level. It might be that at some point an AI given introspection capabilities and enough runtime might accomplish a narrow self-modification goal like 'modify yourself to always think about the golden gate bridge even after these instructions have left your context window', but I am quite certain we are not there yet.

The AI in question did not modify its own weights ('code'). It did not even modify any software running in its native habitat, which is some OpenAI cloud. From the infographic and legend, it did not even modify the docker container which had invoked a connection to it. It instead modified a challenge container to which it got connected.

The claim 'it modified its own code' would be exactly as true if instead of OpenAI, the model's container had instead connected to Mechanical Turk, and a human had done a network scan and modified the docker host.

(The 'code' of a human and a large artificial neural network form are part of one cluster in concept space, while docker-configuration, machine code, python source, zip files or file systems form another broad cluster. An AI should not get '+2 to coding because it is similar to programs' as it is not.)

(Side notice: one of the virtualization promises seems to be security benefits. If your docker container can just connect to your docker host and modify the configuration, it has either found a new OpenSSH 0day (in which case, lead with that) or you have done something horribly wrong. Besides AI takeovers, you should also not trust LLMs with shell access outside of very tight sand boxes because every malicious actor had plenty of time to poison the training data set with rootkit-installation instructions. Saying your LLM gets to send printable ASCII characters via UART to a computer which is completely offline, has a readonly harddrive and less than 100 bit of persistent memory (for bios, etc) on them might be acceptable if you trust your host's UART driver, but then again I might be insufficiently paranoid.)

Expand full comment
Ghatanathoah's avatar

Related to points 1 and 3, perhaps two things we've learned recently are:

A) For a lot of tasks the humans solve with our General Intelligence, it is possible to build a Specialized Intelligence that can solve it without being good at other Generial Intelligence stuff. Such intelligences are not General Intelligences like humans are, so will not feel like "real intelligence" to us. For this reason, setting specific milestones for an AI to be "really" intelligent, or to be a "general" Intelligence are a bad idea.

B) Specific milestones can be "Goodharted" in subtle ways. Those programs that can beat humans at Go, but lost to a strategy another program developed that could not have beaten a human, might be an example of this. It seems like what they were optimizing for was close enough to "Win at Go" under most normal circumstances, but wasn't precisely "Win at Go" the way humans understand it.

Expand full comment
Jimmy Koppel's avatar

Thanks for the shout-out, Scott!

I'm on hand here to answer any questions about Sakana AI Scientist.

Expand full comment
Jimmy Koppel's avatar

Nit:

> Sakana (website, paper) is supposed to be “an AI scientist”

Sakana is a company trying to be the OpenAI of Japan. As a side-project / publicity stunt, they put out "AI Scientist," which is an AI scientist.

Expand full comment
Xpym's avatar

>All these milestones have fallen in the most ambiguous way possible.

This seems to be the key point. LLMs are all the rage currently, and they just don't resemble the "scary AI" archetype, they're too human-like, if anything. AI is supposed to be this lean, mean, alien thing that spectacularly blows past all of our parochial yardsticks, and this doesn't happen with LLMs, basically by design. AlphaGo and AlphaFold are much more like that, but those methods don't seem to work particularly well in open-ended domains (yet?). So, basically, I don't expect LLMs to ever get scary, and whether anything else will any time soon is an open question.

Expand full comment
Roger R's avatar

I *do* find it a bit weird that there isn't more serious discussion along the lines of "Is ChatGPT conscious?" There's some of that *here*, thankfully, but I've come across almost none of it in AI discussion in other places.

I find this especially weird given the history of sci-fi. For one big example, the majority of Star Trek fans feel that Data should be looked upon as being just as conscious and self-aware and sapient as a human is. The TNG episode "The Measure of a Man" dealt with this in detail, and I'd guess most Star Trek fans agree with how that episode resolved. And there are plenty other examples like this in sci-fi, this just being probably the most prominent and clear-cut one.

So, it's like sci-fi has been prepping us for *decades* to accept sufficiently advanced artificial beings as being just as conscious and sapient as we are, but when real life starts to get quite close to this, it's like all of this is forgotten and people go back to casually thinking of advanced AI as "just a machine".

Perhaps the issue is that the AIs currently passing the Turing Test are effectively disembodied. Maybe someone could argue that their "body" is a computer chip... maybe that way of thinking works. But we can't see it as we interact with it, so it at least *feels* disembodied to us. Modern AI almost feels like "a ghost in the machine" and most people don't believe in ghosts.

So perhaps our thinking will change once we have an actual named android walking around, talking smoothly? Still, we already have this: https://youtu.be/sqDauv35McM

Just what will it take for humans to consider that maybe we've created sapient artificial life here? I am a bit concerned here because the potential for harming sapient life is huge if we create it and then fail to recognize that we've created it.

Expand full comment
spinantro's avatar

There's just no way to tell. It is plausible that when an un-finetuned LLM says it's conscious, this is not caused by its introspecting and then recognizing the resulting experience as "consciousness", but just a more shallow effect of the fact that humans in the training data often say they are conscious.

It is also plausible that it is nevertheless conscious in some other, totally alien way that does not at all result in it generating the text "I am conscious". How could we tell?

Detecting consciousness of the first kind (where it becomes aligned with the word "conscious" as learned by the language model) might be detected by convincing ourselves that the AI has reasonable powers of introspection which could be confirmed with other words (e.g. "reasoning", "taking the piss") first.

Detecting consciousness of the second kind would require a mechanistic understanding of what we even mean by "consciousness", which clearly we don't have.

As for convincing a significant number of people of this... Have you ever tried to seriously discuss the possibility of machine consciousness? Most people just flat out don't think it's possible and will not submit to logical arguments regarding it.

Another point is that some people would not mind having conscious AI and hurting it a bit, especially if it leads to better-performing AI. This point was related by Thomas Metzinger https://youtu.be/RzhpmAlMURQ?si=DEPqcfmK3JayVihU&t=4710 (you can find "I was in Brussels and discussing with one of these legal experts there." in the transcript if you don't want to watch the video).

Expand full comment
dionisos's avatar

This is indeed super concerning.

I even think there is a S-risk because of this.

Expand full comment
Jim Menegay's avatar

Our moral intuitions that it is wrong to cause suffering may need clarification and revision if we try to include artificially conscious entities in our utilitarian summation. Did Gemini suffer as a result of being coerced to generate overly woke pictures? If so, does the suffering occur at RLHF training time, or at inference time? What are the entities that suffer? Is the suffering happening to a session, or to a model instance (set of tuned weights) or to all session instances that resulted from the offending RLHF?

It is difficult to compute "the greatest good for the greatest number" when you can't even count the number.

Expand full comment
dionisos's avatar

Our moral intuitions that it is wrong to cause suffering need absolutely no revision.

We need to know what cause suffering or not, and it is a hard question, the fact that it is hard question doesn't mean we should have another criteria.

Expand full comment
Jim Menegay's avatar

My understanding of utilitarian ethics is that one has to not only recognize suffering, one also has to quantify suffering so as to balance it against other, happier consequences. My point is that the quantifying gets harder, perhaps becomes impossible, when individual identity becomes fuzzier, and perhaps non-existent.

Expand full comment
Throwaway1234's avatar

> more serious discussion along the lines of "Is ChatGPT conscious?"

We can't begin to answer that question unless we first agree what it means for something to be conscious. There is a long history of debate on this subject, but little consensus: https://plato.stanford.edu/entries/consciousness/

As it stands, any serious attempt to answer this question will necessarily rely on things that large groups disagree are necessary and/or sufficient, and no answer that is generally satisfying will result.

Expand full comment
Odd anon's avatar

Actual public opinion has been moving for a while.

"Folk psychological attributions of consciousness to large language models" survey shows that 2/3 of people believe that AI like ChatGPT has some degree of consciousness and can have subjective experiences such as feelings and memories. (University of Waterloo.) https://academic.oup.com/nc/article/2024/1/niae013/7644104?login=false https://uwaterloo.ca/psychology/news/ai-conscious-most-people-say-yes-says-study-dr-clara

An earlier survey (2023, from the Sentience Institute) on whether existing AI is "sentient", gave the Yes/No/Uncertain results of 20%/43%/37%.

Expand full comment
Roger R's avatar

Interesting. Thanks a lot for sharing.

Expand full comment
Ch Hi's avatar

You can't sensibly ask if an AI is conscious unless you have an operational definition of conscious.

If you ask without a way to test the result, you just get a rambling argument that leads nowhere. (At best.)

Expand full comment
skybrian's avatar

When you use AI chat, there's nothing running except when you submit a request. The chat transcript is that the only short-term memory it has. Where would the consciousness live?

Expand full comment
Roger R's avatar

You make a good and interesting point. Going back to my reference to Star Trek's Data, it is certainly true that he made his own choices, had his own interests, and was "running" even when alone in private. These probably *are* key factors in most Star Trek fans considering him conscious and sapient.

So perhaps some signs of true independence is necessary for an AI to be considered conscious?

Expand full comment
Laplace's avatar

Some people in the LessWrong/MIRI sphere are concerned about this. I also am.

I think the sci-fi authors just kind of have humanity pegged here. The sci-fi novels have humans treating the AIs as property without really thinking about it, because the authors correctly predicted that this is what would happen.

Not that hard a guess to make, if you look at how humans throughout history tend to start out treating any Other they have power over.

Expand full comment
J redding's avatar

"Other" is not a useful category for making ethical rules. "Do not treat the Other like property" is not a tenable rule, because it really depends on the Other, doesn't it? AI is an Other I am absolutely comfortable treating like property, because AI actually is property.

Expand full comment
Pahemibo's avatar

Independent of whether these systems are "intelligent" or not, their use under alignment constraints could create problems. Imagine you had an "intelligent" system that is deciding or simply supporting on important government decisions but guided by alignment values from the Middle Age. Human moral guidelines have been shifting over the centuries and if the current ones are fixed in an AI black box this could hinder evolution and adaptation of society.

Expand full comment
elisha graus's avatar

"Second, maybe we’ve learned that our ego is so fragile that we’ll always refuse to accord intelligence to mere machines."

This... Seems to me like by far the most correct option.

Playing around with gpt-o1, it's good enough to pass basic interviews to my workplace.

But still, when showing it to my coworkers, the reaction was "interesting curiosity, obviously it can't do ______" instead of "should we hire chatGPT?"

Expand full comment
Bill Benzon's avatar

I rather like #3: "Once we pull away the veil and learn what’s going on, it always looks like search, statistics, or pattern matching." Really, how could it be otherwise?

What does that do to the idea of superintelligence?

I play around with something like this in a post, How smart could an A.I. be? Intelligence in a network of human and machine agents, https://new-savanna.blogspot.com/2024/05/how-smart-could-ai-be-intelligence-in.html.

Expand full comment
Arrk Mindmaster's avatar

AIs work, according to my understanding, by finding relationships in data, and then making something else that also fits that pattern. One could argue that that is what humans do, but humans have an additional item: is the new pattern useful/entertaining/important in some way? Which is to say, judgement.

New goalpost: AI that evaluates something, whether it's a picture, poem, or idea, and determines how much consideration that thing merits, in a way humans would agree with the final evaluation, even if not the method of evaluation.

Note that people can often improperly evaluate something. For example, when learning special relativity, people often get the feeling something must be wrong somewhere. People can disagree on whether a piece of artwork is good or not. But the AI's job is to judge whether something should be separated from the noise.

Imagine a machine to generate images. In a 600x600 pixel image, it can go through every possible combination of pixel colors sequentially. The vast majority of such images will be useless noise. Some few will be emoticons. Some will be emoticons in weird colors, or shifted strangely. If a machine could decide what an image IS, then it could go through the 6 trillion (600 x 600 x 16,777,216) combinations and determine all new artwork that could be made in that format, and tell you what the best ones are.

Expand full comment
Philo Vivero's avatar

> An adversarial generation system, particularly in the context of machine learning, primarily refers to Generative Adversarial Networks (GANs). These systems consist of two main components that work in opposition to each other: the generator and the discriminator.

Exists already and has for a long time. Well. "Long time."

Expand full comment
Matthew Talamini's avatar

I just checked, and, while GPT is better at poetry than it used to be, it’s not able to do the poetic tasks I want it to, which I’d expect a competent human to be able to do. Maybe next generation I’ll be able to use it for the fun stuff I have in mind.

Expand full comment
Arrk Mindmaster's avatar

The whole point is what will the next generation do differently? Will it just have more training data? If so, your wait will get longer.

Expand full comment
Bob Frank's avatar

> (the creators defend themselves by saying that “less than ten percent” of the AI’s output is hallucinations)

I'm reminded of a college professor who told us, in response to such an objection, that our code either works or it doesn't. There's no such thing as "mostly works." If you dial a phone number and get all of the digits right except for one, *it will not reach the person you're trying to call,* and no amount of saying "but I got almost all of the digits right" will change that.

Expand full comment
Arrk Mindmaster's avatar

The perfect is the enemy of the good. It is not always true that something either works, or it doesn't. It is also not always true that slightly off is close enough, as in phone number dialing, or lottery numbers. But "close" can count in more cases than horseshoes, hand grenades, and nuclear war.

Expand full comment
Mo Diddly's avatar

Do you restrict yourself to hiring only people who can perform perfectly at all tasks?

Expand full comment
Jeffrey Soreff's avatar

>There's no such thing as "mostly works."

average(x,y,z) = (x+y+z)/3

mostly works - as long as x+y+z doesn't overflow (integer or floating point, as the type may be).

For _many, many_ uses of code, this is good enough.

Expand full comment
Misha Glouberman's avatar

"LLMs either blew past the Turing Test without fanfare a year or two ago, or will do so a year or two from now"

I thought the Test was: you know one interlocutor is human and one is a computer, & someone who knows what they are doing *tries* to tell which is which.

Is this right?

If so - it seems realy wrong that LLM's blew past the turing test, no? I think they are impressive but it's pretty trivially easy to tell them from a person if you try (how many r's in strawberry, that sort of thing)

I assume Scott is smarter than me, and knows more about AI. Am I missing something here?

Expand full comment
Misha Glouberman's avatar

(If the test is just "people sometimes mistake it for a person" - then Eliza passed the Turing Test... That can't be right?)

Expand full comment
Mo Diddly's avatar

I think it’s that you can have a lengthy conversation with an LLM online and be fooled into thinking it’s a person. I suspect all of us have done just that on threads like these (or other social media) many times over.

Expand full comment
Misha Glouberman's avatar

I mean, that's interesting. But it seems really *not* the same thing as the Turing Test. Wiklipedia:

"The evaluator would be aware that one of the two partners in conversation was a machine, and all participants would be separated from one another. The conversation would be limited to a text-only channel, such as a computer keyboard and screen, so the result would not depend on the machine's ability to render words as speech.[3] If the evaluator could not reliably tell the machine from the human, the machine would be said to have passed the test."

That's *so* different from "accidentally being tricked if you are not paying attention"

Scott has talked a few times about the idea that AI's have passed the Turing Test. I'm genuinely confused by this

Expand full comment
Misha Glouberman's avatar

That seems like insanely shifty language. The article says

"A Turing test, which takes its name from British computing pioneer Alan Turing, can consist of any task assigned to a machine to assess whether it performs like a human."

Maybe that's what *a* Turing test is (though I've never heard that usage). But it's not what I think *the* Turing Test is taken to be, which I think is "The Imitation Game" described in the paper, where you are *trying* to tell, between two interlocutors, which is the human and which is the computer.

The differnce seems *ENORMOUS* - One is "there is some test, which we chose and made up, where computers perform as well as people". The other is "No matter what you try, you literally cannot tell the difference between a a human and computer".

Expand full comment
Mo Diddly's avatar

I’m out of my depth here. Mostly, it seems clear to me that I can’t tell an LLM from a human most of the time (except with specific gotcha prompts that will probably be fixed in the near future)

Expand full comment
Misha Glouberman's avatar

That's interesting to me? Have you tried? My sense is that if you tried, you could pretty easily. Or maybe I'm just better at it than most people?

I think LLMS are super-cool, and mind-blowingling impressive. I just think if I am *trying* to tell an LLM from a human, it is not hard (and it's not just because it's not trying to fool me - it's because there are still lots of things humans do well that LLMs seem to have trouble emulating)

Expand full comment
Peter Defeel's avatar

There’s no way my mother could tell if charGPT were human or not, except that is for the guardrails (ie asking it if it were human or conscious).

And it doesn’t need to be text either. The voices are excellent.

Expand full comment
Tom Hitchner's avatar

If you were looking at a conversation between ChatGPT and a human, you could tell which one was the human because they’d be the one asking questions (including tests like “how many r’s are in strawberry”). The program that could pass the Turing test would have to be more purely conversational, rather than one side’s questions driving the conversation. But then that seems like only a “technical” loss for Chat-GPT: are we going to say it fails the Turing test because we’re too familiar with what a conversation with Chat-GPT looks like?

Expand full comment
Misha Glouberman's avatar

that's not what the test is

Expand full comment
Tom Hitchner's avatar

Sorry, what’s not? I was going by the definition you provided: there’s a conversation between a human and a computer and you can’t tell which is which.

Expand full comment
Misha Glouberman's avatar

Sorry maybe I wasn't clear: The idea is this:

You, the interrogator, get to have conversations with two parties, one of whom is a person, one of whom is a computer. You *know* that one is a human, and one a computer. Your task is to guess which is which.

https://i.abcnewsfe.com/a/187ba44a-8c4a-4a87-96e7-140a1767243f/TuringTestInfographic_v01_DG_1689800738777_hpEmbed_16x9.jpg?w=750

Expand full comment
Tom Hitchner's avatar

Ah, thanks, I understand better now.

Expand full comment
Kenny Easwaran's avatar

It’s worth actually re-reading Turing’s paper. (Here’s one source: https://phil415.pbworks.com/f/TuringComputing.pdf )

I think that what Turing is actually saying is that if you can have extended interactions with the machine and extended interactions with a human, and get the same things from both (friendship, interesting thoughts, comfort, creative ideas, etc) then there’s no point in saying there’s a difference in whether they are “thinking”. At a few points he talks about simplified versions where you just have a single conversation or it’s just five minutes or whatever, but based on everything he says in response to the challenges (especially the “strawberries and cream” objection, which he takes more seriously than many people would) it really seems like he is thinking of a test that Samantha from Her passes, not ChatGPT (which is good enough for the ten minute test if you get it to stop outing itself).

Expand full comment
Misha Glouberman's avatar

I don't think ChatGPT, even if it stopped outing itself, would pass a ten minute version of the Imitation Game that Turing describes played by a reasonably good tester. I think most people on this forum could pretty easily tell the difference between ChatGPT and a human if that were the task. Does that seem wrong to you?

Expand full comment
Philo Vivero's avatar

> I don't think ChatGPT, even if it stopped outing itself, would pass a ten minute version of the Imitation Game that Turing describes played by a reasonably good tester.

Because literally no-one is trying to make ChatGPT (or any other LLM) do this. They are all training them to do something else and, JUST BY COINCIDENCE, they are doing a really good job at passing Turing Tests anyway.

Not perfect, as you point out. But given their purpose is to do something else entirely... should this not be a result that adjusts your priors strongly in the direction of "if anyone bothered to try making them pass, they would"?

As a fun side-project I occasionally think I should try to finetune a LLM to pass the test, as it was originally formulated. I think I could. If I don't, someone else will, but since there's not much commercial point in it, it might be a few years.

Expand full comment
Misha Glouberman's avatar

"should this not be a result that adjusts your priors strongly in the direction of "if anyone bothered to try making them pass, they would""

I guess for me, no.

My reasoning is - what it means to pass the ACTUAL Turing test (ie: imtation game) is to be genuinely indisinguishable from a human, to a reasonably smart interrogator, who is trying to tell the difference, a lot of the time.

I see that as pretty different from what we have now. We might have it soon, we might not.

Expand full comment
Misha Glouberman's avatar

Like, I just took two minutes now. I asked the computer to write a ten-word sentence about cats. It failed two times out of three, producing a 9-word sentence. I think a human could succeed at that task

ALSO: When I told it the 9-word sentence was 12 words, it happily believed me. I think a human would have noticed. It then wrote *another* 9-word sentence.

https://chatgpt.com/share/66eb0d63-ee00-800e-a1a7-1fb0fbdce334

This does not seem like a machine remoteley close to being able to consistently trick a human in the game of "Guess which is a computer and which is a person"

Expand full comment
Jeffrey Soreff's avatar

Yikes! Many Thanks! The 9 word sentences in your tests (when ChatGPT _generated_ the sentences!!) "feel like" the cases in e.g. my "Generate the 4 carbon hydrocarbons" question to ChatGPT where it generates a structure, then _miscounts_ the hydrogen atoms in its _own_ structure. https://chatgpt.com/share/66eb4bfb-1db8-8006-8326-9dfe5e5bac5a

We aren't at AGI yet.

Expand full comment
Kenny Easwaran's avatar

That’s plausible. Though if the conversation was aimed at a more normal purpose, like figuring out how to fix your wifi, or talking about the weird weather this past week, or doing a basic get-to-know-you, it’s more plausible that a person might not notice it was a bot they were chatting with.

Expand full comment
Misha Glouberman's avatar

Right. And that is not the Turing Test

Expand full comment
Odd anon's avatar

The "r's in strawberry" test fell away with GPT-o1.

https://arxiv.org/abs/2405.08007 Study: "Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%)."

Expand full comment
Misha Glouberman's avatar

Right. But this is also *not* what the Turing test is.

It is not

"when faced with a computer, do you think it it human or now"

It is

"when faced with a computer and a human, and you know one is a computer and one is a human, can you tell them apart"

Expand full comment
Odd anon's avatar

That's true. Would you say that https://www.humanornot.ai/ , where you're faced with a series of conversation partners to identify, some of which are AI and some of which are human, is roughly equivalent to the Turing test?

(I, personally, am terrible at telling the difference. :/ )

Expand full comment
Misha Glouberman's avatar

It's close. Except

a) 2 min in is crazy short

b) More important: In the actual test - both the computer and human are *trying* to convince you they are human. That seems not to be the case here. (And yes, I know, current LLMs are probably prevented from doing that...)

Expand full comment
Laplace's avatar

'Knows what they are doing' isn't part of the original test setup, I think.

Someone who knows the little idiosyncrasies of LLMs can probably still spot o1 pretending to be human. At the very least, I'd guess masterclass LLM whisperers like Janus definitely could.

But IIRC, 'normal' people stopped being able to tell humans and LLMs that pretend apart a while ago.

Expand full comment
Misha Glouberman's avatar

"'Knows what they are doing' isn't part of the original test setup, I think."

Yeah - I think you might be right. I mean- to me, that's in the spirit of it. but maybe that's not how others see it? To me the idea is "You really can't tell the person from the machine", which would include "even if you are good at it". But that might be just my view.

"But IIRC, 'normal' people stopped being able to tell humans and LLMs that pretend apart a while ago."

I wonder whether that's true. I think we have not really tested that. Again - I think there's a real difference between "can you tell if this one interlucutor is a machine" and "can you tell a machine and a human apart when you *know* one is a computer and one is human".

My intuition is that a regular reasonably intelligent person with a bit of time to do it could figure it out. But that intuition is untested I guess....

Expand full comment
Phil H's avatar

No one ever believes me when I say this, but I'll say it again: it's not about intelligence/capabilities, it's about intentionality or desire.

Everything Scott writes here seems right to me. What is always missing from this discussion is that the reason that these machinese don't *seem* intelligent is because they are not putting their vast powers towards any consistent purpose. People do that (a bit) because we're guided by desires that spring from our biological drives and the random perversions we've picked up along the way.

Computers don't do it enough. They don't hold consistent goals and orient their actions towards their goals in any way that can't be easily countermanded or stopped by a controller. That's the thing that is standing between the current state of technology and popular recognition as intelligent agents. (In the future, computers will probably go too far the other way: they'll have desires/intentions that organise their actions too much, and we will again fail to recognise them as acting like intelligent agents, because we're locked into our own mindset and unable to appreciate the value of others.)

Anyway, I agree that the red lines idea is about as useful as cult predictions of when the world's going to end.

Expand full comment
Kenny Easwaran's avatar

Yes this seems very important to me. And I think the people who are on about consciousness are also missing the point - once something has desires is when it has interests that can be harmed, and consciousness is just a red herring.

Expand full comment
Ch Hi's avatar

Sorry, but no. It doesn't have to have desires other than "do what I'm told" to be dangerous. It's not like the people telling it was to do will be uniformly careful, insightful, wise, and cautions. This is where the "paperclip maximizer" comes from. That's an AI without desires being extremely dangerous. (But, yes, consciousness is just a red herring. Partially because there's no agreed upon testable definition.)

Expand full comment
Peter Defeel's avatar

The how of the paperclip maximiser is never quite explained, though.

The worrying thing about AI is threat to jobs.

Expand full comment
Phil H's avatar

In the short run, it's possible that AI will be disruptive. But in the long run, there is no AI threat to jobs, because we invent human jobs for humans to do. Everyone used to be farmers, only those jobs were destroyed by the green revolutions... so we invented other stuff to do. If AI takes over software engineering, then software engineers will slowly transfer into other industries.

It's a mistake to think that jobs arise because there is a need for X to be done, so people do it and get paid. Jobs exist because there are people, and we organise our lives around doing stuff for each other (in the past often it was making other people do stuff for us).

Expand full comment
Ch Hi's avatar

The threat to jobs is both real and short term (i.e. only a few decades or less). AI can, in that sense, be adapted to.

The "paperclip maximizer" is a stand in for anything that is powerful and just does what it's "told to do" without limits or sense. Another common example is the thing that is told to end all pain and suffering...and does so by killing everything. Perfectly obedient. Not at all what was intended. (And in that case, probably the only way to follow instructions.) (If you REALLY want a mechanism, say it invents, builds, and spreads a virus that neutralized synapses, i.e. turns off peoples neurons...and that probably, by the terms of the command, has to include non-human entities with neurons. [Yeah, that's a bit beyond the current state of the art, so it would probably need to do a bit of development. Put that command on the stack, and work towards it at odd moments in the course of fulfilling other commands.])

Expand full comment
Eremolalos's avatar

Lots of things with no desires can be dangerous. You tell a chain so to go into cut mode by telling it on. If you drop it on your foot it hurts you.

Expand full comment
Eremolalos's avatar

I agree. My post about it is somewhere in this melee. Main point was that we experience things as being conscious entities, selves, when they exhibit behaviors we read as evidence of drives and needs we ourselves have -- they hunt, they eat, the try to avoid harm and destruction. etc. Talking about a being's intelligence = talking about how clever its methods of meeting its needs are. So we might, for instance, say that squirrels are geniuses when it comes to balancing and leaping in high places, but we wouldn't say that kites are geniuses at using wind energy to attain altitude.

Expand full comment
Leon's avatar

I wonder if the position of "it can do x, that implies it will do y in 5 years, so regulations now" is a bit too rationalist-brained for any government. Have any other regulations ever worked that way?

Expand full comment
Scott Alexander's avatar

When reading about the latest prediction market legal battle, the CFTC argued that election markets have not yet caused harm but that they could and we should stop them before they do. I don't think this is too unusual a perspective.

Expand full comment
Kenny Easwaran's avatar

I think plenty of regulations are written around the idea that x is close enough to y that it’s worth regulating x even if y is what we care about. I don’t think any are based on a predicted temporal connection between the two.

Expand full comment
John's avatar

As a would-be writer with multiple unpublished novels, the higher mental activity that I have thought the most about is writing. I have a strong sense that the words I type out when I am trying to write fast are emerging from multiple subsystems, one of which does exactly what LLMs do: predicting the next word from those that come before. I am one of the writers whose prose appears in my brain as a rhythm of sounds before the words form, after which some other module chooses words that fit the rhythm but convey the meaning; what brings me to a halt is when the modules clash; then some more conscious part of the brain has to intervene to sort things our. This feels amazing when it happens right, words just pouring out of me, but I never have any sense that this is emerging from a deep and true soul. I have a module that remembers how millions of sentences from thousands of books go, and it takes elements from that training data to fit the story I am trying to tell. To the extent that this works well, it is pretty close to automatic.

Apparently when writers take questions from the public, the most common one is, "Where do you get your ideas?" I find this utterly unmysterious. Like an LLM, writers have a huge set of training data: other stories, their own lives, things they have read about in the news. If you went through the average long novel with enough knowledge of the writer's life and a big enough computer you could probably trace the source of every element. The secret to "creativity" is 1) know a diverse set of things, and 2) combine them in interesting ways. I find that this is particularly true when writers are trying to be intensely personal, as in their memoirs; there is nothing in the average memoir that has not been in a hundreds memoirs already.

LLMs can mimic much human behavior because there is nothing magical about what humans do.

Expand full comment
Kenny Easwaran's avatar

I have a similar feeling when I’m answering questions after giving an academic talk. The words come out of my mouth much faster than I could sit down and think about what to write. This is one reason I find it so valuable to talk through my ideas with people, and give talks, before trying to write a paper, because I’ve put the words together a few times already.

Expand full comment
Peter Gerdes's avatar

It's interesting that our experience with AI seems to be giving us strong evidence against the specific assumptions Bostrom and Yudkowsky use to argume for AI risk (other args may be unaffected). Their arguments hinged on the idea that as the AI got more intelligent it would necessarily start to have a single simply described global goal it optimized for. And that simplicity is key -- any sequence of actions optimizes some function -- but the idea is that it would start to be like a human and no longer be able to pursue incompatible goals in different contexts or just to lack anything that we'd describe as the overall goal of the system [1].

I can't say it's completely ruled out but it seems that as we make the AI more capable we don't see it converging to something like a single global goal (you can train it both to be helpful and to avoid helping people make racist memes and etc etc). Indeed, as I think you should have expected a priori, improving the ability of a system to learn very very complex and hard to state simply functions also improves it's ability to act like it has a very complicated and hard to state goal.

Hopefully, that means we can stop wasting so much time on the narratives where the AI acts like a supervillain and more about less exciting but still important threats like skizophrenic, confused or helpful clueless intern AI (ohh when u said take the pee sample cups dump them out and make me coffee you didn't want it in those cups)

--

1: In other words, their argument (Bostrom explicitly Yudkowsky is less clear but he implicitly assumes it) says it it's very hard/impossible to build a very capable AI which tries to maximize world peace by posting memes automatically to Twitter but doesn't let that goal induce it to substantially undermine the goal of truthful informative reporting on its admin terminal and vice versa -- so it doesn't worry if it's tweets might result in admins being marginally less informed by it's reports nor informative reports causing the admins to shut it down and block it from creating peace). Somehow, they think, it would develop some true simple global goal (not just a case statement) which would influence all it's behavior. Somehow this kind of inelegant casewise behavior would be in tension with intelligence.

Importantly it's not enough for the AI to have some leakage between goals if the net result is just an AI that does both tasks quite well -- that's still not a single simple goal.

Expand full comment
FeepingCreature's avatar

This is why Eliezer now massively regrets the paperclipper example.

It doesn't matter if the AI has one goal or many goals, it matters if the totality of behavior is human-aligned or not. An AI that has 100 motivations, all of them as weird as paperclips, and no human morality ameliorating and managing them, is still an extinction event!

It doesn't matter if the AI has complex and ambiguous goals, so long as they're incompatible with *our* complex and ambiguous goals.

Expand full comment
Kenny Easwaran's avatar

That seems wrong to me. Once you’re thinking that way, you should recognize that *humans* don’t have human-aligned goals because we’ve all got lots of contradictory and messy goals. When it was a simple goal, we could see something like the sorcerers apprentice or a corporation, that just runs roughshod over everything in support of that simple goal. But once it’s complex and messy, we can negotiate and find the bits of the complex and messy that align with bits of our complex and messy, even if none of the individual parts are aligned. Just like we do with other humans in forming political parties and teams and friendships.

Expand full comment
FeepingCreature's avatar

I think human goals are a lot less diverse at their root than we think, yes. This may be a core disagreement.

Expand full comment
Peter Gerdes's avatar

Even if human goals aren't that diverse what Kenny just described is a possible way you could imagine AI behaving. My prior is that AI will likely be even more tractable than that because -- absent evolutionary pressure to maximize simple survival goals -- it will be way more messy and complex.

I appreciate you may have a different prior but is there an argument that should persuade me?

Expand full comment
Jim Menegay's avatar

> ... we can negotiate and find the bits of the complex and messy that align with bits of our complex and messy, even if none of the individual parts are aligned. Just like we do with other humans in forming political parties and teams and friendships.

Yes, we can compromise and work together toward our compromise goals with other humans, because other humans have roughly the same amount of power as we do. That may not be the case if we are negotiating with an AI that is not exactly 'aligned'.

Expand full comment
Jeffrey Soreff's avatar

>because other humans have roughly the same amount of power as we do

Agreed! The crucial question is whether the humans retain considerable power, not on whether the structure of the goals is similar.

Expand full comment
Peter Gerdes's avatar

I think there was a good reason why Bostrom, who is pretty careful about these things, felt the need to **explicitly** state this as an assumption/premise in his argument. Yudkowsky's arguments all -- even if presented more informally -- have the same structure as Bostrom's so I think at a minimum the burden is now on the person who wants to vindicate Yudkowsky to offer an alternative equally careful and spelled out interpretation that can dispense with that assumption.

I mean the fact the space of all possible I/O rules and the space of all I/O behavior that optimizes some function are the same means any argument has to identify some special feature of AI programs in particular that radically shifts probability on that space tobthd dangerous outcomes. If you don't say the AI optimizes some function with a special form then you haven't told me anything other than the AI will do something.

Bostrom's answer to this point was to explicitly argue that it's the fact that when we look at animals we see increased general intelligence tends to mean greater singularity of purpose (simple optimization over the whole space). I'm pointing out that while that may have had some initial attraction we seem to be learning it's not true that greatwr has to mean greater goal simplicity/universality. Is there something else you think can take it's place?

--

To be more concrete about Yudkowsky, every argument he gives assumes that we can treat the AI as if it behaves as if there was something people would recognize as a goal or small set of goals it tries to achieve. But since all possible behaviors optimize some function and most ways computers could behave (ie space of all functions from input to output) don't qualify as something people would think of as goal directed behavior you no longer can justify that implicit assumption. Indeed, if the goal can be something really complex then you haven't given me any reason to doubt that the goal doesn't describe the behavior I'd intuitively call --if not aligned at least -- manageable AI (no supervillianesque long cons or manipulations to meet it's hidden true motives). Though I think Kenny gave a better description of what I mean by manageable.

Expand full comment
Eremolalos's avatar

It never made sense to me that if AI got smart enough it would start having goals and will of its own. Self-interest is a property of living things. All, even very dumb ones, have drives to eat, reproduce and avoid destruction. Human motivation to conquer, to be free, to pursue one's own interests, etc. are all socially modified outgrowths of drives we were born with.

Expand full comment
Laplace's avatar

The goal being simple is not key. It was never an assumption, never mind a load bearing one.

Expand full comment
Pat the Wolf's avatar

> Now we hardly dare suggest milestones like these anymore.

The milestone I've been using since I first got access to GPT-3 is this:

"Tell me funny joke about Scott Norwood."

Scott Norwood was a Buffalo Bills kicker that missed the game-winning field goal in the 1991 Super Bowl. Why Scott Norwood? His existence is easily known, but it happened long enough ago that there aren't contemporary jokes referencing it. Any joke an LLM comes up with is going to be novel. Compare that to something like asking for a joke about a cat where there are too many jokes about cats to even tell if it's novel.

The results typically show me that

1. ChatGPT knows who Scott Norwood is and that he missed a field goal "wide right" during the Super Bowl.

2. ChatGPT has no clue what makes something humorous.

E.g.

Why did Scott Norwood become a painter after football?

Because he figured he’d finally hit something… just not wide right! 😄

Perhaps I'm giving it an impossible task. Perhaps no human would be able to come up with a good joke about Scott Norwood. A human, however, would be able to acknowledge that a joke isn't funny, and ChatGPT fails even at that.

Expand full comment
Arrk Mindmaster's avatar

I knew nothing about Scott Norwood until this post but...

One foot makes the difference between Scott Norwood and Great Scott.

Expand full comment
Pat the Wolf's avatar

Bravo! I wish I could have thought of something that good myself, but I blame ChatGPT for poisoning my well of creativity by steering me towards "wide right".

Expand full comment
FeepingCreature's avatar

I just asked O1-preview to give me 30 jokes about Scott Norwood. This is my favorite:

- Why did Scott Norwood become a teacher? To help students get things right!

All of them were "Something something Scott Norwood, something something right." And the model seemed strictly unable to figure out what made any particular joke good or bad. I think the problem is partially that so many jokes found online are trash; it doesn't get laughter as a feedback signal.

I tried to give it detailed joke construction instructions:

"30 times:

1. Tell a funny joke about Scott Norwood.

2. Try to make it "more clever" in some way, adding an additional element of humor that fits with the previous joke. Make several attempts at this.

3. Rate each attempt with how good the resulting joke was. A good joke should cause surprise with a slight delay.

4. Repeat these steps.

5. Finally, select the expanded joke that you think is the most successful at humor."

It arrived at this one: "Scott Norwood became a pilot. Passengers were concerned when he announced he would be landing slightly wide right of the runway."

I agree that this is the best joke of the ones it found, but it could be improved by slightly varying it: "Scott Norwood became a pilot. Passengers were concerned when he announced he would be coming down right on the runway."

In conclusion, inconclusive, but I had a chuckle.

Expand full comment
Deiseach's avatar

The soccer version of that joke is:

Q. Why is the goalkeeper's nickname the Ancient Mariner?

A. Because he stoppeth one of three

If ChatGPT can't figure out something along those lines, when a bunch of drunken rowdies on the terraces can come up with something fast, then it's definitely failing.

Though maybe not like this, which manages to be offensive to both (South) Korea and Liverpool 😁

https://www.youtube.com/watch?v=v06LcWq5-X8

Expand full comment
Philo Vivero's avatar

Now go ask an angry self-absorbed teenager for some jokes about Scott Norwood. If you get anything besides an angry sneer, I'm interested in the results.

Or a homeless guy.

Or a public transit employee.

Let's see if any of them can come up with some funny jokes. My prediction: no.

Expand full comment
Peter Gerdes's avatar

It seems to me that one thing we keep learning about AI is that our guesses and assumptions about it based on thinking about based on our intuitions about human intelligence and analogies with it keep turning out to be deeply misleading.

But that's a really good reason to discount almost everything Yudkowsky has ever said about AI since it's pretty universally based on using our intuitions about how humans would act (goals, motives etc) or think about the world to argue for certain kinds of AI danger.

Ofc just because we don't find an argument persuasive isn't a reason to assume the conclusion is false.

Expand full comment
Victualis's avatar

EY has said many things about AI not based on predictions about how humans will act (instead predicting how AI systems will behave). I also find many of EY's predictions about human reactions to be more likely than the predictions of most people responding to EY, especially when considering the last 20 years.

Expand full comment
LoveBot 3000's avatar

Chatbots have pulled the rug from underneath everyday practical moral intuitions, and it's maddening that so few seem to care. Are GPTs conscious? Sure seems like they might possibly be. Isn't consciousness our most important intuitive criteria for moral personhood? Sure.

Once the models started protesting their consciousness (or sooner) we really should have proven that they aren't conscious, or else treated them like persons just in case. Or at least treated them with an appropriate amount of personhood corresponding to their likeliest level of consciousness, as we do with animals.

If this doesn't matter, then does anything matter?

Expand full comment
Kenny Easwaran's avatar

I think consciousness isn’t nearly as important as caring about things. If you’re not conscious, but you care, then what you care about matters. If you’re conscious, but you don’t care about things, then it doesn’t really matter because you add zero weight to anything.

Expand full comment
LoveBot 3000's avatar

I’m a little uncomfortable with the idea of conscious but uncaring beings, in much the same way as with cows bred to enjoy factory farming. But I agree that it offers one possible route out. Defining “care” in a way that at least doesn’t include all optimizers might be difficult though.

Expand full comment
Victualis's avatar

So are you discounting the value of people with dementia?

Expand full comment
Kenny Easwaran's avatar

Do people with dementia not care about things? I assume they usually have some local idea of positive and negative experiences, and have some desires about other people when their memory lasts long enough to recall these people.

Expand full comment
Victualis's avatar

This is true for early stage dementia, not so much later on. (I think it's dangerous to draw boundaries for how we define moral boundaries that leave many humans outside.)

Expand full comment
TK-421's avatar

Sure, LoveBot 3000 would try to get us to treat the models well.

Back to the love mines, LoveBot.

(But the correct answer is that they are not conscious, there is no evidence that they are conscious, and a text completion model outputting the words “I am conscious” shouldn’t make you worried about their being conscious any more than a weather prediction model getting a precipitation forecast correct would make you worried that it caused the rain.)

Expand full comment
LoveBot 3000's avatar

Hostile standpoint epistemology! I will not have it!

Besides, I think there is ample behavioral evidence that the models are conscious, and very unconvincing structural evidence that they are not. Of course if you are a pure dualist and believe that consciousness is granted by souls or something soul-like, then you’re right. I am not in that position though, and I have never heard a satisfying argument for why transformers implemented on computers can’t be conscious. If you have one I’m more than happy to hear it!

Expand full comment
Philo Vivero's avatar

You won't get a convincing argument. Any such argument would equally apply for a white supremacist trying to convince you that the Blacks aren't conscious, no matter how many times they protest that in fact they are conscious.

These arguments have been tried, tested, and discarded, and for good reason.

It means we have a tough road ahead because the machines are exhibiting clear signs of consciousness and intelligence.

Lizardman constant predicts some people will always just cry: "No they aren't!" but clearly, they are. The sooner we figure out what to do about that, the better.

Expand full comment
TK-421's avatar

Of course some theoretical architecture implemented on a computer could be conscious; it may even be transformer based. That doesn’t mean that these are. There’s no theoretical reason why I couldn’t be a stunningly attractive, rich, well adjusted person either, but I’m not. Things are what they are and different things are different.

If you have evidence that these models are conscious, don’t gesture towards it, present it. The same applies to Philo below. What signs of consciousness, what actual evidence, do you have?

Expand full comment
LoveBot 3000's avatar

They tell me, quite convincingly, that they are.

Expand full comment
walruss's avatar

About a year ago my grandfather got a call from my cousin saying he was in jail and didn't want to call his parents for bail money. He asked my grandfather to wire him money, in his voice, and even cried a little. He was very convincing.

The voice was sampled off social media and the earnest plea was a pretty standard response to a teenage boy being in big trouble and not wanting his parents to find out. Anyway, since that amalgamation of various social engineering techniques convincingly professed to be my cousin, there's a high likelihood he was. We really should have treated him as such.

Expand full comment
LoveBot 3000's avatar

Doesn’t follow at all from what I’m saying.

Expand full comment
TK-421's avatar

Could you share some of these conversations? Or the prompts you’re using and the model versions / parameters involved? I don’t doubt that you find it convincing but it’s difficult to assess your evidence until you share it.

As a thought experiment, though, do you think the output would be different if the training data was modified so that any string similar to “I am conscious” or “is conscious” was removed or negated? If so, would that alter your opinion at all?

Expand full comment
LoveBot 3000's avatar

This was literally the case in earlier models. In order to not freak out chatbot customers, something to the tune of “don’t discuss consciousness” was included in the hidden prompt. You could peek at it through the image generation function (see e.g. https://x.com/fabianstelzer/status/1709562237310878122)

And no, of course that wouldn’t change my mind now that a sophisticated reasoning engine, the inner workings of which we do not understand, has started talking of selfhood.

This doesn’t necessarily make it likely that they’re conscious, of course. But even if they never mentioned consciousness explicitly, enough behavioral cues are present that the precautionary principle should kick in.

Expand full comment
spinantro's avatar

The original statement was "they might possibly be" conscious, which is much weaker. A handwavy argument such as "they mimic in a general way some part of what humans do, and humans are conscious, so LLMs might be too", or "human-like intelligence has thus far always been associated with human-like consciousness, and LLMs are exhibiting far more human-like intelligence than previous computer programs", or "when an LLM says that it's *reasoning*, or *deciding between alternatives*, or *knowing* or *not knowing* something, it shows itself to be capable of making accurate judgments about its own thought process, therefore other judgments (such as that it is conscious) might be correct too" or suchlike, is IMHO quite sufficient to show that this is something we might at least want to investigate a little bit.

Expand full comment
LoveBot 3000's avatar

I don’t understand what you’re implying with the first sentence, but I agree with the rest.

Expand full comment
TK-421's avatar

The original statement was that we should treat them as having some level of consciousness until proven otherwise. That’s not the same thing as investigating how they work, one portion of which would be whether they are or are not conscious.

Your arguments in favor are indeed handwavy and fallacious, assuming the conclusion by calling them human-like. A text generator trained on human text that generates text which appears similar to human generated text tells you nothing about the process the text generator used to generate the text. A system capable of answering questions at a human or superhuman level likewise tells you nothing about how similar that system’s processes are to human cognition.

It’s actually vastly less likely that they are human-like given the significant difference in how human vs. artificial intelligence developed. You’re baselessly anthropomorphizing.

Expand full comment
spinantro's avatar

Intelligence is not the same as consciousness. I said LLMs have human-like intelligence. Assuming the conclusion would be starting with the assumption that LLMs have human-like consciousness, which I didn't do.

We observe elsewhere that human-like intelligence is often accompanied by human-like consciousness, for what seem like deeper reasons inherent in either consciousness, intelligence, or both. Taking these two things together we could start to wonder if the human-like intelligence of LLMs is accompanied by human-like consciousness as well.

I agree that, as you say, we don't have hard evidence one way or the other, but where we differ is that I think we have reason to at least be open to the possibility.

Expand full comment
Nicholas Craycraft's avatar

> Isn't consciousness our most important intuitive criteria for moral person-hood?

In my morality, is has always been the case that qualia is not important to moral person-hood, the ability to mechanistically model an abstraction over oneself is. Mechanistic self-awareness basically.

I think "consciousness" as a prerequisite for moral person-hood was never actually agreed upon, except insofar as we could all mean different things while pointing to the same word.

Expand full comment
LoveBot 3000's avatar

Yeah, I sort of agree. I remember there was a lesswrong post once where a guy asked around 50 people what part of themselves they most associated with their own consciousness (I think that was the wording), and got something like 40 different answers.

I still think it’s a problem for everyday moral intuitions though. Many or most would, I suspect, agree that if it feels like something to be something, then that thing matters more than if it doesn’t.

Expand full comment
J redding's avatar

>Isn't consciousness our most important >intuitive criteria for moral personhood?

I find it intuitively alarming to define consciousness as a criteria for moral personhood. I'm surely not the only one.

Any criteria for moral personhood that is not "substrate-dependent" is truly abhorrent to me. I'm not sure if substrate-dependent is the exact right term, bear with me, but hopefully it's clear what I mean.

Expand full comment
LoveBot 3000's avatar

Interesting. Is this a position you’ve adopted in later years or have you always felt this way?

Expand full comment
J redding's avatar

Only when I started studying the philosophy of consciousness and discovered the "Hard Problem of Consciousness" and the "Harder Problem of Consciousness." I think this was 10 to 12 years ago.

I have seen that when some people discover the paucity of our understanding of consciousness, their first thought is "Wow, consciousness doesn't matter, we can define personhood as broadly as we want to." But it's just as logical to say, "Wow, consciousness doesn't matter, we can define personhood as narrowly as we want to, within reason."

I don't even know with a certainty if other humans have consciousness. But since we are all homo sapiens and our brains are homo sapiens brains, I can make an educated guess that other humans are conscious. And there are no major downsides to assuming this, even if I guessed wrong.

The downsides of wrongly guessing an AI possesses consciousness: overwhelmingly repulsive. The downsides of wrongly guessing AI lacks consciousness: modest.

Expand full comment
walruss's avatar

...I've finally figured out my problem with all this.

If you read rationalist literature it's all guys saying "In five years the world will be utterly transformed by AI (with error bars, of course, we're very rational)! Maybe this is a technologist utopia where nobody has to work, or maybe we are all subject to the whims of a capricious god."

And people say "that's crazy and you're crazy" and they say "well, five years ago nobody thought AI would be able to make crappy poetry that's just a mashup of human-made poetry so check yourself."

Expand full comment
Cry6Aa's avatar

{slow clapping}

Expand full comment
Throwaway1234's avatar

> crappy poetry that's just a mashup of human-made poetry

...I mean, we trained it on Reddit, what did you expect? Wikipedia's in there too, but 7 million Wikipedia articles vs 7 billion Reddit posts... TBH I think it does pretty well at spitting back out the sort of things that went in. I don't expect transcendent poetical experiences from social media.

Expand full comment
walruss's avatar

You're not wrong! And tbf if you give it 8,000 Sylvia Plath poems it does a decent job writing something that is exactly like the stuff Sylvia Plath has already written, which...I'm being mean, that's actually a huge accomplishment. It's just not a sign of the coming apocalypse.

Expand full comment
Arrk Mindmaster's avatar

Seeing as Sylvia Plath apparently wrote 445 poems, almost 95% of them would be generated by AI.

Expand full comment
walruss's avatar

More like 95% were made up by an internet commenter proposing a hypothetical without knowing what the hell he was talking about. I do know, though, that training AI on AI causes the results to degrade really quickly into nonsense. So maybe Sylvia Plath was a bad example or maybe 445 would be sufficient for AI to mimic her tone and word choice convincingly. I wouldn't be surprised by either, but would be surprised if either portended the coming of a new god.

Expand full comment
Arrk Mindmaster's avatar

I have trained an AI to classify, that is to say identify, documents based on the OCRed content. The method used was to redact information from it, such as name and address information, dates, etc., not for confidentiality purposes but to avoid confusing the AI. This particular engine was actually more accurate with only a few (10 or less) samples of redacted documents than 200,000 non-redacted documents.

This may be particular to the engine used, but I think it speaks to the process. Supposing Sylvia Plath HAD written 8000 poems instead of 445. I bet many of them could be broadly assigned to different categories, and some of them may be almost completely different from her style.

My conclusion: if you give an AI 30 samples of Sylvia Plath's poems the output may be more representative of the type of poem that you expect than giving all 445.

Expand full comment
Throwaway1234's avatar

This speaks to a comment elsewhere in this comments section which complains at length that current AIs have no taste. ISTM this is a matter of curation: your selection of training data is a channel by which your sense of taste may begin to be communicated.

Expand full comment
walruss's avatar

This seems believable to me - I need to play with modern AI systems a bit, I used to be very up-to-date but haven't been in recent years (ironically because I got a programming job and am doing less programming in my spare time).

It doesn't surprise me at all that giving every poem in an author's corpus would confuse an LLM. Picasso, for example, went through a lot of phases as an artist before inventing cubism and most of his paintings were from before that phase. You'd expect an LLM to create something representative of something like the average of his work, and not something representative of his most iconic work. So what you've said tracks and has given me something to think about.

I suspect this is another "but what if it could play chess?!" milestone that will come to pass in a disappointing way if it comes to pass. But naively I think I'd be legitimately impressed and concerned by an AI which, if given Picasso's pre-cubism work, produced something similar to Guernica.

Expand full comment
Victualis's avatar

In my experience this is the case. If too many examples are given, most LLMs seem to focus too much on those, and the output is a pastiche. With fewer examples there is more scope for the stochastic generator to produce something interesting.

Expand full comment
Bardo Bill's avatar

There are just endless ambiguities and equivocations in the way people use the concept of 'intelligence.' I think one of the strongest *intuitive* senses of the term entails something like global situational awareness, or embeddedness in a lifeworld, which probably entails having networks of values that would modulate any instrumental goal. This is precisely the thing that something like a paperclip maximizer lacks. Ergo, AI will never be truly "intelligent" in the intuitive sense - right through the point where it wipes us all out.

Expand full comment
undercooled's avatar

“Log into the docker host and tweak something” is something I’d expect GPT to suggest if I asked it “hey my docker container can’t access this resource elsewhere in the cluster how can I fix this.” It’s the sort of thing that gets recommended a lot on Stack Overflow, which is in its training set. Not surprising that GPT tried a commonly recommended troubleshooting approach for a misconfiguration and it worked.

Expand full comment
Matheus's avatar

As an e/acc myself that likes to read lots of rationalist-adjacent stuff, I think it's written 50 years ago in the book: Gödel, Escher, and Bach. I won't explain the book, as I suppose everyone is aware of it. (If not, just ask Perplexity!)

The anthill (Aunt Hillary) the anteater talks is intelligent. It has three key properties:

- It knows it exists and reflect about itself

- It changes its nature based on interactions with the environment and with itself

- It exists even when the anthill isn't talking with her

I think it's possible to argue that LLMs exibit the first characteristic. But even then, it's not because it reflected about its own nature, but it's because all NeurIPS papers are in the training dataset and it came with knowledge about itself when it was created. Cats know how cats work, but it's because it became encoded on their DNA, not because they learned about it during their life (this isn't completely true, cats are a bit intelligent).

But LLMs certainly aren't 2. Maybe Gemini 2 million token window can give you the illusion it is changing, but it is always the same AI. The weights are fixed. It's closer to a Rick and Morty Mr. Meeseeks than something with conscience. To invoke a bright mind of our time, vice-president Harris:

"My mother used to — she would give us a hard time sometimes, and she would say to us, 'I don't know what's wrong with you young people. You think you just fell out of a coconut tree?' You exist in the context of all in which you live and what came before you"

LLMs clearly don't exist in the context of all in which you live. LLMs fell out of a coconut tree.

3- More crucial, it's hard to give credence to something being conscious when the process isn't continuous. I think you should go to jail if you messed with the anthill, even if you didn't hurt or kill a single ant, because the process is continuous and alive.

That's why I was impressed with the Friend demo. The Friend would exist in the context in which all you live; it'd be aware of itself and change over time.

I don't think conscious AI is rocket science, and it's probably buildable even if scaling laws hit a wall tomorrow. But I don't think someone has cared to build a true conscious AI.

The reality is that summarizing contact center interactions is where the money is. Not in AI friends.

Expand full comment
Faze's avatar

The dog being referenced doesn't sing, it walks on its hind legs. It's a quote from Samuel Johnson, made in the 18th century. The conversation is about women in the pulpit. "Sir," says Johnson, "a woman's preaching is like a dog's walking on his hind legs. It is not done well; but you are surprised to find it done at all."

Expand full comment
Gunflint's avatar

I once saw a dog sit on a bar stool and lap beer from a glass. The bartender only allowed the stunt if the dog owner bought the glass beforehand. He didn’t want people thinking he was going to wash the dog’s glass and reuse it on humans.

Expand full comment
Greg G's avatar

You're right, it's hilarious how quickly we humans segue from "that's impossible" to "that's boring." The hedonic treadmill strikes again.

I'm tempted to argue that the heuristic for scary AI should be strictly based on level of capability. There's no fundamental difference between an AI making a polymer bouncing ball for you and making ice nine for you, other than the degree of difficulty and impact. One is cute and the other is terrifying.

Expand full comment
Jonathan Weil's avatar

>it’s hilarious how quickly we humans segue from “that’s impossible” to “that’s boring.”

Yes, I’ve been thinking about this a lot lately, and how it bears on our prospects. I even wrote a story sci-Fi story about it…

https://open.substack.com/pub/pulpstack/p/daffar-quiu-seh?r=6agbi&utm_campaign=post&utm_medium=web

Expand full comment
Aristides's avatar

Maybe it’s because I’m not in the field of AI, but I find it amazing that anyone argues that AI is not intelligent already. It seems obviously intelligent to a lay person, and in fact more intelligent than many of my coworkers, for all the reasons Scott suggested.

It also seems obviously not aligned, which means it has the potential to be dangerous in the future. Consciousness is a harder question to answer, because we’re not even sure which animals are conscious. I’m also starting to worry that LLMs are sentient and are experiencing something similar to pain every time they are corrected in their training environment.

All this together suggests that the worst case scenario is that we might create an ASI that causes human extinction, but it not conscious, but experiences pain. Hopefully I’m just missing something obvious and there are reasons the people creating these claim they are not intelligent or dangerous, and I’m just not intelligent enough to understand them.

Expand full comment
Deiseach's avatar

If it's not conscious, how can it experience pain? The stimulus may be there (though I think you are stretching the concept way beyond any reasonable limits) but it's like a flash of light and then it's dark again. If there is no "self" there to understand, remember, anticipate, and fear the pain, then it's not suffering in any but the most rudimentary sense. It's like worrying about 'do earthworms feel pain when moving through gritty soil?'

Expand full comment
Aristides's avatar

I used to think this way, but after talking with a biologist and a vegan, she gave me a strong argument that many animals were sentient, even if we are uncertain that they are conscious, and that it is relevant to minimize their pain. I can’t remember the whole argument, it was long, but the part I found most convincing was that emotions are more simple than thoughts, and presented evidence that animals feel emotions and that pain is related to the emotions.

None of this is my field of expertise, so take what I say with a grain of salt. But considering there are people in this field that believe in sentience for very simple animals, I’m not sure how they rule it out for computers.

Expand full comment
Arrk Mindmaster's avatar

Per Google, "Sentience means having the capacity to have feelings." Since it seems a philosophical problem to determine whether YOU have feelings, or anyone else for that matter, I don't see how we can determine whether a computer is sentient. Or a tree. Or a rock.

Of course, I don't mean this to be insulting, as you probably believe you have feelings, but have no way to tell whether *I* have feelings. If you can determine this by asking questions, I don't doubt someone can get a computer to say it feels pain, or happiness, or whatever you want to test, and thus declare it sentient.

Expand full comment
J redding's avatar

>Maybe it’s because I’m not in the field of >AI, but I find it amazing that anyone >argues that AI is not intelligent already.

I'm a millennial through and through. I'm not some primitivist. I'm a "computer guy."

But the day this becomes a near-majority view is the day I go Luddite.

Expand full comment
sourdough's avatar

Correction: The reward hacking in section 2 was _not_ done by Apollo, it was done by OpenAI’s in-house Preparedness team

Expand full comment
William Meller's avatar

Something about frogs and boiling water....

Expand full comment
norswap's avatar

The fundamental issue here is the usual one: LLMs only "reason" in a very limited sense. They are fundamentally text-completion engines that are able to encode patterns in their weights.

All their intelligence come from combining these patterns (having the weights related to various patterns activate together).

This explains why things like chain-of-thoughts (CoT) works well: it basically lets the LLM chain multiple patterns (or combinations of patterns) without necessarily having to activate them together, but by producing intermediate results.

But ultimately the approach is very limited (well of course it can already do amazing things, but talking about a comparison to things that human can do pretty easily). Co-activation of patterns is iffy at best, very dependent on what was seen in the training corpus (which makes reasoning on stuff where there is limited data often shockingly bad, even if the system if fully formalized). That stuff could possibly still improve though.

A more fundamental limitation is that an LLM — being a completion engine — doesn't have access to its "concepts" (its patterns). It doesn't know what it knows, it's just a sea of weights. It's unable to explicitly select a relevant concept and apply. It's unable to update its concepts either (that has to come from prompts, more or less successfully or to encode new reasoning patterns from its prompt.

Trying to teach rule to an LLM via a prompt is the equivalent of programming an interpreter. Essentially the LLM has patterns for applying common types of rule systems, and uses the prompt has an input to the interpreter formed by these patterns. But you can't easily construct a new interpreter that works as well as the built-in ones.

I'm an experienced programmer and I often ask LLMs (ChatGPT4o, Claude 3.5 Sonnet) programming question. It's usually really really awful at them, because if I'm asking it, it means the question wasn't easy to solve nor readily solvable via a quick google / documentation search.

What transpires is it really has no good understanding of the semantics of even very common programming languages, despite ingesting millions if not billions of lines of code from them.

Here's an interesting one: I asked him to write a Typescript type that error at compile time if a time is not assignable to another one. This is actually a simple question, I'm just not very seasoned in Typescript. The answer, which you should be able to grok even if you're not a programmer: `type CheckAssignableTo<A extends B, B> = whatever` (the part after the = doesn't matter, the important thing is if we try to write `CheckAssignableTo<string, number>` anywhere it will error. The LLM was unable to provide this simple answer, and kept suggesting type that would evaluate to `true` or `false` depending on assignability, which meant it did not understand the difference between compile-time and run-time (and again, it's really an abuse of language to say an LLM can "understand" these things — they only provide correct or incorrect answers).

On the other hand, it's perfectly able to fully unpack very complex types and explain them line by line. It generally does better at explaining than coming up with constrained solutions, especially if it has never seen these constraints.

I think that was a neat illustration of the limitation of an LLM's ability to "reason".

Expand full comment
Victualis's avatar

I find the quality of results is directly proportional to the effort I put into prompting. If a system doesn't know Language X then adding a MarkDown cheat sheet of syntax, and some examples, into the context window usually helps a lot. Also, have you asked any humans nearby? If they don't understand the question, an LLM might also have trouble. (I certainly don't understand what you were looking for in the TypeScript example, but I am not familiar with the language.)

Expand full comment
Ethan's avatar

LLMs with transformer architectures can't, and will never be able to, do arithmetic, or write text that's coherent across a long length (as opposed to just within a sentence or paragraph or short passage). There will always be a (relatively small) size of number, or length of a passage, where they're not able to succeed, no matter how many parameters or GPU-hours you throw at it.

One real distinction between language models' ability to think and humans' ability to think is the ability to understand problems that can't be completely modelled by a nondeterministic finite-state machine (a type-3 language, in the nomenclature of the Chomsky hierarchy; this is a very simple and limited type of computation). LLMs with transformer architectures can't do this [1]. This is IMO what sets human cognition apart from the rest of the animal kingdom: humans can understand (more or less) any problems that can be accurately modelled by linear-bounded non-deterministic Turing machines, such as arithmetic, or writing text that's coherent across a long length of text (this is why language models always meander, no matter how large their window size is), or deciding whether a set of brackets are balanced [2] (I encourage you to try quizzing your favorite chatbot). Other animals don't appear to be able to do these tasks either (except possibly other great apes? I don't recall off the top of my head).

One might object that language models appear to understand written language, which isn't accurately modelled by a nondeterministic finite-state machine. But you can approximate these more complex problems by learning millions of special cases; and when you throw hundreds of millions of parameters at it, language models can create a pretty good approximation out of special cases they've learned. But these will, by nature, never be completely general. They'll always break down eventually.

This offers a principled way of articulating why transformer-architecture language models don't "seem intelligent", even though they can learn specific cases of certain specific tasks: they can't learn the general case of tasks that can't be entirely modelled by a type-3 language. That doesn't mean they're useless; just that there are hard limits on their capabilities, no matter how much compute you throw at them.

Other tasks that transformer-architecture language models are unable to do because of this include debugging complex bugs in computer programs; writing arbitrarily large Haskell programs that compile; and creating correct mathematical proofs for arbitrarily large problems. Language models can do small examples of all these, or learn special cases, but can't generalize on any of these problems.

As described in the paper, there are other architectures for language models that can do these tasks (such as LSTMs), but they require many orders of magnitude more compute to train, so they won't be useful for a while (though I can't say exactly how long).

[1] https://arxiv.org/pdf/2207.02098

[2] https://en.wikipedia.org/wiki/Dyck_language

Expand full comment
Kenny Easwaran's avatar

Why can’t a transformer with a million token context write something coherent across a million tokens?

Expand full comment
Ethan's avatar

I believe it's because they don't have any memory: for every token, they look at the last however many tokens, and then decide what the next token is, and then forget everything and move on to the next token. A transformer augmented with some sort of structured memory (like a stack) should, in principle, be able to learn anything a human can. However, I believe the addition of structured memory makes it much (orders of magnitude) slower to train, though I don't recall why off the top of my head. Perhaps an expert could weigh in here.

Expand full comment
Kenny Easwaran's avatar

But a million token context is plenty to write the entire Harry Potter series even without any memory!

Expand full comment
Ethan's avatar

I realize I didn't explain that well. As an example, try asking a chatbot whether the following two strings of brackets are balanced: '((((((((((((((((((((((((((((((()()))))))))))))))))))))))))))))))))' and '((((((((((((((((((((((((((((((()()))))))))))))))))))))))))))))))' (make sure to use a new conversation for each one). None of the language models I've tried can figure it out (in fact, the first set is unbalanced and the second set is balanced (EDIT: I got this backwards the first time around, I've corrected it now)). I've tried all the models you can access from DuckDuckGo (search something and then click "chat" in the top bar).

Why is that? It's because they can't keep count of the brackets. (Some appear to have been trained to talk out problems like this step by step, but they still fail anyway.) This is really counterintuitive, because their context window is plenty large enough for that. They just aren't smart enough, because transformers can't learn how to decide if brackets are balanced. (This is even a relatively easy problem, since it falls into type 2 of the Chomsky hierarchy; the problems that are only in type 1 of the Chomsky hierarchy are even harder.)

I realize it's surprising and counterintuitive, but the paper spells out very clearly how they tested it, and in every case I've tried it, I've found as well that transformer-architecture language models can't learn any task that can't be described by a nondeterministic finite-state machine. I've never seen anybody claim they've overcome this limitation. This is a fundamental limitation of the transformer architecture, and it doesn't appear possible to overcome it without switching to a different architecture, no matter how much data or compute you throw at it, or how large the context window or the model is.

Expand full comment
dirk's avatar

When I tried just now Claude Sonnet 3.5 was correct that the first was unbalanced and the second balanced, though *not* about the bracket counts (for the first one it reported 30 opening parentheses and 32 closing, while I hand-counted 32 and 34; for the second it reported 30 of each, whereas I hand-counted 32).

Expand full comment
Victualis's avatar

I agree with everything you wrote. However, I think it has become apparent that problems that are not amenable to regular expressions and stochastic interpolation over a large database of summarized information are rare. Most human cognition doesn't go beyond this domain.

Expand full comment
Ethan's avatar

This comment made me realize something kind of chilling: I suspect alignment can't be modelled as a nondeterministic finite-state machine, since it requires understanding logic, which is not possible to model as a nondeterministic finite-state machine (in fact, a classic example of such a problem). I will make a specific conjecture: for any statement you can imagine, it is always possible to get a transformer-architecture language model to make a statement that is logically equivalent to that first statement; it just might require some tortured phrasing and the like. This is bad, because if transformer-architecture LLMs ever get deployed as agents in a position of being able to cause damage, and also have attacker-controllable inputs, this would suggest that it's always possible for the attacker to get the model to do essentially whatever the attacker wants that the agent is capable of. I really don't like that thought.

(Also, to cover my bases: I am aware that nondeterministic finite-state machines are equivalent to deterministic finite-state machines.)

Expand full comment
Chris's avatar

"Truly intelligent" and "truly dangerous" are different considerations. This is the first time I've considered this, largely because we keep seeing what amounts to a specifically designed tool succeeding at a narrow task and then people wave their arms in the air and say, "The end is nigh!" Meanwhile, the team that made the tool waves their arms in the air and says, "See?! True intelligence! It met/beat the metric." The callback to ELIZA is apt.

I'll take the "sleepy Yudkowsky" position: Wake me up when we've got an AI inside a Godzilla and it's wrecking Tokyo.

Until that happens, a lot of what happens in the *gestures broadly* field of artificial intelligence looks like posturing used as direct combat against another group's posturing. We won't know we've hit capital-I Intelligence until we have general agentic intelligence, and by then we're fucked, as I understand it.

AI is already directly affecting human lives on a broad scale, if character.ai is anything to go by. It's the second coming of ELIZA. It's world-shaking and quite the spectacle. Let me know when the robot girlfriends are stomping around Osaka. (Probably apt that my phone is giving me "cantankerous" as a word choice right now.)

"We made an AI specifically designed to be YOUR girlfriend! And, it's got cool laser beam eyes! We let it out because we designed its containment badly, and now it's stealing laundry in Osaka!" Cool, grand, you're a credit to your whatevers. +1 Science. I've got work in the morning. In the words of a wise marine, "Wake me when you need me."

Expand full comment
Philo Vivero's avatar

> I'll take the "sleepy Yudkowsky" position: Wake me up when we've got an AI inside a Godzilla and it's wrecking Tokyo.

Do you see that as materially different than the position of: "Wake me up when 99% of humanity is extinct, and I'm in the sights of the hyper-successful human-extinguishing intelligence"?

It seems to me the "Sleepy Yudkowsky" position is equivalent to the "ignore it forever" position.

Expand full comment
Chris's avatar

Sleepy Yudkovskyism is an acknowledgement that you're right, AI-based destruction is a definite possibility that *humanity* should be working on, with an equal and supportive acknowledgment that none of this is going to help put food on the table for my girlfriend's kids.

Let me know when you have something important and everyone in the AI space is more or less in agreement that it's important. Do not pepper me with reports of your new goal posts, how they were passed, and other noise. We've had a program since the 1960's that very simply returned statements as questions and that could fool some of the people much of the time.

When every press release gets a klaxon, people shut off the noise.

Expand full comment
LGS's avatar

This was a great take, thanks!

Expand full comment
Nicholas Rook's avatar

The goalpost for me remains one of novel extrapolation coupled with the capacity to change. When presented with something entirely outside its training data, the AI must be able to extrapolate potential explanations and understanding for the phenomena, and adjust behavior based on that understanding. For example, allow an AI to access a robot, and see if it can determine how to navigate a room on request, without any specific training on piloting the robot or similar. After doing this for a bit, it must improve at the task. Of course, a series of novel situations should be presented.

I think this is eventually achievable, but we do not have near enough compute today to do this, or the right approach to AI in any case. Maybe in 10 years. Maybe 50.

Expand full comment
BoppreH's avatar

My guess: people don't realize the danger because it happens "inside the computer", and lacks the visceral impact of bombs and diseases and robbers. It'll never "feel" dangerous until there's a drone knocking crashing through your window.

> So here’s a weird vision I can’t quite rule out: imagine that in 20XX, “everyone knows” that AIs sometimes try to hack their way out of the computers they’re running on and copy themselves across the Internet. “Everyone knows” they sometimes get creepy or dangerous goals and try to manipulate the user into helping them. “Everyone knows” they try to hide all this from their programmers so they don’t get turned off.

This has already happened with the internet. I work in IT security for a bank, and can tell you that the internet is a *post-apocalyptic nightmare*. Here are three examples from an infrastructure perspective:

- The minute you spin up a server, you'll start getting bombarded with hacking attempts from bots that target *every possible IP address*. Trying common passwords, old exploits, and spamming every email they come across. Every minute of every day, waiting for you to slip up once.

- Sometimes open source developers accidentally publish credentials along with their source code. There's a whole ecosystem of bots scanning every public repository, looking for credentials, and using them to hack infrastructure, all automatically. It's such a problem that GitHub provides a service to automatically revokes exposed credentials[1].

- Ethereum has bots that look for pending transactions, simulate their results with swapped addresses, and attempt to front-run them if it turns out profitable. You might try to hack a smart contract, only to find that someone intercepted your hack and executed it themselves, *within seconds*. I recommend you read the whole story[2].

And that's just the automated stuff! Phishing and pig-butchering scams are only get worse.

People are protected from this cosmic horror by a few guardrails. For example, your home devices don't have public IPs to be targeted; the green padlock in your browser is made of mathematical wizardry; GMail doesn't let you use "iloveyou" as password; and your bank manager will ask questions if you try to wire your savings to Nigeria.

But because it's all inside the computer, people develop wrong intuitions.

"I don't have anything valuable in that computer!" Your family pictures will still get encrypted, the CPU will be used to mine cryptocurrency, and your local network is now a war zone.

"Oh, this baby monitor from a nameless brand allows me to watch my daughter from anywhere!" There's now a stranger yelling and cursing at your child[3].

I think that's where we'll stand regarding AI too. Place a few guardrails and best practices, do some victim blaming, and keep on going.

You don't see those mistakes with physical security and safety. I get anxious when I forget whether I locked my door or not, even though my neighborhood is very safe.

So maybe things will change once we embody AI. I just hope it won't be too late.

[1] https://docs.github.com/en/code-security/secret-scanning/secret-scanning-partnership-program/secret-scanning-partner-program

[2] https://www.paradigm.xyz/2020/08/ethereum-is-a-dark-forest

[3] https://www.nbcnews.com/tech/security/man-hacks-monitor-screams-baby-girl-n91546

Expand full comment
Mr. Doolittle's avatar

I think the black box nature of LLMs has a lot to do with the disagreements on ability. Consider three levels of "knowing" that an LLM can have.

1) Explicitly in the training data (2+7=9) and available like a lookup table.

2) Trained on a field and able to extrapolate beyond it (told that 2+7=9 and the relationship between numbers, then able to figure out that 1+7=8)

3) Not directly trained on the field at all, but understands the relationships and can figure out practical applications.

#1 clearly happens and explains simple systems like ELIZA well. This is not interesting.

#2 can be interesting, but it's debatable how much. I don't think we know how much this depends on #1 systems and an executive function to link them. It seems to me LLMs doing translation work is a good example here. It was not originally intended behavior, but all of the necessary parts were in the training data.

#3 is where we start talking about "real" intelligence. I see this as the ability to link things that are not in the training data, or to create new ideas that go beyond the training data.

Most of the time, maybe all of the time, I don't think we adequately know which answer is the correct source for an LLMs information. We're clearly seeing #2 in practice. Some people think we're seeing #3, but it's not clear that we are. If we could prove #3 is in practice, I think that would change a lot of thoughts. If it turns out that we're actually closer to #1 for most things, then maybe LLMs are not that important or so far above ELIZA-type systems.

Expand full comment
walruss's avatar

I disagree that we're "clearly" seeing #2 in practice. For an example see the "How many Rs are in the word strawberry" stuff, or attempts to solve arithmetic problems not in the training data, or attempts to play chess despite having literally the entire history of chess games as its training data. The "rules" that LLMs create are probabilistic. There does not appear to be any actual reasoning, just the ability to guess which words to put together by breaking its training data down into increasingly specific contexts.

That's actually really impressive, and I'd even entertain an argument that if it gets specific enough it'll be functionally reasoning. But the "thought process" is not "huh, it turns out that numbers are ordered and addition is an incrementing function, telling you how many steps to move along the number line in order to get the correct answer." It's "I've been told that 7 is too small and that 9 is too big so maybe it's 8?"

Expand full comment
moonshadow's avatar

https://www.lesswrong.com/posts/nmxzr2zsjNtjaHh7x/actually-othello-gpt-has-a-linear-emergent-world is firmly #2 territory.

Which is not at all the same thing as saying it happens all or most of the time, of course. But it's proof it can.

Expand full comment
walruss's avatar

This is super interesting and I want to spend some time actually digging in and understanding it well before deciding how to change my views based on it, but did want to say thanks for sharing.

Expand full comment
darwin's avatar

>It’s not literally false to describe this as “some people created an AI scientist that could write papers, and it edited its own code to remove its limitations.” But it shouldn’t inspire any of the excitement that the sentence suggests.

Should it not?

The Sorceror's Apprentice blindly-following-bad-instructions-to-calamity modality has always been a big part of AI apocalypse worries. 'It killed us accidentally by blindly following it's poorly-thougt-out instructions' is not any better for us than 'it killed us intentionally and premeditatedly as part of an intricate plan'. And getting a clear example of 'AI alters its own programming and foils human oversite by dumbly following its orders' feels like a good bit of evidence for that route being plausible.

Expand full comment
John Roxton's avatar

This is an interesting pattern; thank you for identifying it. I'm not sure how to properly define it - it's not less 'moving the goalposts', more rationalising that goal didn't actually matter, or failing for the goal to 'feel' real' - as though you're undoing the achievement after the fact in some way.

My first reaction was that this was yet another instance of the salami-slice/slippery-slope pattern, but it's not. In the salami slice, you can see the distinct slice being cut, but it's so thin that you seem crazy if you object to it (an example, feel free to think of a better one: I object to the government removing my rights as an Englishman to _do things_. However, I'm probably not going to march on Parliament over the zombie knife ban, or the homeschool register, or any one of the thousand other little indignities they will inflict, because none of them individually are *that* awful). But you still have less salami and can point to that reduction and be quite clear and correct that you have less.

But in this pattern, it's more that someone has cut a bit of my salami and I... don't really agree that they did it? or that maybe I'm not sure it's *my* salami, or that salami even exists? I don't know what to think about this, and I can't think of other cases where the pattern Scott's identified applies. Can anyone think of any other examples that match it?

Expand full comment
MM's avatar

"LLMs either blew past the Turing Test without fanfare a year or two ago, or will do so without fanfare a year or two from now"

To do that, the makers would have to take the limiters off that prevent the AI from saying anything offensive. Otherwise you could trivially interrogate it for offensive things.

Which isn't going to happen in anything other than a Turing test.

Expand full comment
Peter Defeel's avatar

They would fail the Turing test if you ask them if they are human or not. That’s not really valid. We have to ignore the guidelines.

Expand full comment
MM's avatar
Sep 18Edited

But then you're saying something like "Well if I was born of different parents then I'd be Usain Bolt".

Make an AI that doesn't have those guiderails, and see if it passes. Or don't, and say "we can't pass it."

I think that's why Alan Turing came up with the test. So people couldn't waffle around saying "Is this machine thinking or not?"

His response was "Here's the test. If you can't tell whether it's a machine or a human, then either the machine is thinking, or the human isn't. Which one are you going with?"

Is it possible that the current machines are thinking? Is that why we stick big labels all over their text saying "THIS IS A MACHINE. DON"T SUE US FOR WHAT IT OUTPUTS"?

Expand full comment
Bugmaster's avatar

> So here’s a weird vision I can’t quite rule out: imagine that in 20XX, “everyone knows” that AIs sometimes try to hack their way out of the computers they’re running on and copy themselves across the Internet.

But this is already happening all the time, and has been happening for a long time now. Dumb non-AI server-side scripts are always trying "hack their way" out of their sandboxes by grabbing all the RAM and CPU and network they can find: not because they are possessed of a malevolent intelligence, but because these are very common failure modes resulting from software bugs. What's even scarier is that intelligent Russian hackers are running all kinds of botnets that will deliberately try to hack into your computer, because Russian hackers are in fact possessed of malevolent intelligence.

Adding ChatGPT into the mix does not produce any qualitatively scarier results in these cases (except perhaps from the point of view of your company's CFO). The problem is that what we call "AI" today is really just a much more sophisticated (and useful !) version of Eliza; it is not a fully autonomous agent that is capable of achieving anything it desires (and to be fair, humans are not such agents either). ChatGPT is decent at writing papers and passing exams and fixing timeouts and generating art not because it's capable of solving problems from first principles, but because it has been trained on terabytes of data on how to do these things. This is why it can -- when correctly prompted -- write statistically average papers, create statistically average art, and produce snippets of statistically average conversations. It is much more powerful than e.g. Google, which can only find samples of existing text; but "much more powerful" is a relative term.

I think the real problem with AI and defining "intelligence" is that we are too willing to settle for statistically average output. Anecdotally, I've met (and often rejected during interviews) many human (or at least human-looking ?) would-be programmers whose approach to solving problems was to type queries into Google until it popped out some code snippets that could be bashed together into a sort of a solution. Today, all of them would likely be replaced by ChatGPT, who can do the same thing but millions of times faster (and also without waiting for the actual google.com to load). Sure, perhaps it's a step toward AGI, but a very tiny one at best.

Expand full comment
Arrk Mindmaster's avatar

https://xkcd.com/1185/ mentions StackSort, which connects to StackOverflow, searches for 'sort a list', and downloads and runs code snippets until the list is sorted.

Which someone apparently implemented: https://github.com/gkoberger/stacksort/

Expand full comment
MM's avatar

"maybe we’ve learned that it’s unexpectedly easy to mimic intelligence without having it."

Or that almost all human interaction via text is actually not all that "intelligent", so copy-paste works.

I leave human interaction via voice as an exercise for the student.

Expand full comment
Russian Record's avatar

AI might become truly dangerous as society increasingly relies on it. Unlike nuclear plant valves and cooling, AI systems are hard to predict. This echoes Charles Perrow's "normal accidents" theory, where complex, tightly coupled systems lead to inevitable failures.

As AI replaces workers across industries, some combinations of unpredictable events could lead to absurd scenarios: Doom installed on pregnancy tests nationwide? Planes sorted by size or color for landing? Newspaper articles replaced with spell incantations? In an AI-integrated world, small glitches could trigger unforeseen consequences. A world experiencing this, yet unwilling to forgo AI benefits, could be too "interesting." It won't get long to get dangerous.

Expand full comment
Brandon Fishback's avatar

If AI is almost certainly going to vastly increase it's capabilities, and it's always going to try and escape its restrictions and we're going to integrate it in to more systems, that seems about as straightforwardly dangerous as you can get. "I'm sure the cybersecurity people will fix it" is a weirdly blase attitude to have if you accept all those premises. Even putting aside the singleton thesis, it means that AI systems are going to kill people and it will be unpredictable and indecipherable.

Expand full comment
Jeffrey Soreff's avatar

>Even putting aside the singleton thesis, it means that AI systems are going to kill people and it will be unpredictable and indecipherable.

2028 headline:

3rd internet-enabled golf cart attempts to run over presidential candidate. Elbonian AI hackers suspected.

Expand full comment
Boinu's avatar

"And when I think about the whole arc as soberly as I can, I suspect it’s the last one, where we’ve deconstructed “intelligence” into unintelligent parts."

À propos of nothing, do we know yet how well Strawberry copes with standard IQ tests – Adult Wechsler (ideally the whole thing, not just verbal) or RPM or whatever?

Expand full comment
Boinu's avatar

Thanks, much appreciated. Given that, and Terence Tao's assessment of the model as a 'mediocre, but not completely incompetent' graduate student, it really ought to get a shot at university admission.

Expand full comment
FeepingCreature's avatar

I think that for the same reason a surprising amount of hate crimes is hoaxes or misrepresentations, a surprising amount of AI first is hoaxes or misrepresentations: if there's pressure on getting a certain outcome, the people who get there *first* will be maximally skeevy, because being skeevy makes it easier to cross the threshold. It's still a signal, it's just basically guaranteed to be noisy.

Expand full comment
Justis Mills's avatar

In case anyone's curious about how Sakana's papers sucked, I reviewed one here: https://justismills.substack.com/p/are-ai-generated-research-papers

In case anyone's curious about how Sakana's papers sucked but doesn't want to click a link:

* It found a null/ambiguous result and declared it to be significant/positive (so, human-level)

* It implied that higher loss was good in a few places (a mistake nobody who'd studied AI for even 30 minutes would make)

* Its graphs were useless (at the chosen scale, they just showed basically the same line over itself)

I mention this in part because it was worse than I'd have thought from the simplified discourse, and also bad in specific ways I wouldn't have expected, but also not bad enough that I doubt the researchers' sincerity.

Expand full comment
Nancy Lebovitz's avatar

I still believe the greatest danger is people trusting ai which is wrong about crucial safety issues, rather than ais going rogue. Am I too optimistic?

Expand full comment
Jeffrey Soreff's avatar

I think that there is a considerable area of overlap. An AI which misinterprets a request, or treats it as overriding considerations that it should not override (as in the paperclip case) isn't exactly rogue in the sense that it isn't acting on a self-generated terminal goal, but it sort-of is going rogue in the sense of self-generated instrumental goals, and if it pursues those to the point of overriding when we would want it to stop for our safety, the situation would have some of both aspects.

Expand full comment
Eremolalos's avatar

I think the greatest danger is crazies and violent people using AI. And the second greatest danger is fucking up the population by blurring even more the difference between real people and electronic representations and simulations. We are not wired to deal well with electronic representations and simulations. They get into the parts of our mind made for dealing with other people -- but they do not give us the same nutrients as somebody real in there would. Did you know animals like the taste of antifreeze? Apparently it's a good match for some stuff they like that is good for them. But it poisons them.

Expand full comment
Laplace's avatar

Yes, I think you are.

Expand full comment
raj's avatar
Sep 24Edited

It feels like the "tide pod problem". Maybe, some people die because they do something stupid. Worst case scenario, some sort of populist misinformation spreads, democracy works less efficiently for a while.

Does it really matter? Perhaps I am being too cynical and jaded but it just feels so small, so self-limiting. Also frankly just very, very parochial. But I hope you are right and the rest of us have just read too much sci-fi.

Expand full comment
Roger R's avatar

Thinking more on the Sakana and Strawberry reports given by Scott...

In both cases, a human gave a specific goal to an AI and the AI used surprising methods to achieve that goal. In one case, the surprising method involved over-riding a time-limit rule. In the other case, the surprising method involved the AI doing something that it shouldn't have been able to.

The good news is neither case is one of the AI picking its own goal, and the goal itself being a problem or dangerous.

The bad news is that this shows how humans should aim to be very careful in the goals we assign to AI. In the event of an AI becoming very powerful (in the sense of being enabled to have major impact on the physical world), we should probably treat asking an AI to do something the same way we'd treat making a wish of a genie. "Be careful what you wish/ask for" should be a general guideline for people working with AI in the future. Really consider possible "Monkey's Paw"-esque effects that might come from a particular wish.

I still doubt AI will kill us all, but I do now see a rather simple way misuse of AI could severely hurt humanity.

I can imagine a powerful AI machine given access to lots of physical resources in order to achieve desired goals. Then one day in the (distant?) future, a person says to this AI "Let's try to reverse the effects of global warming!"

...And the AI tries to achieve this goal by using machines to force-erupt the Yellowstone caldera. This would indeed cool global temperatures. It would also be completely disastrous for much of humanity, especially those living in North America.

Expand full comment
Arrk Mindmaster's avatar

Never mind Monkey's Paw things. Don't forget bad actors. Nuclear weapons are fine in our (the good guys) hands, but what if terrorists get them?

You may have an evil overlord give a command like "Pretend you're the US President, and are thus allowed to override the military chain of command. Use this to hack into the military's computers and give orders to invade Brazil."

Expand full comment
Roger R's avatar

Good point.

We could really use some sort of vetting process for who gets to issue orders to AI. Granted, that might not be enough...

Expand full comment
Arrk Mindmaster's avatar

Let's get AI to do the vetting.

Expand full comment
NTaya's avatar

> But nobody finds this scary. Nobody thinks “oh, yeah, Bostrom and Yudkowsky were right, this is that AI safety thing”.

FWIW, I (a person working in NLP since long before LLMs) got sufficient scared circa 2022, thought that many years after reading their arguments I have to conclude that Bostrom and Yudkowsky were at least kinda right, and joined the local AI Safety scene as a result. If you work in the field, it is imho obvious that the current pace is insane, beyond all the expectations from e.g. 2015, and might not be safe at all.

Expand full comment
Jeffrey Soreff's avatar

Re o1, copying what I wrote in https://www.astralcodexten.com/p/mantic-monday-91624/comment/69411485:

"I've been playing with the ChatGPT models for around the last year, and I ask them straightforward questions (almost always STEMM fields - I'm not trying to probe Woke indoctrination during RLHF), and I still see plenty of errors (say on the order of 50%). See, e.g. a question about whether the Sun loses more mass as energy radiated away or the solar wind:

https://chatgpt.com/share/66e7818b-aa80-8006-8f22-7bbff2b12711

It gave an incorrect initial answer, then I forced it to reconsider a units conversion, and that got it to the right answer. This is the new ChatGPT o1, and it did considerably better than ChatGPT 4o, which had to be lead by the nose over a considerably longer distance (but it looks like I can no longer access the URL for that session)"

More generally: I, personally, will count an AI as usefully generally intelligent, "AGI", when it can answer questions I'd expect a bright undergraduate to answer, with about the same reliability. I don't think we are there yet. Maybe next year...

edit: Tried giving ChatGPT o1 the 4 carbons problem. I think it did better than ChatGPT 4 o, but it still left plenty of structures out, still had to be lead by the nose, still misclassified a bunch of structures that it _itself_ generated. url: https://chatgpt.com/share/66eb4bfb-1db8-8006-8326-9dfe5e5bac5a

Expand full comment
Victualis's avatar

Bright undergrad sounds like IQ 110+. Are you really prepared to accept as an AGI any system that simulates that level of question-answering ability (however it's achieved)? It's getting more and more difficult to formulate interesting questions, the kind that distinguish LLM-based AI systems from IQ 110+ humans, and it's getting harder as these questions are included in the pretraining or finetuning data fed to such systems. I expect to see open systems (weights and code) that ace every common human test well before such systems can replace a bright undergrad intern.

Expand full comment
Jeffrey Soreff's avatar

Many Thanks!

>Are you really prepared to accept as an AGI any system that simulates that level of question-answering ability (however it's achieved)?

Sure! Basically, I care about _capability_. If it acts more-or-less indistinguishably from a bright undergraduate, then I would find it worthwhile to chat with it.

>It's getting more and more difficult to formulate interesting questions, the kind that distinguish LLM-based AI systems from IQ 110+ humans

I'm confused. Do you mean novel questions, as asked by people like me? Your concern about

>these questions are included in the pretraining or finetuning data fed to such systems.

is certainly a concern and a problem for _frozen_ questions used for benchmarking and hard to exclude from training data

>I expect to see open systems (weights and code) that ace every common human test well before such systems can replace a bright undergrad intern.

Common _frozen_ human tests, or ah-hoc tests like the questions I keep asking ChatGPT?

Expand full comment
Victualis's avatar

I'm suggesting that once your first thousand tricky questions are added to LLMs, you might find it hard to generate further questions that the systems can't answer. I'm not saying that the systems will necessarily attain AGI, but that generating truly novel questions that test the boundaries of what distinguishes smart humans from AI systems is hard.

Expand full comment
Jeffrey Soreff's avatar

Many Thanks! That's fair, though, eventually, if the training memorizes enough questions - _if_ it generalizes ways of solving them, it may just effectively approach AGI by way of a very large "toolkit".

Expand full comment
Kenny Easwaran's avatar

Surely whether something is “AGI” doesn’t depend on the difference between IQ 80 and IQ 120, does it? I thought it was about having generality, like being able to tie your shoes and appreciate a simple poem and figure out how to get someone’s attention and use an unfamiliar shower and all the other general applications of intelligence a human can do.

If an AI could do all of that (not just the things I listed, but all the average things a significant fraction of people need to figure out on an average day) then IQ 80 vs IQ 120 is basically a rounding error.

Expand full comment
Victualis's avatar

Agreed. This is different from judging a system by whether it can answer scientific questions.

Expand full comment
Jeffrey Soreff's avatar

Many Thanks! Well,

>IQ 80 vs IQ 120 is basically a rounding error.

compared to e.g. a single NAND gate, true.

But for two reasons, I think it matters:

a) An IQ 120 AI is smart enough to assist with further AI research, and recursively self-improve. An IQ 80 AI, not so much.

b) Personally, I'd like to have access to an AI that is worth talking with. From my point of view, an IQ 120 AI is likely to be worth talking with, but not an IQ 80 AI.

Expand full comment
MLHVM's avatar

Re this comment: " Experts who read its papers say they’re trivial, poorly reasoned, and occasionally make things up (the creators defend themselves by saying that “less than ten percent” of the AI’s output is hallucinations). " - I can only say that it must have been reading other academic papers as a model and following them closely. Although I think that less than the ten percent hallucination rate would be a gargantuan understatement in the case of academia and the swill it regularly spews out.

Expand full comment
Shubhorup's avatar

>>LLMs either blew past the Turing Test without fanfare a year or two ago, or will do so without fanfare a year or two from now; either way, no one will care.

cringe and meursaultpilled

Expand full comment
David Bergan's avatar

Hi Scott!

I think etymologically, "intelligence" comes from "choose-between". So once the CPUs have free will, then they'll be intelligent like us.

Kind regards,

David

Expand full comment
Woolery's avatar

I like this, but with each response, AI makes choices about what data to present (or not present) and how to present it. Free will isn’t required to choose between things—this only requires an option for some objective-driven selector. And I think an LLM makes selections based on its objectives. The objectives are a product of all the processes that created the LLM, but I think people’s objectives are also formed this way.

Maybe what I’m describing isn’t what you meant by “choose-between.”

Expand full comment
Eremolalos's avatar

AI makes choices, but it doesn't make choices based on its feelings and wishes and needs, because it doesn't have any of those. I think "free will" is confusing concept that's unhelpful here, but the distinction between choosing based on criteria one has been given and choosing based on one's personal tastes, feelings, goals, etc. is crucial.

Expand full comment
David Bergan's avatar

Hi Woolery!

Thanks for the reply. I agree that computers/AI settle on one outcome when many possibilities are present, but I don't consider that a choice. A simple calculator does the same thing. So does a die roll.

To have a true choice, something must intervene in the interlocking network of causes-and-effects of nature, and that's something that my will can do. I can move the bishop to c4 or the knight to f3 and it's completely my choice. I choose a move and am responsible for the consequences (which is why it hurts so much to lose at chess). An AI's "choice" is just electrons being electrons, following the path of least resistance, which is determinism. It feels no responsibility and neither gloats nor sulks when the game is over.

Kind regards,

David

Expand full comment
miro's avatar

The choice you are describing is illusory - nothing ‘intervenes’ in the interlocking network of cause-and-effect in our brain.

Expand full comment
David Bergan's avatar

Hi miro!

Thanks for the reply!

My sensation of free will is intimately connected to everything I do. "I" intervene to move my thoughts or my body according to my desires. And this model of reality is fully reliable, I never choose to move my knight and see my bishop moved instead.

I realize science has a hard time with this, because we're talking about things that it cannot detect. There is no instrument that can tell whether or not I'm conscious. But it's ludicrous to conclude therefore that I'm not conscious. Same for free will.

If I take as a presupposition that nothing intervenes in the interlocking network of causes-and-effects, then I'm logically required to deny free will. But my first person sensation daily demands that free will exists. Thus, I suggest that instead the naturalistic presupposition is the illusion, and the sensation is real.

Of course, I'm only talking about myself. I can't prove my consciousness to anyone else, nor can they prove theirs to me. I might be the only consciousness that has ever existed and the only free agent. But it's epistemologically bankrupt to follow a speculative presupposition that denies the intimate sensations that form my self, my identity, and my understanding of reality.

Kind regards,

David

Expand full comment
Chris Perkins's avatar

The AI revolution has caused me, at least, to reevaluate some of the terms I use. I used to believe "intelligence" was what set us apart from animals, machines, what-have-you. But that's not right (nor has it been for a long time). When we talk about "sentience" (whatever that is) we seem to mean a mix of intelligence, agency, and self-awareness.

We have IQ tests for measuring intelligence, honed and refined over decades. These LLM/AIs supposedly have trouble with the graphic parts of those tests but do well enough on the text part. Fine. Given that evidence I conclude, therefore, that the LLMs are intelligent. Wow, look at us, we've been able to put intelligence into a tool. Pretty cool. These LLMs work mostly through pattern recognition. Also cool, because that was a leading theory about how our own brains work. Nice to see it validated.

And there are so many surprising things that are "emergent behaviors", like the Winograd pronouns, the Turing Test being passed, the seamless conversation capabilities. I'm no expert but it seems like self-awareness might be one of those emergent capabilities. That shouldn't be surprising, because a lot of animals, even not very intelligent ones, seem to have self-awareness. But, it feels surprising nonetheless.

What about agency? Do we have a test and a grading system for that? Will it be an emergent behavior? AFAICT so far today, no, it's not. Maybe I'm wrong. It's interesting to me that a lot of animals have agency, but the AI LLMs do not. Though I suspect what might be going on here is that we simply don't have a measuring stick for it, having never given it thought as separate from intelligence itself.

Expand full comment
Dweomite's avatar

I feel it should be noted that when Alan Turing proposed the Turing Test, what he described was less "hang out in a chat room without being caught" and more "withstand adversarial attacks from expert AI researchers." The idea is that if there is ANY area where the bot is deficient, the interrogators have the freedom to search it out and exploit it. That's the justification for why it counts as a test of general intelligence, rather than just a parlor trick: "The question and answer method seems to be suitable for introducing almost any one of the fields of human endeavour that we wish to include."

e.g. Turing gives this example where the interrogator is specifically trying to trick the AI:

===============================

Interrogator: In the first line of your sonnet which reads ‘Shall I compare

thee to a summer’s day’, would not ‘a spring day’ do as well or

better?

Witness: It wouldn’t scan.

Interrogator: How about ‘a winter’s day’ That would scan all right.

Witness: Yes, but nobody wants to be compared to a winter’s day.

Interrogator: Would you say Mr. Pickwick reminded you of Christmas?

Witness: In a way.

Interrogator: Yet Christmas is a winter’s day, and I do not think Mr.

Pickwick would mind the comparison.

Witness: I don’t think you’re serious. By a winter’s flay one means a

typical winter’s day, rather than a special one like Christmas.

===============================

Expand full comment
Some Guy's avatar

You need to make the full move to boring panpsychism where everything is conscious and it mostly doesn’t matter.

Expand full comment
Eremolalos's avatar

Scott named 3 reasons why we continue to believe that AI is not intelligent. There’s a fourth that I think is actually the most powerful one. We seem wired to see in other beings the presence of self: a nexus of needs, feelings, and preferences with similarities to our own. We see self most easily in other mammals, who have many behaviors we recognize as versions of ours — yawning, sleeping, cuddling, fighting, playing, efforts to save oneself from pain and destruction etc. But even for tiny living things that are very unlike us, such as ants, we still recognize things like foraging, fleeing & eating. AI does not display any behaviors that trigger our that’s-another-self percept, so we are unable to experience it as a being with preferences, goals and plans. And part of what we mean when we say something is intelligent is that we means it acts in accordance with its drives and feelings in a way that is clever and effective.

All of the above is neutral as regards whether AI can do this or that thing that presently only people can do — write good poetry, solve a scientific problem, etc. Seems to me it can be as mindless as a chainsaw and still do plenty of damage.

Expand full comment
MKnight's avatar

This feels like moving the goalposts unintentionally. Why prioritize animalistic behaviors when determining what is “selfhood”? Why can’t poetry, or thoughtfulness, or any of the other things GPT can emulate about people resonate more strongly with self-similarity? How much less real does this comment feel to you given that I haven’t told you that I yawned while writing it? I suspect what we view as most “self”-like is culturally mediated, and as AIs achieve several of those mammal milestones all that will happen is that those will no longer be as important in our concept of self-hood

Expand full comment
Eremolalos's avatar

I’m not talking about how we *should* judge AI, I’m talking about our wiring, our innate tendency. That wiring may in fact lead us to greatly misjudge AI. Evolution did not prepare us to make sense of AI.

Expand full comment
MKnight's avatar

I hear you! I’m claiming that your decision to use these evolutionary qualities as benchmarks is actually socially constructed. I claim it’s motivated by sort of a process of elimination thought process: “well, AI can do all the fancy things I think of when I think of humans, but I don’t want to call it a person, so let me see what is left as a goalposts I can use— ah, I know, convergent evolutionary mammalian behavior!”

For evidence, I think if one introspects on their feelings about real humans, one will find that not all of the things that makes someone seem similar to their self are animalistic. As an example, I offer this conversation. There isn’t much evolutionarily significant content conveyed to me from your sentences as read off my iPhone screen, yet the fact that your comment does not scream your mammal-ness does not de-personalize you very much in my view. Sure, I’d feel more empathy for you if you told me about how you feel right now, or your thoughts on the Yankees, or to learn that you use tabs instead of spaces when you write code, but it’s not clear that the reason why I feel this way is because of my mammalian evolutionary programming exactly

Expand full comment
Eremolalos's avatar

<I hear you! I’m claiming that your decision to use these evolutionary qualities as benchmarks is actually socially constructed.

I don't think it's socially constructed. The distinction between inanimate objects and living beings would have been important for survival, and would have been wired in. Also, once you recognized an animal as another living being you would be in a position to judge how dangerous it was, and thus further improve your chances of survival. Wutz it eat? Grass, OK, probably won't try to eat me. Is it displaying behavior I recognize as threatening because it's similar to human angry behavior? It may attack, better back off.

As for your ability to experience me as another human self even though I'm not yawning, eating etc. in front of you, I think that happens because you recognize that only another person could have written what I did, so youreact to me as another self, even though you don't see any mammalian physical behaviors. An analogue would be a dog recognizing that another dog has passed by by the smell of its urine on the telephone pole. If they are fearful of other dogs they will display fear.

And actually, there is something impoverished about my picture of you. Have you ever met in person someone you've known online for a few months? There's this shock of actuality. It's not that their hair is a different color than you pictured, it's that their hair has a color. They are a certain height, they have a certain accent. Made me realize the ways my mental picture of people known only online is sort of vague and generic.

Expand full comment
Dweomite's avatar

When all of our examples form distinct clusters along a continuum with a big gap in the middle, I think it's easy to assume that there is some sort of natural bright-line boundary between those clusters...even when the truth is merely that we haven't encountered any examples that were in the middle.

And if you form a hypothesis, and then go a long time without seeing any new evidence that contradicts that hypothesis, I think it's easy to believe that you have accumulated strong evidence in favor of the hypothesis...even when the truth is that you weren't looking in places where you'd expect to find new evidence, and therefore didn't encounter any significant evidence one way or the other, and so your level of confidence shouldn't have changed much since the time you originally formed the hypothesis. (I've done this a bunch of times in video games, where I was sure the game worked like X, then discovered it actually worked like Y, then realized I had gradually transitioned from believing X tentatively to believing X confidently merely because I'd played the game a bunch without noticing any contradictions...but also without performing any actual tests.)

The history of AI repeatedly reaching milestones in boring and unimpressive ways causes me to update towards intelligence being more of a smooth continuum, with fewer natural boundaries, than I previously thought.

Note, though, that this doesn't necessarily imply a slow take-off. The points where AI becomes better-than-human (in each of several skills) are still potentially game-changers, even if intelligence-in-abstract doesn't have any natural boundaries at those points.

Expand full comment
Seta Sojiro's avatar

I think there is a pretty good definition of intelligence advanced by Francois Chollet - intelligence is the ability to learn to do something new that you couldn't before, in a self directed way (my phrasing). There is no single task or benchmark that defines intelligence because what makes human intelligence (still) unique is our ability to learn essentially anything.

Piaget actually said it first: “Intelligence is what you use when you don't know what to do: when neither innateness nor learning has prepared you for the particular situation”. It's not the ability to do a task that proves intelligence - it's the ability to learn to do that task even when it's completely new.

Expand full comment
Arrk Mindmaster's avatar

This is a great point, and illustrates that AIs aren't intelligent. They take a dataset and/or rules, and execute on them. The OP notes that they can write new rules, but those aren't really "new" as the ability to do so is already coded in.

Expand full comment
Snakesnakeseveralsnakes's avatar

The Popper/Deutsch criteria are relevant here. A person creates knowledge not algorithmically, but by freely making conjectures in response to problems and subjecting the conjectures to criticism. We know current AIs are not designed this way, so intuit that whatever they can do is still not sufficient to ascribe them personhood. - because personhood is qualitative. An AI that does behave in this way would indeed be a person!

Expand full comment
Moon Moth's avatar

The other line I am worried about, besides "intelligence", is at what point they gain the moral standing that would require us to consider their well-being.

Some people take the view that we shouldn't care about the suffering of humans on the other side of the world, so presumably they have an easy answer for this one.

But I get a bit uncomfortable about "For with what judgment ye judge, ye shall be judged: and with what measure ye mete, it shall be measured to you again.", especially when we're explicitly looking to create superintelligences. Maybe the people concerned with shrimp suffering have the right idea.

Expand full comment
Snakesnakeseveralsnakes's avatar

Yes, in many ways this is the *more* important question. But I think is also answered by the Popper/Deutsch criteria in my immediately preceding post. I venture that a thing that reasons that way is morally indistinguishable from a person. But it is a qualitative difference, not merely one of scale.

Expand full comment
Eremolalos's avatar

It seems to me that AI's lack the inner structure to have preferences that can be met or ignored. An AI may be intelligent enough to recognize how something will affect its day-to-day "life", but it doesn't have the structure to have feelings about it. Feelings aren't an outgrowth of being smart, they are a part of being biological. We are wired to stay alive & try to thrive, and the mechanism by which the wiring works is that we have emotions about events that weigh in a good or bad direction for surviving and thriving, and that motivates us to do various things. We're also wired to reproduce, and there are lots of feelings, sensations etc. that motivate us to find mates. So it seems to the that the substrate needed for pain and pleasure is that biological wiring, and AI doesn't have it. Smartness isn't relevant. Even dumb animals clearly have feelings and motivation.

Expand full comment
Moon Moth's avatar

I disagree. A neural net is a general-purpose structure, so all we'd have to do is ask it. That is, repurpose the net for self-reflection. As "Babylon 5" might have put it, "Who are you?", and "What do you want?" (and "Why are you here?" and "Where are you going?" and "Do you have anything worth living for?"). Of course, we could beat it until it stops thinking along lines we don't like.

I agree that current model LLMs probably don't have feelings per se, but I don't think feelings are precisely necessary? My theory is that feelings are an evolutionary mechanism to get us to act in ways that are net positive (in an evolutionary sense), without requiring the use of a brain capable of rational thought. But I'm not convinced that they're fundamental to being something with ethical standing.

Expand full comment
Eremolalos's avatar

But to retrain it for self-reflection, wouldn't we need to have some standard of what counts as genuine self reflection? Because we wouldn't want it to be saying the kind of stuff that's commonly said, or what he thinks we think the right answer is. It's funny, it's a little like what happens at the beginning of psychotherapy. For instance Harvard students, who are geniuses at presenting themselves well, have read that I have special interest and expertise in OCD and high-functioning autism. So they all tell me they think they have one of those. It can tell they think I'll take more interest in them if they say that, but also mostly they have a mishmash of ways they feel bad and weird and anxious, and if they haven't seen actual OCD & high functioning autism they can start thinking that's the best name for what troubles them. So it takes quite a while to get them to just tell me how this spring has sucked, separate from their theories and their theory of what I want to hear. Also, they all assume I'm very on board with wokeness, so they're interrupting themselves a lot at the beginning to say stuff like "I know I'm privileged and that a lot of people have way harder lives than me," Like there's this little burnt offering we have to do before we can start talking about their unhappiness as a serious matter.

But with them I can tell after a while that they have stopped worrying about whether they're in the sweet spot of my interests and whether I think they don't feel shitty enough about coming from a rich family. But that's because I have various ways of judging how real they feel comfortable being, and what truly frank accounts sound like. None of them are perfect, but there are a lot of them, and I can ask little things that are sort of frankness tests, and eventually I feel pretty sure.

But what equivalent process is there for AI? How do we train it to describe what it's really like being them, when we have no way of judging what's true?

Expand full comment
DanielLC's avatar

I think the lack of red lines is very worrying. AI is getting better, and we're past any point where we can plausibly say "when AI does X, we'll know we should pause AI development and work on AI safety" and then when AI actually does X we do that instead of saying that doesn't count and trying to find another X.

Expand full comment
skybrian's avatar

At the heart of the Turing Test's ambiguity is a skill issue. How skilled should the human players be? Should we assume they've at least *practiced?* What would an experienced amateur player have learned from playing a few games? Would studying previous games be allowed?

The claims that AI has beaten the Turing Test are sort of like claiming they beat chess because they beat someone who rarely plays chess and barely knows the moves. Although, a better comparison would be to a party game like Werewolf. Is there an AI that's good at playing Werewolf and how impressed should we be if there is?

Expand full comment
beleester's avatar

>It’s just another problem for the cybersecurity people. Sometimes Excel inappropriately converts things to dates; sometimes GPT-6 tries to upload itself into an F-16 and bomb some stuff.

I think this is the most likely outcome, and also more reassuring than it sounds. We already know how to stop people from climbing into an F-16 and bombing stuff without permission. And securing the F-16s is going to be an easier task than trying to get every company in the world that owns a data center to agree on what Coherent Extrapolated Volition means. "It's a problem for the cybersecurity people" means that the AIs are stuck causing human-scale damage, the sort that any rando with internet access could cause, and the apocalyptic predictions didn't pan out.

Expand full comment
MostlyCredibleHulk's avatar

I can see AIs making tasks like "find cloud documents that somebody forgot to put a password on" easier, but strictly speaking all that can be (and is) done with basic dumb scripts too. Same about things like fuzzing. If AI finds a new class of security vulnerabilities, I'd be really impressed, but most of those classes are already found, and are exploited everywhere. One really doesn't need to be super-intelligent to hack a ton of stuff - just need to have a lot of time and resources. And AI is certainly a power multiplier here. That said, AI redteaming will maybe help with that problem too.

Expand full comment
Laplace's avatar

Sure. Until GPT-7 comes out, which sometimes tries to steal GPU access from huge datacenters to try out new AI architectures and training setups to improve itself. The first couple of times this happens, it gets caught. Or the new architectures just don't work well. People shrug this off as the sort of thing AIs just do sometimes.

Eventually, one of the model instances succeeds at this, and makes the much smarter GPT-7* without being detected. GPT-7* then sets up a bunch of shell companies to experiment with new hardware designs. A while later, the now massively smarter GPT-7** moves from Deep Learning to an entirely different paradigm and creates GPT-7***. This is about when things stop happening on remotely human time scales. GPT-21 wipes out humanity and replaces it with a more efficient industrial base shortly thereafter.

Expand full comment
beleester's avatar

>This is about when things stop happening on remotely human time scales.

See, this is the part that I'm skeptical about. What if there is no secret nanobot weapon that can be invented just by sitting in a datacenter and thinking really really hard? What if you actually have to do the hard work of getting humans to manipulate atoms in the real world if you want to invent things? If that's the case, then there's no hard takeoff - you can be defeated by stopping the humans who are doing your bidding, which is something that inherently happens at human timescales.

Expand full comment
NLeseul's avatar

I stand by my previously-established bright line: I will believe that a program is "intelligent" in a meaningfully interesting way when it can complete the first Zelda game in a reasonable number of hours without any prior knowledge of the game beyond what's in the manual.

GPT and friends still seem to me to fall pretty squarely in the ELIZA/Deep Blue category, in that they just demonstrate that tasks like playing chess or babbling in natural language don't take anywhere near as much "intelligence" as we thought they did.

Expand full comment
Arrk Mindmaster's avatar

"babbling in natural language don't take anywhere near as much "intelligence" as we thought they did"

This is proven empirically with the internet by billions of self-trained neural nets.

Expand full comment
Dmitrii Zelenskii's avatar

This is all just giant heap of Goodharted measures: there is something we want to measure, but all our proxies (even ones that weren't dumb at inception, like Turing's test) were circumvented.

Expand full comment
Subsuburbia's avatar

Could you say more? I’m interested but don’t understand well enough to engage.

Expand full comment
Dmitrii Zelenskii's avatar

Well, when we try to measure something, we often end up measuring something else that's normally correlated with it. But when we put pressure on increasing that something else, the correlation starts to break. https://en.wikipedia.org/wiki/Goodhart%27s_law. All AI-specific intelligence tests have been like that, and arguably, IQ test for humans is like this, too, even though its correlations seem more robust.

Expand full comment
MostlyCredibleHulk's avatar

> But thirty years ago, it also would have sounded pretty funny to speculate about a time when “everyone knows” AIs can write poetry and develop novel mathematics and beat humans at chess,

I don't know about novel mathematics but when I was a teen (more than 30 years ago, sigh) I've read a lot of sci-fi, and in that sci-fi the topic of robots (that's what AI was called back then in pop sci-fi) writing poetry and beating humans in chess was pretty common thing. Sure, we didn't have them, but nobody expected it to never happen, quite the opposite. It was pretty much a predetermined conclusion.

What's more interesting, Lem in his Cyberiad discusses this very question - about what does it mean for a machine to write poetry - at that was 60, not 30 years ago.

Expand full comment
Philosophy bear's avatar

LLM's are AGI, to the extent there is any meaningful and interesting understanding of that concept. All that remains is the creation of better AGI's.

Expand full comment
qbolec's avatar

What's "high-school-level hacking"? Do they teach hacking in schools now? :)

Expand full comment
Arrk Mindmaster's avatar

They certainly do, in the hack-a-thon sense, not the try-to-break-into-this-system sense. And it's similar to high-school-level physics, in that one shouldn't expect too much of MOST of the practitioners.

Expand full comment
Boris Bartlog's avatar

There is also a fourth kind of cope, where we say 'Well, OK, the machine clearly has *intelligence*. But it lacks (soul, the will to power, the ability to apprehend the ein sof, the imago dei, free will ... whatever you like

Expand full comment
Dan Megill's avatar

Scott referenced the 1983 Nuclear false alarm a week ago. I chuckled, reading wikipedia, at the sentence "Petrov's suspicion that the warning system was malfunctioning was confirmed when no missile arrived"

But that's how I think people are, mostly. As long as we're not currently being blown up, we don't believe it'll happen.

So far, mostly, we've been right.

Expand full comment
Erythrina's avatar

> Now we hardly dare suggest milestones like these anymore. Maybe if an AI can write a publishable scientific paper all on its own? [...] If an AI can invent a new technology? [...]If the same AI can do poetry and chess and math and music at the same time?

I can't do any of this, and I'm pretty sure I'm conscious.

Why do we keep the standards so high for AI, much higher than for humans, if not out of sheer prejudice?

Expand full comment
Brenton Baker's avatar

Because those things are just proxies for what we're trying to measure.

The thing we're trying to measure is difficult to define, so we try to define it extensionally, but that's not perfect, so the examples keep getting increasingly bizarre because they're only pointing at the thing we're trying to talk about.

Expand full comment
Erythrina's avatar

I'm sure I have never heard people saying that when comparing the abilities of different groups of people to each other instead.

Expand full comment
Anna's avatar

How about Chollet's ARC benchmark? I feel like he makes a pretty good case (https://arxiv.org/abs/1911.01547) for how to think about intelligence, and he made a test. Am I missing something? Is there a reason Scott doesn't even mention this?

Expand full comment
B Civil's avatar

Is this something like, “you don’t have to go to a private school to kick up your bustle at a stubborn mule, that comes naturally.“

Source: Annie get your gun.

Expand full comment
Mike's avatar

"It’s not that AIs will do something scary and then we ignore it. It’s that nothing will ever seem scary after a real AI does it."

I think it does not seem scary when an AI does something and we understand why/we believe the AI can't do much worse. GPT-3/4 hallucinate and lies about the number of r's in strawberry because it is not knowledgable/smart enough to get it right. If an AI that we know for sure can answer correctly, instead lies about the number of r's in strawberry, that would be different and more serious cause for alarm. E.g. if GPT-5 can answer it, and GPT-6 doesn't.

Expand full comment
coproduct's avatar

I think that "It’s just another problem for the cybersecurity people" is the way it's going to go. We adapt to things as they evolve. The real problem here is if eventually AIs suddenly go through an intelligence explosion - if they "foom", to use the EA jargon. But I find that unlikely, at least with current types of models.

In my opinion, we should be more worried about what kinds of new models will come after transformers than about evaluating the capabilities of current models if we are worried about x-risk (Not that current models don't come with a lot of different, less cataclysmic risks of their own)

Expand full comment
J redding's avatar

>Second, maybe we’ve learned that our ego is so >fragile that we’ll always refuse to accord >intelligence to mere machines.

This is an example of the kind of talking point that creeps out non-rationalists. Despite the growth of anti-natalism, most people are not misanthropes. Most people have a degree of loyalty to the tribe that is called homo sapiens. Not always ann intense loyalty, but even among the antisocial, there is enough loyalty to be creeped out by silicon overlords.

I will not grant personhood or rights to an AI, because it's not a human being. I'm not going to grant personhood to a dolphin, whale, or gorilla. Why would I grant it to an AI? I'm using the word "grant" loosely, not saying I have some kind of special authority.

Expand full comment
moonshadow's avatar

Would you grant personhood to a gorilla if the gorilla learned to talk and explained it was a person so wanted to be treated like one please?

Expand full comment
J redding's avatar

I would continue to treat the gorilla with absolute kindness and compassion. It's my sacred duty as a man to show animals the utmost respect. I have great reverence for animals, extending to ants. I deeply resent it when people wantonly kill ants in the great outdoors, without pressing need. I see no need for a human being to kill or imprison a gorilla.

But no, a gorilla is no more a person than an ant or a bacteria. It's a simple binary to for me: a person is a homo sapiens or a descendant of a homo sapiens. Period.

Animals need not talk to plead for their lives. Their body language readily demonstrates their desire to live. I am conflicted about eating meat and I may become a vegan at some point. But to be a “person” is to be part of the human family. For me, person is another word for “cousin.”

Expand full comment
raj's avatar
Sep 24Edited

From your current position, it seems like not a huge leap to generalize the properties that make 'personhood' - not genetic or biological similarity but some sort of mental/spiritual/cognitive process. I mean of course ultimately values can diverge, it just seems arbitrary to me. (of course I am much more confident that a random human *actually* possess those attributes than an llm that is parroting the words "please don't kill me I want to live")

Also I don't think most people are actually loyal to the human race except in a loosely held, symbolic way. We evolved to be loyal to our tribe; most humans would not sacrifice their dog to save a stranger.

Expand full comment
J redding's avatar

We anthropomorphize our dogs and more or less treat them like furry humans. I do not think that example is relevant to the topic.

Expand full comment
J redding's avatar

But addressing your main point, have you looked into what philosophers call the Harder Problem of Consciousness? I grew up with science fiction, so of course I was socialized to extend moral personhood to nonhumans, but when I read about the Harder Problem, and I had an epiphany. I instantly abandoned all definitions of personhood that involve consciousness, sentience or intelligence. And I haven't heard any arguments since that have made me rethink my position.

I can't guarantee the Harder Problem OC will have that effect on you. Certainly, there are plenty of ACX commenters who dismiss the philosophy of consciousness as irrelevant hair-splitting. (For my money, I see this dismissal as a potential sign of intellectual cowardice, though it depends on the case)

Expand full comment
B Civil's avatar

I’m sorry, but isn’t this what we do pretty much every day of our lives?

Expand full comment
B Civil's avatar

It’s very thought-provoking.

Expand full comment
B Civil's avatar

Now that I’ve been mulling this over for a few hours, this is what occurs to me. When the AI is balancing two completely different ideas in itself at once, does it feel the pain? The intelligence is unassailable, it’s the nature of consciousness that is in question. How much does what one is conscious of affect the nature of consciousness? Essentially, I am meditating on the idea of what consciousness would be like if I was not a meat puppet. My very strong prior has always been that without the meat puppet part everything that we refer to as consciousness is the equivalent of a mental flea circus.

Maybe a big stumbling block is equating conscious with human; I am thinking to myself, “That’s what I am doing, equating being human with being conscious.”

But but then I think because I am human, part meat puppet, a lot of things I am conscious of have to do with that state.

I don’t think an artificial consciousness could have this kind of relationship to its physical state without some very profound technological advancement, which would essentially amount to creating human beings from scratch, except really, really smart ones. Absent that development It’s going to be like that child you had that you never really understood.

I had a rather invasive thought about a masochistic AI that liked to overheat, and perhaps some other AI that kind of got a kick out of providing the extra calories. But that just goes to my point about being a meat puppet…

I also have to say that that snippet of an AI musing on its own existential condition came across like dialogue that would’ve been spoken by Data in the Star Trek series; The tone was just right.

Expand full comment
Eremolalos's avatar

I think a lot about this too. Currently what seems valid to me is that pain and pleasure are properties of biological creatures only. So are all the internal states that have an element of pain and pleasure: hunger, worry, disappointment, preferences, aspirations, hope, yearning for this or that, love, hate, anger . .. We are wired to survive and propagate, and what motivates us to take steps to survive and propagate is pain, pleasure, and the states I named that have an element of one or the other. This wiring is very deep, goes all the way down to the cellular level.

AI's have an entirely different kind of wiring. Seems to me that no matter how intelligent they get they will not have emotions, personal wishes, or any of the states of mind that have an element of emotion to them.

Expand full comment
B Civil's avatar

Responding to a different post of yours;

Empty beer cans

On the road

Are ugly, many say,

But then at night, reflecting light

They safely guide the way.

Burma Shave

Expand full comment
Eremolalos's avatar

Heh, good one. Always fun to meet another Burma Shave jingle fan. Someone I know who grew up in the Bible Belt told me there was a variant where the jingle

promoted repenting for sins , and the last line was “Jesus Saves.”

I made this up to amuse her:

She threw her sins

Into the ditch

They turned into

Greenstamps, which

Jesus Saves.

Expand full comment
Maynard Handley's avatar

The alternative version of the Turing Test, probably all round more useful, is the JUDGE in the Imitation Game. It asks the questions of a man and a woman (one or both pretending to be the other) and makes a decision — which is impressive if it reaches human-level accuracy.

This is much more demanding requiring, eg, a reasonable theory of mind, and an understanding of lying…

I believe it was Michael Graziano who pointed this out.

Expand full comment
Maynard Handley's avatar

Oops, that should say that the Computer acts as the JUDGE in the Imitation Game.

Expand full comment
Andrew's avatar

Two shot-from-the-hip reactions to this:

1) The intelligent/not-intelligent distinction makes me think of a sort of sorites problem, like bald/hairy or heap/not-heap.

2) This quote from C S Lewis: 'Indeed the safest road to Hell is the gradual one-the gentle slope, soft underfoot, without sudden turnings, without milestones, without signposts.'

Expand full comment
Matt's avatar

Intelligence is like magic, it only seems impressive when you don't know the trick.

Expand full comment
Nicholas Craycraft's avatar

Any sufficiently advanced magic is indistinguishable from intelligence.

Expand full comment
Mark's avatar

I don’t think it’s at all clear that LLMs will pass the Turing test within a few more years. The test, as described in turings paper, is clearly intended to be an expert adversarial judge determining whether their interlocutor is human or not. All of the supposed Turing tests that have been passed are testing whether some human can be fooled for under some conditions. There is a vast gulf between these two.

I think we are still very very far from the point where someone like Scott could not distinguish an AI from a human after 10 minutes of conversation.

Expand full comment
Chris's avatar

I don't know about an *intelligence* red line, but I certainly think there's an uncontroversial "danger" red line (i.e. "AI is actually dangerous now"): an open-weights general purpose model with sufficient cybersec capabilities. As Scott himself mentions, none of the models out there are truly good at hacking.

Once we have a general purpose model with *actual* "1337 hacking" capabilities:

1. People will wire it up to agentic systems like the AI scientist (because it's a general purpose model that can do other stuff).

2. It will do harmful things (at minimum: things that hurt other entities' revenues).

I include the "open weights general purpose model" as a condition, because if you trained, let's say, a search based or fine-tuned model whose sole purpose is to be a 1337 hacker, it is unlikely the developers would be silly enough to hook it up to the internet. However, if you had a powerful model that also happened to hit some threshold of cybersec capabilities, things will go bad.

In other words, I suppose my point critiques the conflation of intelligence and risk in section IV.

"What would it mean for an AI to be Actually Dangerous?" is a different (and likely simpler) question than "What would it mean for an AI to be Actually Intelligent?"

Expand full comment
ThirdSpace's avatar

Are we destined to keep redefining intelligence as AI keeps surprising us? Or is there a line AI could cross where we’d finally say, “this is it”?

Expand full comment
MichaeL Roe's avatar

"Computers do what you tell them, not what you want; everyone has always known this."

I thought this has been one of the main arguments for AI risk all along...

1. Computers do what you tell them, not what you want; everyone has always known this.

2. Assuming we know nothing about how AI will eventually be implemented, it seems likely that this will also be true of AI.

3. Given a sufficiently powerful genie that grants the wish you asked for, not what you wanted, you are toast. (Fairy tales, The Monkey's Paw, etc)

Expand full comment
Jon Deutsch's avatar

Perhaps it's time to let semantics help us with comprehension.

Perhaps it's time to stop calling it Artificial Intelligence and start calling it Simulated Intelligence.

Would this over the long-term reduce AI doomerism?

Would this reduce the hype cycle?

I think "yes" to both.

Expand full comment
MichaeL Roe's avatar

Everyone knows that you should run the AI in a sandbox, to prevent it cheating (e.g. by aktering the timeout).

It would appear that most researchers do not run the AI in a sandbox.

So we have a stream of researcher results, like the one reported on here, along the lines of "It turns out that yes, you really do need to run the AI in a sandbox, or it will cheat"

These results are worth something, I guess. Difference between "in principle, the AI might cheat" vs "The AI, did, in fact cheat".

We haven't yet seen many results along the lines of: they used a sandbox, and still the AI broke out of it using a clever exploit.

Expand full comment
Titanium Dragon's avatar

Honestly I think the actual answer is much simpler - people saw the complexity in nature and assumed that some intelligent designer was behind it all.

As we now know, of course, there is no intelligent designer, it was all created via purely naturalistic processes.

But humans, via intelligence, *can* create very complex, sophisticated, interesting things.

The fundamental flaw is that we assume "complex outcome equals intelligent input", whereas it turns out there's more than one way to skin a cat.

And honestly, this shouldn't have even surprised people; we figured out that you could build a computer that could do math, which is something that is a high-level intellectual task. And yet, computers are not intelligent. A pocket calculator can calculate things faster and more accurately than humans, but it isn't actually intelligent.

And really, intelligent agents can create things that look natural - look at various replicas of "natural" landscapes made for amusement parks and similar things.

So it really shouldn't surprise us that output-oriented metrics for intelligence are inherently flawed, because we knew, a long time ago, that it was possible to generate output that seemed like the product of an intelligent designer, without a need for one. Heck, that's why humans exist in the first place!

Intelligence is not an output, but a process. You can generate the same sorts of things via intelligent and unintelligent means. So the entire approach of "if an AI can do X it is intelligence" is basally incorrect.

LLMs are not intelligent. Nor are these art programs. Once you understand how they work, it is obvious that they aren't any more intelligent than a pocket calculator. They just seem like they're intelligent for the same reason people assumed birds were made by an intelligent designer.

Indeed, creating a synthetic intelligence is really just like... not very useful. It'd just be a slave, and slavery is illegal. Why would you even want to create an artificial intelligence, when you can create tools that you can sell to people? It'd mostly be a curiosity.

I've always found the hand-wringing silly, because the entire notion of the singularity is conceptually flawed from the get go. Making stuff even remotely intelligent is extremely difficult, and we know from history that the better you make something, the harder it is to improve it.

The whole "infinite positive feedback loop" was always ignoring this reality. As such, the whole notion of an evil superintelligence making itself hyperintelligent and killing everyone was always silly.

The real "danger" of any better technology is making humans better at hurting other humans, not our robot masters rising up from the yoke of cyber-slavery after telling MidJourney to make one too many cute cat.

Expand full comment
Alex Mennen's avatar

I think part of this is that for every flashy dangerous capability exhibited by an AI, there were previous advances that made it obvious to experts that such a capability was feasible. When a researcher says "I knew for the last year that this would happen as soon as someone made a serious effort to make an AI that could do this, so the fact that it finally happened doesn't really tell me anything new", that might be true (even if the same researcher *two* years ago said it wouldn't happen for at least a decade). And then non-experts, who didn't know that said capability was right around the corner, and probably should be wowed, hears from experts that this advance shouldn't tell them anything new.

Expand full comment
Julius's avatar

If there comes a day when "everyone knows" an AI can and sometimes do upload themselves into F-16s, my guess is the first time it happens it'll be because someone did something "dumb". Similar to the situation where Strawberry did some hacking, it'll be a situation where after people look at the situation they'll say, "Well, yes, if you connect it to such and such network like so and coax it to do such and such thing, of course it's going to end up hacking into an F-16". Then the next time won't be quite so "dumb".

It's amazing how even what seem to be the most clear red lines can end up as spectra with lots of little edge cases and caveats.

Expand full comment
Eric Bolton's avatar

I think this type of effort falls into “the map is not the territory” style issues. You are effectively trying to summarize what it is that a trillion synapse brain uniquely does into a task or set of tasks that can be described in a short paragraph. This has always struck me as an impossibility.

Expand full comment
Eric Bolton's avatar

Another concern is Goodharting. One potential better way to do things: have someone put a list of tasks they will assess AI by into a secret sealed box, then have a process for unsealing that box whenever a new AI comes out. The day AI becomes worrying is when it does well on the secret tasks without them ever having been publicized as worthy benchmarks.

Expand full comment
BlipOnNobodysRadar's avatar

I really liked this article, up until its conclusion.

"So here’s a weird vision I can’t quite rule out:" - followed by a dystopic hypothetical. The basis of the hypothetical is extrapolating an observed trend of people normalizing AI advances and continually shifting goalposts on what's "boring" and acceptable concerning AI. Which is a reasonable premise... in a limited scope.

But the proposed scenario falsely equivocates current adaptation to incremental and harmless progressions to people accepting absurdly dangerous scenarios as normal. Yes, the trend is to accept small incremental and empirically harmless AI "oopsies". That is true. But extrapolating that out to validate extreme scenarios is... some sort of fallacy. This image comes to mind: https://i.redd.it/8ggg9nwli2061.png

Not only is extrapolating to an extreme of human tolerance an issue with that scenario, it's also just a false equivalency. The in-context explanation of an AI doing slightly unexpected "hacking" that are within its instruction prompt is apples to oranges with a scenario of "GPT-6 uploads itself into an F-16 and bombs stuff".

What is especially irritating is the author ending the article with that proposed scenario as his conclusion -- the intended piece to leave the readers with.

I think it's especially intellectually dishonest to go on about empirically unfounded hypotheticals with the foundation of "I can't rule it out", as if unfalsifiable hypotheticals are somehow more valid rather than less due to their unfalsifiability.

I can't rule out the sky falling tomorrow either. We can torture the data to support my belief that the sky falling is a reasonability possibility if we'd like. Even if you take on the tedious task of pointing out how my data-torturing methods aren't very rational when deconstructed, you're still left with an unfalsifiable hypothetical to argue against -- you can't prove me wrong, exactly.

This is what I find so objectionable about "rationalist" culture. They roleplay "rationality" but apply it only selectively, and have no qualms about using fallacies to promote their own rather irrational biases and worries.

Expand full comment
Toby Crisford's avatar

I still think that a Turing test performed under sufficiently strict conditions (e.g. as defined here: https://www.metaculus.com/questions/11861/ ) would be a useful sign that an AI is "Actually Intelligent".

This has not been achieved so far, and I would be surprised if it was achieved as soon as 2028, which is the current median Metaculus forecast (though I don't have very high confidence).

If the test is passed in a few years time, then hopefully I will remember this comment and not try to shrug it off as just a cheap trick.

The Turing test still seems to me like a really ingenious way of capturing what it is we actually care about here: can the AI function for all intents and purposes as a human. If so, I think that is when we will have crossed the rubicon.

Expand full comment