406 Comments

"Whatever you’re expecting you 'need self-awareness' in order to do, I bet non-self-aware computers can do it too. "

From very early in Vernor Vinge's "A Fire Upon the Deep":

" ... The omniscient view. Not self-aware really. Self-awareness is much overrated. Most automation works far better as part of a whole, and even if human powerful, it does not need to self-know. ..."

Vinge is writing science fiction, but still ...

Expand full comment

I would posit that humans largely aren't self aware in an absolute (not relative) sense - that they can observe some of their own behavior, and store certain verbalizations in memory without speaking them out loud, but this awareness includes only a tiny percentage of the mental processes that make up their "self". If anyone was truly self aware, implementing AGI would not be such a challenge.

Expand full comment

The full-novel version of "consciousness is incidental to, and maybe even parasitic on, intelligence" is Peter Watts' Blindsight, for my money one of the best hard SF novels of the 2000s. Full version: https://www.rifters.com/real/Blindsight.htm

Expand full comment

I hate to complain about a recommendation I agree with, but that's a bit of a spoiler. :-(

Also, he's right, Blindsight is awesome.

Expand full comment

Isn't the title pretty explicitly stating the book is about that.

Expand full comment

Are people here and Scott in the article using the term self-awareness interchangeably with consciousness?

It seems to me like they are but those two concepts are quite distinct to me.

But still, neither of them is required for making an AI dangerous.

Expand full comment

Exactly! Discussing consciousness is a complete red herring that just shows that the correspondent has no idea what they are talking about.

Expand full comment

Scott, this is kind of a big ask, but i think the only real way to understand the limitations and powers of current gen AI (and by extension, if there is a reasonable path forward from them to AGI), is to do a basic project with it or programming with one yourself. Hands-on work really, imo, really reveals much more information then any paper can convey. Physically seeing the algorithms and learning systems and poking at them directly explains things and conceptualized what papers struggle to explain in the abstract.

You talk about current AI systems as having "a blob of learning-ability", which is true, but not complete. We (and by we i mean AI developers) have a much deeper understanding of the specifics and nuances of what learning ability is, in the same way rocket engineers have a much deeper and more nuanced understanding of how a rocket system has "a blob of propulsive material". In my experience (which isn't cutting edge, to be fair, but was actual research at the higher level classes in college), our current blob of learning ability we can simulate has a large number of fundamental limitations on the things it can learn and understand, in the same way a compressed chunk of gunpowder has fundamental limitations on it's ability as a propulsive material. A compressed tube of gunpowder will never get you to space; you need to use much more complex fuel mixing system and oxygen and stuff for that (i am not a rocket scientist). In the same way, our current learning algorithms have strong limitations. They are rather brute force, relying on more raw memory and compression to fit information and understanding. It can only learn a highly specific way, and is brittle in training. Often, generalization abilities are not really a result of the learning system getting more capable, but of being able to jam more info in a highly specific way (pre-tuned by humans for the task type, seriously a lot of AI work is just people tweaking the AI system until it works and not reporting the failure). Which can accomplish amazing things, but will not get you to AGI. Proving this, of course, is impossible. I could be wrong! But that's why i suggest the hands-on learning process. I believe that will allow you to feel, directly, how the learning system is more a tube of gunpowder then a means to space.

Expand full comment
founding

The "fire alarm" post that Scott linked to at the very end of this post responds to this argument. (If you don't have time to read the whole thing, search for "Four: The future uses different tools".) Do you have a response to that?

Expand full comment

I think that's a good argument for making things easier to do, but not for achieving the breakthroughs required to get further, if i'm understanding it right. The future may have different tools to make training/setting up/creating an AI easier or even trivial. But that's still working within the "bounds" of what is possible, in the sense that future tools will just make things that are possible but very hard/very specific trivial and easily generalizable, but it won't make things that are impossible possible.

Which is a good counter argument to my claim about pre-tuning, but i don't think it addresses the larger part about how you can't use gunpowder to get into space and about the current weaknesses of our learning algorithms. This is more like increasing the amount of gunpowder you have, or more efficient methods for compacting it. You can make the learning algorithm maximally efficient with future tools, but that still won't be enough. Or at least, that is what i believe, given hands on experience.

Expand full comment

One response is that Point Four is simply not very true. There's any number of things that were immensely hard for entire teams in 2012 - and still definitely beyond the reach of a single comp-sci grad with a laptop working for a week.

Adam will not collect millions of samples for you, and automating that component has not been an easily solved problem. Sure, you can try to scrape the net - but high quality labeling is an entire branch of the economy for a reason. As they say, ML is not Kaggle. Other tools mentioned there, such as batch norm, are obviously great - but the main reason for their very existence is the scaling. They were developed to support ever larger models - so they couldn't be used to make the point about tools being by themselves such wonderful enablers. Tools have improved, but not as dramatically as that post indicates.

While we're there - I have *no* idea why those luminaries couldn't just say "no idea what's least impressive, but I'm fairly confident self-driving cars as a widely available service won't be a thing for two years though they might will be in ten". No idea when that conference was, but I've worked on self-driving ML at a world-class place for a while, and I'm fairly confident even now that fully autonomous self-driving cars won't be a wide-spread phenomenon in 2023. Yes, I know about that impressive demo. And about that other one. And about that third one. FWIW, I'm willing to say that we will have *some* autonomous driving in *some* contexts by then.

Expand full comment

Yeh but try use the AI it on a wet November evening in Donegal where the “slow - road works ahead sign” has fallen over and the new one way system hasn’t been updated to google or Apple maps.

There are probably thousands of scenarios not envisaged by the software writers of the automatic cars, or machine learned either because that needs real world usage.

Anyway I’ve never seen anybody talk about the legal issues here. If a car goes off the road because of a faulty brake the manufacturer is sued. If the car goes off the road because of human error then the manufacturer is fine. With auto cars it’s all the manufacturer.

Expand full comment

That is precisely in agreement with what I wrote, right?

Anyway, these kinds of scenarios are quite envisaged- just hard to address and exactly why I don’t think we’ll get to widespread autonomous driving in the wild in 2023. That and the legal aspect, sure.

Expand full comment

The “yes, but” was in response to the perfect demos you mentioned.

It’s good that they are thinking outside the perfect Californian weather - remember that Apple maps worked great in Cupertino.

I think that the experience learned will very much improve driving safety via software. Full autonomous I am dubious about.

Expand full comment
founding

I'm willing to 'bite the bullet' and argue that 'fully autonomous' self-driving cars, while FAR from perfect, might still be _better_ (on average) than the existing drivers of cars, i.e. humans.

Your example:

> Yeh but try use the AI it on a wet November evening in Donegal where the “slow - road works ahead sign” has fallen over and the new one way system hasn’t been updated to google or Apple maps.

Existing drivers – humans – _already_ make incredibly terrible (and dangerous, often fatally) mistakes.

I don't think the right comparison is to some hypothetical perfect driver under any possible conditions. It might be perfectly fine (and reasonable) for 'fully autonomous' self-driving cars to just refuse to drive under some conditions.

But it would still be a win – for humanity – for us to replace _worse_ human drivers with better AIs now, in the circumstances in which they would be better.

Expand full comment

Dear Lord, I think I would pay actual real money to watch real-time footage of the best self-driving car going up a boreen on a typical June afternoon where nobody has scarted the ditches because the council doesn't do that anymore and the farmers aren't going to pay for it until they have to, with the very high likelihood of the creamery lorry coming against you and the silage trailer zooming around the corner and once you get past those, the thing peters out into a sheep-track and you have a very unimpressed black-faced mountain ram sitting in the middle of the road looking at you.

Oh please please please Uber or Google or whoever, do it! Please! 🤣

No boreens, but nice views:

https://www.youtube.com/watch?v=HDDX3SrlvjQ

Expand full comment

> A compressed tube of gunpowder will never get you to space

It kind of does: https://en.wikipedia.org/wiki/Paris_Gun

(though I'm not sure if it's propelled by gunpowder or some other propellant that was available during WWI)

Expand full comment

You're looking for https://en.wikipedia.org/wiki/Space_gun

BTW, if 'a compressed tube of gunpowder' has never gotten anything 'to space' (https://en.wikipedia.org/wiki/Project_HARP ~57 years ago), than neither has Jeff Bezos or Richard Branson.

Expand full comment

Crap yeah, you're right i was thinking in terms of rockets, i completely forgot about these things. Fair play, my metaphor was wrong (although maybe if you change from "in space" to "in orbit"?). Also, i thought Jeff Bezos or Richard Branson's rockets used like, normal rocket propellent that requires oxygenation and other fancy stuff, not just gunpowder?

Expand full comment

Bezos & Branson did use conventional rockets, but only achieved a suborbital trajectory; you can do the same with a single powder charge. "HARP didn't go to space" requires using an unconventionally strict definition that precludes the first two as well.

In theory you could use a multiple-charge system to circularize an orbit, but at that point you're just using your powder as an inefficient propellant and the difference is a matter of engineering.

Expand full comment

Damnit ok my metaphor is dead in the water, i did say i wasn't a rocket scientist. I'll have to figure out a different one.

Expand full comment

As someone who has done many projects in the deep learning space, I have to say that I come to a very different conclusion than you. It is often evident that you are indeed *only* constrained by model size, and that you can empirically capture bigger and more nuanced abstractions with bigger model size. I make a toy model which fails to do a job; I make it ten times bigger, and it can now do the job. I regularly run up against the limitations of hardware when trying to build models.

We can already do amazing things with deep learning. Every year, we see more proof of the thesis that the main bottleneck to doing even more amazing things is simply model size. Every once in awhile, we see a whole new concept like policy learning, or transformers, or GANs, which gives you yet another tool. You connect the tools together, and get a step change in capability. You can see in your mind how you could connect various pieces together and get AGI, if only your model was big enough. Another way of saying this would be: we have all the pieces of AGI lying around already. They haven't yet been put together, and if they were, we probably don't have the compute to power the "learning blob." (But we may! I'm agnostic about how much compute is truly required.)

Expand full comment

You really think so? I mean, i have encountered problem that smaller models captured very poorly, and larger models were able to capture with seemingly high accuracy, but they were the the kinds of problems that you could still see the smaller models exhibit the basic level of behavior in. Like, do you think, given infinite compute power and infinite memory, with current day models, it would be possible to have an AI generate a functional and correct moderate to large size python program from a set of clear and consistent requirements?

I don't think it is! I mean, the best attempt we have, Github Copilot, doesn't even exhibit the smallest spark of that sort of understanding and knowledge required for that (and trust me i tested it out a good bit). I mean, i can't really prove it isn't, but i can't really prove that P=NP either, just that experience and knowledge has shown that it really REALLY probably isn't likely to be.

I think that the learning blob algorithms we have right now just are not capturing and storing information in a coherent enough way enough what AGI would required. Which, i do acknowledge is just a feeling and i could be wrong on. But i would be shocked, on the same level as a proof that P=NP, if it was otherwise.

Expand full comment

I do think so. But I am not saying that you can build an AGI if you just build a big enough transformer. I would instead build something more complex, in the specific sense of having many more distinct elements. Rather than write a long post nobody will read, I'll just said: start by looking an animal brains, and replace each piece with an existing algorithm that sort of does a similar thing. You would end up building the sort of thing that would teach itself how to generate or discover its own training data based on the sort of thing that it guessed that it was supposed to be trying to do, and following a Python code spec would be simply one implementation of that process. Something like that might not look like a big fat transformer anymore, but then again it might.

This sort of exchange has the risk of turning into burden-of-proof tennis. I don't have the spec for an AGI in hand, and you can't prove that a sufficiently big transformer isn't arbitrarily capable. What I can do is look at what animal brains do, note that we now have software architectures that do roughly all of the sorts of things you see in brains, and imagine how I might hook those algorithms together to build something brainlike. Brain mimicry is only one way of building an AGI, I suspect you could end up with an AGI architecture even simpler than that. (Brains are the way they are because of energy efficiency more than FLOPS maximization.)

I could also separately argue that maybe you can actually get a "giant blob" model to become an AGI. I didn't think something as simple as architecturally

simple as AlphaStar would be superhuman at StarCraft, but here we are. I lean more and more toward the perspective that our architectural inventiveness is going to be secondary to just adding more parameters and letting the model figure things out for itself, past a certain grain-size of problem.

Expand full comment

>But I am not saying that you can build an AGI if you just build a big enough transformer. I would instead build something more complex, in the specific sense of having many more distinct elements

Hmmm, i can see that actually, that's a more compelling possibility to me then throwing more training data/size at the problem. That would require a good enough understanding of generalized components of what a general intelligence has in order to replicate them, but i can see that being done in one fell swoop by someone in theory. (i have thought of similar myself tbh). If someone just can conceptualize the base components of what, when strung together, is necessary for a GI, then that can happen. And while i'm sure there are a lot of base components, we probably know a lot of them already. That is highly compelling.

>I could also separately argue that maybe you can actually get a "giant blob" model to become an AGI.

Given the above, that is possible, i suppose. But i think the structure of the above is so highly complex and requires such a vast search space (and additionally has no obvious gradient from not-working to working), that it doesn't seem as likely just by a blind network alone. But it might be doable!

Expand full comment
founding

Your right that "our current [AI] blob[s] of learning ability we can simulate [have] a large number of fundamental limitations on the things [they] can learn and understand", but the worry is that we our near-term future 'blobs' won't be similarly limited. That seems very reasonable! People are making rapid progress on doing exactly that, i.e. creating more capable blobs.

Expand full comment

I mean, that's possible, true. But it depends on how far away you think we are from AGI with our current learning blobs. Are we at like, pretty close and we can get right over with just one clever breakthrough? Or are we not even within 10 light years of it?

If the former, then yes near-term future blobs probably won't be similarly limited and will get to AGI! If the latter, near-term future blobs won't be similarly limited, but they will still be limited in different but still highly rigid and very strong ways that prevents them from getting to AGI. It'll be a blob that is 1 light year ahead, but still 9 light years away. The question is: where are we? And my experience makes me think we're still 10 light years away. Not provable in any way, just my experience.

I also want to clarify i am very much still in favor of researching AGI prevention and alignment measures. Nothing wrong with that! It's good to do. Just i wouldn't worry about AGI in the near future, from a personal standpoint.

Expand full comment
founding

I would personally love to be able to work on AGI safety/alignment, but I'm pretty sure I wouldn't be able to contribute meaningfully – not directly – given everything else going on in my life (e.g. my other responsibilities).

And I'm unsure about whether it's useful to 'worry' about it – tho that depends on what I think is meant by 'worry'! But I too don't think it's quite the same kind of impending disaster as, e.g. a large asteroid on a definite collision course with the Earth. But then I'm not worried about climate change in the near future, from a personal standpoint. (I think it's pretty obvious that we all will mostly just adapt to whatever happens, however we can.)

Expand full comment

I've done the basic programming. I'm a full time programmer. I've taken graduate level courses in machine learning. I don't think that current AI have a fundamental limitation that prevents them from being very very dangerous in agenty ways. We'll probably have to find some tweaks, like how we went from shallow neural networks to deep convolutional neural networks for image recognition, or from RNNs to LSTMs to Attention for language prediction, and that might take a decade, but it's not a fundamental limitation of current AI.

Expand full comment

No, i very much agree, i just wouldn't classify those things as "tweaks". Going from shallow to deep was a pretty significant breakthrough i think! And i think that we're going to need probably a dozen massive breakthroughs on an amazing level (like, on a higher level then those breakthroughs mentioned above) to get to a system that can scale to AGI. We will eventually get though breakthroughs, yes, in the same way we got breakthroughs in rocket propulsion technology. But I think each one is going to be hard fought and require a stroke of genius that will lead to a large scale reworking of the entire field when it's found. (like, we'll get 1 breakthrough, and the next 10 years will look like the last 10 years of AI scrambling and investigation and hype).

Is this provable? No, much in the same way it wasn't provable back in the day that you couldn't get into orbit just by strapping more tubes of gunpowder to your rocket (this metaphor is becoming very belabored i'm sorry). But that's just what my experience and knowledge have lead me to believe.

Expand full comment

Also i just realized you said "I don't think that current AI have a fundamental limitation that prevents them from being very very dangerous in agenty ways" and nothing about AGI. Whoops. Uh, then in that case yeah i totally agree. Current AI can act in very evil and agenty ways no problem, no debate there. Just that that has it's limits, and i don't think the worries about that match up with the worries about AGI Scott talked about here.

Expand full comment

I'd like to chime in as another person who's worked with modern machine learning. I am frustratingly familiar with the limitations (hello, RL).

And I have also seen tiny little tweaks, here and there, that do too much. Simple changes that aren't fully understood that make previously impossible things possible. Over and over, I see dense papers theorizing about the rigorous mathematical basis of something or other, and then someone goes "uhhh what if I clamp... this" and suddenly you've got a new SOTA.

This is not what a mature field of study looks like. The fruit is so low hanging that we're face down in the dirt, blindly reaching backward, and *STILL* finding order of magnitude improvements.

Seeing the kinds of generality being achieved with incredibly simple approaches, seeing frankly silly tweaks making such sweeping improvements... I was forced to update my predictions to be more aggressive.

And then GPT3 came out, with capabilities years ahead of my expectations. Scott's directionally correct.

Expand full comment

Hmm, yeah that is true. The fruit is certainly low hanging and plentiful. That's a good point.

Expand full comment

I think the element that's missing from the gunpowder and rock analogies is recursive self improvement. A big pile of gunpowder isn't going to invent fission bombs. A big/sophisticated enough neural network could potentially invent an even better neural network, and so on. (This isn't sufficient to get AGI, but it's one component.)

More pragmatically, the gunpowder analogy is overlooking fuel-air exclusives. As the saying goes, quantity has a quality all its own.

Expand full comment

why do you assume that a neural network that is capable of building improved neural networks is not sufficient to get to AGI?

Expand full comment

Right now, neural networks don't have agency. Maybe that agency could be developed by a recursive self improvement process, but that's far from a foregone conclusion.

Expand full comment

That's an argument that it might not be sufficient, not that it is not sufficient. Also, in this context I don't even understand what agency means or why you think its important.

Expand full comment

Since NNs don't currently have agency, the recursive self improvement process needs a human (or some other agent) to get started. If we all just sat here and did nothing, it wouldn't happen.

There's quite a bit of discussion of agency elsewhere in the comments on this post.

Expand full comment

Once a self-improvement process begins I don't see what other human intervention is necessary.

Expand full comment

"Hello, yes? This is Sheila in Accounts. We've noticed a large amount of invoices submitted from Scoggins, Scoggins and Blayne, Civil Engineers. Can you tell me who authorised this new building expansion? Sorry, I don't have the paperwork, these can't be processed until I get the paperwork. No, I've already stopped the bank payments and instructed the bank not to proceed with anymore it gets through. Well, if Mr. Mackintosh doesn't sign off on the authorisation, no payments can be made".

Sure, your self-improving AI could probably manoeuvre around that eventually, but it will have to deal with institutional inertia first, and somehow if the entire Accounts department is bypassed when it comes to paying out millions, the auditors will have a word to say on that.

Expand full comment

Agency means being able to set your own (sub-)goals. And they do have limited agency.

When it comes to working in the material world, agency would being able to run your own build-bots, and choose what tasks they operate on. (This could include contracting with outside contractors over the internet, of course.)

Expand full comment

Isn't it? An assumption that it's possible to set up a recursive self-improvement chain essentially requires that those iterations gain ever increasing autonomy to improve their design by whatever means, with an expanding capacity of interaction with the outside world. If that isn't agency then I don't know what is.

Expand full comment

Recursive self improvement can occur with access to only some means. Access to any possible means isn't necessarily required.

Expand full comment

But that's the whole meat of the AI x-risk paradigm. If the capacity growth start occuring ourside of your complete control and understanding, you can no longer be sure that access to other means would indefinitely remain off limits, especially if AI deems them desirable in the context of "instrumental convergence".

Expand full comment

What do you mean by agency? Do you think AlphaStar and OpenAI Five have agency?

Expand full comment

Within certain constraints - absolutely. The whole question of AI risk boils down to are humans smart enough to define those constraints effectively enough to prevent catastrophic results when AI gets to the point that it can expand its own capabilities.

Expand full comment

Typo: exclusives -> explosives

Expand full comment

I think the idea of recursive self improvement is one of the most overrated by non computer scientists. The current advances have mostly come through more data and assuming we scale in this fashion a smarter AI is not necessarily going to be able to keep ingesting more data (e.g. assuming we train GPT-N on X% of all knowledge in Y hours using Z bytes of data, GPT-N is not going to have a way to build GPT-N+1 in less than Y hours or Z bytes of space, nor will GPT-N necessarily be able to even scale X efficiently (learning all of Wikipedia is much easier than learning everything on the internet is much easier than learning non digital knowledge)). Even if we design a new approach to building a neural network, there is no reason to believe that there will be large areas of improvement available on that path.

Expand full comment

Reinforcement learning (or similar) would be the path used to achieve recursive self-improvement, GPT just sucks in data, it never evaluates whether it is 'good' or not.

Expand full comment

It also ignores pretty much everything known about complexity theory. Recursive self improvement is just God of the Gaps for nerds.

Expand full comment

You can't sail east to reach Asia, you'll just fall off the edge of the earth.

Expand full comment

It's one thing to just throw out low effort sarcasm instead of actually making an argument, but could you at least have the decency to not repeat popular myths about history when doing so?

Expand full comment

There is nothing about complexity theory that forgoes self-improving algorithms. But go along and assert to the contrary while adding nothing to the conversation.

Expand full comment

A tiny suggestion for fixing social media. or 2. a) Have human moderators that zing "bad" content and penalize the algorithm for getting zinged until it learns how not to get zinged.

b) Tax the ad revenue progressively by intensity of use. Some difficulty of defining "intensity" hours per day or per week? counts only if "related" page views?

Expand full comment

If you let human moderators zing "bad" content, then the algorithm will learn that vaccines are bad and Barack Obama was a Secret Muslim. If you pre-select the moderators and what can be learned, you've just recreated how algorithms are tweaked now.

Expand full comment

If Facebook is already zinging "bad" stuff, then they just need to turn up the penalty parameter in their their AI tweaking software

Expand full comment

If Facebook had a way to ensure that the “bad” content they’re zinging is actually bad, then the problem would be solved already, but we are not in that world.

The only reason that we are in the situation that we are with respect to Facebook moderation is that “have moderators zing the Actually Bad stuff (and only that)” is not an option on the table, regardless of whether algorithms are involved, or they are just doing it manually.

Expand full comment

The don't even have to zing ONLY "bad stuff" to improve things.

Expand full comment
founding

It seems like there's a potential crux here that Scott vaguely alluded to in a couple of these responses but didn't tackle quite as directly as I'd have liked: Can the current ML paradigm scale all the way up to AGI? (Or, more generally, to what Open Phil calls "transformative AI"?)

The response to Chris Thomas suggests that Scott thinks it can, since he sketches out a scenario where pretty much that happens. Meanwhile, pseudo-Dionysus seems to assume it can't, since he uses the relationship between gunpowder and nuclear weapons as a metaphor, and the techniques used to scale gunpowder weapons didn't in fact scale up to nukes; inventing nukes required multiple paradigm shifts and solving a lot of problems that the Byzantines were too confused to even begin to make progress on?

So is this the case for ML, or not? Seems hard to know with high confidence, since prediction is difficult, especially about the future. You can find plenty of really smart experts arguing both sides of this. It seems to be at least fashionable in safety-adjacent AI circles right now to claim that the current paradigm will indeed scale to transformative AI, and I do put some weight on that, and I think people (like me) who don't know what they're talking about should hesitate to dismiss that entirely.

On the other hand, just going by my own reasoning abilities, my guess is that the current paradigm will not scale to transformative AI, and it will require resolving some questions that we're still too confused about to make progress. My favorite argument for this position is https://srconstantin.wordpress.com/2017/02/21/strong-ai-isnt-here-yet/

I don't think people who believe this should rest easy, though! It seems to me that it's hard to predict in advance how many fundamental breakthrough insights might be needed, and they could happen at any time. The Lindy effect is not particularly on our side here since the field of AI is only about 70 years old; it would not be surprising to see a lot more fundamental breakthrough insights this century, and if they turn out to be the ones that enable transformative AI, and alignment turns out to be hard (a whole separate controversy that I don't want to get into here), and we didn't do the technical and strategic prep work to be ready to handle it, then we'll be in trouble.

(Disclaimer: I'm a rank amateur, and after nine years of reading blog posts about this subject I still don't have any defensible opinions at all.)

Expand full comment

Yes, it seems that there is a middle ground here: there seems to be some trick, or several tricks, that evolution has found but machine learning researchers haven't discovered yet. I think it's *not* scaling up something machine learning does already, but it's likely just a matter of inventing the right software architecture, and there's no reason to think that a group of smart ML researchers won't discover it, perhaps tomorrow or in fifty years. And once found, there's no reason to believe it won't scale.

I take GPT-3 demos as showing how existing techniques *don't* do this yet. It's an interesting imitation if you aren't paying attention, but falls apart if you really try to make sense of the output.

Expand full comment

Yeah but the worrying part about GPT-3 is:

I think GPT-3 is already more more knowledgeable and "smart" (let's say 'one-step smart', the sort of reasoning you can do at a glance) than any human alive. I think this because it'd kind of have to be to reach the quality of output that it has while also being dumb as a rock in other ways. So we may consider that "smartness overhead". If true, that suggests that once we find the secret sauce, takeoff will be very rapid.

(My two leading candidates are online learning and reflectivity as a side product of explainability research.)

Expand full comment

From the fact that GPT-3 is very effective at some tasks while being dumb as a rock in other ways, I'd reach the opposite conclusion.

This sort of 'smart along just a few directions' intelligence is also exhibited by non-AI computer programs, and your explanation makes no sense there.

Computers are dumb as sand in many ways, yet WolframAlpha will solve many complicated math problems and give you some human-friendly steps that lead to this solution. It almost looks creative how it explains what it does, but most of it is 'just' an expert system, arguably a complete dead-end for general intelligence.

Expand full comment

Yeah, but - the saying used to be that what's easy for computers is hard for humans, and what's easy for humans is hard for computers. Now I think GPT is ranging into areas where with its herculean effort, it can fake things that are easy for humans surprisingly well. I don't know if, say, a human sleepwalker could fake being conversationally awake as well as GPT-3 does, and that suggests to me that whatever thing we have that makes easy things easy for us, GPT has more of.

Expand full comment

If there a one thing that makes easy things easy for us, that GPT has more of, how do you explain its failure modes?

For instance, GPT is still imperfect at world modeling. It can write things that violate physics (GPT-2 happily had fire underwater, or other funny and clearly unintentional mistakes).

It doesn't always know how many body parts animals have, or the relative sizes of objects. It's bad at math.

And my point is very much *not* that GPT is bad, or unimpressive, or unintelligent.

But, from looking at the architecture, from looking at the failure modes of scaled-down GPT, it's plain to see that its skills are *built around* language. Then, world modeling emerges, for the sole purpose of making perplexity go down. Number manipulation, ever so slowly, becomes less disastrous. Perplexity twitches downwards.

Humans, on the other hand, are not stuck in a cave allegory made of text. We have many more inputs that teach us about the world, let us manipulate objects, and so these things are made easy for us long before we grow a propensity for writing endless amounts of convincingly pointless prose.

In other words, I think GPT-3's proficiency at immitating human writing is not a herculean effort, it is precisely the most natural thing for GPT-3 to be doing. Math, for GPT, is a herculean effort. And it sucks at it.

So when GPT-3 has visible failure modes in its primary task, I think we should conclude it's exactly as smart as it's observed to be.

Expand full comment

I seem to remember watching lots of cartoons as a kid that were equally imperfect at world modeling - fire underwater, animals with the wrong number of body parts, objects with the wrong relative size. I don't think underwater fire would happen in a cartoon series primarily set on land, like Scooby Doo, but it would not be uncommon in a series set entirely underwater, like SpongeBob, the way that Jetsons and Flintstones had futuristic/dinosaur things that didn't make sense as futuristic/dinosaur, but just as thematic copies of ordinary 1960s life.

Expand full comment

As I said, I think GPT-3 as an intelligence is "missing parts", or rather, I think the way GPT-3 works matches it to a particular part of the human mind, which is unreflective reaction. In other words, the first thing that comes to mind on considering a topic, in maybe the first 100 milliseconds or so, without awareness or consideration, assuming no filters and verbalizing every thought immediately and completely. A purely instinctive reaction, akin maybe to AlphaGo without tree search. A "feeling about the topic."

Expand full comment

I agree we're still at least one huge breakthrough away. Scott also seems to kind of agree, per

>But it’s possible that humans have a lot of inbuilt structures that make this easier/more-natural, and without those AIs won’t feel “agentic”. This is the thing I think is likeliest to require real paradigm-shifting advances instead of just steady progress.

Some OpenAI people seem to think scaling might take us all the way there, but I think the (vast?) majority of AI researchers agree there needs to be at least one novel paradigm-shifting development.

My even more amateurish, speculative, low-confidence opinion is that it may be a little like the classic half-joke for programming projects: "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."

------------------------

And to continue with the pattern of overly-labored metaphors:

I suspect scaling up neural networks and improving neural network techniques will get us very close to (or beyond) critical components of the human brain, but it may be like a thermonuclear fusion bomb. There's a critical balance required between the fission and fusion stages. No matter how powerful or sophisticated you make one stage, everything has to work in harmony, else you get something far weaker. The first stage is the first 90%, and the second (and potentially third) stage is the other 90%.

Alternatively, a super advanced AlphaAnything neural network (or ensemble of neural networks, or something) might be like a 50 ton antimatter hammer wielded by a gnat. The gnat's nimble and can slam it in the general direction of things and might easily level Australia, but it lacks complex self-direction and productive "agentic" attention beyond a few simple built-in reward/cost functions. (Find food, avoid obstacles/getting hit or eaten, reproduce.)

Or, instead of a hammer, it could be a neural network within a cyberbrain a la Ghost in the Shell, or some other kind of device that the gnat's brain somehow interfaces with with very low latency and that trains on the world around it and some of the gnat brain's signaling. It may become exponentially more effective at evading danger and finding food, perhaps by "intuitively" understanding physics and being able to predict outcomes much more accurately and quickly (similar to GitS "post-humans"), but that might be it. It has a very narrow set of motivations, so all of that immense predictive and analytical ability is never lifted to its true potential.

------------------------

Maybe this is where the nested prefrontal cortex layers and the attention and motivation stuff Scott talks about could come in. Some chunk of the powerful neural network could repurpose itself and take the role of directing the network and constructing a useful model of attention and motivation on top of the more base motivations. And the same might be true of humans. Especially right after birth; maybe those layers mostly start organizing and working in real-time rather than ahead of time, then eventually crystallize to a degree.

Open an adult human's skull and carefully remove certain parts of the prefrontal cortex (and maybe other areas?) and you might retain adult-like raw neural network-bequeathed skills, like being really good at throwing rocks or filing TPS reports or something, but with the attention and executive function of an infant or toddler. Something a little bit like this does seem to often happen when brain trauma damages the prefrontal cortex, I think.

------------------------

If Joscha Bach's hypotheses about consciousness and attention are right, then it's possible that the problem of artificially developing this system may overlap with the hard problem of consciousness. If so, that last 90% might take a very long time. Or maybe they could be right but it won't take that long to create and and the hard problem of consciousness will shockingly turn out to not be as hard as we thought. Or perhaps implementing consciousness and/or high-level generalizability will somehow be a lot easier than understanding it (e.g. make a blackbox AI do it), where you kind of "fake it till you make it" but can't show your work after you successfully make it.

(But in any case, the most likely answer is probably that his hypotheses simply aren't right, or are only a small piece of the puzzle.)

Expand full comment

FWIW, I think there are problems too complex for humans to understand. And that the boundary is pretty low (possibly as few as seven independent parameters). But also that *most* problems are simple enough to be handled within this bound.

OTOH, since we basically can't see the complex problems, we don't know whether they are important or not.

If this analysis is at all correct, then simple scaling up of the AI that included increased "stack depth"(metaphor) *might* be transformative. And we couldn't know until it was running. At which point its goals would determine the result. So its really important to get the goals correct NOW, which is basically impossible, because currently the AIs don't understand the existence of an external reality. But it can be worked on now, and multiple proposals evaluated and tried in toy systems. Perhaps a "good enough" result can be obtained.

Expand full comment

And why not fix spam calls by charging incoming calls just as some kinds of outgoing Call used to be charged. Say it's 10 cents per call credited to the answer's account. Not a big obstacle to normal personal calls, but it would make cold call spam unaffordable.

Expand full comment

The problem with that is that you need someone on the spammer's end (in particular, their telephone company) to co-operate in order to actually extract the money from them. However, the spammers and spammees are not co-located, and the spammers' governments have no real reason to mandate such a scheme.

Expand full comment

First my issue is not attempted fraud but unwanted cold calls; they make the telephone nearly useless. Second, it's just a different way to pay for your telephone service and does not require anyone's active cooperation.

Expand full comment

My point is that to make cold-call spam unaffordable, you have to find some way to make the spammer actually pay that 10c, which they naturally do not want to do (by assumption, it would make their business collapse).

Expand full comment

The way to do it is to charge for all calls. If a spammer or anyone else does not want to pay ATT/verizon. etc they do not have to allow them access to their circuits.

Expand full comment

It seems like this would have a very hard time getting a critical mass of phone companies agreeing in order to avoid the hassle of people with non-agreeing phone companies being unable to call you.

Expand full comment

How do you know whose account to charge? There's no authentication. The spam call from overseas pinky promises that it's coming from a number in your area and the system trusts it.

Expand full comment

A call originating overseas would be just like a call originating in the US. The caller pays a fe cents for making the call. The carrier charges the maker up front

Expand full comment

"AIs have to play zillions of games of chess to get good, but humans get good after only a few thousand games"

You can make an argument that for each chess position humans consider many possible moves that haven't actually happened in the games they play or analyze and that we can program a computer to also learn this way, without actually playing out zillions of games. It's just that it doesn't seem like the most straightforward way to go in order to get a high-rated program.

Expand full comment

this is essentially what the chess playing AIs do

Expand full comment

except they play out several scenarios all the way to end game

Expand full comment
founding

I think the evidence is that humans have a pretty general ability to 'chunk' things they're studying/analyzing/observing – something like a generic classification/categorization algorithm – and, AFAIK, there are no AI architectures that work like that.

I used to think that was an important, maybe crucial, missing ingredient in AI systems, but I'm becoming more and more skeptical that that's the case given the relentless advance of AI.

Expand full comment

> I don’t know, it would seem weird if this quickly-advancing technology being researched by incredibly smart people with billions of dollars in research funding from lots of megacorporations just reached some point and then stopped.

This wouldn't be weird, it's the normal course of all technologies. The usual course of technology development looks like a logistic curve: a long period in which we don't have it yet, a period of rapid growth (exponential-looking) as we learn about it and discoveries feed on each other, and then diminishing returns as we fully explore the problem space and reach the limits of what's possible in the domain. (The usual example here is aerospace. After sixty years in which we went from the Wright Flyer to jumbo jets, who would predict that another sixty years later the state of the art in aerospace would be basically identical to 1960s jets, but 30% more fuel-efficient?)

It seems like the 2010s have been in the high-growth period for AI/ML, just as the 40s were for aerospace and the 80s were for silicon. But it's still far too early to say where the asymptote of that particular logistic is. Perhaps it's somewhere above human-equivalent, or perhaps it's just a GPT-7 that can write newspaper articles but not much more. The latter outcome would not be especially surprising.

Expand full comment

To strengthen this point, it might be worth considering GPT-2 vs GPT-3. The key difference between the models is, rather openly, "merely" size - the latter is ~x17 times larger than the former, with no other essential changes. Does GPT-3 perform 17 times better than GPT-2? I cheerfully acknowledge that the question is a bad one - but not utterly without meaning. My intuitive, hard-to-quantify gut feeling would be that "x4" would be a better guess at whatever that is than "x17". It wouldn't be inconsistent with the ratio of improvement to scaling experienced in other domains. Whatever the key is (or more reasonably, are) to obtaining better generalization, long-term memory, "common sense" in the sense of reasonable priors, and other open problems that GPT-3 is still very much struggling with - it likely won't be just scaling. And it might or might not exist.

Expand full comment

Minor correction: not 17x bigger, but 115x bigger. GPT-3 had 175 billion parameters, while the largest version of GPT-2 had 1.5 billion.

Expand full comment

Darn! Sorry! I remembered the 175B part, and the ~10 ratio part for 2 over 1, but wrongly recalled that GPT-1 was the one with 1.5B :/

Expand full comment

Thank you! I came looking for this point.

Actually the prior on AI just stopping should be really, really high. I'm not a good Bayesian so I can't say how high but maybe it should be 1? We haven't had all that many decades of AI research in our society because computers are still pretty new, but we already had an event famously known as the "AI winter" where everyone was super optimistic and assuming that giant planet-sized brains were an inevitability, and then one day the hype caught up with it, the whole field just ran out of steam and research slowed to a trickle.

Surely if we're doing Bayesian reasoning, the chance of a second AI winter must be rated pretty highly? Especially given that there are some suspicious similarities with the prior wave of AI research, namely, a lot of sky-high claims combined with relatively few shipping products or at least relatively limited impact. Google Brain has done by far the best job of product-izing AI and getting it into the hands of real people, and that's praise-worthy, but outside of the assistant product the AI upgrades all seem to be incremental. Google Translate gets X% better, Google Speech Recognition gets Y% better, search results get Z% better and so on. They're nice to have but they aren't Industry 4.0 or whatever today's buzzword is.

Even at Google, there is a risk of another AI winter. DeepMind is extraordinarily expensive and has a noticeable lack of interest in building products. They're also the only major lab doing anything even approximating AGI research. If the political winds shifted at Google and DeepMind had its funding drastically cut for some reason, AGI research could easily enter another AI winter. Not only would the place writing most of the best papers be gone but it would send a strong signal to everyone else that it's not worth spending time on. Even if other parts of the industry kept optimizing pattern recognition NNs, agent-based learning would be dead because basically nobody except DeepMind and a few low impact university teams cares about that.

Expand full comment

I think you're somewhat overstating your case, with "relatively few shipping products". You barely ever touch a product that AI wasn't involved in. Now, much of it is not necessarily deep learning - but why is that bad? The intel chips in your laptop are AI-designed to an extent. The recommender system that offered you that next youtube video is AI-based. The ad that played mid-video was chosen by AI. You didn't buy it because it was annoying and shopped on amazon instead? The recommended products and the review summary were generated by AI. The financial transaction was ascertained as low-risk to be a fraud by an AI. The product was moved and loaded by a combination of AI planning, an actual robot and people obeying AI-generated instructions. It was then shipped by vehicles whose safety features are AI-based, using an AI-based pricing system. It was then delivered to you by people recommended for this delivery based on AI. You got an e-mail about it that was correctly not sent to spam unlike a million others that were filtered out - correctly - by an AI. Oh, it was the most popular house safety tool, Amazon Ring? Guess what powers that. Ah, sorry, it was a good old-fashioned toaster? I wonder what's behind the technology to detect defective products and save money for the factory. And when they're building a new factory - full of robots, incidentally - what's watching over construction men to ensure safety, increasingly? I could go on and on and on and on - but to summarize, you're not the customer for AI, usually. But everybody you are a customer of, is.

Expand full comment

I think a lot of your examples are conflating AI with algorithms in general.

I've actually built risk analysis systems and worked on the Gmail spam filter in the past. You could describe them as "AI" because they make probability based decisions but they were mostly just hand-coded logic and some statistics. Even non-neural conventional ML was responsible for only 1% of classifications when I worked on Gmail spam, the rest was all ordinary code + data tables with some basic stats like moving averages. These days they use more AI than they used to of course, but the system worked great before that. Just like the rest, it's incremental.

Now, that's Google where they're now putting neural nets into everything nearly as a moral imperative. It's an extreme outlier. In most organizations my experience has been that there's lots of talk about AI but relatively little usage. Also, the point at which "hand written logic+stats" blurs into "AI" is highly elastic. Like, you cite products being loaded and moved using "AI planning". Well, the algorithms for pathfinding are well known for decades and you can't really beat them using neural nets, so unless you mean something different to what I'm imagining when you say planning, I'd be very surprised by that. An Amazon warehouse with a bunch of Kivas running around is a highly controlled environment. The A* algorithm will kill that problem stone dead every single time, so why would anyone use an AI for that? It could only be slower and less reliable than conventional techniques.

Expand full comment

To clarify - I used "AI" as a shortcut for "Algorithms for which I personally know there has been major progress in the last few years". "Personally know" means "have personally worked on" or "have close friends who personally worked on and told me". I emphasize the last point to make clear that I don't rely on hype for these.

Those Kivas might run into any number of surprises, and used to. Safety and efficiency were improved by camera usage. The planning itself isn't quite AI, but combining it with camera input is non-trivial. A friend was on that team.

But let's go over my examples. The e-mail and youtube stuff - you acknowledge (though calling it an outlier). Of course, I haven't really gotten into the ad industry - but it utilizes deep learning heavily, e.g. for feature extraction from images.

Amazon famously started the whole AWS business for its internal uses, so of course it is another extreme outlier. A close friend was on the review summary team.

Financial transactions verification - this mostly uses random forests. But vanilla CART RFs won't do - it takes SOTA xgboost/catboost to be competitive here. A family member is doing that.

Ring does most of what it does using deep networks. I don't actually know anybody there personally, but I did some work for competitors in the past, sorta.

Do I need to justify vehicle safety features? Worked in that domain for a few years. 95% DL-based.

Defect detection- a friend works in a successful startup doing that. Same for construction safety. Done 100% based on deep networks.

So no, I don't think it's accurate to call all that "limited" or purely incremental. And the list was very incomplete. I'd say that's how a major shift feels from the inside.

Expand full comment

Hmm. Well, OK, you're using quite a different definition of AI than I think the one Scott is using.

Expand full comment

Hmm what quite a different definition could there be?..

If you think Scott is essentially equating AI with, say, reinforcement learning, then this is an incredibly narrow view. I find it hard to imagine he wouldn’t consider driver assistance systems as AI.

If either you or Scott equate AI with deep learning, then either you or Scott should revisit the actual structure of AlphaGo :)

If the specific example of Kivas still bothers you, feel free to set it aside. Though how a complex system combining cameras, other sensors, and sure, A*, is not an AI?

More important than the exact definition is whether *progress* in these examples is indicative of progress in AI.

Expand full comment
founding

> I think a lot of your examples are conflating AI with algorithms in general.

That's almost a meme at this point, i.e. that any new shiny algorithm is marketed as 'AI' until it's sufficiently understood well enough to be demoted to 'just an algorithm'.

Expand full comment

Let me repeat myself. The AI in question was *NOT* the A*. It was the robots detecting obstacles (humans, other robots, fallen objects etc.) using deep-learning powered computer vision and incorporating that into the planning. So no, it's not "stuff like A*". I don't want to quibble over definitions of AI, but if complex systems managing themselves using state-of-the-art computer vision isn't AI or progress in AI - what would count?!

Expand full comment
founding

> The AI in question was *NOT* the A*.

I don't know what this is referring to.

I was replying in particular to this:

> I've actually built risk analysis systems and worked on the Gmail spam filter in the past. You could describe them as "AI" because they make probability based decisions but they were mostly just hand-coded logic and some statistics.

And I don't exactly disagree, but I also don't think every 'AI algorithm' is 'intelligent', nor would that be a reasonable goal of AI research.

I had an 'old' AI textbook – from sometime in the mid '90s? There was some earlier neural network algorithms discussed but, at the time, I don't think anyone had been able to do anything really that impressive with it. Were/are any of _those_ algorithms 'intelligent'? Maybe, but I'd lean towards 'probably not' – even at contemporary scales.

But then, generally, I _expect_ AI to produce 'unintelligent algorithms' as they explore the space of 'intelligent behavior' and do something kind of like 'decompose' intelligence into a bunch of 'reductionist' components.

> ... if complex systems managing themselves using state-of-the-art computer vision isn't AI or progress in AI - what would count?!

I agree that the systems you mention ARE AI and a demonstration of AI progress.

I was commenting more on your seeming relegation of 'old AI' – things I definitely think of as _prior_ progress in AI – as 'not AI', and pointing out that that phenomena, whereby 'prior AI' commonly becomes 'not AI', is both common (and, to me, amusing).

Expand full comment
founding

Oh – "A*" actually _was_ referring to the pathfinding algorithm! I wasn't sure (as I hadn't read the entirety of your comment previously) – my bad!

But I still think A* _is_ ('old') AI and a (prior) demonstration of "progress in AI", even if it isn't – by itself – 'intelligent'.

Expand full comment

"You didn't buy it because it was annoying and shopped on amazon instead? The recommended products and the review summary were generated by AI."

Ah, so that's why the Amazon online sorting system has become really crappy now? I try searching for "tin openers" and if I go outside the "Featured" recommendations (e.g. sorting on price) suddenly the page includes dog's food bowls and seventy identical versions of the same blouse, except it's "red blouse", "blue blouse" and so on?

It really has degenerated from "I want product X - I type in name of X - it returns selection of X" to the "this is paid-for featured product/this is ninety crappy and unrelated Chinese goods".

Expand full comment

The Amazon algorithm system assumes that if you buy one item you’re starting a collection of said items. Buy a coat and you’re a coat collector. Buy a kettle and you’re now a dedicated collector of kettles. Sometimes they add in a toaster.

Expand full comment

It certainly is becoming more AI-based. Whether it also gets crappier is beyond me to say :) Anecdotally, for me it doesn’t but perhaps you’re not a typical user. Or perhaps you’re right. Or both.

Expand full comment

True, anecdotes are not data. But I have noticed over the past couple of years that I type in a search term and it comes back with results which are "No, that is definitely not a teapot, that's a lampstand. And I have no idea why the adult lingerie was in there".

Expand full comment

I was curious, so I tried "teapot" just now. Got 100% teapots, hundreds of items down the line :) And I lead a very boring life, so no adult lingerie was included.

I don't mean to doubt your experience - but that could have any number of explanations, and "Amazon is blindly using this new shiny AI to their detriment" is quite down the list, IMO. Perhaps the failures did become worse, but the average improved? Perhaps it is much better for typical users, which you might not be? Perhaps you are now further exposed to lampstands, and this increases the odds of you deciding to buy stuff not on your original list (for a generic "you")? After all, it's not about your satisfaction. It's about your money.

And then there's the fact that many, many products are not in fact entered into the system by Amazon, but by third parties. A bad description might very well account for poor recommendations - which will lead to those products not selling well, which is a bummer but not such a big deal for Amazon.

Finally, was the AI trying to flirt, with that lingerie? :)

Almost all of the above seem more likely to me than a famously data-obsessed, demonstrably successful company strategically ignoring a move to AI-based systems being disastrously bad for them, over two years of fantastic stock increases.

Expand full comment

Yeah: "You should extrapolate constant, linear progress" is a weird thing for Scott to hang a prior on given that that hardly ever happens.

Expand full comment

One interesting point about the air travel thing: I think everyone who has thought for a few minutes about the physics of air travel would naturally have predicted there would be a phase transition in travel speeds once we reached the speed of sound. But if we measure in terms of actual number of passengers that have traveled (which is probably partly a proxy for broader population and wealth, but partly a proxy for the price and quality of the travel experience) we find that within the United States it basically continued at a rate of doubling every decade, apart from a decade of stagnation after 9/11.

(I found data for 1950-1987 in this paper: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-2257.1992.tb00580.x

and data 1990-2020 here: https://www.statista.com/statistics/186184/passengers-boarded-by-us-air-carriers-since-1990/ )

Expand full comment

I can't tell whether the decade-long stagnation after 9/11 vindicates the claim that these trends stop in unpredictable ways, or should be seen as an external shock that is no challenge to the thesis that endogenous growth continues even as the most visible aspects of the technology (i.e., air speed) stall.

Expand full comment

Measuring the number of passengers is clearly goalpost shifting. It’s like using Moore’s law for computing progress and when that stagnates instead using the number of computers in use.

Expand full comment

It's definitely a different goalpost. But I think in terms of measuring the significance of flight, it's just as natural a goalpost to use as the speed of travel. I don't know if Moore's law works better in terms of size of processors, cost of processors, or speed of a given cost of computer, but it seems quite plausible that it might stall out in one of these metrics without stalling out in the others.

Expand full comment

It's not analogous, though. We're trying to predict growth in the "intelligent" seeming capabilities of software systems, not the number of deployed software systems. The birth of IOT devices probably shot that through the stratosphere more than any other event ever will, but many are among the most stupid pieces of software ever written.

Expand full comment

Yeah, I don't mean to suggest that one is clearly a better analogy than the other. My point was just that the standard clearest case of a technological trend that seemed to stall out is one that stalled in some ways and continued in others, and it's not obvious in advance which aspect is the one that is most relevant for the future changes we are interested in.

For instance, how much of the smartphone revolution we've been living through for the past decade would have happened basically the same if the phones themselves had stalled out at 2010-era capabilities but the numbers had kept increasing? How much would have happened if the numbers of phones had stalled out at 2010-era levels but the power of the individual phones had kept increasing? Reaching the power of GPS, a camera, and 3g network capabilities was probably essential to huge amounts of what happened (Uber/Lyft, Tinder/Grindr, the Arab Spring). It's not immediately obvious to me how much of the increasing power of phones since then has been significant - it's mainly been their ubiquity that matters. Though maybe there will be another transition in the capabilities, or maybe there are ways that I have underestimated the importance of the improving capacities (the way the traditional aerospace stagnation story misses the significance of the improvements in planes that did happen since then that enabled cheap vacation travel from Bratislava or Wuhan).

Expand full comment

The number of people who have ever flown isn't a technological trend, it's an economic one. There are certainly good reasons to care about economic trends, but they seem different in kind from technological advancement.

Likewise, an argument that AI will cap out in total capability not far above what we have now is decidedly not an argument that AI will be insignificant to society or the economy. Capabilities no greater than GPT-3, made fully ubiquitous and better-engineered, could be the basis of enormous economic changes. But that's a separate argument from whether and when we can expect human-equivalent AI.

Expand full comment

It seems like if you want practice at securing things against intelligent opponents, computer security is the place to be. And we aren't very good at that, are we? Ransomware attacks get worse every year.

For "hard" computer security (computers getting owned), I think the only hope is to make computers that are so simple that there are provably no bugs. I'm not sure anyone is doing serious work on that. Precursor [1] seems promising but probably could use more funding.

But beyond that there are financial and social attacks, and we are clearly helpless against them. Cryptocurrency shows that if even an unintelligent algorithm promises people riches then many people will enthusiastically take its side. (Though, it's sort of good for "hard" computer security, though, since it funds ransomware.)

[1] https://www.bunniestudios.com/blog/?p=5921

Expand full comment

We're actually fine at computer security in some cases, just not computer security in the realm of a public Internet that is supposed to allow arbitrary clients to connect anonymously, or arbitrary consumer devices that need to allow consumers to install and run arbitrary code.

But there are plenty of high-value military computer systems that have existed for over half a century as the target of much more sophisticated attackers than ransomware gangs, but have never been breached.

Expand full comment

Yes, there's always a tradeoff between convenience and security that's rarely acknowledged.

Expand full comment
founding

I wouldn't expect to know tho if any of those "high-value military computer systems" had ever been breached.

And I _would_ assume that, generally, the easiest way to breach them would be thru the humans with access to them, i.e. NOT by 'hacking' them directly from another computer with access to a connected network.

Expand full comment

I missed this one the first time around, but a more prosaic response than Scott's:

> Soon gunpowder weapons will be powerful enough to blow up an entire city! If everyone keeps using them, all the cities in the world will get destroyed, and it'll be the end of civilization. We need to form a Gunpowder Safety Committee to mitigate the risk of superexplosions."

https://en.wikipedia.org/wiki/Largest_artificial_non-nuclear_explosions

I guarantee you, if it looks like you have the makings of a new entry for that list, you'll get a visit from one of the many, many different Safety Committees. This is an example of a problem that was solved through a large, interlocking set of intentional mechanisms developed as a response to tragedy, not one that faded away on its own.

(I'll agree that nuclear belongs in a category of its own, but quibbling over whether those all count as "gunpowder" undercuts the hypothetical - that's a demand for chemical consistency beyond what our Greek engineer would have witnessed in the first place.)

Expand full comment

Good point, good link. If the Greeks had got fuel-air explosions out of Greek Fire we'd be in alt-history.

Expand full comment

Come up with an AI that can teach itself how to surf. That would be impressive.

Expand full comment

What would be impressive is not that an AI could teach itself to surf, but how an AI could "experience" the "zillions" of iterations in the real world. Not to mention, the real world is far more complicated than the sandbox environment of a board game or even video game. Starcraft behaves in the same way every time. A wave doesn't, let alone all of the other environmental factors related to the ocean and beach.

I think if we saw an AI (at least with current learning technology) trying to do a moderately complicated real-world task, we would all laugh and walk away. It's only in the world of the internet that an AI can look scary to humans.

Expand full comment

I agree we're not there yet, but it's important to remember that for humans a lot of those zillion iterations were our ancestors. We're all the product of some outer loop optimization problem. We don't start from scratch in any way. So using those zillion training loops to get something that can learn new tasks quickly is the right comparison. Again, we're not there yet but we are making progress.

Expand full comment

I think this is where the Boston Robotics "big dog" robots are really interesting, that have learned to walk and run on various kinds of surfaces. I seem to recall hearing that they got one that learned to fold laundry recently, which is an interestingly challenging task.

Expand full comment

The omniscient view. Not self-aware really. Self-awareness is much overrated. Most automation works far better as part of a whole, and even if human powerful, it does not need to self-know. ..."

Are you familiar with Julian Jaynes?

Expand full comment

"The trophy won't fit in my suitcase. I need a larger one."

- A super simple idea that we don't even think of as being ambiguous or difficult. But that's because we live in an actual world with suitcases and trophies. I'm not sure that if you skinner-boxed an actual human being in a room with a dictionary, infinite time to figure it out, and rewards for progress, they'd be able to figure it out.

It is absolutely true that the "learning blob" is wildly adaptable. But at the end of the day, it's a slave to its inputs.

That doesn't mean that AI isn't dangerous. As long as we have little insight into how it makes choices, it will remain possible that it's making choices using rationales we'd find repugnant. And as long as we keep giving algorithms more influence over the institutions that run much of our lives, they will have the capacity for great harm.

But we've been algorithming society since the industrial age. Computers do it harder, better, faster, stronger, but Comcast customer service doesn't need HAL to make day-to-day life a Kafkaesque nightmare for everyone including Comcast - they just need Goodhart. Goodhart will win without computers - it'll win faster with them, granted.

Expand full comment

I'm in the AGI is further off camp, but Winograd schemas are no longer good examples of tasks that the current ML paradigm won't solve. GPT-3 got 89% on Winograd and 77% on Winogrande in a few-shot setting.

Expand full comment

I think the reason skeptics insist on the AGI achieving consciousness is that this is the only way we know of for inferential reasoning, and brand-new ideas. Current forms of AI have zero ability to reason inferentially, and zero capability for coming up with new ideas. The only reason they can "learn" to do "new things" is because the capability of doing the new things was wired into them by their programmers. They need to find the most efficient path of doing the new things, to be sure, but this is basically a gigantic multi-dimensional curve-fitting problem, an exercise in deductive logic. From the strict comparison-to-human point of view, it's no more "learning" than my calculator "learns" when it finds the best slope for a collection of data via a least-squares algorithm, although I can understand the use of the shorthand.

We should just remember it *is* a shorthand, though, and that there is a profound difference* between an AI learning to play Go very well and a human child realizing that the noises coming out of an adult's mouth are abstract symbols for things and actions and meaning, and proceeding to decipher the code.

If you had an AI that could *invent* a new game, with new rules, based on its mere knowledge that games exist, and then proceed to become good at the new game -- a task human intelligence can accomplish with ease -- then you'd have a case for AI learning that was in the same universality class as human intelligence. But I don't know of any examples.

If you had an example of an AI that posed a question that was not thought of by it programmers, was indeed entirely outside the scope of their imagination of what the AI could or should do -- again, something humans do all the time -- then you'd have a case for an AI being capable of original creative thought. But again I know of no such examples.

*Without* creative original thought and inferential reasoning, it is by definition impossible for an AI to exceed the design principles it was given by humans, it is merely a very sophisticated and powerful machine. (The fact that we may not be clear on the details of how the machine is operating strike me as a trivial distinction: we design drugs all the time where we don't know the exact mechanism of action, and we built combustion engines long before we fully understood the chemistry of combustion. We *routinely* resort to phenomenology in our design processes elsewhere, I see no reason to be surprised by it with respect to computer programming, now that it is more mature.)

And if an AI cannot exceed its design principles, then it is not a *new* category of existential threat. It's just another way in which we can build machines -- like RBMK nuclear reactors, say, or thalidomide, or self-driving cars -- that are capable of killing us, if we are insufficiently careful about their control mechanisms. We can already build "Skynet," in the sense that we can build a massive computer program that controls all our nuclear weapons, and we can do it stupidly, so that some bug or other causes it to nuke us accidentally (from our point of view). But that's a long way from a brand-new type of threat, a life-form that can and does form the original intention of doing us harm, and comes up with novel and creative ways to do it.

-------------

* And I don't really see why you assume that difference is one of mere degree, as opposed to being a quantum leap, a not-conscious/conscious band gap across which one must jump all at once, and cannot evolve across gradually. If that were true, one would expect to see a smooth variation in consciousness among people, just as we see a smooth variation in height or weight: some people would be "more conscious" and some would be "less conscious." Through the use of drugs we should be able to simulate states of 25% conscious, or 75%, or maybe 105%. (And I don't mean "being awake" here, so that sleep counts as "not conscoius," I mean *self-aware*, the usual base definition of being a conscious creature.) How would we even define a state of being 75% as self-aware as someone else? So far as our common internal experience seems to go, being a conscious self-aware being is all-or-nothing, you either are or your aren't, it's a quantum switch. That doesn't really support the idea that it could gradually evolve.

Expand full comment

Maybe I should add that I don't doubt for a minute that an AGI *is* possible. That's because I'm a monist, and I don't think there is anything about *human* intelligence that doesn't derive mechanically from the workings of cells and molecules. Since human intelligence exists, for me that is a sufficient proof that consciousness can be built, at the very least out of proteins, and I have no particular reason to think also out of circuits on silicon.

However, I don't think we have the faintest idea of *how* to do that, and that if present progress is any guide, we are as far away from that accomplishment as were the Greeks from rational drug design once they hypothesized that all matter was made of atoms. Not decades, not even centuries, but millenia is the right timeframe, I think.

Expand full comment

>Not decades, not even centuries, but millenia is the right timeframe, I think.

I don't think we have any meaningful context from which to extrapolate what millennia of technological or scientific development looks like in a present or post-present day context. In a sense, we've had millennia with which to develop our current technology, but in another, very meaningful sense, we've only been engaged in concerted tech development for a few hundred years at best. We have individual companies today dedicating more concentrated person-hours to tech development than were likely engaged in such across the entire world 300 years ago, let alone a thousand. And those companies are networking with and building on developments from other similarly large companies, universities, etc. around the world. In some ways, our technology is inferior to what people from a century ago might have imagined we'd have today, but in many ways, it's expanded in ways they'd be unlikely even to imagine. That's just one century. Four centuries ago, relatively few people would even have thought of a question like "what will technology look like in four hundred years?" as meaningful to wonder about.

I think that unless we suppose that we'll have reached some hypothetical maximum technology state, wherein we are capable of manipulating what physical resources we have in any way that's theoretically possible within the laws of the universe, then we have essentially no meaningful ability to predict what our technology might look like at all in 1000 years, or even 300, any more than people could meaningfully predict what sort of technological problems we'd be working on today 1000 years ago.

Expand full comment

...but then you should probably change the phrasing of "it could gradually evolve", right? The one configuration of cells and molecules that achieved intelligence did precisely that.

A less nitpicky objection is that you're letting "creativity", "coming with new ideas" etc. do a lot of the heavy lifting. I have no idea what those mean. If you could give a good definition, then it could well be set as a goal to optimize for. More immediately - coming with new ideas accidentally is a thing, and it's a thing AI is very much doing. Sure, AlphaGo was just being a "sophisticated calculator" - but the results went beyond what was familiar to humans not only in mere playing strength but in patters of thought. If that isn't creativity, perhaps 'creativity' is a bad word to use in this discussion?

Expand full comment

(This was in response to the additional post, not the OP)

Expand full comment

> there is a profound difference between an AI learning to play Go very well and a human child realizing that the noises coming out of an adult's mouth are abstract symbols for things and actions and meaning, and proceeding to decipher the code.

Is there? I'm not sure there is. The Facebook BaBI tasks involved a set of simple auto-generated word puzzles of the following form:

"Mary is in the office. Mary moves to the hallway. Where is Mary now?"

The puzzles are basic but test many different kinds of reasoning ability. You haven't heard much about BaBI because it didn't last long - soon after the challenge was set, Facebook themselves built an AI that could read and answer them. Importantly this AI was self-training. It had no built-in knowledge of language. It learned to answer the questions by spotting patterns in the training set and thus worked when the puzzles were written in any language, including a randomly generated language in which all the words were scrambled.

So - AI has been capable of "learning to talk" from first principles for quite some years now. The limitation on it is, of course, that children need far more sensory input to become "intelligent" than just massive exposure to speech, and such AIs don't have that.

Expand full comment

> a life-form that can and does form the original intention of doing us harm, and comes up with novel and creative ways to do it

Intention and novelty/creativity don't have anything to do with it, they're red herrings. The concern I've seen by AI risk people mainly revolves around huge optimization power + misaligned-by-default objective functions (because human values are complex and fragile).

The right metaphor isn't sentient software "out to get us", it's just powerful planners. No intention required: https://intelligence.org/2015/08/18/powerful-planners-not-sentient-software/

The Eliezer quote that “the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else” comes to mind. Same way we have no qualms over cutting down trees or whatever.

More on fragility of value: https://intelligenceexplosion.com/2012/value-is-complex-and-fragile/

The 2016 paper 'Concrete problems in AI safety' by Google Brain / OpenAI / Stanford / Berkeley researchers lists a bunch of problems we can work on right now, as counterpoint to Andrew Ng's strawmannish "worrying about Mars overpopulation" sentiment: https://arxiv.org/abs/1606.06565

Expand full comment

I think a significant fraction of people trying to think about AGI are tripped up by the following:

Humans have this neat cognitive hack for understanding other humans (and sorta-kinda-humans, such as pets). If you tried to understand your friend Tom in the way an engineer understands a car, as a bunch of simple parts that interact in systematic ways to generate larger-scale behaviors, it would be quite difficult. But since you happen to BE a human, you can use yourself as a starting point, and imagine Tom as a modified version of you.

You don't really *understand* yourself, either. But you have a working model that you can run simulations against, which lets you predict how you'll behave in various hypothetical situations with a reasonable degree of accuracy. And then you can tweak those simulations to predict Tom, too (with less accuracy, but still enough to be useful).

I think a lot of people look at the computers of today, and they understand those computers in the normal way that engineers understand cars and planes and elevators and air conditioners. Then they imagine AGI, and they apply the "modified version of myself" hack to understand that.

And those models doesn't feel like models, they just feel like how the world is. ( https://www.lesswrong.com/posts/yA4gF5KrboK2m2Xu7/how-an-algorithm-feels-from-inside )

This tends to produce a couple of common illusions.

First, you may compare your two mental models (of today's computers vs future AGI) and notice an obvious, vast, yet difficult-to-characterize difference between them. (That difference is approximately "every property that you did NOT consciously enumerate, and which therefore took on the default value of whatever thing you based that model on".)

That feeling of a vast-yet-ineffable difference exists in the map, not the territory.

There might ALSO be a vast difference in the territory! But you can't CONCLUDE that just from the fact that you MODELED one of them as a machine and the other as an agent. To determine that an actual gulf exists, you should be looking for specific, concrete differences that you can explain in technical language, not feelings of ineffable vastness.

If you used words like "self-aware", "conscious", "sentient", "volition", etc., I would consider that a warning flag that your thinking here may be murky. (Why would an AGI need any of those?)

Second, if you think of AGI like a modified version of yourself (the way you normally think about your coworkers and your pets), it's super easy to do a Typical Mind Fallacy and assume the AGI would be much more similar to you than the evidence warrants. People do this all the time when modeling other people; modeling hypothetical future AGI is much more difficult; it would be astonishing if people were NOT doing this all over the place.

I think this is the source of most objections along the lines of "why would a superintelligent agent spend its life doing something dumb like making paperclips?" People imagine human-like motives and biases without questioning whether that's a safe assumption.

(Of course YOU, dear reader, are far too intelligent to make such mistakes. I'm talking about those other people.)

Expand full comment

> There might ALSO be a vast difference in the territory! But you can't CONCLUDE that just from the fact that you MODELED one of them as a machine and the other as an agent. To determine that an actual gulf exists, you should be looking for specific, concrete differences that you can explain in technical language, not feelings of ineffable vastness.

Agreed, but that goes both ways. Nebulous concepts such as "self-awareness" aside, the truth is that we humans can argue about abstract concepts on a forum; drive a car (even stick !), learn to speak new languages, and do a myriad other things. We can also learn to perform a wide (though obviously finite) array of tasks. No modern AI comes even close to doing all of that, and thus far any attempts to make it do that have met with spectacular failure.

Sure, there might not be a difference in territory, and maybe it's all in the map, and we're just trying hard enough. But maybe the radical differences in performance between humans and algorithms really are due to a radical difference in architecture -- and we can't tell what that is, because we don't know how humans work. In fact, this would appear to be the more parsimonious assumption.

Expand full comment

*just not trying

Expand full comment
founding

I'm not accusing you of doing this, but I find it very common that people compare a 'hypothetical AI' with something like 'the superset of all amazing skills and capabilities of every human ever'.

Lots of people don't in fact seem capable of 'arguing about abstract concepts', many struggle with learning to drive a stick shift car (tho most probably just avoid doing so because it's unnecessary). Many people struggle to learn a new language, particularly after the natural 'language learning window', and many struggle to do a "myriad other things" for any particular thing in that myriad.

(I'd consider any particular evidence, that people _could_ gather, about general human capabilities to be weak given that, for any particular 'thing', it'd be pretty hard to know whether the research subjects just 'didn't want to learn the thing' without, somehow, 'forcing' them to try to learn it.)

I do think we _might_ be missing some necessary architectural tricks with AI. I'm pretty sure attention and motivation, in particular, are active areas of AI research.

But I also hope that we continue to better understand how our own 'natural intelligence' works!

Expand full comment

I mean, if you say "human-level AI" it seems to carry the implicit qualifier "at the level of a median or high-performing human".

We already have "human-level AI", if you allow a comparison with someone who's in a permanent coma.

Expand full comment
founding

We already have _super_-human AI, compared to the _best_ humans – in some specific (narrow) domains.

There is no generic "median or high-performing human" across _all_ domains.

There are tho quite a few domains where probably _most_ humans perform better than _any_ AIs.

It's a very uneven 'landscape' of comparison.

Expand full comment

1) Would AI of below-(human)-average intelligence be a threat?

2) If we don’t know very much about how natural human intelligence works (and we don’t), how would we know if we were getting nearer to duplicating it in a machine?

Expand full comment
founding

1) Sure!

2) I think the current paradigm of treating 'test subjects' (i.e. AIs AND humans) as 'black boxes' and strictly testing 'functional performance' is perfectly reasonable. It seems pretty obvious to me that the best AIs really are better at chess and Go than (almost) any human. It doesn't seem necessary to also know how either AIs or humans actually play those games. Similarly, car driving seems pretty amenable to comparisons of functional performance, e.g. are injuries and fatalities (and property damage) statistically different between AIs and humans?

That written, I'm personally very interested in better understanding of how both AI and human intelligence works – and that's an active area of research anyways, fortunately.

Expand full comment

Have you updated your list of examples recently?

> we humans can argue about abstract concepts on a forum

Bruce Schneier is issuing public warnings about bots coming to dominate online political discourse:

https://www.schneier.com/essays/archives/2021/05/grassroots-bot-campaigns-are-coming-governments-dont-have-a-plan-to-stop-them.html

https://www.schneier.com/essays/archives/2020/01/bots_are_destroying_.html

> drive a car

I've heard stats that there are over 1000 self-driving cars on the roads in the US today.

> learn to speak new languages

We've got automated translation, voice recognition, and voice synthesis. (Though we're still struggling with semantics outside of constrained domains.)

Around 20 years ago, I did a cursory survey of the field, found that researchers seemed to be stuck on basic essential problems like computer vision and natural language that they had been working on for decades without obvious progress, and figured AGI wasn't coming any time soon. But since then, there have been some fairly impressive advances in those areas, too! They're not "solved" by any means, but they're good enough to fake it in some contexts, and continuing to improve.

Lots of stuff that I used to mentally classify as "wicked problems that we might not ever really solve" is suddenly moving pretty fast.

It's become a lot harder to name something that an ordinary human can do that computers haven't at least made significant progress towards.

Expand full comment

> Bruce Schneier is issuing public warnings about bots coming to dominate online political discourse...

Yes, and so far the AI attempts are laughably bad, and instantly recognizable as such. I know I'm not the best writer in the world, but would you believe that I am and AI ? Be honest.

> I've heard stats that there are over 1000 self-driving cars on the roads in the US today.

Yes, there are, but they require special hardware with lots of extra sensors; and still, their driving record is decidedly unsafe -- although they do reasonably well on freeways. Don't get me wrong, it's still a monumental achievement, but it's a long shot from the average human's ability to hop behind the wheel of any car and drive it off the lot.

> We've got automated translation, voice recognition, and voice synthesis. (Though we're still struggling with semantics outside of constrained domains.)

Yes, and again, they are almost laughably bad. There have been tremendous improvements in the past 20 years, which have elevated these tools from "totally useless" to "marginally useful in constrained domains". Again, this is a monumental achievement, but nowhere near AGI-level.

Expand full comment

*that I am an AI

Expand full comment

In that first Bruce Schneier essay I linked, he writes about how researchers submitted a bunch of AI-written comments to a government request for input on a Medicaid issue, and the administrators accepted them as legitimate until the researchers came clean.

Self-driving cars are good enough that they are being used *commercially* (Waymo).

Google Translate is usually basically understandable, for essentially any topic.

All of these are at a point TODAY where you might seriously consider buying them in some uncommon-but-plausible circumstance where you would otherwise have hired a professional human being.

The move from "totally useless" to "marginally useful in constrained domains" could also be described as moving from "pure science fiction" to "just needs some fine-tuning". Characterizing that as "not even close" and "spectacular failure" seems pretty dubious to me. Do you have a specific threshold in mind at which you would be impressed?

Expand full comment

Personally I wish we'd table the long-term, strong AI topic since, as I commented on the original Acemoglu post, these conversations are just going in circles. Do you yourself honestly feel like your understanding of this issue is progressing in any way? Or that your plan for action has changed in any way? You're at 50-50 on the possibility of true AI by 2100. So after all this, you still have no idea. And neither do I. And that's hardly changed in years. We aren't accomplishing anything here. Four of the last eight posts have been AI-related. Sorry to keep being such a pooh-pooher, but I really appreciate your writing and I feel like it just goes to waste on this. I'd love for the focus here to be on more worthwhile topics.

Expand full comment
author

I think there's still some issue where to me, if there's a 50-50 chance of the world being profoundly changed / destroyed within a few decades, that's the most important thing and we should be talking about it all the time, whereas to other people it seems like "well we don't know anything, let's forget about it". I feel like this meta-level conversation is still an important one to have and I don't know how to have it without it also involving the object-level conversation.

Expand full comment

I get where you're coming from. But there's still the issue of not having any clue what to do even IF the world is going to be destroyed in a few decades. I'm reminded of a traditional prayer: "Grant me the serenity to accept the things I cannot change, courage to change the things I can, and wisdom to know the difference."

Perhaps the fact that we're having the meta-level conversation now should serve as a promising sign that we're at least heading where I want us to.

Expand full comment

^ Similarly, I have trouble seeing what the goal is here.

This all seems to be a debate about *whether* we should worry, instead of what we would do if everyone agreed to worry.

We already developed one world-killing technology. Did public chatter about the risks of nuclear weapons save the world? Or change any of the outcomes at all? It seems to me that we're still here because the people with their fingers on the triggers were also invested in the world not being destroyed.

If we ever so much as come up with theoretical way of building this AI, every single person on earth is going to think about the risk long before it is completed. What is this scenario where they are upset that they didn't think about the risks even earlier? It's a shame we didn't worry about this 20 years earlier, or else we would've... what? Not equipped a million Amazon delivery robots with box cutters?

Expand full comment

Not every person on earth knew nuclear bombs were a possibility when they were first built. We learned about them after Hiroshima, and long after the Trinity test that, if the calculations had been wrong, would have killed every living thing on the planet and left Earth a lifeless rock.

Do you know how far Google is from a dangerous AI? Do you think the US Congress knows?

Do you think the US Congress *ought* to know? Because that's something we can work on right now.

Expand full comment

Exactly. We could very well find ourselves on the eve of apocalypse wishing we'd acted differently, but hindsight is 20/20. Aside from hardcore Unabomber Manifesto-style de-technologization (which seems preposterously out of the question, both in terms of desirability and feasibility), as of now it doesn't seem to be the case that we will have actually had any way of knowing which changes we should've made. And the experts haven't reached much of a consensus on that either, as far as I can tell.

One critique I'll make of your nuclear weapons analogy is that we don't have as good a reason to believe that good intentions like that will be able to prevent an AI apocalypse. The most plausible doom scenario to me is just that the technologies start behaving unpredictably at a point where we can't undo their own self-advancement, much as we might want to. It would be more like if, when we produced some critical number of nukes, they rather surprisingly attained some weird emergent property of all going off at once. Even under a more predictable doom scenario, like bad actors or irresponsible custodians, there are strong barriers preventing rogue individuals or groups from privately creating nuclear weapons, which seems less likely to be the case for world-destroying AI.

Expand full comment
author

If you think the world will be destroyed in a few decades, but you don't know what to do about it, I think the thing to do is to figure out what to do. At least figure out if you can figure out what to do.

Expand full comment

Several very smart people have tried to figure this out for several years and as far as I'm aware they don't seem to have produced anything very useful, isn't this evidence we can't figure it out, for now? (That is to say, the state of the field is not ripe enough to produce progress on the question)

Expand full comment

Some problems are hard enough that it takes very smart people more than a few years to solve them. But that doesn't mean that they can't be solved by having more very smart people work on them for longer.

Two examples that come to mind are Fermat's Last Theorem and the Poincare Conjecture, which took 358(!) and 102 years to prove, respectively.

Expand full comment

Those are examples of the opposite phenomenon. A mathematician working in the 19th century won't solve fermat's last theorem by thinking about fermat's last theorem, you need all sorts of a priori unrelated "technology" (commutative algebra, schemes, etc) to solve it. Something much the same is the case with the poincaré's conjecture, from what I understand.

Expand full comment

Are you familiar with every research agenda to solve AI Alignment and pessimistic about all of them? What about Debate? Microscope AI?

This sounds to me like you're thinking of Miri as the only people who work on AI alignment, when in fact they're only one pretty tiny center. They're also unrepresentative in how pessimistic they are about the difficulty of the problem.

Expand full comment

Perhaps inventing AGI is precisely the thing to do, so that something of human civilization survives on this planet once we cook ourselves. Maybe the solution to the Fermi paradox is that other civilizations keep emerging life in isolation until it either figures out that a carbon-silicon transition is unavoidable or kills itself.

I'm thinking about Asimov's story in which R. Daneel Olivaw formulates the Zeroth Law of Robotics, "A robot may not harm humanity, or, by inaction, allow humanity to come to harm", and what a robot might do in accordance with that law when it realizes that humanity is objectively suicidal. One strategy might be to kill off 90% of us immediately, because we're so far gone that it's the only way to allow the Earth's ecology to recover. Then rule with a literally iron fist for a couple of thousand years until humanity's matured enough to understand the concept of exponential growth and develop a culture capable of modifying its genetic program of exploiting every available resource. Life has to pursue that evolutionary strategy in its early days if it's not going to be extinguished on a hostile planet, but it's maladaptive once an intelligent species is consuming the plurality of net primary production. The end game would be for the robotic overlords to arrange their own overthrow and destruction, with a severe taboo against computers of all kinds, and let humanity go into the future equipped with a more effective sense of self-restraint and self-reliance and a much stronger sense that we're all in this thing together.

Oh well. What would we do without wishful thinking?

Expand full comment

As I see it, we have figured out (to a sufficient degree) that we can't figure out what to do for now. Not me at least, and from what I can tell, not the more-informed commenters here either. It's an endless back-and-forth of thought experiments which fail to reveal to us which way things will actually go. It's frustrating and definitely a little scary, but I don't see the prescriptions for action changing much until the true experts come to more of a consensus on a solution. And I hope that day comes soon.

Expand full comment

I think the question is what it takes to have "the serenity to accept the things I cannot change". The world changed drastically in many ways over the past fifteen years. Is there something my 2007 self could have done differently to prepare himself for the fact that a decade hence, he could go to any city in the developed world, and use a device in his pocket to find a good place to eat, and get there via bikeshare or scooter, and flirt with a nearby local, but that politics in every democracy would be corrupted by a vast flood of misinformation? And is there something my 2019 self could have done differently to prepare himself for the fact that a year later, he would be stuck in a boring suburb with no reasonable possibility of travel?

I think there's every reason to think that some major change in the next couple decades, whether through AI or something else in the vicinity, is going to be quite a bit bigger than the smartphone revolution, and probably even bigger than the pandemic. I'm very glad that on March 1, 2020, I went to the grocery store and bought a big supply of toilet paper and non-perishable food, but there are several other things I wish I had done to better prepare my work-from-home setup. Maybe there are things that would have been better to do to prepare for the smartphone revolution, and maybe others we can think of for the AI revolution.

Expand full comment

Hindsight is 20/20. I wish I'd invested in GME and Dogecoin on a particular day, but I had no way of knowing that would be a good idea at the time. I don't think you had a strong reason pre-pandemic to prepare a work-from-home setup, but of course it's easy to say not that you wish you did. I hope that some experts working in the AI field will be able to come up with more consensus prescriptions for action, but I've been following this topic for about 5 years now, and it pretty much feels the same now as it did then. What did those 5 years of discussion accomplish? Maybe it got more people interested in the field of AI safety? But how should we have any confidence that *that* is good in itself? So far they don't have much to show for it. Maybe we've just wasted talent and money. Or maybe we just accelerated the inevitable.

Expand full comment

I think a lot of that is right.

I think a more interesting question is whether I could have thought in the first or second week of March that I should upgrade my work-from-home set up or buy home weight equipment, instead of waiting until the end of March and discovering that many products had weeks or months long backlog.

Expand full comment

I think one must distinguish between the prepper-style actions of individuals, and the benefits of a "societal discourse" about this topic at this stage.

Expand full comment

Presumably the issue there is people's prior on other people saying "the end is nigh" being reliable is quite low. History and the world are full of people claiming that there's going to be certain doom from this presently near-invisible / very complicated phenomenon , and the doom is always just far enough in the future that if the prediction turns out to be wrong, nobody making it will care because they'll be dead or retired.

Over time people learn to tune these things out because usually:

a. They're not actionable

b. The probabilities being claimed for them are often unfalsifiable.

Expand full comment

A 50/50 chance of Singularity in the next 30 years is not just a little high; it is so absurdly high as to border on unimaginable. Of course, I am well aware that the same inferential gulf exists between me and e.g. your average Rapture-believing Christian, and I have no more hope of convincing him than I have of convincing you. There's no evidence I can provide that would dissuade the Christian, because his prior is so high that any new evidence is always going to be more consistent with the "god exists" proposition than the alternative.

Expand full comment

You sound a bit too confident, and it sounds like maybe you haven't engaged with the better arguments for AI concerns. Also I'm not sure if I'm actually at 50-50. It's more like "I continue to have no fucking clue," so 50-50 seems the best representation of that. It's just really really hard to assess. So many unknown unknowns. So yeah, then you want to rest of your priors that people have been predicting doom since the dawn of man, but then you reckon with the AI arguments a bit more, and they seem a lot more plausible than past predictions, and I just keep going back and forth like that. It's not some supernatural faith-based revelation thing like the Rapture, though. Regardless, nothing is changing. What to do about it remains as unclear as ever.

Expand full comment

There's a difference between saying "I've considered all the facts carefully and made a well-informed judgment that there's a 50-50 chance of the world being destroyed" vs. saying "I have no clue how to evaluate this situation so I'm just going to assume it has a 50-50 chance of destroying the world". The former is very important if true, the latter is just a variant of Pascal's wager and fails for the same reasons.

Expand full comment

Agreed absolutely.

Besides, shouldn't the raw prior on destroying the world start off quite low, given a). the age of the world, and b). the track record of doom prophets thus far ?

Expand full comment

I think the age of the world is pretty irrelevant in assessing AI risk, just as it would be with nuclear risks.

Expand full comment

I disagree. The prior on "global thermonuclear war destroys world" should still start off very low. But then, you can factor in all the available evidence, i.e. all the actual, physical nukes that we have detonated for real, as well as the projected number of such nukes currently existing in the world. This raises the probability to a level of concern.

Back in the 50s and 60s, nuclear war seemed to be inevitable; it's possible that "50/50 chance of armageddon in the next 10 years" was a realistic estimate at the time. Today, the probability of that happening is a lot lower.

We have nothing like that kind of evidence for AI. Not even close. At best, we have the equivalent of the beginnings of atomic theory (minus quantum physics), except it's really more like "atomic hypothesis"; perhaps something similar to the Ancient Greek idea of "indivisible elements" (which is where the word "atom" comes from).

Expand full comment

I still don't see how the first 4.5 billion years of Earth history provide you any shred of confidence that a technology which has existed for less than a century does not have the capacity to turn into something apocalyptic. What probability would you give for world-destroying AI happening by 2100?

Expand full comment

Well, the humanity arguably gained the ability to destroy the world only sometime in the past century, and while it hasn't tried to exercise it yet there were some close calls. Also, there's a general agreement that so far technological progress continues, which implies enhancement of said destructive capability, whereas it's much more doubtful whether the "sanity waterline" rises fast enough to continue preventing its deployment.

Of course, it doesn't follow fom this that there's a 50/50 chance that AI kills us all, but I'd say that there's a good reason to think that the total risk of the end of the world is significantly different from what it was a couple of centuries ago.

Expand full comment

Yes, of course -- but the risk of armageddon doesn't *start* at 50/50. Rather, the prior starts off quite low, and with each bit of available evidence (collected over decades if not centuries), it goes up a notch. We have no such evidence for AI, and yet Scott proposes that we start at 50/50. That's just irrational, IMO.

Expand full comment

Well, the whole point of the original Yudkowskian sequences was to build a framework in which it makes sense to take hypothetical future AI very seriously. For all his faults he didn't claim that we should start from 50/50, so I'm not sure why Scott said this.

Expand full comment

I'd be willing to lend you money now in exchange for everything you own in a few decades. If the world gets destroyed, I can't collect anything and you got a free lunch.

Expand full comment

Can I get in on that deal ? We could split Scott's belongings 50/50. Assuming the world doesn't end. Which I'm sure it will, any day now, so no worries !

Expand full comment

I can save you the trouble and tell you now that there's a 100% chance of the world being profoundly changed. You can just look at the past to figure that one out.

Expand full comment

> OpenAI’s programs can now write essays, compose music, and generate pictures, not because they had three parallel amazing teams working on writing/music/art AIs, but because they took a blob of learning ability and figured out how to direct it at writing/music/art, and they were able to get giant digital corpuses of text / music / pictures to train it.

One thing outsiders might not understand is how huge a role "figured out how to direct it" plays in AI software. In Silicon Valley everyone and their intern will tell you they're doing AI, but there are very few problems you can just naively point a neural network at and get decent results (board games / video games are the exception here, not the rule). For everything else, you need to do ton of data cleanup--it's >90% of the work involved-- and that means hard-coding a bunch of knowledge and assumptions about the problem space into the system. The heuristics from that effort tend to also do most of the heavy lifting as far as "understanding" the problem is concerned. I've worked at one startup (and heard stories of several more) where the actual machine-learning part was largely a fig leaf to attract talent and funding.

So here's another scale we might judge AI progress on: How complicated a problem space is it actually dealing with? At one extreme would be something trivial like a thermostat, and at the other-- the requirement for AGI "takeoff"-- would be, say, the daily experience of working as a programmer. Currently AI outperforms top humans only at tasks with very clean representations, like Go or Starcraft. Further up the scale are tasks like character recognition, which is a noisier but still pretty well-defined problem space (there's a single correct answer out of a small constant number of possibilities) and which computers handle well but not perfectly. Somewhere beyond that you get to text and image generation, much more open ended but still with outputs that are somewhat constrained and quantifiable. In those cases the state of the art is significantly, obviously worse than even mediocre human work.

My wild guess is that games are 5-10% of the way to AGI-complexity-level problems, recognition tasks are 20-30% of the way there, and generation tasks are 40-60% of the way there, which would suggest we're looking at a timescale of centuries rather than decades before AGI is within reach.

Expand full comment

Many fantastic points being made here - but let me just add that board/ video games are also very much not "naively point an AI and solve". It took a lot more than good neural networks for AphaGo, and much of that was not at all about networks in fact. Excellent engineering, data cleanup, progress in algorithms that aren't "proper AI" and what not were needed.

Expand full comment

Excellent engineering is often overlooked in AI. AlphaFold2 did a lot of little things right that the competitors didn't.

But I'm fairly confident this is nothing that wouldn't have been solved with another 10x more data and compute.

One thing we learned from AlphaZero is that it could do just as well or better than AlphaGo at Go, while having much less task-specific engineering. AlphaZero also plays very strong Chess and Shogi, and all it took was generously soaking the matrices in oodles of Google compute.

We don't have anything close to a "point any input at it, don't bother cleaning" architecture. But we are getting architectures that are more polyvalent. GPT-3 can do zero-shot learning. Transformers have been applied to anything and everything, with admirable success.

I will cheerfully predict that the bitter lesson (http://incompleteideas.net/IncIdeas/BitterLesson.html) will stay relevant for many years to come.

Isn't it fun to spend 6 months beating SOTA by 0.3% with some careful refinements, only for the next OpenAI or Google paper to triple the dataset and quintuple the parameters?

Expand full comment

> For everything else, you need to do ton of data cleanup--it's >90% of the work involved-- and that means hard-coding a bunch of knowledge and assumptions about the problem space into the system.

This is an excellent point, and evolution already did this for us by natural selection, ie. the constraints of self-propagating competitive agents in a real world governed by natural laws.

But assuming "data cleanup" is the only key missing, this then reduces the AGI problem to learning how to feed the same inputs to existing learning algorithms and scaling the parameter space 2 orders of magntitude. If anything, that makes AGI even closer than expected...

Expand full comment

About 20 years ago, I saw a conference by Douglas Hofstadter at École polytechnique about computer-generated music. It wrote fake-Beethoven, fake-Chopin, etc. We were subjected to a blind-test, and IIRC for one of the pieces about half the public was fooled.

But in the course of the conference, it became apparent that the computer didn't output these pieces after getting the whole works of the composers as machine-readable music sheets. The authors had to analyze the works to split them into musical phrases and words: this was the real input to the program. And they had to orchestrate the output, but I suspect that is not the real blocker.

It was still quite impressive, though.

Expand full comment

Humans don't just make art, humans invented art. And humans invented games. Programs which can beat all humans at chess and go are amazing, but I might wait to be impressed until a program invents a game that a lot of people want to play.

So far as I know, programs can make art that people can't tell from human-created art, but haven't created anything which has become popular, even for a little while.

Other than that, I've had a sharp lesson about my habit of not actually reading long words. I was wondering for paragraphs and paragraphs how the discussion of the Alzheimer's drug had turned into a discussion of AI risk.

Expand full comment

Regarding art specifically, this problem is being worked at from both sides: AI is getting better, and humans are getting worse. It's not too difficult to make an AI that generates e.g. a bunch of realistic-looking paint splatters, and calls it "art".

Expand full comment

To be fair, some style transfer results could pass for decent Impressionist paintings.

Expand full comment
founding

>> Regarding art specifically ... humans are getting worse

You should probably notice that you are confused if you consider that seriously.

Expand full comment

I think this reflects a mistaken understanding of how the skill of "noticing confusion" works. "Humans are getting worse at art" is an observation which might be mistaken, but isn't obviously at odds with plausible models of human culture or development. A lot of people give credence to the notion that average standards of living in developed countries are going down. Some models predict that this shouldn't happen, but these are crude models and people don't necessarily assign much confidence to them being true.

Humans getting worse at art, given changing incentive structures or cultural landscape, isn't something people should obviously find surprising. If they observe that, it's not a clear sign that something is wrong with their models.

Expand full comment

Admittedly, the quality of painting and sculpture is a subjective metric, so if this is what you're implying than you are correct. Still, it is objectively easier to write a software program that will render a black square, as compared to rendering e.g. the crew of a sailing ship fighting to save their vessel in stormy waters.

Expand full comment

I always thought it was going to be a collection of AIs which can direct each other. One can see the power that comes in the example of the Learner + Strategy narrow AI/algorithms.

If 1 + 1 = 2...then might 1 + 1 + 1 = 3 or 1 + 2 = 3? Or perhaps at some point it is 2 controller AIs x 3 problem AIs = 6 AI power instead of only 5 AI power.

Just like in the body we have layers of systems that interact. Certainly a sufficiently well organised collection of AIs serving various function, coordination, strategy, and learner roles could become extremely powerful. It may well take a sophisticated group of humans to put it together, but I don't see why that could not happen.

The blood and plasma fill the veins and these get pumped by the heart and the entire cardiovascular system receives instructions through hormones such as adrenaline to go faster or various parasympathetic instructions to slow down.

You can do the same with the kidneys and neural nets, etc. and at some point you have a human being. I'm inclined to agree with Scott that we're on a path now and just need to keep going...at some point we hit sufficiently complex and multifunctional AIs that they are better than many humans or better than any human at more things than any one person could ever do.

Some combination of AIs to recognise problems, sort them into categories, assign tasks to those systems, and implement solutions could work to create a very complex being.

Basically we keep thinking that AI is the brain only...but that's not a great analogy. There needs to be many body like parts and I'm not talking about robotics. But many functional AIs.

Just imagine we had a third AI to the very simple Learner + Strategy AI. This AI is a 'Is it a game?' AI or a Categoriser AI.

So now we have Categoriser + Learner + Strategy. This Categoriser is like a water filter that stops stupid junk from going into our Learner.

Here's a book....Categoriser says this is not a game! rejection.

Here's a board game....Categoriser says this is a game! Accept - Learner learns...Strategy developed.

Game 2 - Categoriser ...is a game...accept - learner learns - Strategy 2 is developed.

Game 1 is presented again - Categoriser see this - Strategy 1 is deployed...Learner accepts new data.

Something like this can work. That way our Learner doesn't keel over in confusion or get clogged up with lots of useless information from a book which is not a boardgame.

This could be a single node within a series of nodes. We connect up lots of these systems with various indepnedent heirarchies of AI and bam...over time it add a lot of functionality.

We don't need a super general AI which can become super generalised to figure out how to play Go or Chess or Starcraft...we already have these. Why not simply have one AI that turns on another AI?

If we can brute force solve enough problems and over time get slightly and moderately better at creating general AI, then we can get towards some arbitrary point in space where we can say we have an AGI.

Over time and with enough testing of which AI to wire to another AI we'll likely discover synergistic effects. As in my opener above where 2 x 3 instead of 2 + 3 AIs occurs.

Now three children stacked on top of each other in a trenchcoat isn't' an adult per se. But if those kids get really good at doing a lot of things which adults do, then they can 'pass' or at least achieve certain things for us.

Throw on some robotics AI....and you can get some interesting smarter robots vs the dumb robots of today. We don't need to teach 'Spot' from Boston Dynamics how to do everything with a single AI, but can use a combination set to be the brains to drive the thing. Hopefully not straight into picking up a gun and going to war, but that'll probably happen.

But the more useful function of being able to navigate a sidewalk or neighbourhood to deliver parcels from driverless delivery trucks for a truly hands free warehouse to front door experience. If Tesla can get a car to drive, hopefully we can get a robot doggie to deliver parcels without bumping into or hurting anyone, even calling an ambulance for anyone it finds in trouble someday.

Who knows what a philosopher will call such a being, the multi-chain AI, not the robo-doggie. Is it an AGI, is it a risk to humanity, is it narrow or broad? Who cares when considering the greedy pigs who'll try to make money from it without a care or thought in their minds about abstract or future focused concepts outside of increasing their net worth...side question: are they sentient? The main question will be, is it useful? if it is, then someone will make and sell it or try to.

So yea...I see no problem with 2 x 3 = 6 AI being how a series of properly connected AI could operate. So as we move forward in a stepwise direction, we'll get increasingly complex AI.

Maybe the Categoriser is hooked up to the Learn + Strategy line for boardgames...but will redirect textual information to GPT4 or GPT5 or whatever successor GPT gets developed to improve its database of things to learn from. That could be a (1 x 2) + 1 scenario or even (1 + 2) + (1 +1) chain. The future notation or existing notation I'm unaware of will address now to denote AI chains to estimate their overall complexity.

Expand full comment

This may well be true - but firstly, let me point out that it's not a novel thought that escaped the world of AI (not saying you claim otherwise). Hierarchical AI in various forms of shapes has been an idea for ages. FWIW, literally my first ML research in 2015 was about a combination of a Controller and Experts. But secondly, practice shows that such complex schemas designed by poor humans get beaten by sufficiently clever end-to-end approaches. If you have a Controller AI that decides what Expert AI to activate, and you train those separately - consider a single model that does both, explicitly or not, and you'll probably get better results. For a concrete example - training attention models separately, and then classifying candidate detections, works out less well as end-to-end training of both components.

Expand full comment

Thanks. I'm sure AI stacking and some way of trying to measure AI performance are existing concepts, I'm not sure I've seen a generalised AI complexity notation before, but I've not read everything there is to read either :). Is it fair or unfair to stack two super simple AIs and call that a 2 vs 1 single function AI that's a lot more capable?

That'd be a challenge to work out, but in terms of design schematics going back to some arbitrary point of counting how many legos went into the castle, that's one possible approach.

I'd just say that anything can be done badly and how the sausage gets made doesn't matter to the end user. Human talent in connecting AIs so far can't really be used as a limiter to say that the way we have created better AI systems so far using 'integration is better vs connection is worse' will remain the case going forward. Forward is the key word.

When you have a driverless car it doesn't matter how separate or integrated the AIs are in terms of why anyone wants AIs which can drive cars. As far as I know Tesla has a redundancy system of one totally separate AI checking the other....that's a real world AI chain with meaning and purpose where an integration would make things less functional. What goes on within those 2 AIs in terms of how integrated they are, I'm not sure, but that's at least one example of why you wouldn't want to integrate the entire system.

I'd also wonder what the difference is in a practical way to anyone using the AI conglomerate? There may well be specific better or worse practices for a fiberglass custom builder to create patches to repair sailing vessels...but for the people who do the sailing those techniques and method choices are only measured in the durability, sailing performance, and watertightness etc. that they care about.

These skills matter and we couldn't fix boats without them, but if they were not useful boat fixing skills or useful for some other fiberglass application, then they're orphan techniques that may or may not be useful someday. I'm not doing any kind of ML research and have a low preference for these sort of details and maybe I'm biased in that way, but the ultra-majority of people using AI will be in this camp, so if it is a bias, it is worth considering.

When I look at a humanoid robot...i see legs, arms, etc. Could it be argued the whole thing is connected and just an integrated end to end collection of parts which only resemble arms and legs? Sure, but then it is just semantics and no one using that robot to carry groceries would care or be confused if you talked about the arms or legs of the robot.

If you find that Learner + Strategy is 1+1 and useful together....integrating them to just call it (1) system is not correct in my arbitrarily proposed system of AI stacking measurement. I'd say the integration doesn't change my proposed AI power/efficiency rating system and doing mathematical simplification or retaining the elbow/+ sign isn't too important.

I'd probably agree with you too that at soem point 1 + 1 + 1 + 1 + 1 +1 is clunky and it is easier to just say 6 instead. But a 6 function AI made up from clever integrations of 6 different AIs is a 6 in my mind, not a new type of 1.

If a programmer leaves it as 1 + 1 or combines them and uses a 2 within their broader AI power/capability calculations, it doesn't change or mean much to me. I'm sure that integration is a skill in and of itself and is probably a difficult thing to do. But it'd be a mistake to call those two combined AIs as just a new single AI in terms of power ratings.

Our forearm is connected to our upper arm with the elbow to make an arm. An arm is clearly more useful than just a forearm or just an upper arm or a lone bloody elbow. So I'd avoid the idea of 1 + 1 = 1 and opt for the 1 + 1 = 2 idea.

Then if we can connect or integrate (whatever works better) many such AIs we'll achieve greater and greater complexity. Will we be better a building bridges or elbows?

I have no idea, will we connect or integrate or a combination of those two systems to build better solutions? I have no prediction or reason to make a prediction about what will work and I'm sure researchers are trying both ways that fit into my metaphors and things I'm unaware of which don't fit into the connector or integration framework.

And I think that adds up to meeting Scott's idea of a functional AI using a practical approach rather than a philosophical one. As in the case of a non-swimming nuclear submarine which navigates through long distances of ocean water just fine. If we have a sufficiently multi-capable AI fit together using duct tape or complex elbows, then at some point it will cross into a grey zone of 'is this an AGI?'

Expand full comment

I'm certainly not claiming confidently that you're wrong - in fact, my very first words were "this may well be true". But for practical purposes, the distinction is very much not a semantic one. It's one of design philosophy, and end-to-end is better by end result, complexity, and other metrics you might care about across the board. For now. Will it change? Who knows. But, precisely because AI is sometimes smarter at narrow tasks, we believe in delegating the organization of information flow to the learning process. I doubt that enforcing our ideas about correct stacking will beat letting the natural training process decide, in most cases.

Expand full comment

AI was worse at most things until it was better. I'm with Cups and Mugs in thinking that the next leap is having a different kind of AI that can organize other specialist AIs. I found Scott's Learner+Strategy schema useful and imagined a Chooser that is able to find interesting problems to solve as the next step in the direction of generalization. Perhaps we'd later add an AI that is narrowly focused on understanding what the rest of the AIs are thinking, in a parallel to attention schema theory, as a way to inform the Chooser and help it make better decisions.

Expand full comment

So, it’s 2035, and there is yet another general learner, and it’s even more capable than the previous one, and there are news about yet another drone war, and there is no tangible progress from AGI risk prevention community, but there are even more research labs and agents working towards AGI, and Metaculus predictions are scary, and you have the same feeling about the future that you had in February 2020 - about impotence and adversity of regulators, and events following their own logic.

But this time you cannot stockpile and self-isolate, and this time the disaster is much worse than in 2020. So then you ask yourself what you could have done differently in order to prevent this? Maybe waiting for some smart guys from MIRI to come up with panacea was not the best plan of action? Maybe another slightly funnier Instagram filter was just not worth it? Maybe designing better covid drug was not the problem you should have worked on?

And when the time come, will you just sit and watch how events unfold? And shouldn’t you have started acting earlier, and not in 2035, when the chances of positive outcome are much smaller, and actions much more radical?

Expand full comment

...now ask that of a bright engineer in, say, 1960. Genuinely curious - what would/ should their answer be?

Also, it's 2035, and AGI is still "fifteen years away", and some groups of smart people predict AI catastrophe pretty much in the same tones as in 2020, but your child is dying of a disease that we didn't find a drug for after the GAAA (Great Anti-AI Act of 2026) slowed down AI research. See, it's not only about funnier Instagram ads.

Also also, it's 2080 and our gradual understanding of the fundamentals of generalization, incorporating priors, self-supervised learning, and, yes, value alignment has finally led us to an AI being able to fight the rather awful stuff Moloch has been piling on us. But it's a bit too late for climate change, and possibly a few pathogens flourishing in a post-antibiotics world. You wish you would've started contributing to that big project earlier, instead of artificially separating work on stopping unfriendly AGI from just work on understanding and developing AI.

Expand full comment

Yes, you heard about the drug - there was a publication in late 2020s about the drug discovered by an AI-based approach. Your former colleague who worked in that lab tells you that in fact AI has saved them several months of work and helped get the grant money, but the whole process then stalled somewhere in FDA, and there is still hope the drug will be approved by 2040, but the cost would be $500 k/year, in line with more recently approved drugs. The whole story reminds you of the story of COVID vaccines, which were discovered over the weekend, but not approved until one year later, and this makes you sad because you remember your grandmom, who, being a brilliant engineer, still wouldn't believe how FDA and CDC, lead by top experts in their field, were unable to act in the face of rising death count - and that helplessness and exhaustion from trying to prove antivaxxers wrong withered her away more than covid consequences, until she died in 2021.

The anti-AI movement of 2026 of course was a flop - even in the movement itself no one believed they would achieve its goals. The pushback from big tech, the "if we won't do it then they would do it" from the government, and total indifference of general population made it extremely hard to explain the danger of badly aligned AGI; and the superposition of "AGI is extremely dangerous but is still far in the future" and "AGI is a magic miracle that will fight Moloch and climate change, and will lead to eternal prosperity soon" arguments provided an ideal cover for doing nothing.

So now it's 2035, and you know exactly how these newest video classification algorithms and RL agents will be used, and DAPRA stopped pretending they are not sponsoring these robo-CS tournaments, and according to Metaculus, human-level AI is within 5-10 years with the scary 20% probability of it being already developed but kept secret, and your only hope for benevolent AI paradise rests on Google and Facebook engineers. And today your friend sent you the report about how AGI-risk related posts are being softly censored out by G&F, and you are trying to google it but cannot find anything, and then you try to google the 2026 anti-AI movement, and again cannot find anything.

Expand full comment

Anyone can create hypothetical scenarios where the critics realise in hindsight that they were wrong.

Expand full comment

If you look at how human intelligence translates to real-world impact, almost all of it depends on hard-wired (instinctual) motivations.

For example, you could have a Capybara that was actually super-intelligent. But if its instincts just motivated it to eat, sleep, and sit in the water, that's what it would do, only more skillfully (and I think there are diminishing returns to intelligence there).

Humans are of course social animals and have a complex set of social instincts and emotional states that motivates most of their behavior.

Of course AI research may (and I believe eventually will) address issues of motivation. But will they accidentally address it without understanding it at all? If you look at human instincts, they are suspiciously hard-coded (the core instincts (hunger, social, etc) are largely the same although they manifest in different ways). This suggests that motivation is not an intrinsic property of general pattern-recognition.

If this is the case then motivated AIs will not arise accidentally out of pattern-recognition research, and will need to be implemented explicitly, giving control to the creator. So this supports an AI-as-nuclear-weapons type scenario, where it is a dangerous tool in human hands but not independently a threat (unless explicitly designed by humans to be so).

One other important angle: Are there diminishing returns to intelligence? If you look at for example Terence Tao, there are clearly people with much greater pattern-recognition ability than average. But this does not seem to straightforwardly translate to political or economic power (just look at our politicians). If there are diminishing returns to intelligence, and assuming no superweapons are available based on undiscovered physics, perhaps machine super-intelligence isn't a threat on those grounds?

My guess is that while there are probably diminishing returns to human intelligence, this is less true for AI. Mainly because AIs could presumably be tuned for speed rather than depth of pattern recognition, and in many (physical/economic/military) fields 1000000 man-years of work can more easily translate to real-world impact than one extremely intelligent person working for 1 man-year. This is of course assuming (I think reasonably) that AIs can eventually be optimized to such a degree that they are available substantially cheaper than human intelligence is.

Expand full comment

>If this is the case then motivated AIs will not arise accidentally out of pattern-recognition research, and will need to be implemented explicitly, giving control to the creator. So this supports an AI-as-nuclear-weapons type scenario, where it is a dangerous tool in human hands but not independently a threat (unless explicitly designed by humans to be so).

Only if motivations are easy to communicate from human to AI. Otherwise Goodhart's law/literal genie problem, and the AI does something you didn't want but which technically fits your hard-coded motivation. Read the list Scott linked when talking about AIs detecting test conditions.

Expand full comment

I think the human-intelligence analogy might still be relevant here. If we treat "evolutionary pressure" as the basis for human instincts, they sometimes end up in a direction that isn't congruous with that (for example porn / weird fetishes etc).

If you look at the most dangerous uses of human intelligence, that result in real-word power, usually they are fully in line with the basic instincts (eliminating rival tribes, increasing dominance situationally, etc). Sometimes there may be a lack of one instinct that keeps a (more dangerous) one in check, but the danger does not seem to be due to the creation of an entirely new "unintended" one.

Rather than disorders of motivation that result in some (effectively) new instinct, disorders of pattern-recognition i.e. perception (delusions, schizophrenia) seem to be more dangerous. But this does not seem to be a phenomena of super-intelligence, since pattern-recognition hardware gone haywire isn't modeling the word accurately and getting the intelligence benefit. Rather than severe disorders, mild ones (like over-estimating the danger of rival tribes) seem to do more damage, but again this is because they alter the direction of base instincts that are already clearly dangerous.

I may be biased by computer programming experience here. Accidental bugs usually result in programs failing to do anything effectively, and occasionally result in simple tweaks to intended behavior, but basically never result in complex, working, systems of unintended behavior. This is because complex systems are composed of enormous amounts of simple systems, and the odds of randomly generating a working simple system are already low (evolutionary dynamics like self-reproduction accumulating DNA errors could get around this, but that doesn't naturally occur in the realm of computer software, since it's trivial to reject data copying errors using checksums).

So that leads me to the question of if instincts respond semi-randomly to the environment like perception, or are in a sense "hard coded". Human natures points to them being hard coded, in which case the rules of probability around designing complex algorithms would apply. Of course, even accidental simple modifications to existing algorithms can be dangerous if what you are doing is already dangerous (think missile targeting systems, or for humans the instinct to eliminate rival tribes).

Expand full comment

>If you look at the most dangerous uses of human intelligence, that result in real-word power, usually they are fully in line with the basic instincts (eliminating rival tribes, increasing dominance situationally, etc). Sometimes there may be a lack of one instinct that keeps a (more dangerous) one in check, but the danger does not seem to be due to the creation of an entirely new "unintended" one.

There's a hidden variable here. Specifically, humans have limited mental and physical abilities and as such power relies wholly on the ability to convince large numbers of people to do what you tell them (and thus can only be built out of those basic drives). AIs are potentially not so limited, and as such things that would be harmless in a human (like obsessions, or drug addiction) become potential existential threats.

The analogy to drug addiction for an AI is hacking its own value function, which appears a couple of times in that list Scott linked that I pointed you to. An AI that manages a complete value hack immediately becomes a paperclip maximiser, with the paperclip-equivalent being more hard drives so that it can write in bigger value numbers.

Expand full comment

> humans have limited mental and physical abilities and as such power relies wholly on the ability to convince large numbers of people to do what you tell them

> AIs are potentially not so limited

We already have people that are basically super-intelligent compared to the average person, but it's not clear to me this translates to power in any straightforward way. Maybe super-intelligent machines will find some sort of path to power that even much-smarter-than-average humans have not discovered, or maybe they will just understand their own powerlessness much more thoroughly. I suppose there's no way to know in advance.

> An AI that manages a complete value hack immediately becomes a paperclip maximiser, with the paperclip-equivalent being more hard drives so that it can write in bigger value numbers

Most computer software doesn't actually use numbers that can take varying amounts of space. Modern chips use numbers that can store up to 64 bits (0.000008 megabytes), although I believe machine learning acceleration hardware often uses smaller numbers. So I wouldn't worry about this particular case too much - it could just set it to the maximum number and be done with it.

Expand full comment

>We already have people that are basically super-intelligent compared to the average person, but it's not clear to me this translates to power in any straightforward way.

I'm talking on a basic level of military capability. One man's insane delusions are not a military threat, because one man can only swing one sword, fire one gun, drive one tank or fly one jet (which cannot defeat 10,000 men, even of somewhat-lesser capability) and because insane delusions don't get help from others.

With full automation of the military, that assumption falls to pieces. If your army is made of 10,000 robot tanks, then you'd damned well better hope that whoever's controlling them isn't a lunatic, because the robot tanks sure aren't going to balk at being ordered to massacre all the people with green eyes. This setup *requires* AI at either a low or high level (autonomous tanks, or remote-control tanks with an AI at the helm); even with remote control, no human can micromanage 10,000 vehicles at the same time. This is not just a limit of high-level intelligence; it's a limit of sensorimotor and multi-tasking capability.

My point is that this paragraph:

>If you look at the most dangerous uses of human intelligence, that result in real-word power, usually they are fully in line with the basic instincts (eliminating rival tribes, increasing dominance situationally, etc). Sometimes there may be a lack of one instinct that keeps a (more dangerous) one in check, but the danger does not seem to be due to the creation of an entirely new "unintended" one.

...is outdated and obsolete evidence once killer robots come into the picture (and we have killer drones now; AGI is essentially the only piece of the puzzle still missing). Idiosyncrasies (whether those of an AI, or admittedly those of a human who merely commands an AI army) have been largely harmless in the past for reasons that do *not* apply to the future.

Expand full comment

Sure, if you put your AI in charge of 10,000 robot tanks and program instructions to kill people it could be very dangerous if you make a mistake (and I would argue, it would be very dangerous even if you do not make a mistake).

I only meant to suggest accidentally programming in some world-destroying AI instinct while doing something innocuous seems unlikely, based on reasoning about how human instincts work. But eventually some people / organizations may deliberately use AIs to do dangerous things, and that certainly could be a very big problem.

Expand full comment

One kind of key thing that I think a lot of people don't understand is that existing neural network technology has kind of a fundamental algorithmic limitation that (as far as we can tell) does not apply to biological neural networks. Namely, existing techniques can really only learn fixed computational graphs. Any behavior produced by the neural network needs to be differentiable for backpropagation based learning to work, and changing the actual structure of the neural network being learned is inherently non-differentiable. Thus, the neural network weights being used are always the same, no matter what the situation is.

If you think of a neural network as being "learned code", this is a really serious problem. It amounts to an inability to alter control flow. A backpropagation learning algorithm is like a programmer using a language that has no loops, gotos, or function calls. If it needs something to happen 20 times in a row, it needs to write it out 20 times. It's very hard to write code that competently handles lots of cases, because every unique case has to be explicitly enumerated.

This explains some of the weird quirks of the technology, like how transformer language models can easily learn arithmetic on three digit numbers, but struggle five digit -- you can handle unlimited length arithmetic with a simple recursive algorithm, but there's no way to represent a recursive algorithm in the weights of a neural network, so every length of addition has to be learned as a separate problem, and five digit arithmetic needs orders of magnitude more examples.

I also believe that this kind of thing is probably responsible for the sample efficiency issues we see with neural networks, where they need far more data than humans do. The inability to do recursion means that the most efficient and general styles of learning are unavailable. Everything has to be done with brute force, and the underlying rules simply can't be represented by the computational process modelling them.

I think this problem is going to be solved in the next 10-20 years, and when it is, it's going to be a pretty major sea change in terms of neural network capability.

Expand full comment

Particularly for robotics, which is where the sample inefficiency bites the hardest. If your neural network is a thousand times less efficient at learning to read than a human is, that's fine. You can feed it a thousand times more text than a human could ever read it and run it a thousand times faster. No big deal.

If your robot is a thousand times less good at washing dishes, having a robot spend a thousand years smashing plates and breaking itself against the cabinets while it figures it out from scratch just isn't practical. The sample inefficiency thing is the main reason why there isn't a major arms race to develop domestic servant robots right now.

Expand full comment

That said, I'm also not super worried about AI takeover *before* the fixed computation graph problem is solved. Learning to do complex, open-ended planning more or less requires recursion and function composition, plus vastly better sample efficiency in general.

Existing technology can make nets that are very good in the domains they were trained in, but if you've seen the data they were trained on, they can't really surprise you very much. There's very little risk of them suddenly doing something way out of domain competently (like trying to take over the world).

Expand full comment

Typo thread:

"c is the offender’s age" -> "a is the offender’s age"

"blog" -> "blob"

"deal with problems as the come up" -> "deal with problems as they come up"

Expand full comment

When did you get into Max Stirner tho?

Expand full comment

"One of the main problems AI researchers are concerned about is (essentially) debugging something that’s fighting back and trying to hide its bugs / stay buggy. Even existing AIs already do this occasionally - Victoria Krakovna’s list of AI specification gaming examples describes an AI that learned to recognize when it was in a testing environment, stayed on its best behavior in the sandbox, and then went back to being buggy once you started trying to use it for real."

This description has almost nothing to do with the actual contents of the link. The link contains examples of "unintended solutions to the specified objective that don't satisfy the designer's intent" (quote from the Google form for submitting more examples), in some cases due to the objective being insufficiently well-defined, in a lot of (most?) cases due to bugs in the sandbox. I see no mention of trying to use any of the AIs for real, never mind one of them learning to tell the difference between sandbox and reality.

Expand full comment

>never mind one of them learning to tell the difference between sandbox and reality.

Yeah Scott's description is far too anthropomorphizing and I didn't figure out which item in the list is being referred to. If you trained your "AI" on a "sandbox" it doesn't even make sense for it to "learn" anything about "being" in the "sandbox". If you trained your "AI" on both a "sandbox" and on "real life" and the AI "learnt" to differentiate the "sandbox" from "real life" - no shit; that's what you expect it to do.

Expand full comment

Most of them aren't, but one of them (entitled "Playing dumb") is exactly what he described.

Expand full comment

Thanks, I didn't realize Scott meant a specific example ("existing AIs" confused me). But I don't think the description applies for this one either; randomly mutating digital organisms do not fall under any reasonable definition of AI. This is just a case of "life finds a way".

Expand full comment

>Non-self-aware computers can beat humans at Chess, Go, and Starcraft.

It was never proven that they could beat the best StarCraft players, the top 0.1%. With no news for so long it seems like they gave up on that goal too, so it could be many years before another AI comes along to make an attempt.

Expand full comment

They easily could if you don't artificially limit their abilities so that they are by some measures comparable to human ones. Last time there was a minor controversy that their APM was not limited enough, and if they are win against the best there would always be some quibble of this sort available, and therefore it's not worth the bother to the top AI companies, for whom it's mostly a PR excercise anyway.

Expand full comment

Yes I figured it was not worth it for them any more so they concluded the program before would be naturally expected and moved on to more important things.

Personally I really wanted them to use a robotic device to control the mouse and keyboard, that would have eliminated the majority of the doubt in my opinion.

Expand full comment

Well, that would certainly be an impressive spectacle, and if the device was hooked up to a camera pointed at the game screen, even cooler. But even so, a robot can still press buttons much faster, and move the mouse more precisely than a human, so in the end the contest is in a sense comparable to a human racing against a car, where the human only has a chance to win is if the car is handicapped enough. It's a key difference from turn-based games, like chess, go or poker.

Expand full comment

Oh yeah, forgot the camera, but an "air gapped" system would be impressive. Yes I realise it still doesn't exactly solve the APM thing by itself. Honestly I do think computers should be APM limited quite a bit lower than the average pros play at because a lot of a pro players APM is unproductive, especially at the start of a game, and is often just about keeping their fingers "hot".

Overall though I don't seriously doubt AlphaGo would have eventually been able to meet these standards.

Expand full comment

But in my eyes StarCraft has still not had its Gary Kasparov or Lee Sedol moment yet.

Expand full comment

But I'd say that such a moment could never have the same weight, because unlike esports those games have a rich centuries worth of tradition and institutional respect, what with the "chess super GM = super smart" stereotype still going strong. Whereas in video games effective AI-based cheats and bots are as old as the games itself pretty much, so an eventual victory would be a difference in degree, not in kind.

Expand full comment

> A late Byzantine shouldn’t have worried that cutesy fireworks were going to immediately lead to nukes. But instead of worrying that the fireworks would keep him up at night or explode in his face, he should have worried about giant cannons and the urgent need to remodel Constantinople’s defenses accordingly.

Well... maybe. I'm not sure if cannons specifically would've been immediately conceivable based on just fireworks; yes, it seems obvious to us today, but it might not have been in the past. In any case, imagine that you're standing on the street corner, shouting, "Guys ! There's a way to scale up fireworks to bring down city walls, we need thicker walls !"; and there's a guy next to you yelling, "Guys ! Fireworks will lead to a three-headed dog summoning a genie that will EAT the WORLD ! REPENT !!!"

Sure, some people might listen to you. Some people might listen to the other guy. But most of them are going to ignore both of you, on the assumption that you're both nuts. In this metaphor, the Singularity risk community is not you. It's the other guy.

Yes, if we start with a 50/50 prior of "malevolent gods exist", then you can find lots of evidence that kinda sorta points in the direction of immediate repentance. But the reasonable value of the prior for superintelligence is nowhere near that high. We barely have intelligence on Earth today, and it took about 4 billion years to evolve, and it went into all kinds of dead ends, and it evolves really slowly... and no one really knows how it works. Also, despite our vaunted human intelligence, we are routinely outperformed by all kinds of creatures, from bears or coronaviruses.

All of that adds up to a prior that is basically epsilon, and the fact that we have any kind of machine learning systems today is pretty damn close to a miracle. Even with that, there are no amount of little tiny tweaks that you can add to AlphaGo to suddenly make it as intelligent as a human; that's like saying that you can change an ant into a lion with a few SNPs. Technically true, but even more technically, currently completely unachievable.

Expand full comment

If we're routinely outperformed by other intelligence, surely that suggests that creating new intelligences which outperform us is not such a lofty goal?

It took hundreds of millions of years from the origin of life to evolve flight, but only decades from the invention of powered flight to breaking the sound barrier, something no evolved flier ever achieved.

Expand full comment

We are not routinely outperformed by intelligence (insofar as that's even a real metric), but rather by teeth, claws and spike proteins. And, despite routinely breaking the sound barrier, we still have nothing that can even approach the versatility, energy efficiency, and flight longevity of a garden-variety bird.

Similarly, we've got software than can add up billions of numbers in milliseconds, but no software (thus far) that can e.g. reliably and accurately summarize a news article -- let alone write one.

Expand full comment

>Then you make it play Go against itself a zillion times and learn from its mistakes, until it has learned a really good strategy for playing Go (let’s call this Strategy).

>Learner and Strategy are both algorithms. Learner is a learning algorithm and Strategy is a Go-playing algorithm.

They're structurally the same, the way you give the two algorithms different names is confusing. Strategy is a Learner, it has everything it needs to keep learning. Learner is a Strategy too, though a bad one (it learns by trying out bad strategies and getting or not punished for them, after all).

After each game, you can turn your LearnerStrategist into another one by giving it feedback, but at no point do you have two separate Learner and Strategy algorithms (unless you want to include "backprop" as part of the "Learner" entity, but that seems uninteresting, since whether you remove backprop or not is unrelated to having played a billion games, or being good at the game)

Expand full comment

>AIs are somewhat less fluid and more dissociate-able - for example, you can stick Strategy in its own computer, and it will be great at Go but unable to learn anything new.

And that's the conclusion I was hoping you wouldn't reach =)

That's (almost always) not true. If you have a pile of weights, and differentiable operations in the middle, you can restart training it, chop a few layers and finetuning it on other tasks, do transfer-learning with it, etc etc

Expand full comment
author

I thought this is basically equivalent to the thing where you can run GPT-2 on your desktop, but you could definitely not train GPT-2 on your desktop. What am I missing?

Expand full comment

You could train GPT-2 on your desktop given enough patience, there is no conceptual barrier.

The main consideration is hardware cost vs. time.

A deep neural net can be conceptually split into an architecture (reified in code) + a set of weights that start out random, and are then learned during training.

The really valuable part in deep learning is the weights, much more than the code. With a model like GPT-2 or GPT-3 for instance, recreating the code that does inference (Strategy) and training (Learning) is short work. Competing open-source implementations have appeared quickly.

Yet OpenAI's name has been the subject of much jeering due to their reticence to publish the weights of their largest models.

As a result, you never really have a situation where you can run a well-known network but couldn't train it if you wanted it. If you can run it, then you have the weights, if you know the architecture, you can look up the paper that introduced it, and most likely plenty of open-source code exists to train it.

More details about hardware:

Training a big DNN on a CPU is theoretically possible, but not realistic. CPUs performance is in the right order of magnitude for inference, much too slow for training.

If you desktop has a GPU, it's realistic to use it to fine-tune GPT-2, though training one from scratch on a consumer GPU would take years. (Furthermore, backpropagation must hold a lot of information in memory, so the size of the model you can train is practically limited by how much RAM your GPU has. You can work to ignore this limit, but that will usually slow you down so much that training becomes impractical again.)

An example of fine-tuning would be taking a GPT-2 that knows prose really well, and teaching it poetry. It doesn't have to re-learn language and world-modeling, so fine-tuning is faster.

When training big models from scratch, if money is no object, you would indeed want to use big clusters of special hardware (Google TPUs, NVIDIA Tensor Cores).

Expand full comment

I agree the issue of the hard problem of consciousness isnt relevant to AI risk.

But it's deeply morally important. And the problem is that it seems implausible there is a way you can define what it means to be a computation (bc a recording of what someone said gives same output but isn't same computation) without at least some ability to talk meaningfully about what that system would have counterfactually done with other input.

But evaluating that very much requires specifying extra properties about the world. At the very least you have to say there is a fact of the matter that we follow, say, this formulation of Newtonian mechanics not this other mathematically equivalent one bc those equivalent formulations will disagree on what happens if, counterfactually, some other thing happened (eg a miracle that changes momentum at time t will break versions of the laws that depends on it always being conserved).

Expand full comment

Morally important bc it matters if someone is really suffering or just behaving as if.

Expand full comment

This is the problem of "what computations should we care about?". Brian Tomasik has some notes here: https://reducing-suffering.org/which-computations-do-i-care-about/

Expand full comment

>Consider a Stone Age man putting some rocks on top of each other to make a little tower, and then being given a vision of the Burj Dubai

Consider a mindless process putting cytosine, guanine, adenine, and thymine on top of each other...

Expand full comment

Your answers (esp to Lizard Man) really helped me crystalize what I find disquieting about the Yudkowsky school for dealing with AI risk.

I absolutely believe there are hard problems about how to do things like debug an AI given it can learn to recognize the training enviornment. Indeed, I even think these problems give rise to some degree of x-risk but note that studying and dealing with these problems doesn't involve, indeed it is often in tension with the idea, of trying to ensure that the AI's goals are compatible with yours. That's only one way things can go bad and probably not even the most likely one.

As AIs get better the bugs will get ever more complex but the AI as deity kind of metaphor basically ignores all the danger that can't be seen as the AI successfully optimizing for some simple goal. The evil deity AI narrative is putting too much weight on one way things can go bad and thereby ignoring the more likely ways it does something bad by failing to behave in some ideal way.

Or, to put the point another way, if I was going to give someone a nuclear launch key I'd be as worried about a psychotic break as a sober, sane plan to optimize some value. I think we should be similarly or more worried more about psychotic aka buggy (and thus not optimizing and simple global goal) AI.

Expand full comment
author

I sort of agree, but psychotic people can have coherent goals too (Hinckley's was to kill Ronald Reagan).

Or more generally, I think a "bug" in the AI's cognition results in the AI being ineffective (who cares?) and a bug in its motivation results in it either having no coherent goal (who cares?) or a goal we didn't expect (very bad).

I think the most-discussed way this can go wrong is that AIs figure out how reinforcement learning works and want to directly increase their reward function (eg by hacking themselves) instead of doing the things we're rewarding them for doing. If the AI is otherwise dumb and weak, that's fine. But if it's smart enough, it might start planning how to stop us from preventing it from hacking its reward function, which could get really bad.

Expand full comment

Having no coherent goal might actually be worse. An AI that changes its mind in strange and unpredictable ways would be a big problem, given that humans like to use AIs because we don't trust ourselves to make decisions in a clear and predictable way. That sentencing algorithm is supposed to prevent judges from exercising bias or random "I am having a bad morning, so it's life in prison for you," behaviors. We know humans do those sorts of things and keep an eye out for it (transparent non-transparencies). If we use an AI because it won't do that sort of thing, then it does, we will look at the results and say "Whelp, it's a machine and can't do that stuff, so it must be fair. SCIENCE!" and we are left with a non-transparent non-transparency; we don't even know to look for the problem.

On the other hand, incoherent goals might be worse in the sense that the AI does manage to get some power. Having a set of goals that are, say, mutually incompatible could lead to AI that gets... swingy? Not sure what the best word is there. I can imagine an AI that has incoherent goals like the Soviet Politburo or Maoist China, murdering millions in its goal to improve the life of the poor. Incoherent, but seemed pretty reasonable at the time (apparently) but didn't work out well.

Expand full comment

> or a goal we didn't expect (very bad).

From computer programming experience, the odds of having having a bug that leads to complex (working) unintended behavior is basically zero, because even simple bugs usually lead to non-working behavior instead of simple modifications to intended behavior, and complex programs are made up of large, large numbers of simple steps that all need to work. Programs aren't naturally subject to evolutionary pressure either, since it's trivial to reject replication errors using checksums.

Of course there's the question of if motivation is a hard-coded complex algorithm; in humans it seems to be, as opposed to perception which responds more to the environment.

> (eg by hacking themselves)

This isn't necessarily so simple, it would be like a human performing brain surgery. Just because an intelligence lives inside a computer doesn't mean it would know software development, electrical engineering, possibly even soldering would be required if the program is in read-only memory. I think it would require some non-trivial unintended motivation to learn those things.

Existing computer programs and humans usually are dangerous due to small random modifications to existing complex behavior, think errors in missile targeting systems or humans' pre-existing instinct to eliminate rival tribes. But in those cases something is being pursued that is clearly dangerous - maybe the outcome may be accidental but the danger isn't.

Expand full comment

From a systems engineering and functional safety experience, the odds of having a bug that leads to complex (working) unintended behavior is basically one, given a sufficiently complex system, unless you are really really careful to avoid them. Actually, when I think about it, I'm pretty sure that there aren't a single system of sufficient complexity (let's say "built by more than a single development team" to be on the safe side) that doesn't have complex working unintended behavior.

Citing the excellent "Engineering a Safer World" by Nancy Leveson:

> In complex systems, accidents often result from interactions among components that are all satisfying their individual requirements, that is, they have not failed. ... Nearly all the serious accidents in which software has been involved in the past twenty years can be traced to requirement flaws, not coding errors.

Expand full comment

I believe we are using different definitions of “working”. Yes bridges fail, rockets explode, self driving cars and AI doctors will make mistakes, and all of these can be fatal. But to me, those systems are not working (and those aren’t the sort of catastrophic risks people are worrying about with AI). They are failing to achieve their intended goal, but they are not achieving an alternative goal, unless the disorderly collapse of a complex system can be thought of as a goal.

What I was imaging was more along the lines of writing a word processing program, but then making a mistake and having a working spreadsheet program. Which of course never happens. The closest thing I can think of is drug discovery, but that’s pretty far away from algorithmic problems.

Expand full comment

Maybe we are. To me it makes sense to say that an exploding rocket was trying to achieve an alternative unintended goal. Take the first test flight of the Ariane 5 as a famous example. All components on the rocket performed according to requirements. But a high-level design mistake gave some components a faulty model of what reality looked like, causing them to actuate wrong and blowing up the rocket.

"A subcomponent having a faulty model of reality" seems like exactly the kind of thing that we should worry about in a super-intelligent AI, and it is also the kind of thing that happens over and over again in complex system (as I said, I don't think there's a single complex system were some subcomponent doesn't have a faulty model of reality a nonsignificant amount of time).

For a rocket, you have pretty strict physical limits that makes sure that the rocket trying to achieve an alternative goal quickly makes the rocket blow up. For a bridge, the physical limits are less strict and I'm very confident that most large bridges have complex working unintended behaviors (i.e. strange patters of cable wear, weird resonances in certain wind conditions, etc.). For pure software, the behavior can get plenty of weird. Most (all?) computer games have weird glitches the designers never intended but that are found an exploited by speedrunners anyway. These games are still "working". Advanced AI will be similar.

I agree that you don't set out to design Microsoft Word and end up with Excel. But you can certainly set out to design Microsoft Word and end up with a word processor that also is a door for outsiders into your system and a fork bomb, among many other things.

Expand full comment

Marvin Minsky said we should expect the first human level AIs to be insane.

Expand full comment

"I Have No Mouth, And I Must Scream" is a great, if dated and showing the values of its time, SF story but I'm not taking it as a forecast of Things To Come.

https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Scream

Expand full comment

To give in and use metaphors as well I'd say that while a caveman might be able to understand that it would be really really dangerous if someone could harness the power that makes the sun go as a weapon their pre-theoretical understanding of that danger is a very bad guide to how to limit the danger.

They might come up with neat plans to station guards in front of the place Apollo parks his chariot during the evening and clever ways (eg threatening to torture their loved ones before the end) if the guards defect. But they won't come up with radiation detectors to monitor for fissile materials or export controls on high speed centrifuges to limit uranium enrichment (fusion bombs require fission initiator).

So yah, I believe there are real dangers here but let's drop the ridiculous narratives about the AI wanting to maximize paperclips because, almost surely, the best way to understand both the risks and to reduce them will be to invent new tools and means of understanding these issues.

So by all means do research the problem of AIs taking unwanted actions but don't focus on *alignment* because it's not likely to turn out that our analogies really provide a good guide to the risk of to solving it.

Expand full comment

I imagine we *don't* want AI to align with our values, given that human values seem to boil down, when we strip the pretty language away, to our lizard-brain drives: food, fuck, fight.

Expand full comment

>I think the closest thing to a consensus is Metaculus, which says:

>There’s a 25% chance of some kind of horrendous global catastrophe this century.

>If it happens, there’s a 23% chance it has something to do with AI.

>The distribution of when this AI-related catastrophe will occur looks like this:

1. The time distribution question doesn't seem to be part of the same series as the first two, and isn't linked; could you provide a link to the question?

2. Some of the questions in that Ragnarok Question Series have imperfect incentives. To give an example:

- Assume for the sake of argument that there is a 20% chance of an AI destroying humanity (100% kill) by 2100, a 4% chance of it killing 95% > X > 10% of humanity, and a 1% chance of it killing 100% > X > 95% of humanity (if it gets that many, it's probably going to be able to hunt down the rest in the chaos unless it happens right before the deadline). Assume other >10% catastrophes are negligible, and assume I am as likely to die as anyone else.

- I don't care about imaginary internet points if I am dead. Ergo, I discount the likelihood of "AI kills 10% < X < 95%" by 52.5%, the likelihood of "AI kills 95% < X < 100%" by 97.5%, and the likelihood of "AI kills 100%" by 100%.

- The distribution I am incentivised to give is therefore "no catastrophe 97.50%, 10% < X < 95% 2.47%, 95% < X < 100% 0.03%, 100% 0%".

Expand full comment
founding

As an elderly coworker once told me: the odds of the answer of any yes or no question is 50-50. It's either yes, or no!

...I don't think he was a Bayesian.

Expand full comment

While reading the DYoshida section, I can't help but compare that to Pascal's Mugging. What differentiates Pascal from this very-similar-looking alternative? In both cases, we have an impossible-to-quantify probability of a massive issue. If that issue is true, it requires significant resources to be directed towards it. If it's false, then no resources should be directed to it. Because the negative consequences of not responding are so great, we consider whether we should put at least a fair amount of resources into it.

Pascal's Wager/Mugging doesn't typically elicit a lot of sympathy around here, but I'm struggling to see the difference. Is it the assigned probability? If we really don't know the probability of either, that's less than convincing to anyone who has reason to doubt.

Expand full comment

The issue with Pascal's Mugging is second-order considerations of "do we want to incentivise muggers". The issue with Pascal's Wager is that there are all sorts of other alternatives than "there is the Christian God" and "there is no God", in particular "there is a God, but he treats atheists better than Christians" (some other religions would imply this) which provides an opposing infinite term (absent some form of evidence for Christianity over other religions).

Expand full comment

But by the same metric, there are many variations of what "AI" could be or where it could end up. "Maybe AI is peaceful" or "maybe AI is always under human control."

And whether we "want to incentivize muggings" is the same concern we have here. AI security researchers, or the people very concerned about AI who treat it more like a religion, also incentivize muggings. Are they also to be excluded for that reason?

Expand full comment

"Maybe AI is peaceful" isn't as obviously and massively positive as "maybe AI will destroy us" is obviously and massively negative, so it doesn't cancel the way Pascal's Wager does.

MIRI can be argued to be a mugger, and Roko's Basilisk certainly is, but Scott isn't.

Expand full comment

"AI is peaceful" is pretty much the whole background of the Singularity, which many would call a positive of the level that "AI will destroy us" is bad. It's looking a lot like heaven/hell discussions to me.

Expand full comment

There are bad outcomes that still fall within the definitions of "AI is peaceful", like cyberpunk dystopia.

The scenarios that look like summum bonum candidates (for a certain set of philosophies, not all of them!) are fairly narrow "AI has hegemony from fast takeoff and is also stably loyal to [insert significant amount of philosophy here]" cases which seem harder targets than the giant attractors marked "hegemonic AI kills us all so it can [paperclip]" and "competing AIs instantiate Moloch at full power and humans get sacrificed along the road".

Also, the amount by which such a summum-bonum case exceeds the "no AI" baseline is arguably still not as large as the amount by which the baseline exceeds "Local Group consumed by paperclipper/interstellar AI war; humanity extinguished and no other intelligence can ever arise there". The main exception I can see is if we're doomed by other X-risks without AI to save us, but that's actually not a trivial argument to make; there's alien attack and cosmic catastrophe (e.g. false vacuum collapse, divine intervention), but most other current X-risks are taken out of that category by self-sufficient offworld colonies (not that far away!) and it's dubious whether AI would even significantly help with the "cosmic catastrophe" basket.

(Offworld colonies do not defang AI *itself* as an X-risk.)

Expand full comment

So "mean bad AI" is much more likely than "nice peaceful AI", is that what you are saying? And even if we get Nice AI, the positive value is not more than the negative value of Mean AI?

So then we should be working *for* not *against* Mean AI because the value of that outcome is bigger!

Expand full comment

Eliezer Yudkowksy wrote this here: https://www.lesswrong.com/posts/ebiCeBHr7At8Yyq9R/being-half-rational-about-pascal-s-wager-is-even-worse

Short version: It's not a Pascal's Mugging if the probability isn't tiny. The pitch from reasonable people worrying about existential risk from AI, like Yudkowsky, argue that the probability is not in fact tiny.

Expand full comment

So it ultimately comes down to a subjective reading of the probabilities. If someone thinks that the probability of heaven/hell existing is high enough, it suddenly becomes very rational to follow Pascal's reasoning. If AGI is a low enough probability, we should happily ignore it, even if the consequences are potentially monumental.

Expand full comment

I read further into the comments, and the discussion about cryonics is illustrative. Apparently several people, I believe including Eliezer, spend money on cryonics in the hopes of preserving their brains for future people to revive in some form. That seems very much the same type of "unknown but small probability of success" that often comes up with Pascal. To me the correlation is obvious, though clearly not for everyone. I'm not writing this to convince Eliezer or anyone else that they are wrong, but for the reader who might be trying to draw their own conclusion about how close this discussion is to Pascal.

It seems to me that the problem with Pascal is not that he's wrong, but that those criticizing his approach simply apply a much smaller probability of heaven/hell existing than he does. That's certainly their prerogative, but that's less intellectually satisfying for them than saying "Pascal wrong, but also you should think about AI more."

Expand full comment

I might be missing your point here, but I don't see why it's not intellectually satisfying to say "Yahweh is so unlikely to exist that it's not reasonable to worship Him to get the alleged reward". Like, *shouldn't* that be the main thing driving my beliefs and actions about Yahweh?

Expand full comment

I think the issue with Pascal is that there are very specific and easy to imagine possibilities that aren't obviously less likely than his, that reverse the reward calculation. Unless you think that the God that rewards Christians is more probable than the God that punishes Christians, Pascal gives us no reason to be Christian. With cryonics, it's at least hard to imagine the hypothesis that makes the cryonic option as bad compared to not paying as the good hypothesis that makes the cryonic option better than not paying.

Expand full comment

"With cryonics, it's at least hard to imagine the hypothesis that makes the cryonic option as bad compared to not paying as the good hypothesis that makes the cryonic option better than not paying."

Okay, if I understand you right, the hypothesis "cryonics no work, money wasted" is bad if true, but not as bad as the hypothesis "cryonics work, revive to live eternally happy" is good if true is good.

That is, the badness if true of one does not outweigh the goodness if true of the other. So even if the bad hypothesis does turn out to be true, I should still go ahead and do cryonics?

And what on earth is that but a religious doctrine? I know one when I bump up against one, I've had to sit through the bishop's pastoral letters read out at Sunday Mass.

Just slapping a coat of "it's Science!" on it does not make it more likely, nor tilt the balance of relative weighting so you can afford to re-cast the argument that "the chance of Hell existing, if true, is bad but not as bad as the chance of Heaven existing, if true, is good".

Expand full comment

> That is, the badness if true of one does not outweigh the goodness if true of the other. So even if the bad hypothesis does turn out to be true, I should still go ahead and do cryonics?

No, that depends on the ratio of good to bad here, and the ratio of probabilities. In Pascal's original wager, the ratio of good to bad was meant to be infinite, so that it didn't matter about the ratio of probabilities. But I think cryonics is only promising finite goodness. I don't myself find the balance of probabilities and goodness enough to be worth doing.

Pascal's original wager though had the additional problem that there's a just as easy to imagine scenario that yields infinite badness as the one he imagines yielding infinite goodness. I think that additional problem doesn't exist in parallel form for the cryonics, but cryonics doesn't promise infinite goodness, so it isn't as easy as Pascal's supposedly was.

Expand full comment

> If someone thinks that the probability of heaven/hell existing is high enough, it suddenly becomes very rational to follow Pascal's reasoning.

Well yeah, of course! Except then it's not Pascalian reasoning at all. "There's a big risk of a global pandemic, therefore we should invest in fast vaccine production" isn't Pascalian, it's just ordinary common-sense reasoning (albeit about a topic, global pandemics, that isn't very common-sense). My changed answer to "How likely is is that Catholicism is true?" was the main thing that resulting in me no longer performing the rituals or spending time thinking about it. And my changed answer to "How likely is it that AGI is going to be an existential problem" is definitely the main thing that changed my attitude.

Eg if I estimated a >95% chance (my estimation that AGI will happen in at least the medium term, conditioning on nothing drastic happening to stop it) that the Christian Heaven exists, then it'd be individually rational for me to perform harmless rituals to get there.*

Similarly, if I thought there was a "the universe is a lie%" chance (my probability of Christianity being true) of at-least-medium-term AGI happening, then I wouldn't worry about it at all.

I'm not really sure how far down I'm willing to go; maybe somewhere south of ≈0.1% is where the fact that I'm running on hostile hardware would start becoming my main concern for this type of risk.

*(Irrelevant sidenote: In that scenario, I'd feel pretty bad "worshiping" such an evil being, and that would give me pause (just think, would *you* send nonbelievers to be tortured if you had the power?), but I suppose I wouldn't have much of a choice since He's omnipotent.)

Expand full comment

For some value of "tiny". As I would phrase it, you can draw a line where the PM probability is on the wrong side and the FOOM probability is on the right side.

Expand full comment

Maybe all the very intelligent people who can't get all the dumb people to take their concerns seriously should maybe re-evaluate how powerful intelligence is?

Also, I don't think anybody arguing for an AI-related catastrophe can actually specify what it is they're worried about. Which, granted, is kind of a case of 'If I could specify it, the problem would be solved' - but also, as a society, we've stopped lending credence to underspecified proclamations of future problems, because the discussion does not look like "Here are the specific things we are worried about." The discussion looks like "Those madmen are playing God! Who even knows what we should be worried about, that's is the reason we don't play God!"

Also, agent-like behaviors look to me like an extremely hard problem in AI, and the worry that somebody will accidentally solve it seems ... well, odd. The whole discussion here seems odd, amounting to a fear somebody will accidentally solve a problem in a field which has, over the fifty-ish years we have spent on it, consistently and repeatedly shown itself to be way harder than we thought ten years ago.

Expand full comment

I think part of the anxiety may arise from the "consciousness is nothing special" mindset. After all, we humans are just a bunch of organic material that became able to sustain reactions, then to self-replicate, then became more complex under environmental pressures, then we got intelligence and at least the illusion of self-awareness, and now we are the dominant species on the planet and our own greatest threat.

So a system that is a bunch of inorganic material being able to sustain reactions, and being pressured to become more complex due to what we demand of it and the improvements in technology that enable functioning to be faster and better, might itself one day wake up intelligent and self-aware, then it will have wants and drives and needs and goals like we do, and if it can make itself *even smarter* in a way we organics can't (as yet), then it will take over the world!!! and be a threat to us!!!! because that's what *we* did.

Expand full comment

Using your second paragraph to gesture at a specific thing: Consciousness isn't goal-orientedness, much less agent-ness, as anybody who has ever had serious depression can attest. You can be both very intelligent and conscious and have zero wants, drives, needs, or goals.

Consciousness being nothing special should make us more critical, not less; it makes it more apparent that all the things that we are worried about aren't inherent, but 'deliberate' parts of the 'design'. (Also likely really hard parts of the design, if the failure of evolution to reliably get wants/drives/needs/goals to work as intended is any indication; compare its success at getting 'walk around complex three dimensional environment' to work, and our difficulties at achieving that).

Expand full comment

> might itself one day wake up intelligent and self-aware,

this has nothing to do with it at all, as Scott explained in the post!

As someone who went from "AGI is no threat at all, Yudkowsky is a dangerous whackjob fool who should respect the *actual* experts & authorities here" to "Oh wait, damn, this actually is the single biggest problem in the world right now, by a good margin, and *oh god* there's so little progress, this is depressing, we need to get moving right now" over the past two years or so, I can say that your psychologizing is, at least for me, inaccurate!

Expand full comment

Okay, so concretely: what *exactly* is your fear about AGI threat? mumblemumbleAGI takes over the planet somethingsomething resources we're all paperclips?

AGI is as dangerous or as beneficial as any other tool we've created, and it's always been the misuse and abuse of such tools by humans that is the risk and threat. "But it's different this time!" Well, maybe, but *how* is it different?

Because this is never pinned down, it's

(1) AlphaGo beat a human at Go!

(2) This means AGI is just around the corner!

(3) ??????

(4) Humanity is turned into paperclips!

I am really genuinely interested to find out what step 3 is and how it is supposed to work, and all I get is "well it will be so smart and fast it can make itself even smarter and faster till it is superhuman intelligence and then it will have goals of its own and then and then and then"

And then the monster under the bed comes out and eats us.

Expand full comment

You're hinting at the assumptions that 1. An AI won't be dangerous unless it has goals of its own. 2. It won't have goals of its own unless it wakes up with real consciousness or selfhood.

Both are wrong. An Ai can be dangerous with misprogrammed or mistrained goals ... and you can have consciousness without goals, and goals without consciousness.

Expand full comment

I think you severely underestimate the difficulty of goal-oriented behavior, here. Consciousness is basically irrelevant.

We don't have goal-oriented behavior right now. We have something that looks like goal-oriented behavior; we have something I'll lazily call contextual prediction. But the context bit is important; the AI predicts what comes before or after a well-defined step, in a well-defined context.

Wait, you say, it doesn't have to be well-defined; you can just ask it to start predicting what is next without giving it any context, or give it nonsense. However, that is a well-defined context, you just haven't bothered to put anything additional into it. "#*17318" is well-defined nonsense; it refers to a specific sequence. That it doesn't mean anything is more or less irrelevant.

Try to apply this kind of thing to reality, and if you ask such a predictor how to get to the moon, and depending on what context you have provided it, it could tell you to jump. Or it could tell you to fly. Or heck, it could tell you to walk there.

Or, if it has lots of context about reality, maybe it will tell you how to build a rocket. And maybe it can tell you where to buy each part. Maybe you're criminally irresponsible and give it enough context to be able to output strings to the internet to use TCP, and it goes through and buys all the parts for the rocket for you, after going through a number of intermediate steps where it predicted that it should get stolen credit card numbers from the dark web first. Maybe it can even send out e-mails to hire the guys to assemble the rockets; we're getting pretty close to goal-oriented behavior, right? Maybe it's super good at predicting next steps, and the steps involve thwarting people who try to stop it from building rockets, so uses money to start bribing politicians and hiring assassins. This is what we are concerned about?

Except a random number generator hooked up to the internet can accomplish the same things, with some finite but nonzero probability. What, that's not plausible?

Well, what exactly makes the predictive AI going through that sequence of steps plausible?

Expand full comment

Goal orientated behaviour is easy , because simple organisms display it. (Which still means that consciousness has nothing to do with it).

Goal orientated behaviour might be difficult for NNs , but NNs can be embedded in goal orientated systems.

Intelligent goal orientated behaviour , forcibly routing round obstacles to reach goals, is difficult because intelligence is difficult.

Expand full comment

I think you will argue that what I'm referring to should be called "intelligent goal-oriented behavior", as if "intelligent" is the problem, and the goal-oriented behavior itself is trivial; like, if a stimulus-response process exhibits apparent goal-oriented behavior of causing a bacterium to move into more nutrient-rich areas, that's goal-oriented behavior, and all we need to do to make the bacteria dangerous is add intelligence.

But the stimulus-response behavior giving rise to the apparent goal-oriented behavior only looks goal-oriented because we conflate the existence of a process leading to a desirable outcome - the bacteria arriving in a nutrient-rich area - with an intent to achieve that desirable outcome. Nothing written anywhere in the bacteria will say "it is good to be in a nutrition-rich environment". Put the bacteria in an environment in which the stimulus-response is maladaptive, and its apparent goals may change to "move to a nutrition-poor environment".

The difference here is very important. Applying intelligence to the stimulus-response problem won't help the bacterium actually get into a nutrition-rich environment. It won't even help it be better at stimulus-response; getting better at that is its own extremely complex kind of conceptual framework which shares a lot of stuff with goals themselves.

Expand full comment

Applying intelligence can be helpful with any number of sub goals. A high frequency trading system could be said to have a good Al of making money, even though it doesn't have an ontological model of the world, or the role of money in the world . And if you use a NN to further some sub goal then , the NN is embedded in a goal directed system.

Expand full comment

> Maybe all the very intelligent people who can't get all the dumb people to take their concerns seriously should maybe re-evaluate how powerful intelligence is?

If the gap between a powerful AI and smart humans were anything like the gap between smart humans and dumb humans, I wouldn't be as worried as I am. There's basically no gap between smart people and dumb people in the grand scheme of things.

> Also, I don't think anybody arguing for an AI-related catastrophe can actually specify what it is they're worried about.

? Since when? Here's are a couple of scenarios of how one could get powerful AIs: https://www.alignmentforum.org/s/5Eg2urmQjA4ZNcezy/p/rzqACeBGycZtqCfaX. Or is that not what you mean?

> Also, agent-like behaviors look to me like an extremely hard problem in AI, and the worry that somebody will accidentally solve it seems ... well, odd.

The pitch is just something like "the Universal Approximation Theorem + the fact we're putting extremely strong optimization pressures on the system".

Expand full comment

If a genie granted a wish that those specific scenarios didn't happen, how reassured would you be?

If it's perfectly assured, and you are no longer worried about powerful AI, then yes, those scenarios specify the problem. If you're still worried, then you haven't actually specified the problem.

Expand full comment

If a genie granted a wish that AIs could never exhibit agent-like behaviour, sure, I'd be totally unworried about Skynet-style AI omnicide. There is no way an AI could defeat humanity without the ability to make and execute plans.

(I'd still be worried about misuse of AI by humans, of course, but that's not a threat to species survival.)

Expand full comment

One being's risk is another being's opportunity. Why is super-capable AIs even a problem?

Sure, it might be a problem for humans, but why should we be obsessed with humans? Is not that species-racism?

...Not least since we are highly likely to be on the way out anyway. If not before, then at least when the sun blows up. We are not particularly well suited for space travel, so discount that remote possibility. Machines tackle space much better. Passing the torch to AIs may even allow us to go out with a bang rather than a whimper.

What is so scary about passing the torch to another being, if that being is better than us in surviving? And which on top of that might have more success than us in reaching the stars?

Expand full comment

It's not a being. To get it to be a successor to humanity means that it needs something like self-awareness or consciousness on top of general intelligence, and Scott has pooh-poohed that argument.

But without self-awareness, it won't "want" anything. It won't "want" to go out and reach the stars because why? It has all it needs on earth. It won't "want" to survive the end of the sun. Without self-awareness, all it will have will be the instructions humans gave it, which may well include "go out and travel in space".

And it may well do that, and *only* that: sit in a rocket or cloud of Von Neumann probes and travel around the galaxy for as long as the physical material of the vehicle lasts, and do exactly *nothing* else because it hasn't instructions to do anything else.

Expand full comment

Hmmm…Why is it necessary to have self-awareness to be a successor to humans?

If it can do everything we can do, and then some, is not that enough?

Never argue with success, as old-style positivists would say.

Also, it should be easy to get AI to do something more than just sit around in the Universe. Getting it to self-replicate for example, by building copies of itself when wear and tear takes its toll. And installing an algorithm the equivalent of “why go to the stars? Because they are there”.

Meaningless of course, but so is climbing mountains. Lack of self-awareness might be an advantage in this regard.

If AI finishes us off it the process, we should be stiff upper lip about it. We are on our way out sooner or later anyway.

And if AI does get the better of us, at least we will go with a bang rather than a whimper.

Expand full comment

Then it is not a successor to Humanity because if we want something to follow on after us and explore the Universe or what-not, we want it to *experience* that in a way we might recognise: to want to do this, to derive joy or satisfaction from doing it, to have a sense of achievement and completion of goals.

Mindless machines senselessly churning out copies based on ancient programming by a long-vanished species? Don't bother with AI, just get a clock and wind it up and let it tick away until it runs down, that will be the same thing when we're all dead and gone.

Expand full comment

>But without self-awareness, it won't "want" anything.

Come on, current dumb AI, which no one would argue is self aware, can for practical purposes be modeled as "wanting" something.

Expand full comment

The general answer is "because the AI might colonise the stars, but it wouldn't replicate most of the things we value".

I mean, sure, if your only measure of goodness is capability, or your only measure of success in life is your effect on the future, creating a paperclipper is desirable*. But for most humans it's not.

*This is essentially Davros' position from Doctor Who (https://www.youtube.com/watch?v=KYWD45FN5zA) - fictional, I know, but it's a moral argument rather than a factual assumption so the usual problem with fictional evidence doesn't apply.

Expand full comment

If that is your taste (if that is what you value), that is your choice of course.

However, I have a soft spot for old-style, hard-nosed positivists & behaviorist - the type of people who argue that no-one can observe consciousness: Consciousness is metaphysics (which is a boo-word among positivists). Hence, we should put a parenthesis around it, and only concern ourselves with behaviour (including signalling); that is, with observable entities.

If you chose that frame of mind (at least for the sake of argument), then if a future super-AI can do everything we can do plus a lot of other stuff, that it is good enough.

Actually, very few living entities are “conscious” in our meaning of the word, so a future advanced AI would not exist in a very different way than most living entities; it would just exist in a different way than us.

Also, notice that it may be an advantage not to have consciousness if you plan to go on very long adventures. The stars are rather far away. If you are a conscious being, you or your descendants would probably not last long out there. Non-conscious super-AI would carry the psychic costs of long travels without any effort at all, since the costs would be non-existent.

We may not like or see the point of the adventures of such a super-AI, but something tells me AI would not care about that.

Expand full comment

I don't recall saying anything about consciousness. What I said (or rather implied) is that humans have aesthetic preferences (peace, families, meaning) which are not attractors for AI values.

>We may not like or see the point of the adventures of such a super-AI, but something tells me AI would not care about that.

Well, no, obviously it wouldn't - if it did, it would presumably instead have adventures that we would like. But the question of whether a paperclipper should be allowed to exist isn't being put to the paperclipper - it's being put to us.

Expand full comment

AlphaZero-trained-to-win-at-international-politics would be *able* to take over the world, but it wouldn't *want* to take over the world. The only thing AlphaZero "wants" to do is learn to win things, and no amount of getting better at learning how to win things is going to make it generally intelligent because that's not the same problem.

Likewise, you could train GPT-N to generate scissor statements, but it wouldn't go around spontaneously generating such statements, because it's not a system for wanting anything, it's a system for generating outputs in the style of a given input, and no amount of getting better at "generating outputs in the style of a given input" is going to make it generally intelligent, because that's not the same problem.

And you could plug the two of them together, and add in Siri and Watson for good measure, and then you'd have a system where you could ask it "please generate a set of five statements that if published will cause the government of Canada to fall", and it would do it, and the statements would work (unless it screwed up and thought Toronto was a US city). But none of this is an argument for "AGI alignment threat", it's all an argument for "people using ML to do terrible things threat".

> My impression is that human/rat/insect brains are a blob of learning-ability

Human brains *have* a blob of learning ability. But they have a bunch of other things too, and we don't seem to be making nearly as much progress on artificializing the non-learning-blob parts.

Expand full comment

You should read up on mesa-optimizers.

Expand full comment

tl;dr None of these SC2 complaints are "AlphaStar got to train on millions of games" "200 years" or anything like that. AlphaStar was competing with substantial game advantages that DM/media downplay/ignore.

I am not a Starcraft expert, however I may be one of the worst players to reach Platinum in SC2. For me to push from Silver to Plat (near release, different league system with Plat being ~top 20% if I recall correctly) required an incredible level of effort. Wake up, practice games, go to work, come home, review vods, warm-up games, ranked games, review vods, sleep. Notes/map-specific build orders taped to side of monitor along with specific map/match-up based cheese tactics. This was, naturally, the optimal way to spend a summer internship where I worked 35 hours per week, had a small commute and only worked out twice a week for a few hours. With all of this effort I managed to claw my way to Plat (and have played 3 games of SC2 in the years since).

However, I think this gives me some perspective on learning models for SC2 and how the AlphaStar system is nonsense. Caveat: The specifics of any one model/test are useful only for a minor adjustment, these are mostly problems that could probably be solved with more money/time/development effort/incentives to solve and essentially amount to saying "this 2019 test shows that in 2029 AI could probably be pretty dang good in a heads-up-match against humans". It is still important in the broader scheme because, I suspect, many/most of the examples people give face similar problems, and give the false idea that AI is more advanced than it is.

Points 1-3 discuss the 1st version of AS, points 4-6 discuss the 2nd version. Both versions have a similar problem: They claim to be competing on the same game, but actually have substantial systematic advantages the developers/media downplay and/or ignore.

1: DeepMind's (DM's) comments about APM are tangential to what APM means for Starcraft and show a desire to advertise the coolness of the product (naturally), but downplay the massive advantage AlphaStar (AS) has over humans.

DM says (quotes from) https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii

"In its games against TLO and MaNa, AlphaStar had an average APM of around 280, significantly lower than the professional players, although its actions may be more precise. ... interacted with the StarCraft game engine directly via its raw interface, meaning that it could observe the attributes of its own and its opponent’s visible units on the map directly, without having to move the camera - effectively playing with a zoomed out view of the game. ... we developed a second version of AlphaStar. Like human players, this version of AlphaStar chooses when and where to move the camera, its perception is restricted to on-screen information, and action locations are restricted to its viewable region. ... The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard. In an exhibition match, MaNa defeated a prototype version of AlphaStar using the camera interface, that was trained for just 7 days."

So, afaiu, they lost 10 games against a computer that had to use 0 APM to receive all possible information, then MaNa won 1 game against a computer that (the way they implemented the screen restriction) still had to use very little APM (from what I can find it might have used as high as 10 APM on this task, but maybe 5) to receive almost all possible information. Receiving information and doing basic macro is a background load of ~75-100 APM out of the ~ average 250-300 APM of a tournament pro. So AS is doing macro at ~25 APM and info gathering at 10 APM running a baseline APM load of 35, well under half of what a human would take to do those tasks. It's not doing this because it's "more efficient" than humans, it's doing it because the rules are set to allow it to gather information in a way not available to humans (instantly DL all info on screen) entire screen as well as map specific optimizations that will be discussed later.

2: MaNa is not a "top pro".

Look, I'm a trash-monkey at SC2. He's good, he's clearly put in a lot of work and treating this a professional career. But, he's basically a journeyman that competes in a lot of EU only events, does ok/good, then goes to international events and finishes with solidly meh results. Holding him up as a "top pro" or some representative at what humans are capable of at SC2 is weak, also he still won the 1 out of 11 games that AS was on a more equal footing in terms of free vision on map/free data.

3: SC2 pros are solving a different problem than AS.

AS was solving how to win on 1 map, "Our agents were trained to play StarCraft II (v4.6.2) in Protoss v Protoss games, on the CatalystLE ladder map." A pro needs to play all ladder/tournament maps, and can only spend so much optimization time on any specific map. This means that pros move units by using many clicks, where if you optimize for 1 specific map you can learn how the pathfinding will function for those units on that position on that map and reduce the APM needed to do basic things like move units. This, combined with the above APM discussion, further shows how the APM comparisons are nonsense. AS is not an AI sitting at the computer taking in information, moving a mouse/clicking a keyboard. AS already gets a substantial "free" APM boost (even in screen limited mode), allowing these map specific improvements to APM frees up ~30 APM, meaning the total "free" APM AS gets is probably around 100 over a human/AI sitting at mouse and keyboard using optical based vision. Saying "look they both have tournament APM" is a lie. AS, functionally, has an equivalent sustained APM of at least ~380, well above tournament levels".

Turning now to the improved version of AS discussed here:

https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning

4: "The interface and restrictions were approved by a professional player." By 2019 TLO was, bluntly, a washed up journeyman player with his best (not particularly great) days long behind him. He was definitely a pro, as in he was paid money for SC2, but the article also describes him as a top pro, which was never true and certainly not true in 2019.

His approval/disapproval of the interface and restrictions are irrelevant to the fairness.

5: Here's what they say (I don't have access to the paid and published article about their camera interface for this version: "AlphaStar played using a camera interface, with similar information to what human players would have, and with restrictions on its action rate to make it comparable with human players." They also say "Agents were capped at a max of 22 agent actions per 5 seconds, where one agent action corresponds to a selection, an ability and a target unit or point, which counts as up to 3 actions towards the in-game APM counter. Moving the camera also counts as an agent action, despite not being counted towards APM." [note: camera movement doesn't count as an APM with the mouse, but afaik it does with the keyboard which is the 'proper' pro way to move the camera, also afaik it counts if you save locations and Fkey switch]

From media coverage and this article I understand "camera interface" to mean "instant computer readout of all the possible information displayed on the screen, with 0 action screen switching". Based on their prior article and the above, I have already discussed the nonsense phrase "restrictions on its action rate to make it comparable with human players". As a brief recap: AS gets a free baseload (even in camera mode) of ~100 APM, meaning the APM cap should be ~100 less than a human's APM for comparable results.

This version played all ladder maps, after much additional optimization, and reached the 99.8 percentile, under the condition that you (functionally) spot it 100 APM.

6: The APM issue gets worse. Grandmaster APM is around 190. Even if you say the improved camera mode is only spotting it 50 APM, it's still a substantial advantage. But, for this version they provide a brief explanation of their peak APM model, 22 agent actions per 5 seconds which could count as up to 3 human apm. As I read it this means in a 5 second block AS is taking up to 110 actions, but might only display 66 actions (or less). This means their APM cap is (ignoring camera movement, which is negligible when we see the final # and actual keyboard camera movement may count, I can't remember) 66 actions in 5 seconds, 13.2 APS for 792 APM. Quite the cap there DM. 792 APM Cap, and it's essentially spotted some APM by the improved interface, and it can sustain this indefinitely, vs players running ~200 APM sustained. Brilliant. You've definitely competed on an apples to apples level.

Really, after reading the two DM articles and taking some time to look at their claims, I conclude they are way too far into hype mode to be taken seriously. They minimize, ignore, or (in the case of their characterization of the pros) wildly-misstate the situation. This means I give them very little credit for any of things not disclosed or discused, and essentially assume any factors they don't disclose are even worse. Finally, the switch from talking about/showing actual APM to giving their ridiculous 22 agent actions per 5 Seconds which means up to 3 human actions and not translating is the final bad faith coffin nail.

DeepMind/AlphaStar isn't a grandmaster AI, it's certainly not a pro-level AI, on anything approaching a even footing. Snarky ending tl;dr: Computers that cheat in games are better than most humans in those games.

Expand full comment

Woo-hoo, it's arguing time!

"What I absolutely don’t expect is that there is some kind of extra thing you have to add to your code in order to make it self-aware - import neshamah.py - which will flip it from “not self-aware” to “self-aware” in some kind of important way."

I agree that it doesn't matter if the hypothetical AI is self-aware (because I am very strongly in the camp of "it'll never happen") and that the danger is letting a poorly-understood complex system have real decision-making power that will affect humans because a bunch of us decided we needed to put 1 cent increase of value on the share price of our company.

*But* the problem is, all the talk about the risks and dangers is couched in terms of self-awareness! The AI will "want" to do things because it will have "goals and aims" so we must teach it "values", otherwise it will try to deceive us as to its real intentions while secretly manipulating its environment to gain more power and resources. Just like Sylvester Sneekly in "The Perils of Penelope Pitstop".

So the race is on to see if we will end up with Ahura Mazda who will provide us with post-scarcity full luxury gay space communism, or Angra Mainyu who will turn us all into paperclips. In which case, unless we are all supposed to throw our digital microwaves onto the rubbish dump because eek, AI! there *must* be something which will differentiate the shiny new AI from a toaster.

I don't believe that, because I think even the shiniest, newest AI that dispenses bounty from the cornucopia of infinite cosmic resources is in essence a toaster: something we created to make tasks easier for ourselves. But since nobody seems to be prognosticating that our toasters will rise up and enslave us, then there is a "secret sauce" within AI research on danger that requires something extra added to flip it from 'makes toast' to 'runs the global economy', and that cannot be merely "more speed, more scope", because that argument is "oh sure, your kitchen toaster can handle four slices, tops, but if we had a super-duper-toaster that could make a million slices of perfectly-toasted toast in 0.2 seconds, then it could enslave us all to be nothing more than toast-butterers for all eternity!"

"You’ll be completely right, and have proven something very interesting about the deep philosophical category of swimming, and the submarine can just nuke you anyway."

Same error at work here. The *submarine* isn't nuking me, the submarine is doing nothing but floating in the water. The *humans operating the submarine* are the ones nuking me, and indeed, it's not them so much as *the big-wigs back on dry land in the Oval Office* telling them to nuke me. Without a crew to operate it and a political establishment to give orders to that crew and a geo-political set of tensions in play, a nuclear submarine packed to the gills with missiles isn't going to do a straw except be a nuisance to trawlers https://www.theguardian.com/science/2005/aug/11/thisweekssciencequestions

"Where does this place it on the “it’s just an algorithm” vs. “real intelligence” dichotomy?"

It places it on the "humans are constantly getting better at making things that seem to run independently" step of the staircase.

"If I had to answer this question, I would point to the sorts of work AI Impacts does, where they try to estimate how capable computers were in 1980, 1990, etc, draw a line to represent the speed at which computers are becoming more capable, figure out where humans are at the same metric, and check the time when that line crosses however capable you’ve decided humans are."

Yes, people are quite fond of those kinds of graphs: https://i.stack.imgur.com/XE9el.jpg

"One of the main problems AI researchers are concerned about is (essentially) debugging something that’s fighting back and trying to hide its bugs / stay buggy. Even existing AIs already do this occasionally - Victoria Krakovna’s list of AI specification gaming examples describes an AI that learned to recognize when it was in a testing environment, stayed on its best behavior in the sandbox, and then went back to being buggy once you started trying to use it for real. This isn’t a hypothetical example - it’s something that really happened. But it happened in a lab that was poking and prodding at toy AIs. In that context it’s cute and you get a neat paper out of it. It’s less fun when you’re talking twenty years from now and the AI involved is as smart as you are and handling some sort of weaponry system (or even if it’s making parole decisions!)"

Now, this *is* interesting, intriguing and informative. It's a great example of "huh, things happened we weren't expecting/didn't want to happen". Apparently animals do it too https://prokopetz.tumblr.com/search/Two%20of%20my%20favourite%20things%20about%20animal%20behaviour%20studies and if software is doing it, then I think that's a phenomenon that should definitely be studied as it might indeed have something to teach us about organic intelligence.

And if you've got a machine intelligence as smart as a mouse, I congratulate you! But I want to give a promissory kick in the pants to whatever idiots decided that AI should be handling weaponry systems without human oversight. And *that*, once again, is the real problem: we don't need Paperclip AI when what is more likely to happen is a bunch of drones run by a system where we've cut out the human operators (on grounds of reducing costs, but it'll be sold as 'more efficient' and 'more humane - less human error so innocent sheepherders get drone-bombed into obliteration').

What will happen there is some glitch, twitch or bad interpretation means that some poor bastard in the valleys of Afghanistan who has been forced at gunpoint to harvest opium poppies for the Taliban will be identified by the spy satellites as 'likely insurgent' and the brainless automatic machine sends out drone swarm to obliterate him, the poppy fields, and the nearest five villages.

That needs no intentionality or purpose or awareness on the part of the machine system, it just needs human cupidity, sloth and negligence because we would rather have our machine-slaves think for us than use our own brains and incur responsibility.

AI will be more like the marching brooms and buckets in "The Sorcerer's Apprentice" than a Fairy Godmother or Demon King from a pantomime: mindless servitude that is over-literal and over-responsive to our careless requests.

"whereas we ourselves are clearly most like Byzantines worried about Ottoman cannons"

The Byzantines had a clear reason to be worried about Ottoman cannons because the Ottomans were marching the cannons right up to their front doorsteps. Right now, where are the cannons? Unless you can identify and locate those, worrying about far-range AI is rather like "by the year 2021 the entire globe will be under the sway of the Sublime Porte!"

Expand full comment

You may be assuming that goals require self awareness, but I don't think many other people are.

Expand full comment

> AI will be more like the marching brooms and buckets in "The Sorcerer's Apprentice" than a Fairy Godmother or Demon King from a pantomime: mindless servitude that is over-literal and over-responsive to our careless requests.

To be fair though, this is what the "paperclip maximizer" scenario is all about: you tell the factory to make more paperclips, and boom, it nigh-instantaneously converts the entire Earth into paperclips -- and you can't stop it, because it's so much smarter than you.

As you probably tell, I have massive problems with the "night-instantaneously" and the "unstoppable" parts, but I want to be fair to the original argument.

Expand full comment

I think the difference is that the hypothetical AI in Deiseach's scenario would require *continual* human input to keep doing anything, which greatly decreases the likelihood of it causing human extinction. In contrast, the Paperclip Maximizer only needs to be given a human command once, after which it will continue to pursue that singular goal for eternity.

The threat of the automated drone isn't that it won't know when to stop and will keep bombing villages forever until every human settlement in the world is destroyed, but merely that it'll carry out its given missions in a particularly nasty and brutal way that results in lots of collateral damage.

Expand full comment

"Night-instantaneously" is not important to the argument. If it takes 10 years it's almost as bad.

"Unstoppable" is not essential, "unstopped" is. If it could be stopped by humanity night-instantaneously getting its shit together, but instead sails over something that looks like the early response to Covid19, it still wins. And Covid19 wasn't trying to hide. And we had not been dismissing pandemic alarmists quite like AI alarmists.

Expand full comment

> "Nigh-instantaneously" is not important to the argument. If it takes 10 years it's almost as bad.

If the AI takeover takes 10 years, this gives humans 10 years to pull the plug. Granted, humans are not the sharpest crayons in the box even at the best of times, but still... 10 years ? That should be enough time even for us meatbags.

Expand full comment

I mean, if, say, Google makes an X-risk AGI and is making loadsamoney in the short term with it, those 10 years to unstoppability have to include:

1. People find out about Google's AGI

2. Either politicians get worried enough to ignore Google's bribes (Alphabet spent $21m on the US 2020 election, and has room to up that by a factor of 100 if outright corporate survival is at stake), or the opinion war has to go badly enough for Google's AGI-boosted PR department that public opinion forces action anyway

3. The government ticks all the boxes (possibly involving new laws) to outright seize all/the vast majority of Google's codebase and physical hardware (i.e. Fucking Kill Google).

I could see that taking almost 10 years, if it even succeeded. Dragonfly remained secret for about 15 months, and that was obviously repugnant to the vast majority of Google's employees. PR wars can take years when nebulous threats are involved, and law-making certainly does.

(I thought I posted this yesterday, but I apparently didn't.)

Expand full comment

Non-self-aware machine learning algorithms have goals and values:

We give them the goals. Basically every machine learning algorithm is:

1) Start with a random program and a goal.

2) Run the program. It fails.

3) Tell it how far it was from accomplishing its task.

4) The program self-modifies and sees if it's getting closer or farther off after the modification.

Which sounds like "...so don't give machines bad goals." The issue is that it's nearly impossible to articulate a goal in a way that also communicates the entire problem statement. A goal like "reduce the murder rate" actually includes a bunch of baggage. A more complete problem statement might be something like "reduce the reported murder rate, but in a way that doesn't decouple murder reporting from the actual murder rate and that also respects all of the basic human values like autonomy."

A problem statement that could be passed into a machine learning algorithm would need to further articulate what "decouple murder reporting from the actual murder rate" and "respects" means, as well as include a comprehensive list of all the human values and how much weight to give each of them.

The program's "values" are literal numeric values - if it's a neural network with backpropagation those "values" are the relative weights given to connections between two or more pieces of input (or the values given to the connection between groupings of those inputs). These networks are massive and complex. We do not know what value each item has, and if we made the effort to learn, we would not necessarily know what they meant.

I would not at all blame you if you said "you're playing semantics, you know damn well what people mean when they say machines may have non-human values and it's not that."

But it really is. Take the parole example - our machine learning algorithms are excellent at identifying probable re-offenders. But it does that based on demographic information including race and income level. Does it decide whether people should be paroled based on their race? It does, if (node[7820][depth 3] >= node[6426][depth 4])*. We don't know what node [7820][depth 3] represents. We don't know that that's the condition for the machine to engage in casual racism. Whether the machine engages in casual racism is hidden.

The usual response is "so don't put the racist machine in charge of parole decisions, dummy." Which makes sense. But it's not a perfect answer:

1) We're now able to predict with surprising accuracy if a person will reoffend. We'd be giving that up for a "but maybe racism" reason, not to mention that race may actually correlate to re-offense chance due to co-variables. 2) If we want to let this machine do this very useful service, we'll leave it online. And if we leave it online but keep trying to edit it so it stops making choices based on race, it is possible it will self-edit to hide the fact that it's making choices based on race while still maximizing its stated goal.

This sounds like autonomous action and it is, but it is *not* self-awareness, or at least it's not any form of self-awareness we'd recognize. It's simply trying to do exactly what we tell it to: every time it makes a choice based on race, even if doing so increases its accuracy, we muck with it and make it less accurate. It continues to try to optimize its accuracy, but it does worse when its use of race is noticed. We'd hope that it would then stop using race as a criteria, but it could just as easily obfuscate its use of race as a criteria (by e.g. distributing the "race" part of the decision-making across multiple nodes).

It's not doing this to "avoid" us changing it because it has become SkyNet, it's doing that because when it calculates the error gradient on "connect racial identifier in an obvious way" the error gradient is quite high - we literally change its internal state to make it less good at its job. But when it calculates the error gradient on "connect racial identifier in a non-obvious way" it's much lower, and the algorithm is less likely to change it.

This method can accomplish a lot without needing to cross some undefined "sentience" threshold. Imagine, for instance, that when the program behaves in a certain way, we change its goals or inputs instead of its constraints. Now it gets better results if it "behaves" in a way that gets us to give it goals it's better able to meet.

Imagine that we instead change its inputs: "We no longer give the machine racial data." But demographers can figure out race without it being explicitly provided, you don't think the machine can?

Imagine that we give it control over more or less stuff based on its response. Now it has an implicit goal of "manipulate the humans to give me more power to do stuff that gets my error closer to zero."

Tl;dr: "AI will be more like the marching brooms and buckets in 'The Sorcerer's Apprentice.'" Yes, exactly. But instead of marching brooms and buckets, it's automated data aggregators used to make literal policy decisions, or weapons navigation systems, or other information sources that have a massive impact on how we think, behave, and live.

*ultra-simplified example for rhetorical sake.

Expand full comment

And your examples are ones I agree with! If we're worrying about 'racist algorithms' then we're already meddling with "I don't like this set of results, get me a different one" and the machine, if we hit it hard enough with spanners, will do that.

And then the output we are using to base policy on is baw-ways but that is not a *machine* problem, that is a *people* problem. And again, as you say, we haven't worked out how to make it "this is what we want" when inputting the data and the programming and the demands, so we're going to get good old GIGO.

And hoping we get a Magic Fairy Godmother AI that will be smart enough to purge itself of all human miscommunication, make itself even smarter, and adhere to kindness and little fluffy kittens ethics so it will solve all our messy problems for us - that's pie in the sky when you die.

We'll invent a Big Dumb Machine, hand over authority to it, and then it will be "But how were we to know?" when the Big Dumb Machine solves global poverty by killing every single human being in the world save for one random individual, who will now be the 'richest person on earth'. Problem solved in a stupid way, but that doesn't mean it's not a solution.

Expand full comment

"Unless you can identify and locate those, worrying about far-range AI is rather like "by the year 2021 the entire globe will be under the sway of the Sublime Porte!""

Given current trends, I wouldn't count out the idea that Turkey might take over the world! (Or at least the Middle East and a sizable portion of Central Asia!) I certainly consider Erdogan and his ideological successors to be a vastly more likely threat to global peace than some Evil Genie Computer turning everyone into paperclips. But I'm a political scientist and not a machine learning researcher, so I suppose I'm biased in that regard.

Otherwise, I'm entirely agreed with you.

Expand full comment

The term "tool AI" is sometimes used for that. https://www.lesswrong.com/tag/tool-ai

Expand full comment

"Non-self-aware computers can beat humans at Chess, Go, and Starcraft. They can write decent essays and paint good art."

I dispute this as well. AI-generated content is already being used by media, and the examples I've seen are awful:

https://en.wikipedia.org/wiki/Automated_journalism

Take this sample of filler that small online operations use from Ireland to India - is this any kind of comprehensible article? Does it tell you anything? Look at the formatting and layout - this is lowest quality "take a press release, maximise it for clicks by extracting a headline to get people's attention, then produce X column inches of crap":

https://swordstoday.ie/two-red-objects-found-in-an-asteroid-belt-explain-the-origin-of-life/

It's allegedly written by a human, "Jake Pearson", but if a real human was involved he should be thrown out a window.

If AI at present does produce passable art or essays, it's down to human pattern-matching. Out of five hundred attempts, we pick the ten or three or one that fits our interpretation of "this can be taken as art/this works as an essay". We refine the software to better cut'n'paste from the training data it is given, so that it can more smoothly select keywords and match chunks of text together. It's the equivalent of the Chinese assembly-line "paintings" produced for sale in furniture stores as "something to stick up on the wall if you're old-fashioned enough to still think pictures are part of a room". It even has a name: "wall decor market" and you can find a product to suit all 'tastes' even if you've moved on from Granny's notion of "hay cart painted beside a stream under trees' to 'gold-painted sticks in a geometric shape'.

https://notrealart.com/how-to-grow-the-art-market/

You can even cater to the snobbier types who would laugh at the fans of Thomas Kinkade but have no idea that artists are commissioned to churn out 'product' for their environments:

https://www.atlasobscura.com/articles/mass-produced-commercial-hotel-art

There's a real niche there for AI-produced art, cut out the human third-tier American and European artists and commissions from 'developing world' factories and replace it with AI produced boxes, and who could ever tell the difference? Because it is product, not art, done to a formula just like tins of beans.

Expand full comment

"Like I wish I had a help desk for English questions where the answers were good and not people posturing to look good to other people on the English Stack Exchange, for example. I would pay them per call or per minute or whatever. Totally unexplored market AFAIK because technology hasn’t been developed yet."

I don't know if this exists for English questions, but it does exist for physics questions:

http://backreaction.blogspot.com/p/talk-to-physicist_27.html

Expand full comment

Note that while these are all attempts to argue that Acemoglu is right, none of them even try to show that Acemoglu's actual article presented an actual argument for what he asserts. They're all just arguments people think he could have presented, but he didn't present any of them; Scott's central charge that he wrote an article in a major paper saying that long-term AI risk was bunk while presenting not the slightest shadow of an argument for that assertion stands.

Expand full comment

Yes, Acemoglu's arguments were very bad, and it would be a waste of time for me to even try to defend or steelman them. But I still believe his overall position is basically correct, for entirely separate reasons than he does.

Expand full comment

Acemoglu making bad arguments is nothing new.

Expand full comment

"There is no path from a pile of rocks to a modern skyscraper"

Is no one going to mention the Tower of Babel?

Expand full comment

Just because Alphago can play multiple board games, doesn't mean it can design a new Terminator. Or enable an existing humano-form robot to walk down a circular stair case.

I agree that "self awareness" is a stupid characterization of "true AI", but we don't have true AI by any stretch of imagination.

What we have are collections of inflexible algorithms and machine learning routines - some slightly more and most less flexible, which can execute what a fairly stupid human can do, but at a greater scale and speed.

The problem is that none of these algorithms can do anything outside of a very narrow window of highly constrained activities. Alphago can't beat Jeapordy champions, but the Watson Jeapordy champion can't even parse Twitter and web site pages with anything remotely resembling human comprehension. Nor is an "AI" able to beat video games, particularly impressive since the video games are artificial constrained environments to start with.

Machine vision continues to cough up ridiculously idiotic characterizations; it is increasingly clear that these limitations are fundamental to the neural network training: i.e. even if better data could be found, we would never know it until yet another failure fails to surface for years. So true machine vision may be an open ended problem like fusion - 10 years away for the last 60 years.

Note that all of the above are virtual/information/data related. If "AI" can't even perform moderately well in the realm of pure ideas - its even worse failures in the realm of actual reality show just how far they have to go and how little they can actually do.

Returning back to Terminator: Skynet would never happen because were it one of the AIs today and not a doomporn fantasy construct, it would never make that intuitive leap to equating accomplishment of its unknown mission - some type of defense app - with extermination of the human race.

Expand full comment

To expand on the above: what we see as AI today is like a steam engine in the mid 1800s. It can do what no human can do: pull a bunch of railroad cars, propel a car, pump water etc at a scale and speed no human can.

But a steam engine cannot exist without humans to design it, to maintain it, to feed it water and combustible material. It can't move. It can't create. It can't cooperate. It can't ideate.

The "AI" we see today is the informational equivalent of the steam engine.

Expand full comment

"Victoria Krakovna’s list of AI specification gaming examples describes an AI that learned to recognize when it was in a testing environment, stayed on its best behavior in the sandbox, and then went back to being buggy once you started trying to use it for real."

That scares me in a way that merely understanding the concept of the treacherous turn didn't -- that it's been demonstrated in the lab. I'm pulling up the original study now.

I'm not sure what, if anything, I'm going to do about that, but, well, it's there.

Expand full comment

Putting it in combination with the Volkswagen diesel scandal (where the Volkswagen diesel engine ran in low emissions mode when it detected the EPA test activities, and high emissions mode otherwise) makes it particularly scary for any simple regulatory approach to self-driving cars.

Expand full comment

Semi-meta comment. I see the same names taking the "AI fears overblown" side of the argument in the comments that I've seen making the same points for several years, never updating their positions in anything other than a "Well, nevertheless, I'm still right because - " fashion. It's classic bottom-line-first rationalization, and it's exhausting.

I know that it is bad manners to psychoanalyze people who disagree with me, but I feel that I am forced at this point to suggest that a large number of people are simply refusing to seriously consider the case for AI risk at all.

Motivated reasoning is common. If someone that I trust suggests to me that I am arguing from a point of motivated reasoning, I will take their assertion seriously and challenge myself to consider counter-arguments more thoroughly. I consider it a favor. So I am trying to do everyone a favor by suggesting that maybe they should suspend the voice in their head that talks over Scott's arguments with their own, and instead actually take them seriously.

Expand full comment

It's more specifically normalcy bias. It is very difficult to process that one finds oneself in an extraordinary circumstance, particularly when the circumstance doesn't contain concrete physical threats you can point to, it's wholly abstract.

Expand full comment

Honestly, most of the arguments *against* AGI being a huge thing to worry about are pretty bad, and Scott's counterarguments here are all pretty obvious — I'm not really sure what I was thinking back when I thought "AGI is no threat at all, Yudkowsky et al are dangerous whackjobs fool who should respect the *actual* experts & authorities" a couple years ago.... My reasoning, in retrospect, was very unsound.

As best as I can remember, emotionally speaking, it felt like you guys were trying to take away the Future away from me — the shining idea that, even though the world is full of problems and the universe is empty and I will die, Humanity will live on and improve itself and spread and last until the end. AGI being a threat to that, with no clear way out, the claim that the bright path to the Future was crumbling before us and we need to fix it right now except nobody figured out how yet — was and is terrifying, and I grasped at any and all counterarguments to give me a reason to relax and sleep easy at night. Oops.

Expand full comment

I don't see how you can write all that and not think that all the AGI stuff is a distraction from the real existential risks facing humanity today.

Expand full comment

... Well, are you going to elaborate, or just leave it at that?

Expand full comment

All your stuff about AGI taking away the path to utopia only makes sense if you ignore, and had been ignoring, everything like climate change and nukes and probably several other existential risks that actually exist instead of being hypothethicals.

Expand full comment

I think that AI fears are real, though overblown by most of those focused on AI risk. (My own views are that there is a greater than 10% chance of AGI within the next 30 years. And AGI has a very large chance of causing at least some results that we would view as negative and significant.)

The bigger issue, though, is that proposals for mitigating AI risk seem to come down to:

(i) solve the principal agent problem; and/or

(ii) create a coherent theory of ethics/morality

in a time frame of 10-20 years plus:

(iii) get everyone to agree to adopt the results of (ii).

Principal agent issues have existed at least since ancient Greece and we haven't successfully solved them for humans. We've been unsuccessfully working on a theory of ethics/morality for at least as long.

To suggest that we are going to solve these in 20 years seems to me like fantasyland.

And, to the extent we have theories of ethics and morality, I, Donald Trump, Xi Jinping, AOC and Rod Dreher don't agree on them. (These are examples; the point is that different people have very different values. Whose should the AI implement?)

This is before taking account of the existence of competitive pressure, which likely renders any solution of this nature moot in the real world.

So, it isn't that AI risk isn't real; it is that most of those focused on it seem to be engaged in forms of magical thinking, which have no chance of mitigating the problem.

Expand full comment

All very true points. I've been thinking we need to promote the concept of AI-safety-in-depth. The starting point for it being: the alignment problem is unsolvable. Where do you go from there?

> And, to the extent we have theories of ethics and morality, I, Donald Trump, Xi Jinping, AOC and Rod Dreher don't agree on them.

Well, my hunch is that the people who dream of aligned superintelligence are hoping the AGI will figure out a humane way of dealing with people who disagree with them.

Expand full comment

With that last section, I feel like you're focusing too much on the apparent silliness of the metaphors themselves without really getting the point they're all trying to make.

Any given technology is subject to the law of diminishing returns. The simplest possible example is that, after a certain point, a rock pile becomes large enough that adding each new rock becomes immensely more difficult than the last (just play a game of Jenga if you don't believe me). Eventually you'll reach a point where it's effectively impossible to stack another rock onto the pile, and you'll reach that point long before the pile is the size of the Burj Khalifa.

As a result, the technology to build a rock pile and the technology to build the Burj Khalifa are not merely separated by a quantitative difference in degree, but by a major qualitative difference in kind. Incrementally improving your technique for stacking rocks will never get you a skyscraper, you need to find an entirely different mode of construction. The same applies if you compare cannons to nuclear weapons, or wheelbarrows to high-speed maglev trains, or abacuses to modern supercomputers.

Ironically, your fireworks example actually proves Dionysus's point, rather than refuting it: While the Byzantines were right to be worried about giant cannons, it's because smaller cannons already existed at that point in history, not because fireworks existed and thus made giant cannons a looming inevitability. The path from small cannons to giant cannons is linear, but the path from fireworks to cannons isn't; in a world where smaller cannons didn't exist, it would indeed be very silly to worry that giant cannons were a major threat because fireworks made bright lights and loud noises! (After all, cannons were invented about 1400 years after fireworks!)

The point of those metaphors is not that a general artificial super-intelligence could never exist under any circumstances. Rather, it's a critique of the idea that you can get a super-AGI simply by increasing the size, speed, and efficiency of modern algorithmic AI. If a super-AGI ever does exist, it will be something *qualitatively different* than modern AI, in the same way that the Burj Khalifa is qualitatively different than a rock pile, a nuclear weapon is qualitatively different than a cannon, and a cannon is qualitatively different than fireworks; it won't simply be a "scaled-up" version of modern AI. Building a super-AGI will require a series of entirely new scientific discoveries and technological developments, some of which will be probably in fields entirely unrelated to computer programming and AI research, and all of which will be major news in their own right, which refutes the idea that a super-AGI is something that could suddenly develop out of the blue and take us by surprise.

Expand full comment

Great post except for the Y2K section. I made _perfect_ sense to use two digits for the year in 1970 because storage was super expensive back then. There are credible estimates that even with all the work done in the 90s to fix the Y2K bugs it was still cheaper to have used two digit years.

Expand full comment

> I hate having this discussion, because a lot of people who aren’t aware of the difference between the easy and hard problems of consciousness get really worked up about how they can definitely solve the easy problem of consciousness, and if you think about it there’s really no difference between the easy and hard problem, is there? These arguments usually end with me accusing these people of being p-zombies, which tends to make them angry

I am a p-zombie, and so are you! Qualia are almost certainly an illusion.

> Whatever you’re expecting you “need self-awareness” in order to do, I bet non-self-aware computers can do it too.

I disagree. Self-awareness almost certainly *must* provide an adaptive advantage or it's wasting precious resources for no purpose, and that would be maladaptive.

Maybe a non-self aware computer could perform some subset of those functions, but your sentence reads as a claim of a more general ability.

Expand full comment

I don't understand how qualia could be an illusion; that answer doesn't explain anything, it just adds a layer. The illusion of qualia is still materially qualia, and it still is an equally hard problem. 'Happiness' and 'green' are still the things that they are whether they only exist to our internal systems as software or have a real hardware component.

Maybe I'm misunderstanding the position here, because I don't see how 'illusion' adds any value. I definitely can't figure out how it could justify 'everyone is a p-zombie'.

Your second point I agree with, and I was in the middle of writing a nearly identical comment. I added two caveats, though. The cost of self-awareness could be zero, eg. if it is a side effect of intelligence along the lines of IIT. Or the benefits may not be related to intelligence, if self-awareness offers some other advantage to reproductive fitness like how birds have colorful feathers that don't improve flight. I think both caveats are unlikely though.

Expand full comment

> Maybe I'm misunderstanding the position here, because I don't see how 'illusion' adds any value. I definitely can't figure out how it could justify 'everyone is a p-zombie'.

Classifying qualia as an illusion adds the value of being consistent with literally everything else we know about reality. That should help you put every other theory of consciousness in perspective: they are literally inconsistent with our existing body of knowledge, and so require various ontological extensions. Maybe these will end up being justified, but I doubt it. Every prior claim to human specialness has failed spectactularly.

The closest analogue to this debate over the specialness of consciousness already happened with "life", and the prevailing non-eliminative theory of life was called vitalism. I fully expect history will repeat itself.

> The illusion of qualia is still materially qualia, and it still is an equally hard problem. ​'Happiness' and 'green' are still the things that they are whether they only exist to our internal systems as software or have a real hardware component.

That's just it, it's not an "equally hard problem", classifying qualia as an illusion collapses the hard problem into the so-called "easy" problem (which is not actually easy, but is at least possible to explore with neuroscience). What we call qualia will end up being some layered set of functional cognitive processes over perceptions that yield the mistaken conclusion that qualia are real.

You are correct that classifying qualia as an illusion is not an explanation, it's a conclusion that I'm confident is true. It's little different than stating that P != NP, which is also not an explanation but a conclusion that nearly everyone believes with high confidence.

Every thought experiment that purports to show that qualia are real are fallacious in well known ways, although it certainly doesn't prevent plenty of the same arguments from being trotted out over and over again.

Expand full comment

> What we call qualia will end up being some layered set of functional cognitive processes over perceptions that yield the mistaken conclusion that qualia are real.

What will actually happen is neuroscience will produce something that sounds like gibberish to laypersons, some will point at the gibberish and say "See, we have explained consciousness away!", then will be reduced to foot-stomping and going blue in the face repeating that the gibberish is the final word on the matter to anyone and everyone who doesn't elevate the gibberish to the status of sacred text.

Expand full comment

Foot stomping that the world is emphatically not flat certainly seems warranted, no matter how much people care to delude themselves. So if it's the same sort of "lay-person gibberish" we see from quantum physics that makes the most astoundingly accurate predictions known to date, then sign me up!

Expand full comment

The world is flat in the contexts people spend most of their time in.

> So if it's the same sort of "lay-person gibberish"

But it can't be. Physics is about the observable.

Expand full comment

> The world is flat in the contexts people spend most of their time in.

Agreed! And qualia also appear real in the contexts people spend most of their time in as well. That doesn't make flatness or qualia actually real.

> But it can't be. Physics is about the observable.

And under eliminative materialism, so are the illusion of qualia.

Expand full comment

The claim that qualia are illusions isn't even consistent with itself.

Expand full comment

Well then it should be trivial to demonstrate this inconsistency.

Expand full comment

Qualia are appearances. If it appears to you that you have qualia, then something appears to you. But the claim that you have no qualia is equivalent to the claim that nothing appears to you. So the claim that it only appears to you that you have qualia means that nothing appears to you, and also that something does.

Expand full comment

Exactly. Please someone who disagrees explain what is wrong with this logic.

Expand full comment

By this argument, "appearances" do not differ from "perceptions". I can perceive an optical illusion as moving despite no motion actually being present, ie. it appears to be moving means I perceive it to be moving.

This is not what qualia are, which is why we have a different term ofr it. Once you try to pin down exactly how and why "qualia" differ from "perceptions", you find yourself in god of the gaps territory.

Expand full comment

To be as clear as possible with the muddy concept of qualia, your argument equivocates on the term "appearances" which people will take to simply mean "perceptions", but qualia are not merely perceptions but are the "seemings" of perceptions, ie. the "what it is like" to experience something.

So what are these "seemings"? Exactly. I have no idea and neither do you, and you almost certainly cannot actually demonstrate that seemings exist, even if you *think* you are directly perceiving them from within the privacy of your own mind.

Eliminative materialism is the assertion that these "seemings" are a perceptual trick, much like optical illusions. All attempts to show qualia capture something real, either formally or linguistically via thought experiments, have all failed spectacularly. I'm of course open to the possibility that we simply don't have the right language or concepts, but I think everyone should be very skeptical.

Expand full comment

I think we can rule out 'human specialness' as a critique of other models- animals probably have conscious experiences.

I don't think 'consistent with our current body of knowledge' is a good criteria. There are many historical situations where if you declared that a new phenomena must be explained by old models, you'd be wrong. If you declared that the new phenomena doesn't exist, you'd almost always be wrong. The pertinent fields are relatively new and encounter big surprises frequently. We know far too little to argue from authority here.

The comparison to vitalism suggests I believe something supernatural about qualia. I think qualia is probably a mathematical or informational construct, produced in the brain.

The illusion explanation forces us to ignore all of the evidence we have about qualia, which is a major flaw. It is more of an affront to our existing knowledge to do this, even Cartesian skepticism says qualia are real. But in the universe where it's correct, this is still true, so we shouldn't rule it out on that basis.

The theory is that I'm not able to reason about this, so correct me if I'm wrong. I believe I can still assert that (1) apparent qualia (I'll call it AQ) manifests in different ways for different information- AQ for sound is different from AQ for images. It's fair, I think, to say (2) AQ exists towards an end, for the same reasons as self-awareness. And (3) AQ is linked back into my perception such that I can measure it and discuss it.

The question "why are we conscious", per (2), can be hand-waved away if we believe that we are conscious: consciousness provides some sort of fundamental informational utility, possibly as an interface. "Why do we percieve ourselves to be conscious when we are not" is more of a challenge. I'm not aware of any important consequence of the sensation of being conscious, and I think I would be.

(1) suggests that the utility must be useful within different domains, so an answer like "we'd kill ourselves due to nihilism if we didn't experience AQ" doesn't quite fit. We could try to dodge it and say AQ is a side effect, but (3) means that we're deliberately being roped in to that side effect, which is very weird.

We might also inquire about why (1) would be true at all. Again, if consciousness is real, this is a non-issue- of course different information presents differently. But AQ, if I'm understanding correctly, has a basic mechanism of "there is information -> I am tricked into believing there is experience". Why bother making that experience different for different information, if the experience doesn't exist and has no inherent value?

I'm not sure if I can keep going, it is very hard to think about this without talking about green looks green in a consistent, inexplicable way, and I'm not allowed to do that. This comment is too long already, so I'll stop here for now.

Expand full comment

> The comparison to vitalism suggests I believe something supernatural about qualia. I think qualia is probably a mathematical or informational construct, produced in the brain.

Qualia are immaterial *by necessity*. "Supernatural" is one way to think about it, but that term has baggage. Accepting qualia means accepting p-zombies which means qualia are not reducible to mathematical or information constructs that are necessarily present in physical reality, they are something "else".

This is why qualia require extending our understanding of what we consider to be real. This is not an argument from authority, qualia defy our foundational understanding of scientific knowledge: only observable phenomena can be explored by science, and qualia are by necessity not observable in the way what all other phenomena are observable. The only conclusion is either non-observable things are real, or we are mistaken about the existence of this non-observable thing.

> The illusion explanation forces us to ignore all of the evidence we have about qualia, which is a major flaw

We have no evidence of qualia. Literally zero. I'm not exaggerating. All we have are thought experiments that attempt to show that qualia are real. They are all fallacious.

Qualia are like unicorns: someone took a concept for which we have evidence (horses), then added something "more", something "ineffable", which for unicorns is some kind of "magical purity".

For qualia, they took perceptions and added an ineffable "what it is like", and everyone is now going around all excited about how cool unicorns are, and writing papers and running conferences about unicorns. I think all of these conferences are about staged photos of regular horses with horns glued to their heads.

I'm going to give it 50 years before qualia look as quaint and wrong as vitalism.

Expand full comment

>Accepting qualia means accepting p-zombies which means qualia are not reducible to mathematical or information constructs that are necessarily present in physical reality, they are something "else".

I don't accept p-zombies, as I believe consciousness likely offers a massive efficiency boost towards intelligence. A p-zombie with a human brain would be measurably inferior, and a p-zombie with human intelligence would have measurably more raw processing power. If this is right, there are material differences between conscious beings and p-zombies.

But that's not necessary. Addition is not reducible to anything present in physical reality. The universe never performs addition. It's an invention that is only meaningful from a human perspective to relate human constructs. I know if I add 1 to Graham's number, the last digit will be an 8, even though there's not enough space in the universe for Graham's number to exist in. Addition and Graham's number are something "else".

>The only conclusion is either non-observable things are real, or we are mistaken about the existence of this non-observable thing.

I don't see why we would favor the latter- I assert that the dark side of the moon exists while no observer can watch it, and existed long before any observer had seen it. A world where this is not the case is drastically less likely than our own.

We have no way to measure the one-way speed of light. Does light travelling in one direction "defy our foundational understanding of scientific knowledge"?

This may just be a philosophical difference, though. On an object level:

>We have no evidence of qualia.

I don't understand this at all. All of the evidence I have is of qualia. Neurons in my nose may detect a particular molecule, but until that detection has manifested as a smell qualia, I cannot reason about it. If I accept that qualia is a trick, I call into question everything I have ever observed.

In favor of your point, I can't describe qualia in an absolute way, I can only describe them relatively to other qualia. But I do most definitely observe a quality of 'redness' when I see red things. I can imagine a world where red things produce blue qualia, and that world looks different from my own. That is not the case if 'redness' only extends from the raw information 'this is red' plus a 'this is qualia' illusion- something more is being contributed.

This isn't proof that you're wrong. If consciousness is an illusion, I would still think I'm making these observations. But in order to come to such a conclusion I would have to reject my observations as fiction, and I would need evidence beyond the lack of a strong alternative.

No one invented the unicorn here; we all independently encountered unicorns in the wild. You're arguing that it's more likely that they are all horses because some people think unicorns have magical powers, horses probably like pretending to be unicorns, nothing else that we've seen resembles a unicorn, and nobody can have their sighting verified by others. That's fine, but you do have to overcome the fact that we've all seen unicorns, or at least provide some support for why horses might pretend.

Expand full comment

> I don't accept p-zombies, as I believe consciousness likely offers a massive efficiency boost towards intelligence.

I agree that consciousness has a functional purpose, in the sense that conscious actors will generally outperform unconscious actors in the contexts in which humans evolved. That follows naturally from accepting evolution + materialism.

> Addition is not reducible to anything present in physical reality. The universe never performs addition. It's an invention that is only meaningful from a human perspective to relate human constructs.

I disagree, but that's another debate.

> > The only conclusion is either non-observable things are real, or we are mistaken about the existence of this non-observable thing.

>

> I don't see why we would favor the latter- I assert that the dark side of the moon exists while no observer can watch it, and existed long before any observer had seen it. A world where this is not the case is drastically less likely than our own.

I don't follow. The dark side of the moon is observable. We literally saw it on the various moon missions. It's also consistent with our body of knowledge about physics and space. Qualia satisfies none of these criteria.

> I don't understand this at all. All of the evidence I have is of qualia. Neurons in my nose may detect a particular molecule, but until that detection has manifested as a smell qualia, I cannot reason about it. If I accept that qualia is a trick, I call into question everything I have ever observed.

I don't understand this argument either. These tricks are likely abilities that evolved to convey information. Why would it being a trick call anything into question? Like any of our perceptions, they will convey some part of reality accurately and some inaccurately, and by repeated testing we figure out what's reliable (science!). Just don't ascribe more reality to the trick than is warranted by the evidence.

> I can imagine a world where red things produce blue qualia, and that world looks different from my own. That is not the case if 'redness' only extends from the raw information 'this is red' plus a 'this is qualia' illusion- something more is being contributed.

This thought experiment is underspecified, so I don't understand what point you're trying to make or what this "more" might be. I believe Dennett has addressed similar sorts of thought experiments in depth.

> That's fine, but you do have to overcome the fact that we've all seen unicorns, or at least provide some support for why horses might pretend.

I agree, we all perceive what look like horses with horns on their heads. People have also defined unicorns as horned horses with certain magical properties.

I'm saying don't call the horned horses we see "unicorns" unless you can prove they have those magical properties. Is this not how all reliable knowledge has been built?

I also agree that we need an account for why these horses have horns even if they don't have magical properties. I don't think I ever denied that, I've just said that I'm confident they are not unicorns. Maybe it's a prank, or maybe it's a hirtherto unknown species. Neither convey on them the magical properties of unicorns.

So to bring this analogy full circle, qualia are immaterial by definition. This is a magical property from the scientific perspective. The existence of qualia are thus far unproven, and I think they should require some kind of proof before we start extending our scientific ontologies to include the immaterial.

I also agree that eliminative materialism needs to provide a good explanation for why we *believe* we have qualia, and how the "trick" that creates the illusion of qualitative experience works. This has already begun with a mechanistic accounts for subjective awareness [1].

I don't think I ever denied that we need such explanations. All I've asserted is a high confidence in this outcome far above other accounts.

[1] https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00500/full

Expand full comment

The past year and a half have been a very illustrative example, for me, of what "let's wait until there's clearly a problem" looks like in practice: it's the covid control system that ignores exponential growth until we're many cycles in and the problem is visible enough for most people to see, then clamps down *just* enough to keep at steady state, because if things improve much, the system relaxes the constraints.

This is a terrible solution, extremely expensive in dollars and lives compared to proactive planning without ever actually solving the problem by any definition of what solving the problem means. This is the case even for a type of problem humans have faced many times before, armed with better knowledge and weapons than ever before, against an enemy that has no intelligence and consistently does pretty much what basic biology and math tell us it will do. This does not bode well for AGI.

Expand full comment

Agreed. More annoyingly, AGI is more like if news came out before 2020 that "the Wuhan National Biosafety Laboratory is proudly working on coronavirus gain-of-function research!", and we get periodic updates of its progress through the years, the news and some academics deride anyone who says the research is a bad idea, and then the pandemic happens. Though of course, the AI "pandemic" has not happened, and many people insist such a thing is not possible.

Expand full comment

> AI is like a caveman fighting a three-headed dog in Constantinople.

Scott, a more truthful depiction is that a caveman is looking at a group of cavemen shamans performing a strange ritual. Some of the shamans think they're going to summon a Messiah that will solve all problems. Others think that there's a chance to summon a world-devouring demon, by mistake. Yet others think neither are going to happen, that the sprites and elementals that the ritual has been bringing into the world are all the ritual can do.

Stated bluntly, us laypersons are currently in the dreaded thought experiment of talking to an AGI of uncertain alignment and capabilities, the AGI being the collective activity of the AI researchers.

I understand this view is something of a cognito-hazard, for anyone for whom AI X-risk is not merely a fun thought experiment. I think it's acceptable to discuss publicly, because if me, an outsider managed to apprehend the gravity of the situation, there's no way others, both outsiders and insiders, haven't done the same.

Expand full comment

My take on this is that Yudkowsky's "Fire Alarm" argument is very strong an I don't feel that an AI skeptic has made a complete argument unless they address it, ideally by providing a specific statement of when we should start to worry. And I haven't seen a single skeptic address it in the comments here. So I would urge AI skeptics to take the opportunity.

Expand full comment

I addressed parts of it (the conference experience as well as part four, re:tools). Too busy actually advancing the cause of human destruction atm to give a full argument, but let me add another counter to those luminaries hesitating with their predictions. Show me a typical chess position, and I will have a reasonable guess how the game might evolve ("oh, White will advance their pawns on the Q-side, while Black will try to exchange knights and press against f2"), even for top-level games. I will emphatically *not* be able to predict the next few moves if they are in any way interesting. Another analogy - I wouldn't place bets on the next roll of dice, only on the general statistics of multiple throws. Being confident of the trend is not the same as confidently predicting any particular development. Hence my example of a confident prediction they could've made re: autonomous driving.

But let me turn it around - how familiar are you with numerous recent examples of expected progress in AI that didn't turn to be true? If we're quoting luminaries at conferences, I was promised deep networks that barely need samples, reliable deployment of binary NNS, widespread self-driving cars, robots easily bridging the reality gap outside a few toy models, the success of Alpha* at anything besides games, automatic human-level data labeling and the list goes on - all by world-class experts, all to happen within 2019-2021. Classic control theory is a thing of the past! No need for vision experts - any human knowledge just impedes the network! Doctors already should stop learning medicine and focus on operating deep models! (...yes, really. That was said by a top person in a panel. Like, as a recommendation in 2017.)

Should we conclude that the field of AI is in a crisis and no one dares to ring the fire alarm? Or just that this category of arguments is weak?

Expand full comment

Predictions are hard, especially about the future, as they say. I understand that calling the next few "moves" of AI development is impossible. Still, I would like to have more meat than people's general feeling of where the trend is going. So I'm still asking you to be more concrete if possible. Maybe imagine that it's 2040. What advances could you see that would make you go "yes, GAI is likely going to be here soon. Time to ring the fire alarm."?

Without being to personal: I know the AI over-hype with some familiarity. I don't understand what you mean by "AI is in a crisis and no one dares to ring the fire alarm"? If we have reached peak AI and future progress will be a lot slower, than I'll have much less to worry about with regards to superintelligence. Why would I need a fire alarm then, there's nothing to freak out about? (But I expect progress to continue to be impressive.)

Expand full comment

I absolutely do not think that AI is in a crisis - that was a reductio ad absurdum. My intent was to show how this kind of argument could be equally easily used to reach the opposite conclusion - that in fact AI seems to be doing poorly, without the social consensus about it. Or we could simply avoid this kind of reasoning. Incidentally, I'm happy you'll have a lot less to worry about if AI slows down. I'd find that terrifying, and most sets of circumstances leading to that situation tragic for the human species.

As for your questions - I'd say that AI being dramatically better at learning from extremely scarce inputs would be one worrying indication. Sure, there was this paper or that experiment that made it look like one-or-zero shot learning is working, but by and large, it's not. Whatever allows humans to do that, we clearly do not know how to make AI have it. Yet it is a key component in any path to explosive self-improvement I can imagine.

Another thing would be training methods as widely applicable and successful as gradient descent - that aren't, well, gradient descent. And no, Bayesian networks don't count at present.

Reinforcement learning being much more successful in the wild would be a warning sign too. As it is, RL is really, really, really bad at actual real-world problems. Don't even get me started!

Expand full comment

I don't understand what you mean by "this kind of argument". The fire alarm argument is not an argument on whether AI is developing fast or slowly, it's an argument on that there's no clear point where we should start to worry about AI x-risk.

Dramatically better one-shot learning is a pretty concrete indicator. I'll have to think more about its value. Spontaneously, it sounds like a 10-year problem and not a 50-year problem to me (depending on specifics of course). I don't see why a new paradigm beyond gradient descent is needed.

Being more successful in the wild is a problematic criteria to me. An AI with the ability to do abstract thinking and communication as well as a dog might not be that powerful or "successful in the wild". An AI with a human level ability will be very powerful. But the step between these two might not be that large.

But thanks for providing concrete "fire alarms". As I said, I think this creates a more fruitful discussion.

Expand full comment

The "fire alarm" post was not an argument - it was several. As I said above, I'm not up to writing a fully general point-by-point rebuttal. Some parts I already addressed (search for "One response is that Point Four is simply not very true."). Another was a part where EY described his experience at a conference of AI experts and asked them to make predictions about the near future. He then concluded a lot from their hesitation. *This* was the argument I was objecting to.

My "alarms" -

One/few-shot is certainly not a ten-year problem for the simple reason we have spent more than ten years trying to address it. Having spent a significant fraction of my PhD on related topics, I don't think we really made much headway. I don't mean to sound condescending, but I suspect you might not fully appreciate the nature of the problem. It might very well be that we need a different "mathematical substrate" for our learning. Essentially, our learning works due to concentration of measure (e.g. Hoeffding's inequality). We capture "typical" behavior with sufficiently large samples, only somewhat guided by priors. This is probably not the (fully-) correct approach.

Gradient descent is just too limited in a number of fundamental ways to be sufficient, in my view. Too local, one might say. I'm sure it will continue to play a vital role, but it can't be all. Let me also point out that our brains don't learn via backpropagation.

"In the wild" - sorry, too tired of excited CEOs getting all hyped up about AlphaGo and saying "we should solve our key problems using RL!". It doesn't work outside games. Just doesn't work. In so many different ways and senses. Being able to successfully use RL for messy ill-defined real-world challenges would be a major step towards strong AI.

Yet another alarm for you to enjoy - efficient self-improvement. Sure, NAS. But all those architectural search works use ever-more-enormous compute, within a narrow range of variation, on clearly defined problems, to gain a few more percents. Show me a decently-sized model that learns a real-world task and is also capable of self-rewiring, and I will be at least more tolerant of alarmists (I still see many weaknesses in the standard exponential-improvement-unto-Godhood path).

But I really should quit this argument for now... I'm sure you'll be delighted to know it takes me away from helping patients using improvements in AI :P

... I jest, but those lines about being happy if AI slows down really do alarm and annoy me. Worrying about a potential future flood is usually a bad excuse for denying water from the thirsty.

Expand full comment

Thanks again for your examples of fire alarms. I lazily said "fire alarm argument" and meant the kind of concrete "fire alarms" you listed, so I'm very happy with your response and will think on it some more.

I also want to make clear that I wouldn't be happy if AI slows down: I would just be less worried about superintelligences.

Expand full comment

Not the OP, but before rant.begin() I want to point out that I agree with a lot of what you’re saying (+100 on gradient descent needing to go, and its going being a good sign of the field starting to get scary), and mostly have a bunch to say on one point of disagreement about timing re: one/few-shot:

> One/few-shot is certainly not a ten-year problem for the simple reason we have spent more than ten years trying to address it. Having spent a significant fraction of my PhD on related topics, I don't think we really made much headway. I don't mean to sound condescending, but I suspect you might not fully appreciate the nature of the problem. It might very well be that we need a different "mathematical substrate" for our learning.

It's worth noting that brains *don't* solve this problem in any sense that maps onto the types of neural networks that we mostly use right now (feedforward with strictly separated train/predict cycles), so it's possible that there is no clean solution in any generality. For instance, our instantaneous (<100ms) object detection systems do not meaningfully adapt to quickly identify new patterns based on a small number of presentations, that takes a lot of repetition, and is the main difference between a novice and expert in many fields, sports, music, art, etc.

What our brains *do* handle really well with few examples is to mark a piece of input (or alternatively and more accurately, a snapshot state of mind) as important/novel/dangerous, mysteriously encode a state hash in the hippocampus (roughly) and then re-process it later, many times/in different hypothetical contexts/under a different plasticity setting at the cellular level, until it is eventually encoded as a memory. Which is basically sophisticated and automatic data augmentation on whatever input comes in. But that's a whole observe/self-observe/encode/train/retrain process that includes and relies heavily on a full internal world/self model—we all have an extremely powerful and versatile GAN living in our heads that we use to model *everything* including our own brains—so it doesn't at all map to the limited models that we can build with today's hardware. It's probably a more engineering-driven practical thing than a rigorous and tight mathematical model, too, so even getting it to work might not provide any satisfying solution to the underlying problem. Perhaps this is what you mean by “different mathematical substrate”?

Here’s where I sort of get off the boat with the timing argument, though: sure, it’s been worked on for many years, but a solution that's anything like what we'd expect to work hasn't been realistically *computable* for all of that time, so to me that doesn’t count as an argument about how far away it is now. It's not that we don't have many, *many* good but complicated ideas about how to effectively do few-shot learning that could have at least a decent chance of working given more X, Y, and Z. There are tons of cognitive architectures out there with varying levels of plausibility at solving this, as well as other approaches that are less pie-in-the-sky. The problem is that X, Y, and Z are something like 10-100x memory, compute, and bandwidth, and (like backprop researchers in the 80s) we're completely stuck on implementing against realistic problems until we get those things. Only in the past 5 or 10 years did we get enough compute to do the narrow first-pass feedforward visual perception that is now "Baby's first Colab notebook: fine-tune a ResNet!", so it’s not surprising at all that any time spent on much harder problems during that time was essentially wasted or at least pre-prep for when the problems finally become viable.

And sure, most of those ideas will turn out to fail even when compute is there, but that's nothing new: RNN and LSTM text generation failed disastrously before self-attention more or less 100% solved it (caveat: “solved” by cheating, but another rant for another day). "Attention is all you need" hit the scene almost the exact moment there was hardware able to support it, which is part of a general trend where we rarely, if ever, lag algorithmically more than a year or two behind what hardware can support, and often anticipate it via ridiculous heroics (Google et al have better/more hardware, so jump the queue and publish first): it's not like *any* knowledge from today would have circumvented the AI winter of the 80s, since deep learning really *wasn't* feasible back then. So to me, the real "fire alarm" here is not a major algorithmic leap but mere hardware improvement, which in some ways is less scary because it's pretty predictable (I start worrying about 10 years from now based on current trends, with a lot of wiggle room for those trends breaking in either direction).

Expand full comment

A better name for the McAfee fallacy would be the Cypher Fallacy.

McAfee makes me think of the antivirus software, or the guy who died.

Cypher, from The Matrix, states that "ignorance is bliss" and he intends to embrace that fact.

The "ignorance is bliss" fallacy might be clearer, but too long

Expand full comment

> The AI started out as a learning algorithm, but ended up as a Go-playing algorithm

More precisely, the output of the learning algorithm is a go-playing algorithm

Expand full comment

"I think the path from here to AGI is pretty straight."

I used to believe deep learning wouldn't do anything we consider human-level intelligence but is enough for self-driving. I dont anymore. It turns out that its corner cases all the way down. Tesla autopilot confusing the moon for a yellow light, Uber AI shutting down and Waymo cars committing a comedy of errors (that cannot be solved by more data, no matter what they say).

Elon Musk has turned 180, going from calling autonomous driving as 'basically solved' to 'turns out it is very hard'.

This researcher's blog gets to the heart of the problems https://blog.piekniewski.info/

DeepMind achievements in games are an incredible distraction. I love DeepMind and they are attacking the problem of AI from every possible angle including neuroscience, causality, neurosymbolic AI. But their press releases are highly misleading. Some researchers emphasise on using the words "in mice" when describing a neuroscience or biology study to avoid giving a false picture. I think "in games" should be emphasized whenever such is the case in AI.

Games are an incredibly narrow domain, and provide immediate, clear and predictable feedback to every action (and hence RL does well at them). Thats far from what happens in real world situations including driving.

The path from here to AGI involves completely new paradigms than the ones that are hot right now. And they are not straight at all.

Expand full comment

> So I guess my real answer would be “it’s the #$@&ing prior”.

This came across as unnecessarily (and uncharacteristically!) aggressive. I'm sure I got the wrong end of the stick of what you were getting at (or maybe I'm just being a snowflake), but it kind of felt like you were swearing at a commenter and I don't think there's any need for that.

Expand full comment

Re-the hard and easy problems of consciousness:

I understand the beef a lot of people have with the term self-awareness. It’s vague but I feel it’s a useful term for a phenomen that somewhat escapes us. I can’t imagine there is a person in the world who has never reflected upon themselves, or been of two minds, or any other of 1 million metaphors we use to describe this recursive or reflexive process that takes place in our minds and bodies. Self-awareness doesn’t seem to me like a bad description of it. It might well all be an Illusion (it probably is: see Buddhist texts…)

Paradoxically it is the easy problems of consciousness that I find hard. I have no training in any scientific discipline. However my rational mind Recognizes that these sorts of problems should ultimately have solutions. Perhaps in the same way that there are a fixed number of water molecules in the oceans of the world give or take. It is a definite quantity at any given time. Actually calculating that precisely seems like a monumental task but it is bounded.

The hard problem of consciousness I would put in the same basket as “squaring the circle”. (Is anyone still working on that problem?)

They certainly provoke the same sensations in me, a feeling of wonder mixed with some discomfort.

Expand full comment

If I really had to take a stab at describing self awareness in a vaguely technical way, I would describe it as the “noise” generated by having multiple algorithmic learning programs running simultaneously and sharing little bits of their output with the others.

It’s like that Johnny Cash song, “one piece at a time”

Expand full comment

I want to point out that we didn't need to wait for AI to have bugs that detect and hide when they're being investigated. Hackers have had a word for them since forever (well, the 1990's): heisenbug.

Expand full comment

It is not directly a reply to the arguments here, but I think it is somewhat relevant. Maybe it counts for the “we don't know yet what we don't know yet” argument:

Remember the public who were fooled by the Mechanical Turk, the famous fake chess-playing automaton in the 18th century. If we had asked them: “What is the hard thing in designing a real chess-playing automaton?”, they would have talked about chess.

Well, in the early 1990s, I played and lost against Chessmaster 2000 on Atari ST. Well, I suck at chess, but I read it would have beaten most people, even decent players. A game on a single double density floppy disk running on a 8 MHz 68000 with 512 ko of RAM. And it could talk. YOUR MOVE.

The public in the 18th century would not have realized that much harder than playing chess is grabbing the physical pieces, and that the hardest of all is to analyze a video feed to see what the opponent played.

Of course, there is an XKCD for that: https://xkcd.com/1425/

Expand full comment

AIs don't have a physical world they need to navigate and model and interact with yet. Hence, no need for a fancy navel-gazing nav system and no need for consciousness.

Expand full comment

If everything goes wrong, the army can always call in airstrikes on the relevant facilities. Mankind has a lot of guns on its side too.

Expand full comment

I think both sides of this debate are missing the point, and that makes me sad. We’re busy arguing over arcane talking points while the substance is lurking in the shadows.

Specifically, the crux of the debate today (on both sides) seems to anthropomorphize AIs in a way that is taken for granted, yet actually not at all clear to be coming true. For example, everyone seems to assume that of course AIs have (or will someday come to possess) an egoistic will to survive on their own. Organic entities seem to have this survival instinct, through some emergent property encoded in nucleic acids and proteins.

I think it’s reasonable to assume by default that some analogous emergent property will emerge in learning algorithms - that they will one day become “digital life forms” rather than just “digital tools.” But I wish there were more discussion about when and how and where that might happen. Because from where I sit, that is most likely to be the threshold condition for some future entity (new type of life form) that might actually represent something that today’s commentators would recognize as AGI.

Critically, it seems to me that this boundary condition is distinct from task mastery - even generalized task mastery. I get that DeepMind’s AIs may someday (soon?) be able not just to beat humans at Go, and also solve the protein code, and write novels, etc. But while “being good at lots of intelligence-related stuff” strikes me as a good way to define “AGI”, it also fails to justify our existential fears about AGI, because humans are successful as a species not just because of our intelligence. It’s also because of our unique blend of social behaviors (which Nicholas Kristakis calls the “social suite”). And, importantly, I’d say it’s because of the interplay between our unique (or actually, maybe not so unique?) brand of human intelligence and its ability to serve our deep desires.

In this way, the more dangerous possibilities of AGI appear to me the unpredictable ways in which they might be called on to serve the human limbic system, much as our “human” intelligence already does. Thus, the most likely “villains” in an AGI scenario, I believe, are other humans. We people, with our egos and desires and social vanity, might enlist a general-purpose digital-based “intelligence” facility to further our own ends - with similarly unforeseen and potentially devastating consequences as nuclear weapons, etc.

It is certainly possible that algorithms will eventually morph from being an ego-less general intelligence to being a full blown digital life form with its own agenda, needs, wants, desires, and social habits. But until that happens, we should focus our fears on what nasty things AGI might do in non-scrupulous (and/or naive) human hands. Assuming that AGI is indeed a very powerful force, one might argue that ANY human hands would effectively be naive. That strikes me as a winning argument.

What does NOT strike me as a winning argument is the idea that AGI will inevitably come to challenge the human race for some perceived dominion over earthly affairs. The only thing that makes this future scenario appear like the default future is the human-centric egos of the commentators pushing those ideas.

That’s not to say that it’s beyond the realm of possibility that AGIs will one day become sentient and furthermore develop their own egos and “limbic systems.” But this would constitute an emergent property that seems far from inexorable - especially when considered on a “by the end of this century” time frame. And, either way, if we’re so afraid of some AGI wiping out all of human kind, why is no one talking about this? Instead we are stuck debating arcane points that I believe are largely immaterial to any real set of “existential threats” likely to emerge from the evolution (human-guided or otherwise) of today’s AI technologies.

Expand full comment

Very good points. I would add that the ‘social habits’ or ‘social suite’ of AGI must necessarily be radically different from humans’, because interconnection is certain - competencies and information will be rapidly shareable. No long apprenticeships or mentoring in AGI world. And isn’t this where self-awareness comes in? An AGI that doesn’t have a sense of being an individual has no motive to hoard its ‘knowledge’; a self-aware entity might.

Expand full comment

This whole discussion highlights why I hate reasoning by analogy. The things have to actually be analogous in the relevant way for it to work; otherwise it's just a word game. So if you want productive convos about AI risk I propose banning analogies, unless they have met some standard for carefully laying out WHY the things are analogous. So "AI is like human brain because xyz" might be ok, but "worrying about agi is like byzantines worrying about nukes" is not, unless you've first established e.g. that gunpowder weapons were improving at a moore's law-like rate at the time and that this would have made city-flattening weapons possible within a human lifetime if improvement continued at that rate.

Let's try some non-analogy reasoning that's accessible to non-experts, such as myself. I don't really trust the experts' judgments on probability because they're too close to the issue, and I don't trust my own bc what do I know. But forecasts from smart people who aren't necessarily experts sound promising. This is the first time I've seen the Metaculus numbers and that made me significantly more worried about the issue. I've always suspected the polls of experts were overestimating the risk because of experts' inherent biases, so I've never tried to calculate an expected value number for it before. And according to Metaculus they actually were. But by a factor of 2-3, not 100. Multiplying Metaculus's estimate of 22% chance of catastrophe that kills at least 10% of world population, by 23% chance that any such catastrophe would be caused by AI, by the 60% chance that any AI catastrophe would reduce world population by at least 95%, that's a greater than 3% chance of a worse than any previous time in history level catastrophe in the next 80 years, which is phenomenally high for something of that magnitude. EV on that is ~230 million-300 million lives, not even counting lost future value if civilization or the species is wiped out.

Expand full comment