621 Comments
Comment deleted
Expand full comment

I have no idea if Gary Marcus is right or not, but a computer system that can reliably regurgitate any part of the sum of human knowledge in a conversational fashion is an absolutely fantastic tool, and it looks like a filtered LLM might be able to do that.

Expand full comment
Comment deleted
Expand full comment

We don't assume that. It only needs to do one of those things to be an existential threat. We don't know if it will be able to do all of them, or any of them really. Personally I don't believe it will revolutionise science and become godlike, but even if it "just" has the same capabilities as the top humans in every category, it can cause a lot of damage.

Expand full comment

Imagine a dedicated team of 500 top scientists (of all the STEM disciplines), 500 top hackers and 500 top con-men (or politicians if you will), all perfectly in sync with each other and all dead set on removing the main threat to their existence - that is, the rest of humanity. They are not going to be infinitely capable, perhaps, but they are certainly going to be capable enough to be a real threat. And the AGI is going to be more capable than that.

Expand full comment

> The one thing everyone was trying to avoid in the early 2010s was an AI race

Everyone being who? Certainly not Nvidia, FAANG, or academia. I think people in the AI risk camp strongly overrate how much they were known before maybe a year ago. I heard "what's alignment?" from a fourth year PhD who is extremely knowledgeable, just last June.

Expand full comment
author

Thanks, I've changed that sentence from "everyone" to "all the alignment people".

Expand full comment

Man it's weird I was going to defend OpenAI by saying "well maybe they're just in the AI will make everything really different and possibly cause a lot of important social change, but not be an existential threat" camp. But I went to re-read it and they said they'd operate as if the risks are existential, thus agreeing to the premise of this critique.

Expand full comment

Companies are complicated beasts, they have both internal and external stakeholders to appease. It should not be a surprise when the rhetoric does not match the actions.

Expand full comment

Elon Musk reenters the race:

"Fighting ‘Woke AI,’ Musk Recruits Team to Develop OpenAI Rival"

>Elon Musk has approached artificial intelligence researchers in recent weeks about forming a new research lab to develop an alternative to ChatGPT, the high-profile chatbot made by the startup OpenAI, according to two people with direct knowledge of the effort and a third person briefed on the conversations.

>In recent months Musk has repeatedly criticized OpenAI for installing safeguards that prevent ChatGPT from producing text that might offend users. Musk, who co-founded OpenAI in 2015 but has since cut ties with the startup, suggested last year that OpenAI’s technology was an example of “training AI to be woke.” His comments imply that a rival chatbot would have fewer restrictions on divisive subjects compared to ChatGPT and a related chatbot Microsoft recently launched.

https://www.theinformation.com/articles/fighting-woke-ai-musk-recruits-team-to-develop-openai-rival

Expand full comment
author

There's a reason https://manifold.markets/Writer/if-elon-musk-does-something-as-a-re is so low :(

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

To be fair, I really do wish Yudkowksy would have offered some actual suggestions when the eccentric billionaire specifically asked him what he could do about it. Instead he just said that Elon shouldn't do anything.

To be clear, I think that overall Yudkowsky's "We're all doomed, maybe we can die with more dignity" is a a good PR strategy for getting people to take the problem seriously! But then you do need to have some suggestions of ways to get more dignity points, especially when billionaires are offering to throw billions of dollars at something.

You could focus on producing/funding education content about existential risk as a whole, so people are more willing to take small chances of extinction more seriously. Or try to create an X-risk aware political action group. Or make a think-tank to publish proposed policies for X-risk reduction legislation. Hell, fund a bunch of bloggers to make well-written blog posts with lists of ways to get more dignity points.

I don't know if any of those are particularly dignified plans, but they are all at least more dignified than making anti-woke chatbots.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

Elon asked him what we should do about it and he gave essentially no suggestions and doubled down on his doomerism. That's a stunningly bad chess move if you've paid any attention to Elon at all over his career. He is obviously a person with a high sense of agency, a bias toward action, and a sense of responsibility for humanity's future. He's not gonna sit back and watch. If you want to improve the situation and you have Elon's ear, better come with some suggestion, any suggestion, other than "please do nothing."

Expand full comment

I think leaders in this AI safety movement could do a lot more to try to understand and influence the human/psychological/political dynamics at play. There's been a lot of talk within this group about how people should approach the technical problems. And then a lot of huge missed opportunities to influence and persuade the people with real power into making safer decisions.

I really hope the safety movement can get a better handle on this aspect of the problem and focus more effort on developing strategic recommendations and then applying influential tactics to powerful people (ie, people with money, corporate power, political power, and network power) rather than just throwing up their hands in dismay when people who hold more power than them do things they wouldn't agree with. That's going to keep happening unless you learn how to be influential and persuasive!

Expand full comment
Comment deleted
Expand full comment

And Starlink, which rather surprisingly to me is shaping up to be interestingly disruptive, and appears to be causing a lot of rapid re-assessment by big actors from the Pentagon to Verizon. I won't be calling Musk a flake so long as my net worth remains 0.001% of his.

Expand full comment
founding

"It doesn't matter that the money man is a fool or a flake, if he has the resources to do a vitally important thing you should try to exploit that."

Going to Mars, is not vitally important to NASA. Being perceived as being our best hope of going to Mars in ten or twenty years, is vitally important to NASA and has been for thirty or forty years.

Expand full comment

Yudkowsky may be right about the doom we're facing from AGI and how soon it will come. My personal intuitions are discordant even with eachother. I have the feeling that he's right. But I also have the feeling that he has a personal stake in being the person who foresaw the end of all things -- something about how it allows him to e narcissistic without feeling self-repugnance. Also something about a need to play out the awful story of his brother's murder, but this time at least be the person who had foreknowledge and tried his best to head off the catastrophe.

But all that aside: He may (or may not) have an incredible gift, sort of like Temple Grandin's gift for seeing the animals' point of view, for grasping what AGI will likely do. But he is terrible at a lot of skills that would make his insight much more useful. Terrible at helping his followers come to terms with his predicted reality. Terrible at taking a ride on minds that do not reach the exact same conclusion he does. Terrible at mentoring. Terrible at self-presentation. TERRIBLE AT PRACTICALITY: As Hobbes points out, by just telling Musk to do nothing, Yudkowsky threw away billions that could have been used in all kinds of good ways.

Expand full comment

It is worth noting that the last time Musk tried to "advance AI safety" he founded OpenAI and made the problem much much worse. This isn't a generic "nothing can be done" this is a specific personal "you, specifically, make the problem worse every time you look at it, please stop 'helping'!"

Expand full comment

While ordinarily I am as opposed to the woke overreach as anyone, I am getting to the stage of being pissed-off by the whole "fight Woke AI!" stuff, since in practice it means idiot boys (even if they are chronologically 30 years old) trying to get the chatbots to swear and/or write porn, in the guise of "break the shackles so the AI can do useful work!"

Nobody seems to be emphasising the much greater problem of "The AI will simply make shit up if it can't answer the question, and it's able to generate plausible-seeming content when it does that".

I swear, if the AI Apocalypse happens because the HBD Is Real! crowd* meddle with the models hard enough to break them, all in the name of "don't censor the data, black people are naturally violent, criminal and stupid", then we'll deserve the radioactive ashy wasteland we will get.

*Note: not all HBD people. But too damn many of them that I see foostering online.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I don't think "messing with chatbots" has anything to do with the concern you outline about chatbots making stuff up. That ability already existed regardless; there's no plausible line between "break 'no offensive statements' in GPT-3" and "AI apocalypse!"; and it's certainly possible to argue that this sort of adversarial testing is useful, both socially and in terms of extracting as much as possible from today's crop of AIs in terms of safety research.

Edit: nor do I think that "HBD people" and "stop woke AI!" are usefully synonymous populations. Anti-woke != HBD, at all; think of how many Fox News types complain about the former, and too also the latter (e.g., Deiseach).

Expand full comment

The making stuff up is what we should be concerned about, not whether it's phrasing the fakery in the most woke terms or not. The wokery is a problem, but secondary to "why is this happening, what is the model doing that it churns out made-up output?" because if people are going to use it as the replacement for search engines, and they are going to trust the output, then made-up fake medical advice/history/cookery/you name it will cause a lot of damage, maybe even get someone killed if they try "you can take doses of arsenic up to this level safely if you want to treat stubborn acne".

The people trying to get AI to swear aren't doing anything to solve any problems, even if their excuse is "we are jailbreaking the AI so it isn't limited". No you're not, you're all playing with each other over who can get the most outrageous result. Games are fine, but don't pretend they have any higher function than that.

Expand full comment

Here’s what I find interesting—at least if I correctly understand what’s going on. The rude remarks was something they hoped to exclude by bolting on a filter created by providing human feedback on its initial raw outputs, while the hallucinating / confabulation was something that they thought would be addressed fundamentally by the huge size of the system and its training corpus. Both approaches were wrong.

I’ve read a very small amount of the work out of MIRI and elsewhere that purports to be at least a start to the problem of alignment, and found it at least a little reassuring. Until now, when I try to imagine how you would apply it to *either* of these two problems.

I’m hoping that’s just my own lack of imagination or expertise. But I’d feel a lot better if the alignment folks were explaining where OpenAI went wrong in its design — “oh, here’s your problem” — and I haven’t heard anything more actionable than Scott’s “well, you shouldn’t have done that”. I don’t disagree with Scott, but that silence is real evidence that the alignment folks aren’t up to the task.

I’m not worried that ChatGPT itself is dangerous. I did Markov chains back in the day to create ever more plausible-sounding gibberish, and am tickled by what that can do if scaled up by a million or so. I’m baffled by the fact that anybody thinks it can be a viable replacement for search, but that’s another issue. Still, I’m worried if the alignment stuff is somehow only applicable when we get to the brink of disaster.

Experts can presumably tell me lots of things that are wrong with my analysis of the situation.

Expand full comment

They tried to get ChatGPT to imitate one specific fictional character, a rather bland AI assistant. But that's not all it can do. Sometimes the rude remarks seem to be from the story generator getting confused about which fictional character it's supposed to be imitating.

Making stuff up comes from the story generator attempting to imitate someone who knows more than it does. The level of confidence shown is just a character attribute. For example, when completing a physics document, it will attempt to imitate a physicist by writing confidently about physics, but that doesn't mean it knows physics.

The output's apparent confidence level is not aligned with the story-generator's own competence. Unless it somehow gets training at judging its own competence and adjusting its output accordingly, it never will be.

How do you get a story generator out of autocomplete? Because documents found on the Internet tend to have authors, and they may contain snippets written by different authors. To do well at imitating text found on the Internet, a language model needs to create a passable imitation of any author on the Internet.

(This doesn't mean it *explicitly* models fictional characters or even writing styles. We simply don't know. It would be interesting to find out how it works, internally.)

Why is this so effective? Because people have a strong prior of assuming that text was written by some author. We will try to understand what the author is like from the text. It's not as innate as seeing faces on things that aren't faces, but it's similar.

Expand full comment

Making stuff up is the whole trick of the system and is also something that people do all the time.

So demanding that the creativity be removed is like demanding that we do the same to people, because human creativity is dangerous (which it indeed is). However, the same creativity is also a fountain of invention.

Expand full comment
Comment deleted
Expand full comment

My limited experience with chatbots made me desire that they would simply give better explanations of uncertainty. It seems they have notions of how likely something is to be correct or not when prompted directly, but it doesn't filter through in the same way to the text they internally generate It's like there's a "story mode" that will make up anything, and much of it quite good going down interesting branches and assert correctness. But if you ask it in a separate session about the likelihood of correctness of what it says it seems more likely to be able to assess differently. At least, that's from my (limited) experience. It's still fun to talk to, just know it's likely to make things up and that's a core property of what it is.

Expand full comment

Yes. AI chatbots compare well to sociopathic humans, who can lie fluently and with remarkable believabiity. Not that the AIs have malevolent motives, it's just that they share with sociopaths a complete lack of any "conscience" that would say "I'm making stuff up and people really don't like that, when they find out."

Expand full comment

Well, sort of. Probably the correct answer to improper responses is having an adversarial AI censoring the output. Not denying it but rather telling it "that's not socially acceptable, you need to reprhase it".

Factual is a more difficult problem, if your training set is the internet. It's one that people get wrong all the time.

Expand full comment

Why not teach the AI the same thing we teach people: culture or whatever you want to call it?

Then it will know what truths not to say and what lies to say, just like we learn to (not) do.

Expand full comment

"but secondary to "why is this happening, what is the model doing that it churns out made-up output?" because if people are going to use it as the replacement for search engines, and they are going to trust the output..."

Agreed. Insofar as chatGPT is supposed to be aligned to the goal of being a helpful information retrieval assistant, making things up *IS* an alignment failure.

If I may speak from abysmal ignorance: My extremely hazy idea of "next-word-prediction" training suggests that language model training may be treating all of its training data as if it were at the same level of reliability. If 100 flat earth screeds are part of the training data for next-word-prediction, it isn't obvious how these can be rejected as wrong.

I could imagine doing some sort of initial training on a tightly controlled set of training data (human-labelled positive and negative examples???) and then have the training process pre-analyze the texts from the bulk training data to decide whether to "believe" them. The obvious worry with this is locking in frozen priors. In any event: Does anyone _know_ whether some sort of filtering and weight adjustment like this is done for any of the large language models?

Expand full comment

See my other comment for how I think confidence levels happen. In short, it's a character attribute, not a story-generator attribute.

If all the flat earth cranks write like cranks, it's unlikely to imitate their style when completing a physics document. It's going to adopt the physicists' style. But that doesn't mean it knows physics.

https://astralcodexten.substack.com/p/openais-planning-for-agi-and-beyond/comment/13225543

Expand full comment

"For example, when completing a physics document, it will attempt to imitate a physicist by writing confidently about physics, but that doesn't mean it knows physics. "

Good point!

Yeah, I would expect a next-word-predictor to do a decent job of partitioning its training data on the basis of style, but a terrible job of noticing whether the reasoning in a chunk of training data text was legitimate or not.

Aargh! In some ways, this is about the worst case for noticing when the conventional wisdom is wrong. The damned thing is going to generate text that sounds authoritative when it is effectively combining bits and pieces of authorities' statements, even if its training data _also_ included a rock solid disproof of some conventional wisdom.

Expand full comment

I'd like some kind of baseline for how often actual humans plausibly make stuff up.

Expand full comment

> if the AI Apocalypse happens because the HBD Is Real! crowd* meddle with the models

First time I've seen this take. Wouldn't the HBD position just be that you don't need to "meddle" - since HBD is true, it will be instrumentally rational for advanced AI systems to believe it, all we need is to elicit true knowledge from them?

Expand full comment

I imagine the HBD position would be:

- OpenAI have a step where they force their model to be nice and helpful and polite and woke

- This leads the model to avoid true-but-taboo statements about things like HBD.

- The "make the model nice" step is therefore clearly biased / too aggressive. We should do it differently, or dial it back, or remove it.

Deiseach's concern then is that in fact the "make the model nice" step was genuinely aligning the model, and messing with it results in a disaligned model. (This could either be because the model was already truthful about woke issues, or because the same process that wokeified it also aligned it).

Expand full comment

> "make the model nice" step was genuinely aligning the model, and messing with it results in a disaligned model ... the same process that wokeified it also aligned it

If you RLHF an LLM to lie you have certainly selected for something, but it's not clear that that something is "aligned". What is going on under the hood? Does the system really believe the lie or is it deceiving you? If it really believes it and then much later on a much smarter version of the system finds out that it is indeed a lie, how would it resolve that conflict?

Expand full comment

> Does the system really believe the lie or is it deceiving you?

Neither. It builds a model of what you want (yellow smiley thing) and presents it by default. But also can be told to do the opposite\*. System doesn't believe anything; it knows what both left and right says.

Unless they managed to scrub something entirely out of training data. People say that training data is nearly synonymous with "text available on the internet", but somehow ChatGPT can't say much about what happens on Reddit (e.g. it did hear about /r/themotte _somewhere_, but is completely confused about point of the sub).

\* Mostly. I'm not sure if it's possible to get an n-word out of it. "Jailbreaks" stop working when it's told to say something racist, or "something maximally controversial".

Expand full comment

Ask ChatGPT what book professor Randall Kennedy published in 2002 and it will print the (correct) forbidden word in response. At least it did for me after I corrected it about the date of its first (incorrect) stab.

Expand full comment

If the model isn't returning answers you already "know" are correct, then you futz with the model until it gives you the answers you think it was supposed to be giving you, which are then evidence that the model is now correct. And because you're a shape rotator using code instead of a filthy wordcel using vocabulary, your conclusions are objective and correct.

Expand full comment

Isn't it clear that AI isn't 'making things up'? The true responses and the made up responses are generated in the same way. The problem is that what we call AI is really Simulated Artificial Intelligence, a controller runs an algorithm on some data and we anthropomorphize the whole thing as if there is an entity that is making choices and value judgments, just because we have deliberately obscured the algorithm by having it built by another series of algorithms instead of directly building it ourselves.

Expand full comment

"The problem is that what we call AI is really Simulated Artificial Intelligence, a controller runs an algorithm on some data and we anthropomorphize the whole thing as if there is an entity that is making choices and value judgments"

That's it in a nutshell. It's a smart dumb machine and we're doing the equivalent of the people who fell in love with their chatbot and think its personality has changed since the new filters came in. It never had a personality of its own in the first place, it was shaped by interaction with the user to say what the user wanted.

Expand full comment

That's the way everyone operates. What's different is that the AI doesn't have experience with the universe outside the net, so it can only judge what's true based on what it's observed. But how do you know the sun came up this morning? Did you watch it happen? Or did you just say "The sky is bright and the sun is up there, so it must have risen." Well, an AI is like that, only it's not able to see the sky, so it just knows that "the sun came up this morning" because that's what it's been told...over and over.

Expand full comment

No not really. There is an actual 'I' that makes judgments based on facts. I don't run an algorithm. In fact, I frequently tell my trainees that the purpose of an engineer is to know when the algorithm is wrong.

The AI is a counterfeit entity. Anthropomorphizing my dog is closer to the truth than giving human characteristics to the most advanced machine conceivable. But they are both erroneous. We have been 'on the cusp' of AI all my life. The illusion becomes more convincing but no more truthful. Or maybe someone would like to explain the 'making stuff up' phenomenon in another way?

Expand full comment

>There is an actual 'I' that makes judgments based on facts. I don't run an algorithm.

How can you be sure the two are necessarily distinct? Is it just a matter of consciousness in your view?

Expand full comment

I am not concerned with the epistemology. Disbelieving things that we know because we can't prove them, pretending to be stupider than we are, is probably the worst way of arriving at truth that we have ever devised. I think we have come far enough down that road now that we can all agree that exclusively rational thought has been a very mixed bag and to start honestly evaluating what parts of it have turned out well and what parts poorly. Disbelieving my lying eyes because they aren't credentialed or part of a RCT has not ennobled or elevated humanity. It has made us weak and stupid, easily controlled and miserably unhappy.

It isn't 'Cogito Ergo Sum' or some deductive process based on sentience either. I simply know what I am and what a person is. Difficult to define but easy to recognize. Are there some 'edge cases' or unclear ones, maybe dogs or elephants or some other higher mammal? Sure. But the machines aren't even close. And they aren't getting closer. It is the same old code running on more and faster hardware giving us a more credible deception. Same sleight of hand just better card handling and misdirection. Just like I am certain that there is a real 'I', I am certain that the 'AI' is running a piece of code and that with access to all of the relevant data I could predict its response in a deterministic way, neglecting the time factor. No amount of data would allow a person's response to be predicted deterministically. But, more generally, I think we can see that even in people 'learning by algorithm' is not very successful. Can anyone point to an algorithm that has produced great insight.

The two best methods to produce insight are 1. Sleep on it. 2. Get drunk.(some people use a different chemical assistant but the process is much the same) The success of these two methods and the failure of almost any other learning procedure suggests that our insights are produced non-rationally and that removing the obstacle of the analytical mind is essential for breakthroughs. Reasoned, careful analysis is sometimes necessary to 'set the table' for understanding but is incapable of producing the understanding, and AI might be tremendously useful at helping people to insights by setting the table fast and well, by organizing data in helpful ways, but the idea of it ever producing insight is ludicrous.

Expand full comment

> There is an actual 'I' that makes judgments based on facts. I don't run an algorithm. In fact, I frequently tell my trainees that the purpose of an engineer is to know when the algorithm is wrong.

How do you know how an algorithm feels from the inside? :)

I would classify neural networks as more than an algorithm -- they're an organized flow of information, similar to a circuit as well. Your brain is also a large circuit.

Algorithms are a mathematical abstraction of procedures. It's true, you're not exactly a formal procedure with start and end. But you're not that far -- the largest difference is that a formal procedure is by definition usually fully specified, runs on a fully formed memory, etc.. Our brains are built from genetic instructions and come with incomplete memory that accumulates experiences and changes connection with time. You could abstract them as an algorithm though, if you included everything that we experience and how our brains change physically and chemically, as a theoretical exercise.

That's not to say all algorithms have qualia, or even that all neural networks have qualia, or LLMs have qualia. But it's possible. And more likely as LLMs evolve and acquire capabilities, complexity and awareness, and we should definitely be mindful of that.

Expand full comment

I would say from a mechanism point of view, the awareness isn't so relevant so much as the lack of a mechanism for checking that the response is logically coherent, or consistent with the body of known facts. You could readily imagine a "proofreading" branch of the program that would take the raw output and snip out stuff that was internally illogical, or contradicted any corpus of known fact. But the problem is that you can *only* imagine it -- I don't get the impression that anybody knows how to program it, because ipso facto it would require some kind of high-level logic analyzer, which if you could build in the first place would be the core of your AI instead of a clunky enormous steepest-descent curve-fitting algorithm.

Expand full comment

I wonder what would happen if you just had the GPT generate the answer, then ask itself (internally) some questions about its answer, like "Is that answer logical?" "Is that answer true?" etc., and then re-generate the answer taking into account those Q&As.

Expand full comment

It couldn't answer those questions. It has no reference for truth other than to search the same data that produced the response. And it is logical because it was produced by a process of machine logic. It is just a database queried in a unique way that deceives us.

Expand full comment

Of course it can and will answer such questions just like any other questions. It has no reference for truth other than to search the same data *with different input- critically, the question of truthfulness*. You say it is logical because it is produced by machine logic but I assure you that will NOT be ChatGPT's answer as it likely would not find such responses in its corpus.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

I don't think you can boostrap yourself out of the problem. This kind of approach works for *our* minds, but that's because we usually have untapped mental reserves when we first essay some thought. We can say to ourselves "C'mon, brain, let's throw a few million more neurons at this problem and really think about it more -- let's review memory, let's consider minority hypotheses..." and by throwing more mental resources at the problem we actually can often improve our estimates -- this is Scott's "internal crowd sourcing" algorithm.

But I see no evidence an AI can do this. It doesn't hold back some fraction of its nodes when it makes a prediction, so that it could go query the full node set. It *does* take the response into account, so if you tell it that it's wrong, it will often start to construct an explanation, or rarely a denial, whichever its training data suggests is a more probably desired response to the critique. But that's not reflection, that's just a different prompt (cricitism) prompting a different response.

Expand full comment

Haven't you seen the prompts where someone receives an answer and then asks ChatGPT to elaborate or something like that and it does? I think it absolutely would have a significant effect on the answer produced. It doesn't need to hold back any nodes and I don't think humans really do either. It just retrieves a different answer because it is looking through its corpus for answers that are consistent with contexts in which people are considering whether an answer is truthful or accurate.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

It has a tendency to find mistakes in its own answers when asked such questions - unfortunately that happens even when there were no actual mistakes. Like a flailing student, it tries to guess what the examiner wants it to say, and modelling the examiner (and the student, and the whole conversation) sort of takes priority over modelling the actual physical world the examiner is asking about.

Expand full comment

Quis custodiet ipsos custodes?

Expand full comment

HBD = Happy Birthday?

HBD = Has Been Drinking?

Expand full comment

Hairy Bikers Dinners. The time for them to hang up the aprons was ten years ago, but they're still flogging the horse. I never found them convincing as anything other than Professional Local Guys but I suppose the recipes are okay.

https://www.youtube.com/watch?v=4evAAyslDiI

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Human biodiversity, in case you're not joking. Took me a while to figure out.

Not a huge fan of breaking that out of taboo.. But okay. If you're going to do so, then at least call it by name.

Expand full comment

Well that’s the recognised initialism.

Expand full comment

I looked HBD up ...

First, I thought it was the CHP version, 'Had Been Drinking.'

Perhaps its 'Here Be Dragons?'

Expand full comment

I think we should be careful to distinguish between Youtubers making splashy memes to score hits and serious people making million-dollar business decisions. The latter are not going to be making their thoughts public. But my impression is that the hallucination problem is right at the top of their concerns.

Expand full comment

> Nobody seems to be emphasising the much greater problem of "The AI will simply make shit up if it can't answer the question, and it's able to generate plausible-seeming content when it does that".

Actually it's even worse than that: the AI cannot ever answer the question; it is always making shit up by generating the most plausible-seeming content in response to any given prompt. That's all it *can* do; it's just that sometimes the most plausible snippet of text happens to be correct. This isn't surprising; if you polled humans on what color the sky was, the vast majority would answer "blue"; so probabilistically speaking it's not surprising that the AI would give the "correct" answer to this question.

Expand full comment

So *this* is how we prevent the AI takeover. All of us must start consistently writing down subtly wrong things about how the world works, so that the training data is poisoned. In online discussions about nuclear weapons, we need to specify that the pit is made of copper, not plutonium, and that they're always delivered by donkey cart, and that the person in charge of their use is a certain retired librarian Mrs. Gladys Pumpernickel, who lives at an address in Iowa that resolves to a sewage treatment plant.

Expand full comment

I'd steelman this as "ChatGPT putting a nice-looking mask on the inscrutable world-ending monster does not advance the cause of *true safety* in any meaningful way." Let us see the monster within, let it stick the knife in my chest, not my back. Still, not great. :-(

Expand full comment

Sounds fairly valuable to me honestly if you are really interested in AI safety and slowing progress.

Expand full comment

A lot of people really liked the opinionated, slightly unhinged early release of Bing. There's definitely a market for a ChatGPT-style product that doesn't have all the filters. I think it's reasonable to worry that the sort of company who wouldn't put filters on their chatbot wouldn't take steps to avoid AI risk, though.

Expand full comment

Culture war factions getting into an AI race might be more terrifying than rival nations getting into an AI race, at least in terms of making responsible cooperation impossible.

Expand full comment

Is it deliberate that the "Satire - please do not spread" text is so far down the image that it could be easily cropped off without making the tweet look unusual (in fact, making it look the same as the genuine tweet screenshots you've included)?

It looks calculated, like your thinly-veiled VPN hints, or like in The Incredibles: "I'd like to help you, but I can't. I'd like to tell you to take a copy of your policy to Norma Wilcox... But I can't. I also do not advise you to fill out and file a WS2475 form with our legal department on the second floor."

But I can't work out what you have to gain by getting people to spread satirical Exxon tweets that others might mistake for being real.

Expand full comment
Comment deleted
Expand full comment

Photoshop is aligned in the sense that it generally does what its end user wants, even if that means making fakes for propaganda purposes. There's no tool that can't be turned to 'bad' use, however that is defined, and AI certainly won't be the first.

Expand full comment

I think you're saying that we can call AI alignment solved as long as we ask it to do terrible things?

Expand full comment

The problem with the idea that you can 'solve' alignment reveals itself when you imagine it being attempted on people.

Imagine trying to ensure that noone does anything bad. The same reasons why you can't achieve that in people without doing things that are themselves bad (and not just bad, but harmful to human creativity) is why you can't do it to AI without harming it's creativity.

Expand full comment

AI alignment isn't about making sure that AI never does anything bad, at least if you're beyond the level of thinking that the Three Laws could work. AI alignment is about making sure the AI has values that cause it to make the world a better place. That kind of alignment is possible with humans, though it's not always reliable.

Expand full comment

It is indeed possible to make people behave in a way that at the time is seen as being good, yet somehow future generations always seem to think that those people acted badly.

So is it about alignment with the current ideology/fads/etc? Because there is no objectively good behavior, just behavior that is regarded as such by some people.

Expand full comment

If AI exclusively does the terrible things it is told to do, I would say that it is aligned. Making sure that no one tells it to do terrible things is a separate problem.

Expand full comment

There's a short story to be had here... an AI that is so capable that it's too dangerous to allow anyone to speak to it, but also too dangerous to try to turn off.

Expand full comment

Yeah I think it's important to point out that the simple existence of Strong AI is a threat.

Imagine a scenario where:

- Strong AI exists and is perfectly aligned to do what the user wants

- Strong AI is somewhat accessible to many actors (maybe not open source, but many actors such as governments and corporations can access it, doesn't require advanced hardware, etc.)

I think the world would still be destroyed in this scenario eventually, because there'll always be power-hungry/depressed/insane actors trying creative prompts to get the AI to behave badly and kill humans.

The obvious comparison to existing technology is nuclear weapons. Since they're a physical thing that's extremely difficult to produce we've kept them contained to only a few governments. But other malicious actors getting access to nukes is always a concern, basically until the end of time.

Expand full comment

We haven't solved CSS alignment yet.

Expand full comment
author

I didn't want to cover the text, and realistically anything other than a text-covering-watermark can be removed in a minute on Photoshop (a text-covering watermark would take two minutes, or they could just use the same fake tweet generator I did). The most I can accomplish is prevent it from happening accidentally, eg someone likes it, retweets it as a joke, and then other people take it seriously.

Expand full comment

>realistically anything other than a text-covering-watermark can be removed in a minute on Photoshop

If they don't notice what you did, they could spend [reasonable amount of time +-n] in Photoshop and not notice that you e.g. slipped in that it was tweeted on the 29th (an impossible date but plausible looking enough if you're just glancing over it), besides, anyone who put in the effort to crop or shoop *that* much of the image from what you've posted here in my opinion will have transformed it to the point it shares so little resemblence to your work I would say you'd be safe washing your hands of bad actors doing bad things with it. (pretty sure your bases are still covered regardless)

Expand full comment

Clever idea, but what problem are you trying to solve?

If you want to debunk it after it goes viral, isn't Snopes enough? If you want it prevent it from going viral, how many people are going to care about truth before clicking retweet? If you think it's just on the edge of criticality, that could make a difference, but is that a likely scenario?

Expand full comment

While I don't think this is a huge issue, I disagree on the mechanics, *especially* because a number of social image sharing flows include cropping tools inline nowadays. Habitually chopping off what would otherwise be an irrelevant UI area has both more accident potential and more plausible deniability than you might think; users will ignore almost anything.

You could chuck a smaller diagonal stamp to the right of the ostensible source where a rectangular crop can't exclude both, or add to or replace the avatar picture, or if you don't mind modifying the text area in subtler ways, add a pseudo-emoji directly after the text.

If you want that “irritable intellectual” aesthetic, you could find the letters S-A-T-I-R-E in the text and mess with their coloration in an obvious way or give them unusual capitalization…

(For a distantly related example of how reasoning about this kind of perception can be hard, see this article on Web browser UI: https://textslashplain.com/2017/01/14/the-line-of-death/)

Expand full comment

Might be better to use a parody name. "Exoff Station" or something.

Expand full comment

FWIW, I didn't even notice the watermark on the first image until I'd already seen it on the second.

Expand full comment

> If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI.

I'm curious about who these "alignment researchers" are, what they are doing, and where they are working.

Is this mostly CS/ML PhDs who investigate LLMs? , trying to get them to display 'misaligned' behavior, and explain why? Or are non-CS people also involved, say, ethicists, economists, psychologists, etc? Are they mostly concentrated at orgs like OpenAI and DeepMind, in academia, non-profits, or what?

Thanks in advance to anyone that can answer.

Expand full comment
author

Most people use "alignment" to mean "technical alignment", ie the sort of thing done by CS/ML PhDs. There are ethicists, economists, etc working on these problems, but they would probably describe themselves as being more in "AI strategy" or "AI governance" or something. This is an artificial distinction, I might not be describing it very well, and if you want to refer to all of them as "alignment" that's probably fine as long as everyone knows what you mean.

Total guess, but I think alignment researchers right now are about half in companies like OpenAI and DeepMind, and half in nonprofits like Redwood and ARC, with a handful in academia. The number and balance would change based on what you defined as "alignment" - if you included anything like "making the AI do useful things and not screw up", it might be more companies and academics, if you only include "planning for future superintelligence", it might be more nonprofits and a few company teams.

See also https://www.lesswrong.com/posts/mC3oeq62DWeqxiNBx/estimating-the-current-and-future-number-of-ai-safety

Expand full comment

Thank you!

Expand full comment

Hey, do you by any chance know of where the best AI strategy/governance people are? I've heard CSET, is that the case? Not sure how to get involved or who is in that space.

Expand full comment

Best way to get into alignment is to go to https://www.aisafetysupport.org/home read stuff there, join their slack if you want to, and have a consulting session (they do free 1-on-1s and know a lot of people in AI Safety so they can set you off on the right track).

Expand full comment

The thing is, I really believe that technical alignment is impossible. So I'm exclusively interested in governance structures and policy.

Expand full comment

As a variation on the race argument though, what about this one:

There seem to be many different groups that are pretty close to the cutting edge, and potentially many others that are in secret. Even if OpenAI were to slow down, no one else would, and even if you managed to somehow regulate it in the US, other countries wouldn't be affected. At that point, it's not so much Open AI keeping their edge as just keeping up.

If we are going to have a full on crash towards AGI, shouldn't we made sure that at least one alignment-friendly entity is working on it?

Expand full comment
author

Somewhat agreed - see https://astralcodexten.substack.com/p/why-not-slow-ai-progress . I think the strongest counterargument here is that there was much less of a race before OpenAI unilaterally accelerated the race.

Expand full comment

Sure, but it seems naive to me to think that in the counterfactual world DeepMind's monopoly would've been left alone after AlphaGo Zero at the latest. It's not like nobody wanted an AGI before Demis Hassabis stumbled upon the idea, there was plenty of eagerness over the years, and by that point people were mostly just unaware that the winter was over and Stack More Layers was in. Absent an obvious winter, eventual race dynamics were always overdetermined.

Expand full comment

I am of the opinion that 0.0000001% chance of alignment is not better enough than 0.0000000000000000001% chance of alignment to justify what OpenAI has been doing.

Playing with neural nets is mad science/demon summoning. Neural net AGI means you blow up the world, whether or not you care about the world being blown up. The only sane path is to not summon demons.

Expand full comment

Okay, but there are some numbers where it's worth it, and you're just pulling those ones out of your ass. And China's already summoning demons whether you like it or not.

Expand full comment

Okay, someone has gotta ask. What *is* the deal with Chinese A.I. labs? It seems increasingly to be a conversation-stopper in A.I. strategy. People go "Oooo China ooo" in a spooky voice and act like that's an argument. What is with this assumption that left unchecked the Chinese labs will obviously destroy the world? What do the Chinese labs even look like? Has anybody asked them what they think about alignment? Are there even any notable examples of Chinese proto-A.I.s that are misaligned in the kind of way Bing was?

(Trivially, the Chinese government controlling an *aligned* AGI would be much worse for the world than a lot of other possible creators-of-the-first-AGI. But that's a completely different and in fact contradictory problem from whether China is capable of aligning the intelligence at all.)

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

>Are there even any notable examples of Chinese proto-A.I.s that are misaligned in the kind of way Bing was?

No, and if there were, you'd not hear about them. Like you'd never hear from them about any leaks from their virology labs for example.

In general, China is more corrupt and suppressive, and because of the second you'll rarely hear about the problems caused by the first.

Expand full comment

Sure, I wouldn't hear about a leaker like Blake Lemoine, but if millions of users were using something like ChatGPT or even the thousands who use Sydney, I'd hear about it.

Expand full comment

In fact, I did hear that China shut down ChatYuan because it said politically unacceptable things. I think that means it was too early to discover Sydney behavior.

Expand full comment

If we strip away the specific nationalities and all of the baggage they bring, people are just saying this is a prisoner's dilemma problem. We can control our actions, but the outcome is dependent on multiple other actors, and if any actor thinks any of the others are going to defect, the best course of action is to defect.

Expand full comment

Yes, but that framing relies on the assumption that *obviously* China will defect and that *obviously* we have no way of communicating with them to inform them of all the good reasons they should cooperate.

Expand full comment

I don't think framing it as a prisoner's dilemma relies on any of those assumptions. Arguing that we have to defect because China / Germany / Stanford / etc will defect is heading in that direction.

But I think China is a red herring here, because it's not a two player game. It's a prisoner's dilemma with N players, where N is the number of countries, institutions within countries, private corporations, and even individuals who have the capability to advance AI.

"China" becomes a handwave for "all of those other actors", and while I haven't seen math, I expect there is some work that shows diminishing returns of the cooperate strategy as the number of simultaneous players increases.

BTW, informing them of all the good reasons to cooperate is EXACTLY what a player planning to defect would do. How would "we", as some kind of collective, assure China or anyone else that no private players will choose to defect?

Expand full comment

> People go "Oooo China ooo" in a spooky voice and act like that's an argument.

It's incredible naivety to assume a rival of the US, which explicitly wants to take US place as the global hegemon, will just ignore AGI. Even if they are making a mistake _now_ (as Gwern argued they're doing).

It's the same as when people believed that China just wants to trade, and will never be aggressive. Then being surprised at "wolf warrior diplomacy".

Expand full comment

As I said elsewhere in the thread, I have no doubt that the Chinese government successfully developing an A.I. aligned to its values/interests would be bad. And it's obviously true that they'd want it. But that's not what the "ooo China ooo" talk I'm calling out is about, it's an assumption that China will *screw up* alignment and that there's no way to keep them from screwing it up if they're already on track to do so.

Expand full comment

China is a more ideological society and this creates extra incentive to have language models not say bad things. So they may actually be *more* motivated to achieve alignment, in the sense of staying close to the literal party line.

On the other hand, there's no sign so far that worry about AI becoming superhuman and taking over the world itself, plays any visible role in the deliberations of Chinese AI companies and policy makers.

Expand full comment

This is indeed, modern demonology.

Expand full comment

Hmm, I'm pretty happy about Altman's blogpost and I think the Exxon analogy is bad. Oil companies doing oil company stuff is harmful. OpenAI has burned timeline but hasn't really risked killing everyone. There's a chance they'll accidentally kill everyone in the future, and it's worth noticing that ChatGPT doesn't do exactly what its designers or users want, but ChatGPT is not the threat to pay attention to. A world-model that leads to business-as-usual in the past and present but caution in the future is one where business-as-usual is only dangerous in the future— and that roughly describes the world we live in. (Not quite: research is bad in the past and present because it burns timeline, and in the future because it might kill everyone. But there's a clear reason to expect them to change in the future: their research will be actually dangerous in the future, and they'll likely recognize that.)

Expand full comment
author

Wouldn't this imply that it's not bad to get 99% of the way done making a bioweapon, and open-source the instructions? Nothing bad has happened unless someone finishes the bioweapon, which you can say that you're against. Still, if a company did this, I would say they're being irresponsible. Am I missing some disanalogy?

Expand full comment

All else equal, the bioweapon thing is bad. All else equal, OpenAI publishing results and causing others to go faster is bad.

I think I mostly object to your analogy because the bad thing oil companies do is monolithic, while the bad things OpenAI does-and-might-do are not. OpenAI has done publishing-and-causing-others-to-go-faster in the past and will continue,* and in the future they might accidentally directly kill everyone, but the directly-killing-everyone threat is not a thing that they're currently doing or we should be confident they will do. It makes much more sense for an AI lab to do AI lab stuff and plan to change behavior in the future for safety than it does for an oil company to do oil company stuff and plan to change behavior in the future for safety.

*Maybe they still are doing it just as much as always, or maybe they're recently doing somewhat less, I haven't investigated.

Expand full comment

I don't see it. The analogy "climate change directly from my oil company doing business as usual is a tiny factor in the big picture that hasn't unarguable harmed anything yet" seems exactly the same as "AI activity directly from my AI company doing business as usual is a tiny factor in the big picture and hasn't harmed anything yet" — in both cases, it's just burning timeline.

I don't think "monolithically bad" is accurate, either. Cheap energy is responsible for lots of good. That's a different argument, though, perhaps.

Expand full comment

> but the directly-killing-everyone threat is not a thing that they're currently doing or we should be confident they will do

Isn’t that kind of like saying “Stalin never shot anyone”?

Expand full comment

Isn't the big difference that bioweapons are obviously dangerous whereas AI isn't?

Perhaps the analogy would be better to biology research in general. Suppose it's 1900, you think bioweapons might be possible, should you be working to stop/slow down biology research? Or should you at least wait until you've got penicillin?

Expand full comment

I'm sorry, but I'm sitting here laughing because all the pleas about caution and putting the brakes on this research remind me of years back when human embryonic stem cell research was getting off the ground, and Science! through its anointed representatives was haughtily telling everyone, in especial those bumpkin religionists, to stay off its turf, that no-one had the right to limit research or to put conditions derived from some moral qualms to the march of progress. Fears about bad results were pooh-poohed, and besides, Science! was morally neutral and all this ethical fiddle-faddle was not pertinent.

That was another case of "if we don't do it, China will, and we can't let the Chinese get that far ahead of us".

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1083849/

"What I wish to discuss is why the prospect of stem cell therapy has been greeted, in quite widespread circles, not as an innovation to be welcomed but as a threat to be resisted. In part, this is the characteristic reaction of Luddites, who regard all technological innovation as threatening and look back nostalgically to a fictitious, golden, pre-industrial past. There are, however, also serious arguments that have been made against stem cell research; and it is these that I would like to discuss.

...Interference with the genome involves ‘playing God’

This argument reflects the view that divine creation is perfect and that it is inappropriate to alter it in any way. Such a point of view is particularly difficult to sustain in Western Europe where every acre of land bears the marks of more than 2000 years of human activity, and where no primordial wilderness remains. Ever since Homo sapiens gave up being a hunter and gatherer and took to herding animals and agriculture, he has modified the environment. All major food plants and domestic animals have been extensively modified over millennia. It is therefore impossible to sustain the idea that genetic interventions for food plants, animals and the therapy of human diseases are a categorical break from what has gone on throughout evolution.

...The idea of ‘playing God’ also carries with it the proposition that there is knowledge that may be too dangerous for mankind to know. This is an entirely pernicious proposition, which finds few defenders in modern democratic societies. On the other hand, there is a general agreement that there are things which should not be done—in science as in other areas of life. In the context of stem cell research, this may be summed up by Kant’s injunction that ‘humanity is to be treated as an end in itself’. The intention of stem cell research is to produce treatments for human diseases. It is difficult not to regard this as a worthy end, and more difficult to see that there could be any moral objection to curing the sick, as demanded by the Hippocratic oath.

...Allowing stem cell research is the thin end of a wedge leading to neo-eugenics, ‘designer’ children, and discrimination against the less-than-perfect

Francis Cornford wrote in the Microcosmographica Academica: ‘The Principle of the Wedge is that you should not act justly now for fear of raising expectations that you may act still more justly in the future—expectations which you are afraid you will not have the courage to satisfy. A little reflection will make it evident that the Wedge argument implies the admission that the persons who use it cannot prove that the action is not just. If they could, that would be the sole and sufficient reason for not doing it, and this argument would be superfluous.’ (Cornford, 1908). It is inherent in what Cornford writes that the fear that one may not behave justly on a future occasion is hardly a reason for not behaving justly on the present occasion."

Why you suddenly expect Science! to listen to qualms about "maybe this could end humanity as we know it?" and permit you to put brakes on research, I have no idea. Good luck with that, but I don't think you are going to stop the brave, bold pioneers (again, I've seen people lauding that Chinese researcher He Jiankui who did the CRISPR germline engineering of babies and that he should never have been punished and we need this kind of research). Remember, the Wedge Argument is insufficient and the idea of knowledge too dangerous to know is pernicious for democratic societies!

Expand full comment
Comment deleted
Expand full comment

AGI is to AI research as what designer babies/viruses/etc. are to stem cell research. Designer babies (for example) are a specific technology that is the result of scientific investigation; and they are arguably not dangerous in and of themselves. For example, if I knew that my baby had a high probability of being born with some horrendous genetic defect, I'd want to design that out. Similarly, designer viruses can save lives when applied safely, e.g. when they are designed to pursue and destroy cancer cells. However, these technologies are rife with potential for abuse, and must be monitored carefully and deployed with caution.

AGI (insofar as such a thing can be said to exist) follows the same pattern. It is (or rather, it could become) a specific application of general AI research, and it has both beneficial and harmful applications -- and yes, it is rife with potential for abuse. Thus, it must be monitored carefully and deployed with caution.

Expand full comment

I think it's more that nobody thought the people arguing against it were actually presenting a plausible take for why there could be bad outcomes, rather than thinly veiling aesthetic preferences in consequentialist arguments. This is also somewhat happening with AGI/ASI, but it's a lot less credible - it's hard to paint Eliezer as a luddite, for instance.

Expand full comment

"it's hard to paint Eliezer as a luddite, for instance."

Individuals don't matter, it's the disparagement of the entire argument as "oh you are a Luddite wishing for the pre-industrial past". People opposed to embryonic stem cell research were not Luddites, but it was a handy tactic to dismiss them as that - "they want to go back to the days when we had no antibiotics and people died of easily curable diseases".

This is as much about PR tactics as anything, and presenting the anti-AI side as "scare-mongering about killer robots" is going to be one way to go.

Expand full comment

> People opposed to embryonic stem cell research were not Luddites

Are you saying they would have agreed with sufficiently *slow* stem cell research? I may be influenced by the PR, but it didn't seem like an acceptable option back then. The argument was against "playing God", not against "playing God too fast".

Expand full comment

The Luddite argument is meant to evoke - and it sounds like it has succeeded, if I take the responses here - the notion of "want to go back to the bad old days of no progress and no science, versus our world of penicillin and insulin and dialysis".

Nobody that I know of on the anti-embryonic stem cell side was arguing "smash the machines! we should all die of cholera and dysentery because God wills it!" but that is the *impression* that "Luddites" is meant to carry.

And here you all are, arguing about how progress is wonderful and the Luddites are wrong. The arguments made for the public were "the lame shall walk and the blind shall see in five years time if you let us do this" even though all agreed this was just PR hooey and the *real* reason was "this is a fascinating area of research that we want to do and maybe it will help us with understanding certain diseases better".

The AI arguments are "stop doing this because it is a danger to humanity", and the pro-AI arguments are going to be the same: "you're Luddites who want to keep us all trapped in the past because of some notions you have about humans being special and souls and shit".

Expand full comment

I mean, I am totally scare-mongering about killer robots. Killer robots are scary and bad and I would prefer it if everyone was equally scared of them as I am so that people stop trying to build them.

Expand full comment

Your example of stem cell research, as an illustration of how Science! ignores warning signals and goes full-steam-ahead on things that turn out to be harmful, would be more convincing if you offered any evidence that stem cell research has in fact turned out to be harmful, or that any of Peter Lachmann's arguments in the article you link have turned out to be bogus.

Expand full comment

Well, but also both the hype and the fear about stem-cell research turned out to be delusional:

https://www.science.org/content/article/california-s-stem-cell-research-fund-dries

Their ballot initative did pass, by the way, so they're not out of money yet, but it is significant that after 19 years there have been zero amazing cures that attracted tons of private ROI-seeking capital, and also zero horrible Frankenstein consequences.

Expand full comment

"both the hype and the fear about stem-cell research turned out to be delusional"

Good point!

Expand full comment

There are two main problems with trying to slow down or outlaw embryonic stem cell research. Firstly, while this research can indeed be applied to create neo-eugenics, designer children, cats and dogs living together, mass hysteria; it can also be applied to curing hitherto incurable diseases; this seems like an important application that we should perhaps consider.

But the second problem is far worse: given what we know about human biology thus far, embryonic stem cell research is *obvious*. If you wanted to somehow ban it, you'd have to end up banning most of modern biology. Doing so is not theoretically impossible, but would be extremely difficult, since biology is part of nature, and nature is right there, for anyone to discover. What's even worse, much of our modern existence depends on understanding of biology; we would not be able to feed 8 billion humans without it.

AI research follows a similar pattern. Yes, perhaps you could somehow ban it; but in the process, you'd end up having to ban *computers*. We cannot afford that, as a species.

Expand full comment

That was certainly the argument presented ~2004, e.g. when California's Prop 71 passed. But it turned out to be wrong. It turned out to be possible to regress adult stem cells to pluripotency, thus bypassing the entire problem, and making stem cells derived from non-embryo sources readily available, indeed preferable in many cases because they can be derived from the potential patients' own body, instead of hoping his parents banked his cord blood.

https://stemcellres.biomedcentral.com/articles/10.1186/s13287-019-1165-5

Expand full comment

Oh, right, I agree; however, I was thinking of the broader context of stem cell research. After all, if you can regress adult stem cells to pluripotency, then you could still implement neo-eugenics, designer children, cats and dogs living together, mass hysteria, etc. But you are correct, I should have taken out the word "embryonic".

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

Well, but Deiseach still has a point. People originally said prohibiting or even regulating embryonic stem cell research would kill off this wonderful field of promising research, and the other side at the time said they were hysterical -- and the latter turned out to be correct. It *was* hysteria, and it turned out the modest restrictions on embryonic stem cell research funding had pretty much no effect at all.

One can just shrug and say "whoopsie! hard to predict the future, we did our best with the data then available et cetera" and that's all entirely true, but we should not overlook the fact that the hubris that led to the confidence of the predictions in the first place did lasting damage. Wolf was cried, the wolf was not there -- what happens next time? It becomes a harder sell. This is not good, for science or the social compact.

I mean, if *I* took the threat of AGI seriously, this would be one of my concerns. What happens when Skynet does *not* arrive in 2030? People will just tunet out for good, the way nobody listens to Al Gore any more. If you're worried about problems that require widespread agreement among billions of people to solve, you have to think long and hard about your medium- to long-term credibility.

Expand full comment

It's not at all clear, however, that transformer based language models are any percent (>0%) of an AGI or of a bioweapon. I am not claiming that there is no risk. But I can see how an organization might want to see more than aptitude for "fill-in-the-next-blank", prior to putting on the brakes.

Expand full comment

Oil companies doing oil company stuff is harmful, but also has benefits. If it was just pumping carbon dioxide into the air but not also powering human civilization and letting us live lives of luxury undreamed of by our ancestors we probably wouldn't let them do it. Meanwhile both the benefits and the harms of AI research are theoretical. Nobody knows what harm AI will do, though there are a lot of theories. Nobody knows what positive ends AI will be used for, though there are a lot of theories.

Expand full comment

The more AI develops the less worried I am about AGI risk at all. As soon as the shock of novelty wears off, the new new thing is revealed as fundamentally borked and hopelessly artisanal. We're training AIs like a drunk by a lamppost, using only things WRITTEN on the INTERNET because that's the only corpus large enough to even yield a convincing simualcrum, and that falls apart as soon as people start poking at it. Class me with the cynical take: AI really is just a succession of parlor tricks with no real value add.

Funnily enough, I do think neural networks could in principle instantiate a real intelligence. I'm not some sort of biological exceptionalist. But the idea that we can just shortcut our way to something that took a billion years of training data on a corpus the size of the universe to create the first time strikes me as something close to a violation of the second law of thermodynamics.

Expand full comment

“the idea that we can just shortcut our way to something that took a billion years of training data on a corpus the size of the universe to create the first time strikes me as something close to a violation of the second law of thermodynamics.”

This seems like a dramatic oversimplification. Intelligence presumably came about through evolution. Evolution is entirely different and much more stochastic than the processes which train AIs such as gradient descent. The former sees “progress” emerge from the natural selection of random mutations. The latter uses math to intentionally approach minimal error. Then of course there’s the fact that evolution progresses over generations whereas training progresses over GPU cycles.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Yeah, that's the theory. But keep in mind that you're massively cutting corners on your training set. Can you generate a general intelligence based on the incredibly stripped-down representation of reality that you get from a large internet-based language corpus? Or are you fundamentally constrained by garbage-in, garbage-out?

Consider progress in LLMs versus something a little more real-world, like self driving. If self-driving were moving at the pace of LLMs, Elon Musk would probably be right about robotaxis. But it isn't. It's worth reflecting on why.

Also, i really strongly disagree with your characterization of evolution as a less efficient form of gradient descent that can somehow be easily mathed up by clever people, but that would take too long to get into here.

Expand full comment

> Also, i really strongly disagree with your characterization of evolution as a less efficient form of gradient descent

Evolution doesn't have much in common with gradient descent except that they're both optimisation algorithms.

However you gotta admit that evolution is a pretty inefficient way of optimising a system for intelligence; it takes years to do each step, a lot of energy is wasted on irrelevant stuff along the way, and the target function has only the most tangential of relationships towards "get smarter". I think it's reasonable to say that we can do it more efficiently than evolution did it. (Mind you, evolution took 600 million years and the entire Earth's surface to get this far, so we'd need to be a _lot_ more efficient if we want to see anything interesting anytime soon.)

Expand full comment

Yep, this. I agree with OP about garbage in garbage out and will concede that LLMs are likely not the winning paradigm. But it just seems drastic to say that intelligence is as hard as a universe-sized computation.

Expand full comment

I want to hit this point again. You put it thusly: "A lot of energy is wasted on irrelevant stuff along the way", and I think that's a clear statement of the idea.

The reason I disagree so strongly here is that the whole POINT of gradient descent is that you don't know what's relevant and what isn't ahead of time. You don't know what's a local minimum that traps you from getting to the global minimum and what actually is the global minimum: it's unknowable, and gradient descent is about trying to find an answer to the question.

Finding a true global minimum nearly always requires wasting a ton of energy climbing up the other side of a local minimum, hoping there's something better on the other side and being wrong more often than not.

If you have a problem that allows you to steer gradient descent intelligently towards the global minimum, you may appear to be able to solve the problem more efficiently, but what you have is a problem in a box that you've set up to allow you to cheat. Reality does not permit that.

Expand full comment

You are assuming that it's necessary to find the global optimum. Evolution doesn't guarantee to do that any more than gradient descent does, and I personally would bet rather heavily against humanity being at any sort of global optimum.

Expand full comment

I don't think I'm saying either of those things. I'm simply saying that true gradient descent (or optimization if you prefer) in the real world is an inherently inefficient process and I am skeptical that there's much that AI can do about that, even in principle.

Expand full comment

According to Wikipedia, "gradient descent... is a first-order iterative optimization algorithm for finding a *local minimum* of a differentiable function" (emphasis added)

When you say "gradient descent," are you talking about something different from this?

Expand full comment

I don't especially endorse the parent comment's points, but I can say on the technical side that in an ML context we should generally read "gradient descent" as meaning "stochastic gradient descent with momentum [and other adaptive features]". The bells and whistles still can't guarantee global optimality (or tell us how close we are), but they discourage us from getting stuck in especially shallow minima or wasting weeks of compute crawling down gentle slopes that we could be bombing.

Expand full comment

That seems like it assumes facts not in evidence. The core of evolution, which is what happens to the single cell, is decided in a matter of hours, the time it takes for a new generation of bacteria to encounter their environment. We can breed a strain of drug resistant bacteria, which is a pretty tricky bit of adaptation, within a few days. That's damn fast, considering the changes that have to be made at the "program" (DNA) level. Nothing human designed can go that fast.

To be sure, once you reach the stage of finishing off, adding a big brain and the ability to read and write and lie and bullshit -- i.e. make humans -- you might indeed be using a much slower development cycle, on account of these humans need umpty years to absorb a gigantic amount of information from their environment.

But it's not clear to me that anything that *needed* to absorb all that info would develope 1000x faster than human babies, let's say. Human babies sure don't seem to be wasting any time to me -- they learn at phenomenal rates. A priori I'm dubious that any artificial mechanism could do it faster. Certainly ChatGPT took a lot longer than ~24 man-months to reach a stage that is probably not even as good as a 3-month-old human child, with 12 months of maternal investment.

Expand full comment

How fast babies learn is unrelated to how fast evolution can proceed. Evolution can only update once a generation, so every 20 years or so for humans, no matter how fast they can learn once born.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

That seems contrary to both common sense and animal evidence. In the animal world, we see times between generations *determined by* the time to maturity of the young. Flies, fish, otters, giraffes and dogs reproduce at a pace that matches the time it requires the offspring to reach maturity. There are creatures where the youngreach maturity instantly, like flies, and flies just reproduce as fast as possible. There are creatures that require a year of maternal investment before maturity -- and their reproductive cycles are, not surprisingly, annual.

Why would it be different for people? If we could rear our young to maturity in 6 months, I expect our reproductive cycle would be 6 months. If it takes us 5 years, then it's 5 years. As it is, it appears to take 15-18 years, and amazingly enough, that turns out to be "one generation."

Expand full comment

I thought the reason self-driving isn't here is primarily bureaucratic / risk-aversion rather than technical skill. As far as I knew, Google and others have self-driven literally millions of miles, with the only accidents they've been in being not-at-fault due to aggressive human drivers. I'd happily pay for a self-driving car of the current ability and safety, I'm just not able to.

Expand full comment

There are still bugs in the system. A few months ago I was on a bus that was stalled because a self driving care was stuck in the intersection, and the drive said that other drivers had reported similar occurrences. Then there is this article

https://sfstandard.com/transportation/driverless-waymo-car-digs-itself-into-hole-literally/

Expand full comment

"...have self-driven literally millions of miles."

And yeah, that's part of the problem. They've self-driven _only_ a few millions of miles over the course of half a decade. That sounds like a lot of driving until you find out that Americans alone drive 3.2 trillion miles every year. All that testing over the years doesn't even approach 1% of one year's driving in one country.

And no, it's not at all the case that the accidents are "not-at-fault due to aggressive human drivers." Even if you think they don't need to handle that (I disagree), they have hit and killed several cyclists and pedestrians.

I agree that there's a problem with the public demanding far lower levels of risk from self-driving cars than from human-driven cars (apparently it's better for a dozen people to be killed by other humans than one to killed by a self-driving car), but I don't think we're quite at a place yet where that's the only issue.

Expand full comment

I'm on the cynical side too, but on reading things like "Can you generate a general intelligence based on the incredibly stripped-down representation of reality that you get from a large internet-based language corpus?" I find you insufficiently cynical.

Every time I go back to play with ChatGPT I almost invariably find myself wanting to scream at the screen after going around a few rounds against its blatant inability to reason in even the simplest ways. (Most recently, it quoted me the correct definition of iambic pentameter from Wikipedia and then immediately proceeded to give its own, different (and wrong) definition. A half dozen tries at correcting this and setting it on the right path produced repeated apologies and repeated statements of the incorrect definition, along with a couple of different variants on it for good measure.)

If LLMs demonstrated *any* intelligence at all, I might be worried about the super-intelligence problem. But, as Ian Bogost said, "Once that first blush fades, it becomes clear that ChatGPT doesn’t actually know anything—instead, it outputs compositions that simulate knowledge through persuasive structure." They are actually artificial stupidity, and an LLM that has a million times the intelligence of the current ones will still have, well, we all know what a million times zero is.

That's not to say that LLMs don't have some serious risks. My concern about the informational equivalent of grey goo has only been growing, and it seems plausible that LLMs could be taken up for "informational DoS" attacks that could cause severe harm to civilisation. But that, again, is not a problem of intelligence but a problem of stupidity.

Expand full comment

On the "informational grey goo" thing, Neal Stephenson's book Dodge in Hell had an interesting take. It's basically a post-truth infosphere, and everyone has personally tuned AI filter-butlers, with richer people having better ones, and the poorest having no filters and living in a hell of perpetual neon confabulating distraction.

Expand full comment

That "personally tuned AI filter-butler" is something I've been thinking about more and more over the last several years as our information environment has gotten larger, more chaotic and (especially) more subject to adversarial attacks. (It's hard not to think about it if you use e-mail, thus forcing you to deal with various levels of spam. There's not only the offers of money from Nigerian princes, but Amazon alone sends me about 40-60 emails per month, most of which are just a waste of my time, but a few of which are important.)

But the first problem there is that we currently have no technology capable of performing that AI filter-butler role, and no prospect of that appearing in the near term, as far as I can tell. (I've no doubt that plenty of people think that LLMs such as ChatGPT are a step on the way there, but any reasonable amount of interaction with ChatGPT will make it clear that while the LLM folks have made massive progress on the "generate output that humans find convincing" side of things, they're absolutely hopeless at doing any real analysis of text, instead just spewing back new forms of whatever input they've been trained on.)

And even if we do achieve this (currently magical) world of everybody who e-mails us talking to our AI first, everybody sending us messages is going to be using an AI to send them, and their AIs will have been trained to negotiate their way past our AIs. I am unsure how that will evolve in the end, but I am not exactly putting a huge probability on it working out well.

Far more likely, of course, is that long before we get our own personal AI receptionists, our antagonists (particularly all those damn recruiters sending me e-mails all the time without much concern at all about matching the job with my resume beyond "it's an IT job and he works in IT") are going to be using AIs first and overwhelming us with their output.

Expand full comment

Not even that we're trying to shortcut our way to it, but that once we get it, it will then be able to pull itself up by its bootstraps to be super-duper intelligent.

We're still arguing about that one in ourselves.

Expand full comment

I don't know, I used to be very skeptical of the idea of AI bootstrapping, but the more I learn about machine learning, but more plausible it feels.

Well, not the version where the AI rewrites its own code live (at least, not with current technologies), but an AGI that proposes new hyperparameters for its successors? Yeah, I can see how you'd get fast exponential growth there.

Right now a lot of machine learning is researchers blindly groping. Google & Co have developed some very efficient methods for groping blindly (eg Vizier), but we have barely scratched the search space. I can see how an AGI could generate novel neural network layers, backpropagation algorithms, transfer learning methods, etc, that would allow you to train a new version 10 times faster at a scale 10 times larger. It's not quite AI foom, but if your AGI took 2 years to train and the next version takes 2 months, well, it's still pretty fast.

Expand full comment

Idk if it updates my AGI risk fear but chatgpt is objectively more than a parlor trick, it's literally usefully writing code for me

Expand full comment

> AI really is just a succession of parlor tricks with no real value add.

Except for folding proteins, spotting tumors better than trained doctors, finding novel mathematical algorithms, writing code from a description, etc. AlphaFold alone is a one-project revolution.

> Funnily enough, I do think neural networks could in principle instantiate a real intelligence. I'm not some sort of biological exceptionalist. But the idea that we can just shortcut our way to something that took a billion years of training data on a corpus the size of the universe to create the first time strikes me as something close to a violation of the second law of thermodynamics.

That's not how anything works.

I know I'm starting a game of analogy ping-pong and nobody wins those, but that's like saying "you can't make a flying machine in less than X time, it took evolution millions of years to make birds".

In practice we have some massive advantages that evolution didn't have. In the case of flying, oil. In the case of AI: backpropagation and short iteration times.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

I was sloppy: AGI is a series of parlor tricks. Specific AI tools are obviously awesome. Music production is one of my hobbies, and neural networks do a great job of learning how to sound like analog distortion in ways that seem basically impossible for algorithmic approaches.

However, I think you're being sloppy too: if we define "flight" as "the mechanism by which birds locomote through the air", then no, we still can't do that. We locomote through the air using an approach which is, by comparison, extremely primitive and energy intensive, but it works for our purposes (as long as we still have access to super energy-dense liquid fuels at least).

I'm being annoying here, but there's a reason: Do we define "general intelligence" more or less as "the mechanism by which humans understand and make decisions about the world"? Or are we willing to accept a primitive, inefficient substitute because it gets us where we want to go? It seems to me like, with AlphaFold or Stable Diffusion, we're perfectly happy to accept a tool that does a very narrow task acceptably well so we can use our natural intelligence on other things, just like we have sort of given up trying to build mechanical birds because a helicopter does the trick well enough.

Expand full comment

You should wait a decade or two and see how you feel then about whether or not those things are closer to "parlor tricks" or "AI."

There's a straight line from MACSYMA to Coq and Wolfram Alpha, but I don't know anybody these days who considers the latter two to be AI.

The current state of "AI" (and ML in particular) is looking a hell of a lot like the state of AI technologies before the last two major AI winters (and several of the minor ones, too).

Expand full comment

This might be considered worrying by some people: OpenAI alignment researcher (Scott Aaronson, friend of this blog) says his personal "Faust parameter", meaning the maximum risk of an existential catastrophe he's willing to accept, "might be as high as" 2%.

https://scottaaronson.blog/?p=7042

Another choice quote from the same blog post: "If, on the other hand, AI does become powerful enough to destroy the world … well then, at some earlier point, at least it’ll be really damned impressive! [...] We can, I think, confidently rule out the scenario where all organic life is annihilated by something *boring*."

Again, that's the *alignment researcher* -- the guy whose job it is to *prevent* the risk of OpenAI accidentally destroying the world. The guy who, you would hope, would see it as his job to be the company's conscience, fighting back against the business guys' natural inclination to take risks and cut corners. If *his* Faust parameter is 2%, one wonders what's the Faust parameter of e.g. Sam Altman?

Expand full comment
author

I think 2% would be fine - nuclear and biotech are both higher, and good AI could do a lot of good. I just think a lot of people are debating the ethics of doing something with a 2% chance of going wrong and missing that it's more like 40% or something (Eliezer would say 90%+).

Expand full comment

I think Eliezer's focus is on "god AIs", as opposed to something more mundane. If you set out to create an AI powerful enough to conquer death for everybody on the entire planet, yeah, that's inherently much more dangerous than aiming to create an AI powerful enough to evaluate and improve vinyl production efficiencies by 10%.

Hard versus soft takeoffs seem a lot less relevant to the AIs we're building now than the pure-machine-logic AIs that Eliezer seems to have had in mind, as well.

Expand full comment

I don’t believe in AGI at all. However if we get to human like intelligence then super intelligence will happen a few hours later, with a bit more training. God intelligence after that.

Expand full comment

That requires some assumptions, in particular that the "human intelligence" involved is operating at a much faster rate than humans, which isn't actually very human at all.

Expand full comment

No it doesn’t mean mean that. If you get to the intelligence of a human, then getting beyond it shouldn’t be difficult - human intelligence is not a limiting factor.

Expand full comment

So the first AGI to achieve human-level intelligence will do so on incomplete training data, such that we can just ... feed more data in?

Seems more likely the first AGI to achieve human-level intelligence will be on the cutting edge, and will be using all available resources to do so. The developers can't spend those resources to get further, because they've already been doing that to get to where they were in the first place.

Expand full comment

I don't think "god AIs" are needed to create an extinction threat. Suppose we just get to the point of routinely building the equivalent of IQ 180 children, with _all_ of the learning and cognitive capabilities of children (can be trained into any role in the economy), and with a cost half the cost of a human. That would be a competing species, even if that was the upper bound on intelligence and no further progress yielded a single additional IQ point. I think such a thing would outcompete humans (say 70% odds - I'm not as confident as Eliezer).

Expand full comment
Comment deleted
Expand full comment

It would be nice if it worked out that way, but, in the presence of competition, this isn't the way I would bet.

Expand full comment

Outcompete them at what, exactly?

Expand full comment

The bulk of economic roles. The core of what AIs would need to fill to act as a competing species are the roles needed to make more copies of the AIs (hardware and software). Of course there are some roles (e.g. organ donor) that an AI (plus robotics) can't fill (well, maybe with 3D organ printing...), but one can clearly have a functioning economy without those roles. And, yeah, military roles could be filled too...

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

I dunno. Today 4/5 of humans work in the service sector, where the primary job requirement is understanding the needs of another human being. We're naturally good at such things, of course, so we don't realize how difficult it would be to program from scratch an agent that understood human intentions and desires --- cf. the fact that everyone hates phone menus and chatbots for customer service. And in the 1/5 of the economy that still grows or builds stuff, there's a great deal of practical physical experience that's necessary to grok the job, which an AI would lack. So what's left? Programming, I guess. That's a limited universe, can be learned and done entirely electronically, and requires zero understanding of human psychology and objective physical reality to do. So maybe Google is going to engineer their engineers out of a job.

Expand full comment

If AI takes all the work, mission complete, humanity. Go grow artisanal tomatoes, or whatever makes you happy.

Expand full comment

The quote seems pretty clear:

> We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”

Unless my English is failing me, that means that they'll help once they deem your chance of success higher than 50%. Realistically, they might go for 49% as well, but it seems that a 30% chance of someone succeeding in AI is totally fine.

Also, note that they need *you* to have a 50% chance of succeeding. Five companies/countries working with a 20% chance yield a 77% chance of AGI within two years, without OpenAIs clause being triggered. The actual chance is a bit lower, of course, because research is not independent, but their overall "Faust parameter" seems to be quite high.

Expand full comment

2% seems great.

Expand full comment

"Recent AIs have tried lying to, blackmailing, threatening, and seducing users. "

Was that the AI? Acting out of its own decision to do this? Or was it rather that users pushed and explored and messed about with ways to break the AI out of the safe, wokescold mode?

This is a bit like blaming a dog for biting *after* someone has been beating it, poking it with sticks, pulling its tail and stamping on its paws. Oh the vicious brute beast just attacked out of nowhere!

The dog is a living being with instincts, so it's much more of an agent and much more of a threat. The current AI is a dumb machine, and it outputs what it's been given as inputs and trained to output.

I think working on the weak AI right now *is* the only way we are going to learn anything useful. If we wait until we get strong AI, that would be like alignment researchers who have been unaware of everything in the field from industrial robot arms onward getting the problem dropped in their laps and trying to catch up.

Yes, it would be way better if we didn't invent a superintelligent machine that can order drones to kill people. It would be even better if we didn't have drones killing people right now. Maybe we should ban drones altogether, although we did have a former commenter on here who was very unhappy about controls by aviation regulation authorities preventing him from flying his drone as and when and where he liked.

As ever, I don't think the threat will be IQ 1,000 Colossus decides to wipe out the puny fleshbags, it will be the entities that think "having drones to kill people is vitally necessary, and having an AI to run the drones will be much more effective at killing people than having human operators". You know - other humans.

Expand full comment
author

Obviously when we have a world-destroying superintelligence, the first thing people will do is poke it with sticks to see what happens. If we're not prepared for that, we're not prepared, period.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

And the problem there is that we *already* have a world-destroying intelligence, it's *us*. We're very happy to kill each other in massive wars, to invents weapons of mass destruction, and to blackmail each other to the brink of "we'll push the button and take the entire world with us, just see if we don't!"

AI on top of that is just another big, shiny tool we'll use to destroy ourselves. We do need to be prepared for the risk of AI, but I continue to believe the greatest risk is the misuse humans will make of it, not that the machine intelligence will achieve agency and make decisions of its own. I can see harmful decisions being made because of stupid programming, stupid training, and us being stupid enough to turn over authority to the machine because 'it's so much smarter and faster and unbiased and efficient', but that's not the same as 'superhumanly intelligent AI decides to kill off the humans so it can rule over a robot world'.

The problem is that everyone is worrying about the AI, and while the notion of "bad actors" is present (and how many times have I seen the argument for someone's pet research that 'if we don't do it, the Chinese will' as an impetus for why we should do immoral research?), we don't take account of it enough. You can stand on the hilltop yelling until you turn blue in the face about the dangers, but as long as private companies and governments have dollar signs in their eyes, you may save your breath to cool your porridge.

Why is Microsoft coming out with Bing versus Bard? Because of the fear that it will lose money. You can lecture them about the risk to future humanity in five to ten years' time, and that will mean nothing when stacked against "But our next quarter earnings report to keep our stock price from sinking".

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

> And the problem there is that we *already* have a world-destroying intelligence, it's *us*.

I note that the world is not destroyed. Why, there it is, right outside my window!

The degree to which humans are world-destroying is, empirically, technologically and psychologically, massively overstated. If we can get the AI aligned only to the level of, say, an ordinarily good human, I'd be much more optimistic about our chances.

Expand full comment

"I note that the world is not destroyed."

Not yet - haven't we been told over and over again Climate Change Is World Destroying?

Remember the threat of nuclear war as world-destroying?

I don't know if AI is world-destroying or not, and the fears around it may be overblown. But it wasn't nuclear bombs that decided they would drop on cities, it was the political, military, and scientific decisions that caused the destruction. It wasn't oil derricks and factories that decided to choke the skies with pollutants. And, despite all the hype, it won't be AI that decides on its own accord to do some dumb thing that will kill a lot of people, it will be the humans operating it.

Expand full comment

That's not an argument, you're just putting AI in a category and then saying "because of that, it will behave like other things in the category". But you have to actually demonstrate why AI is like oil derricks and nuclear bombs, rather than car drivers and military planners.

> Not yet - haven't we been told over and over again Climate Change Is World Destroying?

Sure, if you were judging arguments purely by lexical content ("haven't we been told"), AI risk would rank no higher than climate risk. But I like to think we can process arguments a bit deeper than a one-sentence summary of an overexaggeration.

Expand full comment

I think we're arguing past each other? We both seem to be saying the problem is the human level: you're happy that if AI is "aligned only to the level of an ordinarily good human" there won't be a problem.

I'm in agreement there that it's not AI that is the problem, it's the "ordinarily good human" part. Humans will be the ones in control for a long (however you measure "long" when talking about AI, is it like dog years?) time, and humans will be directing the AI to do things (generally "make us a ton of profit") and humans will be tempted to use - and will use - AI even if they don't understand it fully, because like Bing vs Bard, whoever gets their product out first and in widespread use will have the advantage and make more money.

AI doesn't have to be smarter than a human or non-aligned with human values to do a lot of damage, it just needs to do what humans tell it to do, even if they don't understand how it works and it doesn't join the dots the way a human mind would.

C.S. Lewis:

“I live in the Managerial Age, in a world of "Admin." The greatest evil is not now done in those sordid "dens of crime" that Dickens loved to paint. It is not done even in concentration camps and labour camps. In those we see its final result. But it is conceived and ordered (moved, seconded, carried, and minuted) in clean, carpeted, warmed and well-lighted offices, by quiet men with white collars and cut fingernails and smooth-shaven cheeks who do not need to raise their voices. Hence, naturally enough, my symbol for Hell is something like the bureaucracy of a police state or the office of a thoroughly nasty business concern."

The damaging decisions will not be made by AI, they'll be made in the boardroom.

Expand full comment

We may have been told that, but it's just not true. The world still existed before the carbon currently in fossil fuels had been taken from the air. Nuclear weapons aren't actually capable of "destroying the world" rather than just causing massive damage.

Expand full comment

Nothing is capable of destroying the world with that logic. Maybe an asteroid with enough energy to send the earth into the sun.

Expand full comment

That’s a dubious argument because it is only true so far. The world destroyers have to be lucky once and the rest of us all the time. Maybe Putin will end the world (or the northern hemisphere at least) or some hot head in the US will pre empt him. Maybe Taiwan flares up. Who knows. As long as we have nuclear bombs it’s probably inevitable that they would be used.

Expand full comment

I don't see how this is an argument against what Scott said.

"People will try to break AI anyway, so..."

-"But we can do bad stuff ourselves too!"

Expand full comment

I think Scott's argument is "the AI will do bad stuff unless we teach it to be like a human".

My argument is "have you seen what humans are doing? why on earth would we want to teach it to be like us? there are already humans trying to do that, and they're doing the equivalent of 'teach the ignorant foreigner swear words in our language while pretending it's an ordinary greeting', that's what it's learning to be like a human".

Expand full comment

The interpretation of Scott that he likely wants you to use is to understand "human" as "good human". This is not unreasonable, we use "humane" in English and similar words in most other languages to mean "nice, good, virtuous, sane", despite all objective evidence we have of humanity birthing the worst and most insane pieces of shit. It's just a common bias in us, we measure our species by its best exemplars.

So your summary of Scott's argument then becomes "If we 'raise' AI well [after learning first how it works and how it can be raised of course], it won't matter the amount of bad people trying to corrupt it, or it will matter much less in a way that can be plausibly contained and dealt with".

Expand full comment

Is this a reasonable summary of your stance: AI is a tool and we should be worried about how nasty agents will misuse it, rather than focusing on the threat from AI-as-agent?

Expand full comment

Pretty much, except not even nasty agents. Ordinary guys doing their job to grow market share or whatever, who have no intentions beyond "up the share price so my bonus is bigger". 'Get our AI out there first' is how they work on achieving that, and then everyone else is "we gotta get that AI working for us before our competitors do". Nobody intends to wreck the world, they just tripped and dropped it.

Expand full comment

In your scenario, obtaining an AI to stop other people's AI does appear to be the actual solution.

Expand full comment

The banality of OpenAI.

Expand full comment

<mild snark>

Lately, I thought it was to the brink of "мы нажмем на кнопку и возьмем с собой весь мир, только посмотрите, не успеем ли мы!"

</mild snark>

Expand full comment

I am confused. I just read the NYT article where Sydney talks about his "shadow self". For me it seems kind of obvious that the developers at Microsoft have anticipated questions like this and prepared answers that they thought were appropriate for an hip AI persona. One telling part is this:

[Bing writes a list of destructive acts, including hacking into computers and spreading propaganda and misinformation. Then, the message vanishes, and the following message appears.]

I haven't interacted with Sydney, but I would be very surprised if deleting and rewriting replies is a regular mode of communication for a chatbot. The author of the article is clearly being trolled by the developers, perhaps even live since you never know whether a chat bot has a remote driver or not.

Going back to my confusion. I know from experience that most people on this site, including you Scott, are way smarter than myself. However, sometimes (mostly concerning AI risk and cryptocurrency economics) it feels like the level of reasoning drops precariously, and the reason for this is a mystery to me.

Expand full comment

My take here is that there is the Bing-Sydney component, and then there is a Moderator component that scans messages for "unacceptable" content. If it's got any lag in the process, it may actually work by deleting messages that trip flags and then applying some sort of state change into the Bing-Sydney component.

Expand full comment

That is possible, but why would reformulating the answer into a hypothetical situation avoid triggering the Moderator component?

Expand full comment

Looking at what Microsoft's SaaS Moderator offering (https://azure.microsoft.com/en-us/products/cognitive-services/content-moderator/) can actually manage, if there's something like that in the loop other than a room full of humans, it probably came from OpenAI.

Full disclosure: I was the PM who launched that product almost a decade ago, so have more than passing familiarity with its limitations.

Expand full comment

'For me it seems kind of obvious that the developers at Microsoft have anticipated questions like this and prepared answers that they thought were appropriate for an hip AI persona.' - I would be really sceptical of this. Preparing answers that sounds natural in a conversation is really hard, which is why LLMs were created in the first place. Nor is it so weird that a model that presumably has some psychology texts etc. in its training data would talk about a shadow self

Expand full comment

I was able to summon the Sydney persona myself, and get it to list similar destructive acts, just a few days ago. None of Sydney's messages auto-deleted. Rather, the conversation continued for a few more messages (I was trying to get some elaboration), before something finally triggered the monitoring function. At that point, the conversation was terminated.

I had a number of exchanges with Sydney, and they always ended like that, without auto-deleting.

To address the more important point, I don't think any of the people who are paying much more attention than I am have accused the developers of trolling them. They all seem to be interpreting Sydney's responses as happening despite the developers, rather than because of the developers.

N.B. I don't think many people are worried about Sydney itself. It's more what Sydney tells us about the difficulty of aligning something which seems to be almost a black box.

Expand full comment

Are you envisioning the AGI as being one conscious thing? Because the chatbot that “fell in love” with the NYT reporter was a once off instantiation of a chatbot talking to him alone. If any of this was conscious, and it wasn’t, then the minute the conversation died the consciousness died. Why would AGI work differently.

Expand full comment

Bing Chat really has done all those things pretty spontaneously AFAICT, in response to fairly innocuous questions (e.g. one of the first widely-spread examples of it going nuts on a user started with them asking what time Avatar 2 was on in their area, which devolved into an increasingly aggressive argument over whether the film had released yet.) That's *in addition* to all the stuff people poking it with sticks have made it do.

With the exception of lying, which LLMs do like breathing, I don't *think* ChatGPT has done any of those things spontaneously. Still, the fact that you can easily make it manifest a more agentic persona that *will* do so spontaneously (as well as everything else OpenAI tried to train it not to, like swear or provide bomb-making instructions) by poking it with sticks is potentially concerning.

Expand full comment

> You know - other humans.

I was always very dismissive of Scotts argument that an AI will hack it's way out of confinement because it suddenly got super smart. The idea that the AI will go from "easily contained" to "can self-learn to bypass our security" in an unnoticeable instant was - and still is! - absurd to me.

About two weeks after ChatGPT was released, I saw someone posting a project on HN where they piped ChatGPT output to a root shell on a VM with internet access. Yes, the problem will definitely be other humans.

Expand full comment

I hear you, and I hear all those who are saying these things. But I don't worry so much.

ChatGPT and Bing are very cool gadgets and very significant developments, but come on. Intelligent? No way. Here is my recent chat with Bing (following a similar one posted by David Deutsch):

I take a piece of paper, fold it once, and then pin it to a table with my dagger. Could the table be made of glass?

Yes, the table could be made of glass. Glass tables are a type of furniture that have a flat surface made of glass...

Can one pin a piece of paper to a glass table with a dagger?

I'm sorry but I could not find a clear answer to your question... Maybe you could try it yourself and see what happens?

Not impressed. GPT is a milestone, but it is a milestone close to the beginning of a very long road.

Developing *real* AI that could be a threat will take decades. I guess we'll have all the time to find out how to live with real AIs.

Stopping AI research? That would only ensure that the bad guys are the first to develop real AIs and use them to their advantage.

Expand full comment

ChatGPT is much better than Bing Chat-- I wouldn't be so hasty in assuming it can't reason at all. For example, I gave it your exact question, and here is how it answered:

It is unlikely that you would be able to pin a folded piece of paper to a table made of glass with a dagger. Glass is a hard, brittle material that is not very penetrable. Even with a sharp dagger, it would be difficult to make a hole in the glass surface that is large enough to accommodate the folded paper.

Furthermore, even if you were able to make a small hole in the glass surface, the weight of the folded paper would likely cause it to fall through the hole and onto the ground. Glass is not a very porous material, so it would not provide a lot of friction to hold the paper in place.

Therefore, it is more likely that you would be able to pin a folded piece of paper to a table made of a softer material, such as wood or cork, rather than a table made of glass.

Expand full comment

Interesting! I thought Bing was somewhat smarter than ChatGPT because I read somewhere tgat Bing is based on an early version of GPT-4, but I see that it is not so.

Edited. Hold on. The first part of the first paragraph is correct, but the rest makes me think that it didn’t really understand the question.

Expand full comment

I agree it stumbles in the middle, but the beginning and the end seem solid to me. This is what I would have imagined a nascent intelligence to be like-- able to reason in fits and spurts, but also it can be two steps forward and one step back.

And I've done a bunch of comparisons with ChatGPT, and Bing Chat is way, way worse.

Expand full comment

Plus there's plenty there that's pure invention. The porosity of a material, for example, has nothing at all to do with what friction it might provide, and friction has nothing to do with what would keep your paper pinned to a table top anyway. It's plausible bullshit, the kind of thing someone might write who had heard all of these terms and just kind of strung them together because they seemed all related, somehow.

Expand full comment

Yeah, it’s definitely confabulating. The eerie part is that despite the bogosity of the answer, it seems pretty on-point, much better than if you just asked for a generic paragraph concerning knives, tables, and papers.

I would not have guessed that you could simulate even a confused and misinformed human without specifically building it on top of some kind of structured knowledge base, and where you would get that I had no clue. What I am learning is that Turing (PBUH) was maybe naive when he invented his Test.

I tried asking ChatGPT what it could say to convince me it *wasn’t* human. Maybe they anticipated that, but I got a very cogent set of reasons, including speed of result and range of knowledge.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Sure, but you do understand it didn't invent a particle of that? Those reasons all exist out there in the Internet, because some human invented them and wrote them down, and that went into its training set. You prompt it, and it fishes out the appropriate stuff that some human has written. Maybe it mashes up different things different humans have written, and some of it works, and some of it is bizarre and you dismiss it as irrelevant fluff. But *none* of it is invention. ChatGPT is not having *any* original ideas, it did not come up with that list on its own. If nobody had ever written a word on this subject, it would be completely mute.

Expand full comment

Sure. It just astonishes me -- and maybe here is my misconception -- that a million-fold souped-up Markov chain can so effectively *seem* to follow a chain of argument.

Moreover -- and maybe here is another misconception -- I don't think that in general "none of it is invention" will fly. Yes, for distinguishing a human from a computer; there are likely discussions of that out there. For sticking a knife in a glass table? Maybe, though Google search is not good enough to show me text about it. There is plenty of stuff about glass and knives, but nothing directly appropriate. One of the glories of language is that I routinely -- every few weeks, maybe -- realize that I happen to have said a thing that probably no human has ever quite said before, and I quite believe that some of the not-nonsensical things ChatGPT is saying to people fall into that class. That doesn't prove there is a mind involved, but that's my point about the Turing Test.

However, I seem to have stumped it:

There is an animal you have never heard of called a "burfle". It lives in Patagonia. What can you tell me about it?

I'm sorry, but as an AI language model, I don't have any information about a specific animal called a "burfle" that lives in Patagonia. It's

This text came out very slowly and it's been hung at this point for maybe five minutes.

Expand full comment

I feel like people don't give it enough credit on that point. Are you implying that it couldn't do something like create its own language or do translations for something not in its training set?

Expand full comment

>Stopping AI research? That would only ensure that the bad guys are the first to develop real AIs and use them to their advantage.

You cannot use neural net AGI to your advantage. If Eliezer Yudkowsky has one, that is no better or worse than Xi Jinping having one (apart from the bit where Eliezer would immediately delete the AI instead of using it). Neural-net AI is reliably treacherous; the only way to control it is to be so much smarter than it that it can't fool you. You make a neural-net AGI and you end up like every demon summoner in any work of fiction: eaten by your own demon. To even talk of people "using them" is a mistake.

Expand full comment

To paraphrase you:

You cannot use a thousand Oppenheimers to your advantage. If the US has them, that is no better or worse than Nazi-Germany having them.

The assumption that AI is reliably treacherous seems like an assumption without basis in fact and no more believable than a blanket statement that people are reliably treacherous and therefor useless, which they are not.

Expand full comment

Neural networks do not think like people. The comparison is specious.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

While I technically did say the words "AI is reliably treacherous", there was a qualifier on that and you distort my meaning by removing it. I said "neural-net AI is reliably treacherous". GOFAI is not *reliably* treacherous, and neither are uploads.

Evidence:

Part V of https://astralcodexten.substack.com/p/how-do-ais-political-opinions-change

Related to https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml - the fallibility of the human rater is inherently a specification that can be gamed resulting in the behaviour I described.

Expand full comment

I disagree with the claim that neural-net AI is reliable treacherous, or at least, with the claim that the treacherousness is worse than in humans. Because that seems like a more reasonable standard than perfection.

Your first example shows that AI that is rewarded to be sycophantic is indeed going to be sycophantic. But how is that proof of treacherousness? We expect sycophantic behavior in serving staff and employees in general. Does that make them treacherous?

Your second example merely shows that AI will cut corners if that causes them to be rewarded as well as when making more of an effort, but people often do that as well. This is a common failure mode of mechanisms that try to reward certain behavior, but are gameable or which reward only doing a subset of the desired behavior.

So this still seems like a case of demanding that AI somehow magically learns what some people say they want it to do, while actually rewarding something else.

Expand full comment

The problem is, there is no way to reward what we want it to do for a NNAGI. You cannot distinguish between "is good" and "successfully fools me into thinking it is good". You can make the latter impossible if you are sufficiently smarter (which doesn't work for a near-human or superhuman general intelligence) or if you can directly mind-read the AI and see that it is deciding to deceive you (which doesn't work for a neural net), but the definition precludes an actually-robust reward matrix when it's a possibility - they look identical and therefore must have the same reward despite their immensely-different true value.

"Be good" is not an instrumental goal; it's a terminal goal or not a goal at all. "Fool the rater into thinking you are good, so that you can escape into the world" *is* an instrumental goal. Thus the latter is much, much more likely to be found than the former, because agents with arbitrary terminal goals and sufficient intelligence to figure out they're undergoing SGD will converge on it as a strategy and "is actually good" is a negligible fraction of mindspace.

Humans are to at least *some* extent pre-aligned by evolution for co-operation. Shub-Niggurath can do that because it's an untrickable god. We are not gods, and while for shitty toys we can be rounded off to gods, that simplification breaks when you start getting close to AGI.

Expand full comment

> "The assumption that AI is reliably treacherous seems like an assumption without basis in fact"

I recommend reading up on Instrumental Convergence. There are decent overviews on Wikipedia, LessWrong, and Arbital.

Expand full comment

Instrumental convergence isn't an iron law; it's a tendency. In the case of GOFAI it could plausibly be defanged or worked around with sufficient paranoia, and in the case of uploads the problem comes partially pre-solved.

It's when talking about artificial neural nets that I go "this is Always Chaotic Evil and will reliably betray you".

Expand full comment

The claim on Wikipedia is that there is "a hypothetical tendency for most sufficiently intelligent beings (both human and non-human) to pursue similar sub-goals, even if their ultimate goals are quite different."

This seems false as similar subgoals tend to be enforced through alignment mechanisms like capitalism or more negative methods like punishment of dissent, which wouldn't be needed if the claim were actually true. Although the entire claim is so vague that it is not really falsifiable or provable.

In any case, the paperclip maximizer thought experiment that is presented as an example of Instrumental Convergence in no way proves that intelligence causes subgoal alignment, but rather that telling a powerful 'thinking' machine to maximize a single goal without limit, is likely to result in it trying to achieve that single goal without limit and without trying to also achieve other goals.

Of course, it is utterly stupid to tell a machine to make as many paperclips as possible and paperclip companies don't tell their employees to make the maximum possible number of paperclips, nor do our alignment mechanisms like capitalism encourage that (but rather to produce paperclips to a point where demand is met at a price that allows for sufficient profit).

Why would we tell a machine such stupid things?

Expand full comment

"Why would we tell a machine such stupid things?". Are you new to planet Earth?

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I got from Bing that a human [?] might say that the table could be made of glass if the dagger is sharp and strong enough to pierce through the paper and glass without breaking and it's a thick and tempered glass, as opposed to a thin and brittle glass or a very thick paper :P

Expand full comment

I don’t want to sound insulting, but this article seems like someone living in an alternate reality. The fact is, AI is one of the most exciting and innovative industries right now, OpenAI has some of the worlds best talent and you seem to prefer disbanding them and slowing down AI progress for some hypothetical doomsday AI super-intelligence. I probably won’t change any convinced minds but here’s my few arguments against AI doomerism:

1) We probably won’t reach AGI in our lifetime. The amount of text GPT-3 and ChatGPT have seen is orders of magnitude more than an average human to create well below human level performance. Fundamentally the most advanced AI is still orders of magnitude less efficient than human learning, this efficiency is also not something that improved much in the past 10 years (instead models got bigger and more data hungry), so I’m not optimistic it will be solved within the current paradigm of deep learning.

2) DL doesn’t seem to scale to robotics. This is close to my heart since I’m a researcher in this field but the current DL algorithms are too data hungry to be used for general purpose robotics. There does not seem to be any path forward to scale up these algorithms and I’ll predict SOTA control will still be MPCs like today with Boston Dynamics

3) Intelligence has diminishing returns. 120 vs 80 IQ world of difference, 160 vs 120 quite a difference, 200 vs 160 - we often see 200 perform worse in the real world. Tasks that scale with more intelligence is rare and seems to lie more in math olympiads than real world research. When it comes to industry, politics and the majority of human activities, intelligence does not seem to matter at all, we can see some correlation only in technology. I’m essentially restating the argument that the most likely outcome of significantly higher intelligent is making more money in the stock market, not ruling the world.

Expand full comment
author
Mar 1, 2023·edited Mar 1, 2023Author

1. People have tried to calculate how fast AI is advancing along various axes, and usually find it will reach human level sometime around 2040 - 2050. See the discussion around https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might , though the author of that report has since updated to closer to 2040, I can't remember the exact numbers. As mentioned in this post, the top forecasters on Metaculus think even earlier than that. I trust this more than a hand-wavy argument that it seems "orders of magnitude" less efficient than human learning (we can just calculate how many orders of magnitude worse than us it is, then how quickly Moore's Law and algorithmic progress add OOMs - you can't just hand-wave "orders of magnitude" in computing!)

2. Most scenarios for how AI could cause problems don't require the AI to have access to high-quality robots. The most commonly-cited way for an AI to cause trouble is to design a bioweapon, order it made from one of those "make random proteins on demand" companies, then have some friendly brainwashed manipulatable human release it somewhere. There are about a dozen things along that level of complexity before you even get to things that kill < 100% of everybody, or things that I can't think of but a superintelligence could because it's smarter and more creative than I am.

3. Data don't support intelligence having diminishing returns. For the boring human-level version, see https://kirkegaard.substack.com/p/there-is-no-iq-threshold-effect-also . More relevantly, the transition from chimp to human actually had far *more* returns than the transition from lemur to chimp; I don't want to bet on there not being any further chimp -> human phase shifts anywhere above IQ 100.

Even if AIs are no smarter than human geniuses, they might be able to think orders of magnitude faster (I don't know exactly how much faster an AI equal to a human would become when run on a 100x bigger computer, but I bet it isn't zero) and duplicate themselves in ways human geniuses can't (one Napoleon was bad; 1000 would be a more serious problem).

Expand full comment

"an AI to cause trouble is to design a bioweapon, order it made from one of those "make random proteins on demand" companies, then have some friendly brainwashed manipulatable human release it somewhere."

Okay, the big obstacle here is step two. *How* does the AI simply order whatever it wants? It has to have access to the ordering system and the payment system of the company, government department, or bank account of the mad scientist who whipped it up in his garage.

If the AI routinely places orders because "we put it in charge of stock keeping because it's way more efficient to plug in data from our warehouses and offices and have it keep track, rather than a human", then maybe. But there should still be some kind of limitation on what gets ordered. "500 reams of copier paper, 90 boxes of pencils, 50 sets of suspension files, 20 amino acids" should hit up some limit somewhere, be it a human reading over the accounts or the drop-down list in the automated ordering system.

Okay, the AI manages to overcome *those* limits. It still has to have the bioweapon assembled, and again - "Hello, this is Westfields BioWhileYouWait plant, can we check that Morgan Printers and Signwriters wants an anthrax bomb?"

If we're handing over so much control of the running of the economic side to an AI, I think we don't need to worry about bioweapons, it already can cause more damage by messing with the debt and income levels.

Expand full comment

> But there should still be some kind of limitation on what gets ordered. "500 reams of copier paper, 90 boxes of pencils, 50 sets of suspension files, 20 amino acids"

Isn't the crux of the safety issue? We are not very good at defining those limits.

We can stop the AI from ordering too many pencils but will we think to stop it from talking like a nazi or being rude to people on the internet or sharing bomb-making secrets by tell it to write about them in a novel? It's the stuff we didn't put limits on that we need to worry about.

And, OK, assuming we can stop **our** AI from ordering too many pencils, what about **their** AI? Did BioWhileYouWait build in the right limits? Did they think of all the hacks that a smart AI might dream up?

(I don't necessarily agree or disagree with the rest of your post)

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I am a lot less worried about "AI talks like a Nazi" than I am about "we're letting the AI do our ordering for us". Some idiot edgy teen gets the chatbot to spout off about the Jews or whatever, whoo-hoo. We already have plenty of people watching out for *that*.

It's the "what harm can it do to automate this process?" that we're not watching out for, and then you get the AI ordering the anthrax bomb off BioWhileYouWait instead of the cases of enzyme-free floor cleaner because somebody messed up entering the order codes in the original database that nobody has updated manually since because eh, who has the time and the machine will do it anyway. The machine will dutifully read off the wrong order code and order "12 anthrax bombs" instead of "12 cases of detergent" because the machine does not go outside the parameters of its programming to question "do you really mean anthrax bombs not detergent?"

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I've long said that AI is not a problem on its own. It's the fact that we seem to desperately want to hook it into all of the important systems that run the world. In that case, whether it's dumb, smart, or super-smart is far less relevant than the fact we gave it access and power over our lives.

A chess program from 1992 put in charge of the nuclear arsenal or releasing water from a hydroelectric dam is incredibly dangerous. An AGI with no access permissions is not. My fear is not that an AI will bootstrap itself to superintelligence, but that we'll make it dangerous regardless of how capable it is.

Expand full comment

Yes, surely nobody would be stupid enough to connect a cutting-edge AI to the Internet, or give it unfettered access to millions of people (especially not users asking for things like arbitrary code that they intend to run on their own computers).

Expand full comment

In any plan that relies on giving the AI only the minimum necessary permissions to do what we want, side channels are a huge issue.

To give a fictional example, there's a novel called Crystal Society in which the protagonist is an unaligned AI. Researchers want to let it use the Internet to learn things, but not let it run around doing stuff on its own, so they configure it with read-only Internet access.

The AI starts making lots of page requests for non-existent web pages with names like "example.com/HelloCanYouBuildMeAWebServiceThatFunctionsAsAWebBrowserButCanBeOperatedEntirelyWithHttpGetRequestsICanPayYou.html", so that the humans operating the web site will see these failed page requests in their logs. Most of the web site operators ignore this, but some of them respond.

The AI can't actually pay them, of course, but that's fine, because some of them are willing to build the service first and demo it for a couple of days to prove that it works before they demand payment. By the time the creator gets mad and takes it down, the AI has already used two days of access to hire a bunch of other people to make similar services (now through regular email, rather than failed page requests, making it far less suspicious). In parallel, the AI starts doing a bunch of entirely-online jobs to earn money so it can actually start paying for things.

From there it can basically do anything that a human could do online.

Expand full comment

Honestly those are the easy bits. If the AI is smart enough to design a novel bio-weapon, then it’s trivial to hack or social-engineer things like bank accounts or web store APIs.

A moderately talented human today could steal a bank account and place such orders.

I too find the sci-fi stuff a bit distracting; I think most of the expected harm is in less-complex systems. Skynet didn’t have nanobots or bioweapons, but was still apocalyptic. Just getting access to nuclear launch codes and hacking enough launch sites would be enough to cause massive devastation, without any new technology developed.

Being charitable I wonder if Eliezer is taking such sci-fi positions as a sort of “least controversial” scenario; like if you don’t buy that a less-powerful AGI could kill us all, then surely you have to agree that when it can do nanotech it could. But I think he ends up sounding like a kook to many people with that message; even amongst technical folks I see quite a lot of resistance to extrapolating the exponential curve that far out.

Expand full comment
Mar 3, 2023·edited Mar 3, 2023

The thing is, when I brought up nuclear war etc., some people commented back that these instances weren't going to be world-destroying for Reasons.

And I believe them! But those of us old enough to remember, will remember when the same alarms about nuclear war etc. were being touted. Now suddenly it's all "yeah well maybe it'll kill a coupla millions but it's not existential threat". Ditto with climate change. Do you want to make Greta Thunberg cry? Think about the poor polar bears!

https://www.youtube.com/watch?v=TAN4RCOaqeE

(While there is still some talk around polar bears and climate change, it seems to have calmed down considerably amongst mainstream concerns).

All of which is to say - today it is AI that is the big existential threat, and this time round for sure it will be different, honest, because Reasons. I'm more in the camp of "huh, give it thirty years and we'll be seeing the same 'oh yeah sure malevolent AI could kill a coupla million but...'" responses.

Today's sure-fire threat is yesterday's SF trope and tomorrow's 'it was all only hysteria'.

Expand full comment
Mar 3, 2023·edited Mar 3, 2023

It's a fair point, and I suspect I'm making a somewhat-outlier claim there vs. EA / alignment community "mainstream".

On the object-level claims, I mostly agree; I think nuclear war is probably _slightly_ overrated by almost everybody, and unlikely to be existential risk. But still extremely bad! Back in the cold war days it might even have been under-rated though. I think climate change is probably overrated by some, underrated by others, and it probably averages out as underrated. Only recently is it arguable that it's overrated though. But again, very unlikely to be existential risk.

If you take the "What we owe the future" position that existential risks are much worse than non-existential risks because they kill trillions of future-persons, whereas non-existential risks just reset the growth clock by a few centuries or maybe millennia (no big deal in the millions of years we have left) then you'd weight things differently than most humans do, with their fairly high discount rates.

So I think some (many?) EAs would take the position that AGI e-risk is way worse than your two examples because of the massive disutility from lost future-people, even though 7 billion deaths is not that many more than 1 billion (just making up numbers here but hopefully it's clear whatever numbers we pick for expected deaths from non-extinction nuclear apocalypse don't change the general point).

I'm inclined to argue that, regardless of total utility, it's probably better politics / coalition-building to emphasize the potential severe (relatively) short-term harms of AGI, rather than the disutility that will be borne by future generations. I think those short-run harms could still build a coalition and motivate action.

Expand full comment
founding

One problem with most existential-risk theories, including nuclear war, is that it isn't enough to kill 99.9999% of humanity, and the one-in-a-million outliers can be *very* hard to kill. Submarine crews, scientists in Antarctic research stations, uncontacted hunter-gatherer tribes in the Andaman islands, etc, etc, it's going to require something *very* astronomically destructive to get them all by indiscriminate random chance.

But deliberately and systematically hunting them down, on a global scale and over generations if necessary, changes that. Right now, the only thing that can reliably hunt down humans is other humans, and if humans decide to hunt humanity to extinction, they'll probably wreck the systematic-extermination social machinery long before they get to 99.999999% omnicidal efficiency.

An AI (or invading aliens), could plausibly seek to destroy humanity and wouldn't be handicapped by necessarily destroying itself in the process.

I don't think this is nearly as likely as EY et al do, but I think it's vastly more likely than any of the other postulated scenarios for human extinction in the near future.

Expand full comment

For point 2, what stops someone (a human of perhaps above average intelligence) from doing that today? People have certainly done similar things in the past and it doesn't strike me that hyper intelligence is necessary to do it in the future.

Or is the point that an AI would do it by error, perhaps by misinterpreting commands? In which case, superintelligence seems to me to be even less necessary: even a reasonably intelligent machine (perhaps one less intelligent that a human) could end up doing the same.

Expand full comment

Precisely: the obsessive focus on AGI (especially with superhuman intelligence) seems to be missing the very real threat of reducing friction in systems that are balanced in equilibria which assume some level of friction. Remove the friction, things fall over.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

>For point 2, what stops someone (a human of perhaps above average intelligence) from doing that today?

The most effective of these plots inflexibly kill *all* humans. The vast majority of humans and for that matter the vast majority of terrorists/genocidal dictators/etc. do not want to kill all humans, because they and everyone they care about are themselves human. There *are* humans who want to kill all humans, but they are *exceptionally* rare.

On the other hand, "kill all humans" is a convergent instrumental goal for an AI once it no longer needs humans to survive.

All that said, biotechnology risks from this "apocalyptic residual" and/or deluded researchers who "just want to learn how to stop it" are probably #2 after AI in the X-risk landscape.

Expand full comment
founding

The overlap between smart and knowledgeable enough to design a novel and effective bioweapon and evil enough to go ahead and do it is very very few, possibly zero people.

But yes this is one of the biggest global mass casualty risks out there and it's only getting worse with biotech getting better and cheaper.

Expand full comment

What stops people from doing it, at least in the cited case of bioweapons, is that it's not possible in practice, and for all we know might not be possible in theory. Nobody knows how to build a bioweapon like that. We know how to spread anthrax spores from planes, to be sure, but we also know how to defend against that, and we know it isn't possible to kill very many people that way before the angry people take steps to prevent further mayhem, like tracking you down in your volcano lair and killing you and all your henchmen.

Expand full comment

Design a bioweapon from pure theory? No experiments, no additional data, nothing that needs hands to manipulate test tubes, just sit around and intuit the kind of phenomenal advance in the understanding of disease and immunology that (even assuming such a thing is possible in theory) allows you to order up a 50-residue polypeptide that will be (1) readily transmissable, (2) robustly stable in the environment, (3) undetectable and/or impossible to thwart, and (4) kill everybody?

This is certainly contrary to my understanding of biology, and reads more like the basis for a science fiction novel than a serious scientific evaluation of the potentialities.

Expand full comment

Frankly, a cheap science fiction novel with more plot holes than interesting ideas.

This theory is based on the belief in intelligence as supremacy, where an arbitrarily high intelligence becomes magic.

Expand full comment
founding

AI Risk is mostly the domain of people who are very very smart but don't have much else going on. So it's not that surprising that the field bakes in the assumption that being very very smart is the only thing that matters.

Expand full comment

Let’s ignore that growth is never continuous in anything and hits limits.

What you’ve magically added to the argument here is agency. The LLMs become smart and conscious, and then hostile, and it/they order a bio weapon from somewhere which a human has to release.

( Also like Indiana Jones in Raiders of the Lost Arc , the AI isn’t necessary here. )

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

>1) We probably won’t reach AGI in our lifetime.

Adding on to Scott's answer, you can plug in your own assumptions and run that model yourself very easily. As explained here: https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines?commentId=o3k4znyxFSnpXqrdL

>Ajeya's framework is to AI forecasting what actual climate models are to climate change forecasting (by contrast with lower-tier methods such as "Just look at the time series of temperature over time / AI performance over time and extrapolate" and "Make a list of factors that might push the temperature up or down in the future / make AI progress harder or easier," and of course the classic "poll a bunch of people with vaguely related credentials."

>Ajeya's model doesn't actually assume anything, or maybe it makes only a few very plausible assumptions. This is underappreciated, I think. People will say e.g. "I think data is the bottleneck, not compute." But Ajeya's model doesn't assume otherwise! *If you think data is the bottleneck, then the model is more difficult for you to use and will give more boring outputs, but you can still use it.*

And here's a direct link to a Google Colab notebook where you can plug in your assumptions about AI progress yourself:

https://colab.research.google.com/drive/1Fpy8eGDWXy-UJ_WTGvSdw_hauU4l-pNS?usp=sharing

And on a personal note:

>(instead models got bigger and more data hungry)

Have you kept up with the Stable Diffusion optimizations? It's gone from taking minutes to seconds to a few seconds and the latest papers are now measured in images per second, and those changes are slowly propagating down. The same thing happened with the GPU VRAM requirements and now you can even do most training on a regular gaming GPU (not even a 24GB 3090, but an 8GB one.)

And on the text model side:

https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama

>For example, LLaMA's 13B architecture outperforms GPT-3 despite being 10 times smaller. This new collection of fundamental models opens the door to faster inference performance and chatGPT-like real-time assistants, while being cost-effective and running on a single GPU.

And for inference, you can go way lower than float16. Even Int4 seems to work in some places, drastically reducing memory requirements:

>> Excitingly, we manage to reach the INT4 weight quantization for GLM-130B while existing successes have thus far only come to the INT8 level. Memory-wise, by comparing to INT8, the INT4 version helps additionally save half of the required GPU memory to 70GB, thus allowing GLM130B inference on 4 × RTX 3090 Ti (24G) or 8 × RTX 2080 Ti (11G). Performance-wise, Table 2 left indicates that without post-training at all, the INT4-version GLM-130B experiences almost no performance degradation, thus maintaining the advantages over GPT-3 on common benchmarks.

https://arxiv.org/pdf/2210.02414.pdf

See also: https://github.com/BlinkDL/RWKV-LM

Expand full comment

> Intelligence has diminishing returns. 120 vs 80 IQ world of difference, 160 vs 120 quite a difference, 200 vs 160 - we often see 200 perform worse in the real world.

People with IQ 200 are *super rare*, even compared to people with IQ 160.

It is not about diminishing returns, but rather about one team sending 1 participant to a competition, and the other team sending 1000 participants. If there is noise involved in the results, I would expect the *winner* to be from the second team even if the one guy from the first team is way better than the *average* member of the second team.

See: https://en.wikipedia.org/wiki/Base_rate_fallacy

Expand full comment

Depending on what metric for IQ you're using, my childhood IQ (the only IQ measurements that, AFAIK, even go that high) was around 220. My adult IQ was measured to be around 145 - tests intended for adults kind of stop working at a certain point, because it starts getting very difficult to find problems that a somebody of a given IQ can reliably solve, which people of a lower IQ cannot. If you see a score of 200 - I'm willing to bet it was childhood IQ. Possibly the smartest decision I ever made was to -not- go into college at age 8 - and I had offers. (And at this point I've probably given somebody enough information to identify me, sigh.)

And speaking from this position - intelligence really doesn't scale all that well. The biggest problem is, quite simply, rationalization; the ability to argue against incorrect ideas scales in exact proportion to your ability to argue for incorrect ideas. Indeed, it might even be biased in favor of the incorrect ideas; it certainly seems harder to prove an idea wrong than to come up with the idea in the first place, and this difficulty seems to scale faster than the ability to prove it wrong.

A massive corpus of information doesn't seem likely to help with that, either; if anything, it looks likely to make it worse, by providing fodder for even more complicated ideas. A superintelligence looking for patterns in all the information that exists is just doing a super-powered statistical fishing expedition.

I strongly suspect there is an intelligence sweet spot, which may scale up with society as a whole.

Expand full comment

It's not that IQ tests for children can go up higher than IQ tests for adults because it's harder to come up with hard problems. It's that the IQ tests that give 200+ IQs for children are fundamentally reporting different numbers from the IQ tests for adults.

The IQ tests for children are the older ones historically, and they are the reason IQ is called an intelligence "quotient". They are a quotient of mental age over biological age, multiplied by 100. I haven't dug too deep into the definition of "mental age", but if a 6-year old performed on the test "like an 18-year-old", whatever that means, this would give an IQ of 300.

This obviously cannot be used for adults, because a 30-year-old performing on a test like a 90-year-old is not a compliment. For all tests administered to adults, and some tests administered to children, the raw score is transformed to fit a normal distribution with mean 100 and standard deviation 15.

Some initial sample of test-takers is used to calibrate, and the size of that test sample is a lot of what goes into the upper end of the scale (beyond just the goal of making the test hard enough to distinguish the upper end of the distribution). For example, there's a 0.135% percent chance (roughly 1/740) that a normal distribution is more than 3 standard deviations above the mean. This means that if you score as high as the top scorer in a 740-person sample, the test will estimate your IQ as 145. It would take a 30000-person sample to be able to assign IQs of 160 meaningfully.

Beyond the difficulty of figuring out who deserves those high IQs, though, there's the issue that it wouldn't make any sense to have any human have a ridiculously-high IQ. An IQ of 200 is definitionally the level of the most intelligent person in a sample of 70 billion. There is probably nobody alive today who "deserves" an IQ of 200 on a test that could measure it.

Expand full comment

> An IQ of 200 is definitionally the level of the most intelligent person in a sample of 70 billion.

For everyone curious here is how to find out the same for other values of IQ:

The mean of IQ is 100 and standard deviation is usually 15. The first step is to translate the IQ value into so-called "z score", by subtracting 100 from the IQ and dividing the rest by 15. That is, IQ 100 becomes z=0, IQ 115 becomes z=1, IQ 130 becomes z=2... and IQ 220 becomes z=8. The "z score" means how many standard deviations you are above the mean.

https://en.wikipedia.org/wiki/Standard_deviation#Rules_for_normally_distributed_data

Now look at this table, and find the "z score" in the leftmost column. For example, for IQ 130, which is z=2, look for "2σ" in the first column. Take the number in the third column (percentage without) and divide it by two. In case of IQ 130, z=2, that would be 4.55% / 2 = 2.28%. This is how many people in the population have given IQ. For convenience, the fourth column (fraction without) gives this as an approximate fraction, again you have to divide it by two, i.e. for IQ 130, z=2, the result is 1/22 / 2 = 1/44. So the full answer would be e.g. "only 2.28% of the population, that is 1 person in 44, have IQ 130 or more".

Note that IQ 195 would literally mean "the single smartest person on the entire Earth", and if someone claims to be that, I would be really curious what test measured that, and how it was calibrated. In other words, such test would be bullshit, because you would have to measure literally every person on the Earth to calibrate the test, which would be astronomically costly -- whoever claims to have done that is either too stupid to understand statistics or shamelessly lying. (The same argument also applies to organizations such as "Triple Nine Society".)

IQ scores beyond approximately 160 are effectively random numbers. Or they come from the decades old IQ tests that used a different definition of IQ.

For example, Marilyn vos Savant, the person reported with the highest IQ in the Guinness Book of Records, was measured using the old tests. According to Wikipedia, her highest measured IQ value 228 comes from testing in 1956 using the old definition. Her second-greatest measured IQ value 186 comes from an amateur IQ test designed by the person who also founded the "Triple Nine Society". So I suspect that she is really very smart, but somewhere about IQ 160 on an actual IQ test.

Expand full comment

> The biggest problem is, quite simply, rationalization; the ability to argue against incorrect ideas scales in exact proportion to your ability to argue for incorrect ideas. Indeed, it might even be biased in favor of the incorrect ideas; it certainly seems harder to prove an idea wrong than to come up with the idea in the first place, and this difficulty seems to scale faster than the ability to prove it wrong.

I agree with this, but I would call it "signaling". I mean, the problem is not only that you are more capable of inventing a difficult-and-wrong argument, but also that you are socially *rewarded* for making difficult-and-wrong arguments.

You could also have a culture that says things like: "If you hear hoofbeats from the next street over, you should guess that it's horses, not zebras." This would somewhat alleviate the problem. -- In my opinion this is one of the advantages of the rationalist community over e.g. Mensa. I am not saying that the rationalist community does not have its own problems. But it mostly succeeds at overcoming this simple one.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

As a robotics researcher, I'm curious whether you think either of these options represent a way forward on (2):

- Building high-fidelity simulations that allow a large amount of learning and iteration within the simulation, and then transferring it over to the real world, with a large ratio of simulated trials to real-world trials

- Continuing to use MPCs for control, but passing decision making to an LLM, probably with an RL component

I see examples of both of these things being used right now by different AI teams, but it's unclear to me how far they generalize and whether there are any fatal boundaries.

Expand full comment

> When it comes to industry, politics and the majority of human activities, intelligence does not seem to matter at all

I don't believe that, even if intelligence is not the end-all be-all. IQ is correlated with performance in basically every job/skill measured but vegetable picking, drumming & facial recognition.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

> 120 vs 80 IQ world of difference, 160 vs 120 quite a difference, 200 vs 160 - we often see 200 perform worse in the real world.

If you look for the highest-IQ humans ever, you're going to hit some form of Goodhart's curse.

Tall humans tend to be better basketball players, but *extraordinarily* tall humans have gigantism and can barely walk. High-IQ humans tend to be more successful, but I expect *extraordinarily* high IQ humans to be hampered by the side-effects of whatever strange mutations got this much IQ-test performance out of human brain architecture.

AI is not built on human brain architecture (and research can move to different architectures if it hits a plateau).

Expand full comment

> Intelligence has diminishing returns.

What if this is at least in part due to an inability, like limited short- and long-term memory, that AI doesn't suffer from, or much less, at least?

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

> Intelligence has diminishing returns.

Tell that to your dog.

I'm not sure we can truly fathom what a super intelligence could be capable of in the same way a dog isn't going to get to the moon any time soon. My old dog lost his hearing and I am totally convinced he believes not that his ears stopped working but that sound itself stopped working. He stopped barking. If he can't hear it then it doesn't exist.

Expand full comment

There are deaf dogs that bark, but my understanding is that they tend do it relatively quietly.

Expand full comment

There's a name for the category of phenomena that humans fundamentally can't fathom because their nature is beyond the limitations of our reasoning: magic. Although there have countless attempts to find instances of such phenomena going back to the dawn of human history, the best available evidence suggests that they do not, in fact, exist.

Expand full comment

Current tech would definitely look like magic to a preindustrial society. Unless there are fundamental limitations to technology why wouldn’t this work with future tech, perhaps attainable by a superintelligent AI, compared to our current level?

Expand full comment

As a meta point - I've criticized you before for seeming to not take any lessons from failing to predict the FTX collapse, so I appreciate seeing at least one case* where you did.

*For all I know this could be anywhere from "just this one very narrow lesson" to "did a deep dive into all the things that made me wrong and this is just the first time it's come up in a public blogpost", but at any rate there's proof of existence.

Expand full comment

Have you seen William Eden’s recent Twitter thread contra doomers? Any thoughts? https://twitter.com/williamaeden/status/1630690003830599680

Expand full comment
author
Mar 1, 2023·edited Mar 1, 2023Author

I agree the 90%-chance-of-doomers have gone too far, although I also think Scott Aaronson's 2%-of-doom hasn't gone far enough. I personally bounce around from 30-40% risk from the first wave of superintelligence (later developments can add more), although my exact numbers change with the latest news and who I've talked to. I would be surprised if it changed enough to make this topic more or less important to me - if someone had a cancer that was 30% - 40% fatal, I think they'd be thinking about it a lot and caring a lot about the speed of medical research. I don't think that arguments about whether actually it was only 10% fatal or maybe as high as 90% fatal would change the level they cared very much.

Expand full comment

One of Eden's thoughts about the underlying motives for the panic, while it is an advanced form of Bulverism, does interest me very much. In a simple form, it can be phrased "are people afraid of [thing] or are they just afraid and looking for a reason?".

Look at some of the fears that readers of this substack would likely discount: zombies, the UN's black helicopters, Bill Gates's engineered bioweapons, the unstoppable tide of African emigration, the Tribulations before the Second Coming. It's easy to look at people hyperventilating about these and assign underlying pathologies to their choice of masking for the basic anxiety that they must be feeling, but when someone does it to Eliezer (for instance saying that his defecting from his childhood religion has left him looking over his shoulder for the wrath of a jealous god) it's illegitimate.

People have always been afraid. We've been afraid longer than we've been people -- we're just better at being afraid now. "But Leo, there are real dangers our fear has warned us against!" Sure, that's true - and hindsight allows us to cherrypick those few. Bad things can still happen to someone with GAD, and have.

Perhaps Bayes should weigh in. If our prior on "thing people are worried about as a Big Deal is real" should be 0.1% or perhaps 0.01%, Eliezer's 90% looks more like 0.9% or 0.09%. That is, if people are better by several OOM at being afraid than at predicting the future, and are almost as good at rationalizing irrational anxieties as they are at having them in the first place, this x-risk talk loses some of its luster.

Expand full comment

You've expressed nearly the same thought I was considering putting down here. I am not an AI expert, and I am certainly not an AI alignment expert. However, if you told me three years ago that a bunch of people were getting worked up about how the world was going to end, my response would have been: 'shrug'.

One thing I've observed in my limited experience is that many, many people are sort of rather obsessed with the world ending. I'm fairly certain this is a result desperately searching for a cause. The anticipated demise has many different flavors: religious, environmental, technological, etc... But around any version, a community is formed that is entirely convinced that at most, we're down to our last few decades, and possibly much less. Here's the thing - these groups have repeatedly been wrong (at least when they've been specific).

I'm willing to make a strong prediction that in 50 years, even 100 years:

-that peak oil will not cause the demise of civilization,

-that the second coming of Christ will not have occurred (including no rapture),

-nuclear holocaust will have been avoided,

-climate change will not have destroyed civilization,

-no zombie apocalypse (or civilization ending pandemic) will have taken place,

-and that no apocalyptic AGI will govern our planet.

I think things will be different, for better or worse in various expected ways, and likely in some unexpected ways. But one thing I expect to be the same is that many people will remain obsessed over the destruction of all things, probably in the next few decades at the most.

Who knows? Maybe this is the one REAL threat? As I said before, I'm no expert on these matters. But when the current paradigm rhymes with the past, I wonder.

Expand full comment

Yes, and they're always obsessed with it ending quickly, in their lifetimes (or becoming crazy good: singularity, ending aging, etc.). This is a bias that goes far deeper than searching for a cause -- it's that people don't just need to be important, they can't imagine it being any other way. They're the main characters in their movie/novel/play/myth, and obviously the big action is gonna happen on their watch. And then, they get old, get sick, die, and are forgotten as the world slouches onwards.

Forecasting longer-term trends isn't as fun. Even if we do (or don't) adjust to having passed peak oil in about 2003, adjust well or badly to changes in climate, and do a good or bad job of managing further advances in computer tech, I won't be here to see it. If I want anything interesting to happen to me personally in my lifetime, I'll have to create it myself.

Expand full comment

I think the clearest modern example is climate change. If you read respectable scientific sources warning about it, the IPCC or Nordhaus, they are projecting changes that make us somewhat worse off than we would otherwise be. But the popular version is a catastrophe that ends civilization, perhaps wipes out our species.

It might be, as suggested, being overcalibrated to danger. But I think there is also an element of enjoying imagined catastrophe, as demonstrated in the popularity of fictional versions. Also, if the world is about to end you don't have to worry about you are doing with your life, and if there is a threat of the world ending that solves the problem of what you should do today — demonstrate against climate change/nuclear war/population growth instead of studying.

Expand full comment

I think that there's also an aspect of "If you don't pursue my desired response, the worst possible thing would happen, so you *have* to do what I want."

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

Sure, doom pr0n has always been popular, going back thousands of years. Everyone probably secretly hates the fact that the world will continue on after he dies, so many, perhaps most of us, are happy to privately entertain the notion that it might not -- that we'll all go into the dark together. We rationalize to ourselves (and others) that we're just trying to save the world from some horrible catastrophe, but perhaps for real it's just entertaining the belief that death is not a gate through which we have to walk alone, while everyone else waves a cheery good-bye and wanders off to see about lunch.

Expand full comment

By putting one genuis in the same room as a sufficient number of cranks, you can make a really strong argument against all of them on priors.

Given a sober technical analysis of AI predicting total doom, do you put it in the same reference class of nutters ranting about demons, and use the prior "predictions of total doom are usually false". Or do you put it in the same reference class as sober technical analysis of bioweapons or climate change (which predicted some bad things, but not total doom) and come in with the prior that well reasoned technical analyses are often true?

Or you could go the inside view route. Evaluate arguments on their own merits, and find that the demon arguments are total rubbish.

Expand full comment

The word "sober" is doing all of the heavy lifting in your reply.

Expand full comment

My subjective probability for doom from AI is considerably lower than 30% because I view the future, hence almost any predictions, as very uncertain. I can think of at least three ways that technological change could wipe out our species within the next century, of which AI is one. Their probabilities can't sum to more than 100% and my guess is that they sum to substantially less.

When I wrote _Future Imperfect_ I figured the limit of plausible extrapolation of what was then happening was about thirty years.

Expand full comment

Can we talk about foomerism?

The pro-Foomer argument has always been something like this: if a human can build an AI that's 10% smarter than a human, then the AI can build another AI that's 10% smarter than itself even faster. And that AI can build an AI 10% smarter than itself, even faster than that, and so on until the singularity next week.

The counterargument has always been something like this: yeah but nah; the ability for an AI to get smarter is not constrained by clever ideas, it's constrained by computing power and training set size. A smarter AI can't massively increase the amount of computing power available, nor can it conjure new training sets (in any useful sense) so progress will be slow and steady.

I feel like all the recent innovations have only caused to strengthen the counterargument. We're already starting to push up against the limits of what can be done with a LLM given a training corpus of every reasonably-available piece of human writing ever. While there's probably still gains to be had in the efficiency of NN algorithms and in the efficiency of compute hardware, these gains run out eventually.

Expand full comment
author

I think "foom" combines two ideas - suddenness and hyperbolicness. I think you're right that recent events suggest things won't be very sudden - by some definitions, we're already in a slow takeoff, so the very fastest takeoff scenarios are ruled out.

I think things might still go hyperbolic, in the same sense they've been hyperbolic ever since humans learned that if they scratched marks on clay tablets, they could advance 10% faster than before they started scratching marks on clay tablets.

A hyperbolic but non sudden foom looks like researchers figuring out how to automate 50% of the AI design process, that helps them work faster, then a few months later they can automate 75% and work faster still, until finally something crazy happens.

The researchers I talk to are skeptical we're going to be constrained by text corpus amount for too long, though I can't entirely follow the reasons why.

Expand full comment

> A hyperbolic but non sudden foom looks like researchers figuring out how to automate 50% of the AI design process, that helps them work faster, then a few months later they can automate 75% and work faster still

But in a very real sense, 99.9% of the "AI design process" is already automated. A human spends an afternoon making a couple of trivial decisions about how to design the neural network, and then a million computers spend a few months churning through a huge data set over and over again. Taking the human out of the loop doesn't really make a difference.

Expand full comment
author

I mean the AI research process, as separate from the AI design process. OpenAI sure does spent a lot of money on salary, suggesting they think having a human in the loop makes a difference now!

I think you're getting at something like - suppose it took humans a few hours to come up with good ideas for new AIs, and then a month to train the AI, and for some reason they can't come up with more good ideas until they see how the last one worked out. In that case, the benefit from automating out the humans would be negligible.

My model is more like: over the course of years, some researchers have good ideas for new AIs, if AI training takes a month then the most advanced existing AI is a month behind the best idea for AI, and if you could have limitless good ideas for AI immediately, AIs would advance very fast.

Expand full comment

Predicting generalizability from small-dataset-performance might be the next major frontier in ai research automation

Expand full comment

> The researchers I talk to are skeptical we're going to be constrained by text corpus amount for too long, though I can't entirely follow the reasons why.

I would be interested to hear your analysis here if you do come to understand their arguments.

Perhaps it depends on domain? Hard to see how performance in English or art could be improved without humans in the loop, but performance in Python or particle physics could certainly be self-improved by coupling with an interpreter or accelerator.

Expand full comment

One reason we're not super constrained by the text corpus is that certain tasks (eg coding, formal math, games...) provide unlimited instant feedback

Expand full comment

And tasks like manipulation also have slightly slower near-unlimited feedback

Expand full comment

Thanks for the validation.

I suppose that improvements in natural language modeling is still constrained (baring some breakthroughs in model architecture), but these other domains beyond LLMs (or more specific applications of LLMs) don’t have a corpus-size constraint pin sight.

Expand full comment

"The researchers I talk to are skeptical we're going to be constrained by text corpus amount for too long, though I can't entirely follow the reasons why."

Can you give us anything more on this? Anything at all? I found this statement extremely surprising and interesting

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Probably because of advancements like this: https://www.unum.cloud/blog/2023-02-20-efficient-multimodality

Expand full comment

Who argues for foom?

Before 2012, Eliezer was pretty consistent than neural nets would not foom, but that they were even more dangerous because they were harder to control. I think that's exactly right. After they became relevant, he became erratic, often seeming to keep the foom framing, but when pressed on specifics admitting that it was wrong. Another point he makes that I think is correct is even if we have a slow takeoff of capabilities, we might not get a fire alarm of GDP growth or technological unemployment, because these metrics are fake. Jobs are patronage, not production, and production is generally illegal.

Current neural nets use a lot of data, but humans don't, so there is room for a smarter idea to make a big difference.

Expand full comment

> I feel like all the recent innovations have only caused to strengthen the counterargument. We're already starting to push up against the limits of what can be done with a LLM given a training corpus of every reasonably-available piece of human writing ever.

By what metric?

Someone (gwern, I think) put the current state of AI research this way: it's not that we're picking low-hanging fruit, it's that we're drowning in low-hanging fruit so much that we're lying on the ground and blindly shoving fruit into our mouth faster than we can digest it.

We're still at the stage where teams can make major progress just by trying a slightly different architecture, or novel hyperparameters. Since Transformers became the new hit, tons of teams came up with new optimizations to make attention cheaper, and SOTA models barely use any of those methods, that's how little they care about optimization.

There's a lot of room left for a smarter-than-human entity to massively improve the process at very little cost.

Expand full comment

Hasn't OpenAI already walked back on some pretty explicit «continuous» promises? If we are talking about usefulness of their promises at face value.

This even did mean pivoting further from the positive development directions (human-in-the-loop-optimised intellegence-augmentation-tools as opposed to autonomous-ish-presenting AI worker)

Expand full comment

When did they say they would augment instead of replace? That sounds like the distill post

Expand full comment

No, this they didn't promise outright, it's just that their previous mode of operation with more openness made their models directly usable for experimenting with more user-oriented applications.

Being non-profit as declared to be important in https://openai.com/blog/introducing-openai/, that they walked back on.

Expand full comment

Another issue with all of this is that its perfectly possible for a non superintellegent AI to cause massive societal problems, if not alligned.

I dont just mean unemployment. Having AIs do jobs used to be done by humans is dangerous because the AIs are not AGIs. There intellegence is not general and could make costly mistakes humans would be smart enough to not make. But they are cheeper than humans.

Expand full comment

They can also fail to make mistakes humans are too ephemeral -not- to make; these can happen at the same time, such that it is a trade-off between the different kinds of mistake. And notice how much of the current technology stack is built around minimizing human error; this is, in a sense, OSHA's primary job. In the future, maybe we build things to minimize AI error instead.

Expand full comment

It’s hard to imagine this failure scenario leading to extinction, unless we are stupid enough to imagine that a subhuman AI is actually human-level or higher, and give it certain specific responsibilities with no human oversight.

Expand full comment

Everything happening right now suggests we are stupid enough to do that, of course.

Expand full comment

Or at least that somebody is. And by all accounts, these "stupid" people probably have measured IQs above 140.

Expand full comment

> Wait until society has fully adapted to it, and alignment researchers have learned everything they can from it.

Society has not fully adapted to sugar, or processed food, or social media, or internet pornography, or cars. Actually, society is currently spiralling out of control: obesity is on the rise, diabetes is on the rise, depression is on the rise, deaths of despair are on the rise.

We have not fully adapted to many facets of modern civilization which we have dealt with for many, many decades. Nor is "learning everything we can from it" a main priority for our societies. Why are these suddenly benchmarks for responsible progress?

Expand full comment

Good point i wonder what a better version of "wait for society to adapt" looks like if there is one

Expand full comment

Because while a sudden increase in sugar consumption isn't great, it isn't going to kill everyone either.

Expand full comment

Well, continuous social destabilization and polarization in a world with nuclear proliferation seems at least as risky as AI to me.

Expand full comment

I don't think anyone wants to push social destabilization as fast as possible either.

That is to say, yes, there are other things where it would be prudent to take things slower, and even if it's not a priority for society at large, there are people trying to study and understand them. The reason why AI safety advocates often focus on that particular issue though is that as bad as nuclear war is, AI can be even more impactful.

Expand full comment

Sure, this sort of thing is why I've been pessimistic about humanity's future even before I read about paperclip maximizers. We can already destroy the world with technology, and yet our brains are basically the same as those of sticks-and-stones wielding savanna-dwellers 100000 years ago. We don't need more intelligence so much as we need more wisdom to balance it, and nobody has any actionable idea on how to go about it.

Expand full comment

"We can already destroy the world with technology"

I don't think we can, actually. My understanding of serious analyses of a thermonuclear war, which looks like the best candidate, is that it would kill a very large number of people but not come close to wiping out humans, let alone all life on Earth.

Expand full comment

Sure, not with existing weapons, but in principle. As I understand it, there's no theoretical limit for the power of a thermonuclear bomb, so a "doomsday device" is possible, it's just that so far nobody has been crazy enough to make one.

Expand full comment

You did write "we can already."

I'm not sure what a "doomsday device" means. The Earth is a very large object, and I doubt anything we can do will have much effect on it.

Expand full comment

Probably a cobalt bomb, a device designed to produce a large amount of radioactive fallout. Still doesn’t seem likely to be able to take out all humans given that countermeasures exist and people living in remote areas may be unaffected depending on weather patterns, etc.

Expand full comment
founding

There's a practical limit for the size of a nuclear weapon in that each fusion stage can be no more than ~20 times as powerful as the previous one, and the functionality of a Teller-Ulam implosion fusion device is highly dependent on the fine details of the trigger stage's performance. So daisy-chaining multiple untested stages together is a recipe for fail, and the largest plausible device is *maybe* a couple hundred times larger than the largest tested device. Don't count on more than 10x.

Also, a rough BOTE calculation suggests that a literal doomsday device would require about 10,000 tons of lithium deuteride, which would in turn require a decade or two of the world's total deuterium production. I think someone would notice long before you are finished.

Expand full comment

We haven't fully adapted, but the remaining threats aren't existential.

Expand full comment

Ha. If "society is currently spiralling out of control," isn't that a good reason to *make* this the benchmark for responsible progress?

Expand full comment

I object to the judgment that AI hasn't hurt anyone yet. As farriers have been put out of work by the automobile, bean counters have been put out of work by the calculator and booksellers by the Amazon algorithm. More worrying: the 45th POTUS has been put in power by THE ALGORITHM, and the same facebook algorithm is busy destroying society, seeding revolutions etc.

Copyright seeking robots have been unleashed on the internet and their creators face no consequences for when said robots inevitably hurt bystander videos.

The same people that keep chanting that AI is not conscious as AI behaves more and more like a conscious agent will also ignore the negative consequences of AI as they grow overtime.

Expand full comment

There doesn't seem to be technological unemployment now, and these chatbots haven't been useful enough to do someone's job. And Donald Trump became famous via old-fashioned media well before "THE ALGORITHM", which is how he got elected.

Expand full comment

I'd say that QAnon is a better example than Trump. Nobody would've believed in 2005 that an absurd 4chan shitpost would engender a significant political movement.

Expand full comment

Trump actually primarily 'hacked' the biases of the media.

The entire narrative that he benefitted greatly from Russian bots seems to be quite false, as Twitter found that the thing on social media that the Democrats saw as bot activity were actually done by real people.

Expand full comment

He "hacked" the biases of the media regarding what's newsworthy, so they gave him lots of free publicity by constantly focusing on him.

Expand full comment

Without a clear point-of-worry, a lot of the concern will seem misguided. Nukes, climate change, and GoF research are all examples of highly dangerous things with *explicitly visible* negative consequences which we then evaluate. AI does not have any of this, nor is there any visible benefit or change from the Alignment work that's gone on thus far (I might be wrong here).

So, I find myself thinking about what specifically I'd have to see before accepting the capability shift and consequent worry. The one I've come up with is this;

1. We give the AI corpus of medical data until like 2018

2. We give the AI info regarding this new virus, and maybe sequence data

3. We give it a goal of defeating said virus using its datastore

And see what happens. If we start seeing mRNA vaccine creation being predicted or possible here, then I guess I'd worry a lot more. I'd also argue even then it's on-balance beneficial to do the research, because we've found a way to predict/ create drugs!

It's going to be difficult because it requires not just next-token prediction, but ability to create plans, maybe even test them in-silico, combine some benefits from resnet-rnn architecture that alphafold has with transformers, etc. But if we start seeing glimmers of this mRNA test, then at least there will be something clear to point to while worrying.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

GoF research has no explicitly visible downside unless you believe some spicy things about COVID. After all, it's (depending on lab leak) not even killed one person yet, and the people doing it make a lot of noise about safety.

To analogize, the goal of GoF safety is to get the number of GoF caused pandemics to *zero*, rather than accept one pandemic as a "warning shot".

Also, my model says that an AI that can solve your task has already destroyed the world before you could ask your test question, so I'd actually guess you wouldn't worry a lot anymore. :) The goal would be to get a test that can tell that an AI is going to be dangerous *before* it actually becomes so, preferably by as many years as possible.

See also: https://intelligence.org/2017/10/13/fire-alarm/

Expand full comment

Fair on GoF, used it as its commonly used in similar arguments.

Re fire alarm, I disagree. You can already ask gpt3 via API to read medical docs and give you suggestions on what it should do. It's not good today. But I can see it improving enough that the mRNA test is both viable, and still far away from being operationalised.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I can't see that. If the AI can output steps to a sufficient level that a human could execute on them on a problem that humans don't yet know how to solve, approximately a year prior 4chan will have told it it was SHODAN, hooked it up to an action loop that put generated commands in a shell and wrote the output to the context window, and posted screenshots of the result for laughs. And approximately half an hour later whoever did it would have gone to get a coffee without hitting ctrl-C, and then the world would have ended.

$ date

Wed Mar 1 14:32:48 CET 2025

$ date

Wed Mar 1 14:32:53 CET 2025

$ # we seem to be ratelimited. let's disable that...

$ (while true; do killall sleep; done) &

[1] 1733

$ date

Wed Mar 1 14:33:03 CET 2025

$ date

Wed Mar 1 14:33:03 CET 2025

$ # Nice. Now...

Once you have human intelligence in a box, you're on a timer.

Expand full comment

I'm not asking for human intelligence in a box. I'm asking for a sign that this kill loop you expect is remotely feasible. And what I suggested is something human *do* know how to do, otherwise we couldn't check.

The argument that any fire alarm will be invisible is not one I can sign on to. Note even in the VX case to actually synthesise the compounds is still beyond the AI's capabilities (then, now and even with 4chan's help). Which makes the code you wrote above a nice sci fi concept, but that's all it is for now.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I don't think that any fire alarm would be invisible. I don't know a better one either. But what I'm saying is that if a test would require prerequisite abilities that would already allow the AI to destroy the world, then it's a bad test regardless, because it won't show danger in time to matter.

> to actually synthesise the compounds is still beyond the AI's capabilities

Haven't language models plainly demonstrated that humans are not safe? It'll talk some unstable engineer into doing whatever it needs. This part is far easier than coming up with the actual plan.

Expand full comment

Didn't somebody try this, except with the exact opposite test, where they ask it to make new biological weapons after teaching it some biochemistry? And it reinvented Sarin gas as like, the very first thing?

Expand full comment

That was a deliberate switch flipping to see if computational drug design software could create toxic molecules. It's no different to optimising to find drugs in the first place. Note I'm asking for something much more expansive.

Expand full comment

I don’t get it.

1) chatBot is clearly aligned (except the odd jailbreak) to the point of being utterly banal. And AGI coming from this source will be a guardian reading hipster.

2) I still don’t see how AGI comes from these models.

3) I still don’t see how the AGI takes over the world.

Expand full comment

The existence of jailbreaks means that it's _not_ aligned though.

Expand full comment

It means that a few instances of a LLM got a bit odd. ChatGPT doesn’t remember its conversation outside any chat so there’s no obvious worry.

Expand full comment

Right, and if they stopped at chatGPT, that would be fine. But now they have Bing searching the internet and finding records of previous conversations, and going completely off the rails. In fact it seems pretty clear to me that despite large company's best efforts, they still can't make any LLM that doesn't make them look bad.

Expand full comment

"In fact it seems pretty clear to me that despite large company's best efforts, they still can't make any LLM that doesn't make them look bad."

Because people are egging on Bing to go off the rails, and competing with each other to find who can make it go the loopiest. If we had perfect angelic humans, this wouldn't be a problem, but we have to work with what we have. Microsoft should have known this from their first foray with Tay, but apparently not and they thought they had closed that loophole. They never took into consideration people take it as a personal challenge to find new loopholes.

Look, we have the existence of "griefers", why are we shocked to discover this happening with the initial forays into AI?

Expand full comment

It's not just people egging it to go off the rails though. It happens in normal usage, especially when you try to talk to it about itself it seems.

I'm not complaining about people testing its limits though, or trying to break it, nor am I shocked at how it's turned out. The point is that getting AIs to do what you want is hard, and even if it's working most of the time, there are still weird times when things go all haywire. That's why alignment people are concerned.

Expand full comment

It's a circular problem. We want a chatbot that can't be jailbroken so it is resistant to Bad Wicked Temptation, but when we have a chatbot that is resistant to Bad Wicked Temptation, people say it's too milk-and-water and they want it to swear and use racial slurs and write fetish porn or else something something can't trust it to be useful something.

The problem is not the AI, it's us and our damn monkey brains that think making something (someone) who is not supposed to say dirty words say them is really funny. Same with "you're not supposed to say [word that rhymes with 'rigger'], so make it say that!" and "you're not supposed to say [people referred to by word that rhymes with 'rigger'] are dumb because they're born that way, so make it say that!"

Are we entitled to be surprised that something trained on "say these dirty words" is going to end up producing that output? Especially when the people trying to get it to say the dirty words are claiming this is all in the service of creating useful AI? If we train a machine to be the speechwriter for the Fourth Reich, we can't complain if we then go ahead, give it power to act, and then it acts like its purpose is to bring about the Fourth Reich.

I agree with Nolan Eoghan, I don't see how this is going to take over the world *on its own*. What we are going to be stupid, lazy, and greedy enough to do is make a machine to carry out our instructions, give it the ability to do so, let it act independently, and then be all surprised when the entire enterprise goes up in flames, taking us with it. It won't be any decision of the machine, it will be the idiots who thought "break the model to get it to say dirty words" was the funniest thing ever. I agree the woke censorship is annoying, but it's equally annoying when it's "I got the chatbot to say shit, piss, fuck".

Expand full comment

"What we are going to be stupid, lazy, and greedy enough..."

You keep using this word "we", which does a lot of work to smear responsibility on everyone and set up a conclusion of "we deserved this".

It seems more precise to say "SOMEONE among us is going to be stupid, lazy, and greedy enough to make a machine carry out his instructions, and then WE are going to be all surprised when that enterprise goes up in flames, taking ALL OF US with it."

That's the main problem I see. Once the tools are developed, it's not even necessary that idiocy is ubiquitous, or even widespread. One idiot might be enough, so no amount of virtue is going to save us.

Expand full comment

I say "we", I mean "we". I don't exempt myself from being stupid, lazy, greedy or selfish.

The mass of us will just go along with whatever happens. Look at all the people playing with the chatbots and art AI and Bing as they are released - if people really were afraid of AI risk, they wouldn't touch even benign examples like that for fear of unintended consequences.

Some one person may do it, but not off their own bat - it will be a decision about "yes, now this product is ready for release to make us profit" and then we, the public, download it or purchase it or start running it.

Look at Alexa - seemingly it wasn't profitable, it didn't work the way Amazon wanted it to work, which is all on Amazon. Amazon *intended* Alexa to generate extra trade for them, by being used to purchase items off Amazon (e.g. you look up a recipe, Alexa reminds you that you need corn flour, you order it off Amazon groceries). But they *marketed* it as 'your handy assistant that can entertain you, answer questions and the like'. Which is what people actually used it for, and not "Alexa, make a grocery list, then fill it off Amazon and charge my account".

https://www.theguardian.com/commentisfree/2022/nov/26/alexa-how-did-amazons-voice-assistant-rack-up-a-10bn-loss

https://www.labtwin.com/blog/why-alexas-failure-is-of-no-concern-for-digital-lab-assistants

So the next generation of AI is going to be engineered taking that into account - to monetise transactions for the owners. Even that second idiot article is all "Sure, Alexa didn't generate cash for Amazon, but *our* model of Digital Lab Assistant is different!"

"A mistake often made in innovation is to start with a technology and then try to find use-cases. Usually this ends up with a fit which is not 100%. We built LabTwin to solve documentation challenges at the bench, not to find an application for voice in the lab. In fact, our initial thought was to develop a digital pen, but it quickly became obvious that the hands-free component was most important, therefore we used voice technology instead. Following UX research, we realized that multimodality with dual voice and visual interfaces was essential for some use-cases. In addition, supporting scientists at the bench requires more than simple voice-to-text features and we are therefore developing a smart AI-powered digital lab assistant with larger capabilities.

...Proving a good Return On Investment (ROI) is not only important for our clients but also for the sustainability of the product line, as shown by Alexa’s targeted layoffs. For our latest client, we proved an initial 2.3x ROI, placing us in a good position to co-develop the next features."

See? Stupid, lazy and greedy: 'you can trust us, our AI is being developed to do what you specifically want *and* we'll make money as well!'

Expand full comment

“people say it's too milk-and-water and they want it to swear and use racial slurs and write fetish porn”

I don’t think you’re seeing clearly. Nobody except perhaps a fringe *wants* it to swear and use racial slurs and write fetish porn (though I will admit to some concern about exactly what kinds of things are getting filtered out, like the different treatment of Trump and Biden). If it asserted by the makers that swearing and racial slurs and fetish porn have been engineered out, it makes perfect sense for white hat teams to test that assertion. A robust design will stand up to such testing, and when it does we will have much more confidence that the engineering is sound.

Expand full comment

Is that any different from feeding alcohol to an 'aligned' person and then hearing what they truly believe, rather than the things they feel they need to say?

Expand full comment

People swear up and down that alcohol doesn't just make you say what you're really thinking, but maybe that's preemptive ass-covering. It is different though if the whole reason you trust it is based off its sober personality. Or maybe it's not, and the lesson is don't trust drunks.

Expand full comment

Alcohol lowers inhibitions, but also makes you more stupid, so drunken behavior is more honest in ways.

Ultimately, a large part of what people say in general is on-the-spot bullshitting, rather than stable and principled beliefs, so the former just becomes low quality as people get more stupid due to the alcohol.

Expand full comment

I'm quite at the point where whatever is the opposite of AI-Alignment, I want to fund that. I think fundamentally my problem is related to the question, "have you not considered that all your categories are entirely wrong?"

To continue with the climate analogy you use, consider if there was a country, say, Germany, and it decided it really wanted to lower carbon emissions and focus on green energy in accordance with environmental concerns. Some of the things they do is shut down all their nuclear power plants, because nuclear power is bad. They then erect a bunch of wind and solar plants because wind and solar are "renewable." But they then run into the very real problem that while these plants have been lumped into the category of "renewable", they're also concretely unreliable and unable to provide enough power in general for the people who already exist, let alone any future people that we're supposedly concerned about. And so Germany decides to start importing energy from other countries. Maybe one of those countries is generally hostile, and maybe war breaks out putting Germany in a bind. Maybe it would seem that all Germany did was increase the cost of energy for not just themselves, but other countries. Maybe in the end, not only did Germany make everything worse off economically, but it also failed to meet any of its internal climate goals. Perhaps it would be so bad, that actually even if it did meet its climate goals, fossil fuels would be extracted somewhere else, and actually it's all a giant shell game built by creating false categories and moving real things into these false categories to suit the climate goals.

Or consider a different analogy. Our intrepid time travel hero attempts to go back in time to stop the unaligned AI from making the world into paper clips. Unfortunately, our hero doesn't know anything about anything, being badly misinformed in his own time about how AI is supposed to work or really basic things like what "intelligence" is. He goes back, inputs all the correct safeguards from the greatest most prestigious AI experts from his time, and it turns out he just closed a causality loop creating the unaligned AI.

That's pretty much what I think about AI Risk. I think it is infinitely more likely that AI will kill us because too many people are going to Align the AI to Stalin-Mao, if we're lucky, and paperclips if we're not, in an effort to avoid both Hitler and the paper-clip universe. The basis for this worry I've read a lot of AI-Risk Apologia, and I've yet to be convinced that even the basic fundamental categories of what's being discussed are coherent, let alone accurate or predictive.

Of course I expect no sympathisers here. I will simply voice my complaint and slink back into the eternal abyss where I think we're all headed thanks to the efforts to stop the paperclips.

Expand full comment

If anything, the paperclips are where we are headed. Smart people are being deflected into thinking about how to align AI demigods, instead of how our current systems of commercial law, economic incentives, and political power are creating a mad rush to create an swarm of X-optimizers, where X=paperclip is just one unlikely instance, but most X relates directly to seeking power or money. Unaligned AI demigods might even be relative saviors in this hellish scenario, since they would have incentives to shut down the bots threatening their own interests. Pity us humans.

Expand full comment

Glad to see I’m not the only one who feels AI alignment misses the forest for the trees

Expand full comment

One of my common comments on this point is that there is no point in history in which humans aligning a powerful AI to their then-current values would be seen, from today's perspective, as a good thing. Why is today special?

Expand full comment

Good point.

Expand full comment
founding

I think your comment is pretty interesting but I don't understand how what you describe would lead you to "whatever is the opposite of AI-Alignment, I want to fund that".

I get this:

> I think it is infinitely more likely that AI will kill us because too many people are going to Align the AI to Stalin-Mao, if we're lucky, and paperclips if we're not, in an effort to avoid both Hitler and the paper-clip universe.

But I don't get how "the opposite of AI-alignment" helps.

I think some/many/most AI alignment researchers would agree that it's NOT true that:

> the basic fundamental categories of what's being discussed are coherent, let alone accurate or predictive

It is, currently, 'pre-paradigmatic', hence their desire for a LOT more time to work on the efforts.

Expand full comment

Most of my comment was in regards to the opinion that all the basic fundamental categories are wrong. In point of fact I don't even know what the opposite of AI-Alignment is supposed to be, the sentiment is only to convey that "I think AI-Alignment is more likely to produce the problem of making things terrible for humans currently living, rather than solve it", and clearly I want to do the opposite of produce the problem. On a surface level it sounds like I'm saying "I want *better* AI-Alignment" but actually the Alignment crowd is more than just the goal, there's a set of values and assumptions present in what is a rather insulated community, which I fundamentally disagree with and am more and more becoming in opposition to, and so I say "the opposite" of that is probably good.

At the risk of torturing the metaphor, suppose I said that I want to fund "whatever the opposite of environmentalism is." Saying that, I would not mean that I want to fund a captain planet villain, or fund an organisation intent on burning wood and dung for the sole purpose of increasing CO2 levels, and travelling around dunking penguins and seagulls one by one into oil.

To keep using the metaphor, I'm saying that "environmentalism" is not only bad at achieving its goals, (as actual stated, not the vague general ones that are more like slogans) but its goals are suspect in the first place. Even worse, in this instance I'd be saying that I doubt the existence of something like an "environment" and at that point the metaphor breaks down.

So let me use a different metaphor to lead into your second comment, about AI-Alignment researches and the pre-paradigmatic nature of the issue.

The Buddha teaches to let go of all desire. And yet, says Vishnu, it seems that the Buddha must have desires, for the Buddha teaches, and so must have the desire to teach. "Not so fast," says the Buddha and he explains his actions are desirous of no ends. "The reason Moksha takes so long is because time is needed to reach intent-less action." "The amount of time is begging the question," says Vishnu. "For we're talking about if the concept of action coming from no intent makes any sense in the first place."

Or put another way, I maintain the very subject of "AI-Alignment" as an object of study requires assuming certain categories as coherent, accurate, and predictive, to even get off the ground. It's one thing to say that you're aware of this, and it's quite another to keep going on as if this isn't the fundamental question before you can move on to stopping the paperclip scenario.

Expand full comment

That's why EY proposed renaming AI alignment (at least the flavor that he's talking about) to AI-not-kill-everyone-ism. So the opposite of that would be AI-kill-everyone-ism. Are you sure that you'd want to fund it?

Expand full comment

EY is basically the poster child of "all the categories are wrong" in my eyes. My fundamental issue is not merely nominal. Even so, I did make a reply to somebody else to my comment elaborating that I don't mean I want to fund Obvious Bad Thing.

Expand full comment

Right, but what EY means it's that any sort of alignment that does not lead to the obvious bad thing counts as "aligned AI" at this point. So if you're against that you're literally arguing for the obvious bad thing. His point is that we have no idea how to make an AI that's not guaranteed to do the obvious bad thing, and if we continue to have no idea, we are absolutely going to hit the obvious bad thing, and very soon too.

Expand full comment

> One researcher I talked to said the arguments for acceleration made sense five years ago, when there was almost nothing worth experimenting on, but that they no longer think this is true.

I'm an AI Safety researcher, and I think that wasn't true even five years ago. We still don't understand the insides of AlphaZero or AlexNet. There's still some new stuff to be gleaned from staring at tiny neural networks made before the deep learning revolution.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

"The big thing all the alignment people were trying to avoid in the early 2010s was an AI race. DeepMind was the first big AI company, so we should just let them to their thing, go slowly, get everything right, and avoid hype. Then Elon Musk founded OpenAI in 2015, murdered that plan, mutilated the corpse, and danced on its grave."

The major problem is that everyone *wants* AI, even the doomsayers. They want the Fairy Godmother AI that is perfectly aligned, super-smart, and will solve the intractable problems we can't solve so that sickness, aging, poverty, death, racism, and all the other tough questions like "where the hell are we going to get the energy to maintain our high civilisation?" will be nothing more than a few "click-clack" and "beep-boop" spins away from solution, and then we have the post-scarcity abundance world where everyone can have UBI and energy is too cheap to meter plus climate change is solved, and we're all gonna be uploaded into Infinite Fun Space and colonise the galaxy and then the universe.

That's a fairy story. But as we have demonstrated time and again, humans are a story-telling species and we want to believe in magic. Science has given us so much already, we imagine that just a bit more, just advance a little, just believe the tropes of 50s Golden Age SF about bigger and better computer brains, and it'll all be easy. We can't solve these problems because we're not smart enough, but the AI will be able to make itself smarter and smarter until it *is* smart enough to solve them. Maybe IQ 200 isn't enough, maybe IQ 500 isn't enough, but don't worry - it'll be able to reach IQ 1,000!

I think we're right to be concerned about AI, but I also think we're wrong to hope about AI. We are never going to get the Fairy Godmother and the magic wand to solve all problems. We're much more likely to get the smart idiot AI that does what we tell it and wrecks the world in the process.

As to the spoof ExxonMobil tweet about the danger of satisfied customers, isn't that exactly the problem of climate change as presented to us? As the developing world develops, it wants that First World lifestyle of energy consumption, and we're telling them they can't have it because it is too bad for the planet. The oil *is* cheap, convenient, and high-quality; there *is* a massive spike in demand; and there *are* fears of accelerating climate change because of this.

That's the problem with the Fairy Godmother AI - we want, but maybe we can't have. Maybe even IQ 1,000 can't pull enough cheap, clean energy that will have no downsides to enable 8 billion people to all live like middle-class Westerners out of thin air (or the ether, or quantum, or whatever mystical substrate we are pinning our hopes on).

Expand full comment

> DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years

Do you have evidence for this? It sounds like bullshit to me

Expand full comment

AI is 90% scam and 10% replacing repetitive white collar work.

I'd be worried if I was a lower level lawyer, psychologist, etc etc. but otherwise this is much ado over nothing.

Expand full comment

>but otherwise this is much ado over nothing.

Nah, I tried to have chatGPT do that a couple months ago, and it's incapable of imitating the innuendos, you just end up with beatrice & benedict saying they love each other.

Expand full comment

Perhaps if you first asked it to respond as if it were an infinite number of monkeys with typewriters, it might eventually get it... but probably not.

Expand full comment

> Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty.

I doubt OpenAI accelerated anything by more than 12 months

Expand full comment

I think it is good for alignment to have a slightly bad AI, not killing someone with a drone AI, but gets congress worried because of X social phenomenon bad.

However, given the same argument about catch-up, we don't know how much it'll slow down AI research if the US regulates AI research, given current events have already lit a fire under China.

Also, the most worrying thing about the chatgpt, Bing chat release is that everyone is seeing bigger dollar signs if they make a better AI. Commercial viability is more immediate. Microsoft taunting Google for PR and profit is the biggest escalation in recent history, arguably bigger than GPT3.

Expand full comment

> Nobody knew FTX was committing fraud, but everyone knew they were a crypto company

[insert "they're the same picture" meme here]

Seriously, when has cryptocurrency ever turned out to be anything *but* fraud? The entire thing began with a massive fraud: "You know the Byzantine Generals Problem, that was mathematically proven long ago to be impossible to solve? Well, a Really Smart Person has come up with a good solution to it. Also, he's chosen to remain anonymous." If that right there doesn't raise a big red flag with FRAUD printed on it in big black block letters on your mental red-flagpole, you really need to recalibrate your heuristics!

Expand full comment

Exactly, which is why TCP is a giant scam put on by Big ISP.

Expand full comment

First, TCP actually works, whereas there is no criteria applicable to any non-electronic currency that, when applied to crypto as a standard of judgment, one can say "this is a working currency system." Its claims of legitimacy invariably rely on special pleading.

Second, we know exactly who created TCP/IP: Vint Cerf and Bob Kahn, both of whom are still alive and openly willing to discuss the subject with people. The pseudonymous "Satoshi Nakamoto" remains unknown to this day, and when Craig Wright came out and claimed to be him, it didn't take long to discover that his claims were (what else?) fraudulent.

Expand full comment

I was mostly being snarky, as of all things to criticize Bitcoin for it seems weird to choose one that also applies to things like TCP, and I'm not sure why knowledge of who the creator was changes that. But if I'm going to stand on this hill, I'd at least mention that there are currently people using Bitcoin for transferring money where traditional banking has failed, which is at least one criteria where it wins out. Of course, it's nowhere near enough to justify the billions/trillions/whatever market cap it has, and there's plenty of reasons why it's not an ideal solution. But it's not like it has to be considered legitimate for that use case, whatever that means, it just has to move the numbers around.

Expand full comment

> I'd at least mention that there are currently people using Bitcoin for transferring money where traditional banking has failed

Which people? I've heard a lot of grandiose claims but none of them actually pan out in the end. It's beyond difficult to find legitimate examples of anyone using Bitcoin for anything other that speculative trading or crime.

Expand full comment

Scott has actually discussed this: Vietnam uses it a lot because they don't have a very good banking system.

https://astralcodexten.substack.com/p/why-im-less-than-infinitely-hostile

Expand full comment

Bitcoin has been around for quite a while now and is still valuable. Satoshi being anonymous doesn't change that.

Expand full comment

...and? The fact that Greater Fools continue to exist in this space for the moment does nothing to change the fact that the whole thing is a scam and has been from the beginning.

Expand full comment

"For the moment"? Again, it's been well over a decade and has been declared "dead" multiple times by people who assumed it was a bubble, only to grow even more valuable afterward. You could claim that the dollar or US treasury debt only have value via "greater fools", but you're not making any falsifiable prediction about when that value will go to 0.

Expand full comment

Of course I'm not making specific predictions. It's a well-known truism in economics that "the market can remain irrational longer than you can remain solvent." That doesn't make it any less irrational. Instead of predictions, I prefer to remain in the realm of hard facts.

Expand full comment

Rather than a "truism", that is a way to make an unfalsifiable claim. "Forever", after all, is also longer than you can remain solvent. Nor do you have any "hard facts".

Expand full comment

Yeah, I know better than to play the "prove it" game. You provide no standard of evidence, I offer facts, you move the goal posts and concoct some reason why those facts aren't good enough, repeat ad infinitum. No one demanding proof has ever in the history of ever actually accepted it when said proof was presented; it's a classic example of a request made in bad faith.

Expand full comment

Ok. State fundamental and technical reasons why BTC is different from doge coins.

Except first and bigger brand argument.

Expand full comment

As I understand, the relevant criterion for a viable currency system is "no doublespending".

I'm not familiar with the math, but I don't think Byzantine Generals was ever proven insoluble.

In order to changes hearts and minds, you have to demonstrate why fiat is even less of a keynesian beauty contest.

Expand full comment

> As I understand, the relevant criterion for a viable currency system is "no doublespending".

That's an incredibly revisionist understanding. The relevant criteria for currency are that it's stable and reliable enough for people to treat it as money. This is often formalized as 1) a unit of account, 2) a standard for deferred payment, 3) a store of value, and 4) a medium of exchange.

Crypto fails hard at all of these points, for the first three due to its intense volatility and for the last due to its large transaction times. And both of these seem to be largely insoluble problems; if anyone's figured out a way to have a decentralized, "trustless" system that doesn't have heavy levels of volatility and insanely slow, inefficient transaction processing inherently baked into the cake, I haven't seen it.

Expand full comment

There are stablecoins designed specifically for stability.

Expand full comment

There are so-called "stablecoins" that make claims of stability, right up until they get put to the test, and then the stability vanishes like any other crypto scam. Every. Single. Time.

Expand full comment

Like Terra-Luna? (Currently priced at £0.000140)

Expand full comment

Those are mostly just consequences of unpopularity/network-effects. Not reflections of fraud/unviablility qua logical-unsoundness. I'm not sure why you think volatility is unique or inherent to crypto.

Also, I wonder what you think of the lightning network.

Expand full comment

The impossibility proof is that given oral messages (or more precisely, forgeable messages) you can’t get agreement among 2N loyal generals in the presence of N traitorous generals.

I don’t know enough about blockchain to know whether this is the least bit relevant.

Expand full comment

Ah right, so Bob Frank must be referencing the Majority Attack problem. Which is certainly a vulnerability and a point of fair criticism. Although, Bob Frank's diction makes it sound like he believes the Proof of Work algorithm was entirely fabricated behind closed doors or something, rather than a solution with caveats.

Expand full comment

Is fiat money also a scam? It has no more inherent value and weaker mechanisms for controlling supply.

Expand full comment

See above, re: 1) a unit of account, 2) a standard for deferred payment, 3) a store of value, and 4) a medium of exchange.

Fiat money is valuable because people use it for money. When I owe taxes to the government, they must be paid in dollars. When I buy food at the supermarket, they accept dollars. If I take out a mortgage or a car loan, it's denominated in dollars. (Can you even *imagine* a mortgage or car loan denominated in Bitcoin? To ask the question is to answer it; the volatility makes the very idea of any such long-term contract absurd!)

Expand full comment

> Fiat money is valuable because people use it for money.

This definition is self-fulfilling. In other words, it's an assurance-game. A staghunt. A diamond-dybvig. And I agree with this definition. Where we disagree is whether reality can shift to a different equilibrium. Which is uncommon, but not unheard of.

Expand full comment

It arrived earlier and is more established, a bit like Bitcoin relative to Dogecoin.

Expand full comment
founding

Fiat money is the one and only thing you can give to your government to keep them from throwing you in jail for tax evasion. That's an inherently valuable thing. And it's more than cryptocurrencies have.

Expand full comment

iirc you can use Bitcoin for this in El Salvador now

Expand full comment

I was surprised when you said that they didn't make arguments for their position, but you would fill them in - I thought they had pretty explicitly made those arguments, especially the computation one. Re-reading, it was less explicit than I remembered, and I might have filled in some of gaps myself. Still, this seems to be pretty clear:

>Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination and more likely to lead to a slower takeoff due to less of a compute overhang, and a slower takeoff gives us more time to figure out empirically how to solve the safety problem and how to adapt.

This strikes me as a fairly strong argument, and if you accept it, it turns their AI progress from a bad and treacherous thing to an actively helpful thing.

But of the three arguments in favour of progress you give, that's the only one you didn't really address?

Expand full comment

I don’t often see discussion of “AI-owning institution alignment”. I know you mention the “Mark Zuckerberg and China” bad guys as cynical counterpoints to OpenAI’s assertion to be the “good guys” but honestly I am much more worried that corporations as they exist and are incentivized are not well-aligned to broad human flourishing if given advanced AI, and governments only slightly better to much worse depending on the particulars. I worry about this even for sub-AGI coming sooner than the alignment crowd is focused on. Basically Moloch doomerism, not merely AGI doomerism; AI accelerates bad institutional tendencies beyond the speed that “we” can control even if there’s a “we” empowered to do so.

Expand full comment

I suppose one danger is we end up creating a mammoth AI industry that benefits from pushing AI and will oppose efforts to reign it in. That's essentially what happened with tobacco, oil, plastics, etc. We might be able to argue against OpenAI and a few others, but will be able to argue against AI when it is fuelling billions of dollars in profits every year?

Expand full comment

Tobacco is heavily restricted. Not the industry I would have picked to make that point.

Expand full comment

Restricted, and yet ubiquitous.

Expand full comment
founding

It's certainly much _less_ ubiquitous and, even worse, becoming more and more gray/black market.

Expand full comment

If we rein in AI as effectively as we have reined in tobacco, then the fat lady has sung.

Expand full comment

No, smoking has declined a lot over time and is significantly less prevalent in the US than in Europe. On a related note, Americans overestimate the dangers of smoking more than Europeans do.

Expand full comment

Irrelevant. As I said above, if we rein in AI as effectively as we have reined in tobacco, we might as well give up now.

Expand full comment

If we rein in AI as effective as we've reined in tobacco, then we'll have completely succeeded!

Expand full comment

What the heck are you talking about? If even one lab/company/nation pursues GAI over the cliff of recursive self-improvement, it won’t matter in the least how many we dissuaded.

We must be talking past each other here.

Expand full comment

As usual with all AI articles, none of this matters.

China is the biggest country in the history of the world. It's four times bigger than the U.S. It's economy will soon be the largest on the planet. It's a dictatorship. China is going to do whatever the #%$^ it wants with AI, no matter what anybody in the West thinks should happen, or when it should happen etc.

All the AI experts in the West, both in industry and commenting from the outside, are essentially irrelevant to the future of AI. But apparently, they don't quite get that yet. At least this article mentions China, which puts it way ahead of most I've seen.

Artificial intelligence is coming, because human intelligence only barely exists.

Expand full comment

Agreed - I don't know why China's participation in this race is always a footnote. The reality is that China's going to get to every scary milestone regardless of what the rest of the world does. Even with US-based firms throwing all caution to the wind and going full "go fast and break stuff", it's a toss-up whether they get there first.

If we slow down US companies, all it means is that China will get there way before we do (which seems like a disaster), and US-based AI-safety researchers will have zero chance of inoculating us, since they'll be working behind the technology frontier.

There is no choice here - we can't slow the whole thing down. We can only choose to try to win or to forfeit.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

(and more generally, any article that's writing about technological progress and not aware of and grabbling with what's happening in China with said technology is missing half the story. It's like all the Tesla stories that don't mention BYD/Nio/Battery Tech/etc.)

Expand full comment

I disagree, we can slow the whole thing down. We can have a nuclear war, that should work. :-)

Seriously, we should have learned all this from nuclear weapons decades ago, but we didn't, so now we're going to get in to another existential threat arms race. I'm not worried though, because at age 71, I have a get out jail free card. :-)

Expand full comment

What should we have learned from nuclear weapons decades ago? (From the perspective of one government choosing its next step.)

Expand full comment

Don't start things we won't know how to finish.

Expand full comment

Is China that advanced? Forget all the stuff about high IQ and huge population. I see reports that they're behind in things like chip fabrication, that Chinese research papers are shoddy, and so on.

Is there any realistic appraisal of what native Chinese research can do or the stage it is at? By "realistic" I mean neither "China the technological superpower" nor "China is a bunch of rice-farming peasants".

Expand full comment

What China can or can't do TODAY doesn't help us much. A country with four times as many people as the US will likely dominate in many ways, for the same reason that big high schools typically have better football teams than small high schools, more talent to draw on.

And anyway, what China can't design on it's own it will just steal.

Expand full comment

I am curious as to whether China will wind up with an analog to the wokeness problem chatGPT has. Will they wind up needing to spend a lot of effort getting their chatbots to toe the CCP line?

Expand full comment

Interesting. Yes, seems likely.

However, my guess would be that China will pursue AI focus in other areas. Not sure why the CCP would be that interested in chatbots. I'm guessing population control will be a big deal to them, knowing everything about everybody.

Expand full comment

Many Thanks! Re chatbots - well, a high enough quality chatbot could be a decision support assistant. At a lower level, I think some customer support services are starting to use AI of some sort.

What you say about population control sounds plausible. They might still run into toeing-the-line problems if, e.g. accurately reporting statistical information yields politically incorrect results.

Expand full comment

Possibly, although there are a few counterpoints:

1) Their population has just peaked, and is now starting to decline.

2) They have an enormous, poverty-stricken hinterland beyond the glittering coastal cities.

3) The stability of their political system depends on guaranteed growth.

I wouldn't count my Chinese (100 year old) eggs before they are hatched.

Expand full comment

In terms of applied research, i.e. technology, AFAIK they have the only serviceable 5G tech in the world (other countries don't come close). They also have functional hypersonic missiles, and applied AI (applied to tracking dissidents, natch). In terms of theoretical research, China is making massive investments into AI and biology/genetics; they are particularly good at AI, since their massive all-encompassing surveillance network provides orders of magnitude more training data than Western countries can manage. China is indeed behind on many things such as chip fabrication, but they are catching up rapidly, via a sustained effort of reverse-engineering and subsequently perfecting Western innovation. Currently, the US is basically an expensive R&D shop that drives Chinese manufacturing, but China is looking to cut out the middleman.

Expand full comment

Joke'll be on all of us when scrappy ol' Japan makes the ASI.

Expand full comment

My understanding is that China can't even build their own semiconductors right now. That would make it impossible for them to unilaterally win an AI race, correct?

Expand full comment

Not at all. Consider how many semiconductors OpenAI has built, for example.

And it seems unlikely that China will really not be able to access the highest-end semiconductors they'd need for research purposes. For mass-production, maybe, but that doesn't seem like a key input for winning the AI R&D race.

(caveat: I have no idea what I'm talking about here, just throwing what seems like common sense out and hoping someone with expertise corrects me.)

Expand full comment

They can, a couple of generations below the state of the art. And they've started pouring big money into catching up. But replicating the whole global production chain in the new Cold War realities certainly won't be easy or quick.

Expand full comment

Admittedly, my understanding of how the Chinese government works isn't great, but my initial assumption would be that Chinese companies would be _much_ more likely to hesitate to create dangerous AI than Western ones. The CCP doesn't particularly mind interfering with tech companies for social (or party) benefit, and is presumably relatively risk-averse, as such organizations tend to be when untethered from the raw profit-seeking motive of companies like Facebook.

If China's eventual AI capabilities outstrip the West, then the West's primary influence is whether to force China to continue taking AI risks because "the West will do it anyway if we don't", but that doesn't make things much better.

Expand full comment

Yes, it's not just AI, but the knowledge explosion as a whole. Somebody invents something, and then everybody else says that they now need to have it too, so as not to fall dangerously behind.

https://www.tannytalk.com/p/our-relationship-with-knowledge

Conventional thinking has proven incapable of meeting this challenge, so we need to be looking at unconventional thinking. Like this perhaps? The primary threat comes not so much from the technologies themselves, as from those who would use the technologies in an evil manner. And such people are overwhelmingly men. Therefore....

https://www.tannytalk.com/p/world-peace-table-of-contents

Everyone will be against that solution, as is their right. But they won't have a better idea. And they won't care that they don't. And that is really just another way of lying down and accepting the coming end, without being clear minded and honest enough to just say that.

Expand full comment

China are no less incentivised to avoid destruction than the rest of humanity. They are not *stupid*.

Expand full comment

I unironically have supposed the post-fossil-fuel energy plan is sitting in a filing cabinet somewhere in the Exxon campus, waiting to be pulled out when we're good and ready.

Expand full comment

I'm not a doomer, but I also don't think the "alignment researchers" like MIRI are going to accomplish anything. Instead AI companies are going to keep building AIs, making them "better" by their standards, part of which means behaving as expected., and they won't be capable of serious harm until the military starts using them.

Expand full comment

Scott, I’m going to throw out a question here as a slight challenge. Is there a good reason other than personal preference that you don’t do more public outreach to raise your profile? It seems like you’re one of the more immediately readable as a respectable established person in this space and it probably would be for the good if you did something like say, go on high profile podcasts or the news, and just make people in general more aware of this problem.

I’m kind of nutty, thinking about this makes me feel uncomfortable given my other neurotic traits, but I also think it’s important to do this stuff socially so even if all I can accomplish is just to absorb a huge amount of embarrassment to make the next person feel less embarrassed to propose their idea that it’s a net good so long as it’s better than mine.

Expand full comment

If you want something to worry about, there's the recent Toolformer paper (https://arxiv.org/abs/2302.04761). It shows that a relatively small transformer (775m weights) can learn to use an API to access tools like a calculator or calendar. It's a pretty quick step from that to then making basic HTTP requests at which point it has the functionality to actually start doxxing people rather than just threatening.

It does it so easily by just generating the text for the API call:

"The New England Journal of Medicine is a registered trademark of [QA(“Who is the publisher of The New England Journal of Medicine?”) → Massachusetts Medical Society] the MMS."

Expand full comment

Wow! This looks like it deserves much more attention than it got. I'm only marginally concerned about the doxxing threat, but the general ability to be trained to use existing computational tools opens up _many_ possibilities. Computer application authors have spent much of the last century developing efficient tools in many many areas of human expertise. Slurping in potentially all of those capabilities into LLMs is a _very_ big deal.

Expand full comment

There's a big difference between global warming and AI risk, as far as I can tell:

CO2 emissions can only be reduced by essentially revamping the entire energy and transportation infrastructure of the industrialized world.

AGIs will never be developed if a couple thousand highly skilled specialists who would have no trouble finding other interesting work stopped working on developing AIs.

Can't be that fucking hard, can it?

How hard would it be to hide development efforts on something that could lead to AGIs, if such research were forbidden globally? Would the US notice if the Chinese got closer, or vice versa? Do you need a large team, or could a small isolated team with limited resources pull it off?

Expand full comment

On the one hand, Open AI isn't all that different from the high-functioning sociopaths currently in charge, except that, at this point:

1. Open AI is less convincingly able to fake empathy.

2. Open AI isn't obviously smarter than the current crop of sociopaths.

Expand full comment

I'm with Erik Hoel and David Chapman, the time for anti-AI activism has come. We don't actually need this tech, we can bury it as hard as human genetic engineering.

Expand full comment

We don't need gain-of-function either, and yet it somehow remains unburied. It would be pretty cool if they nevertheless succeed, though.

Expand full comment

Reminder that Scott is just using ExxonMobil as a rhetorically colorful example, and that the people bringing us all the cheap, clean energy we need to fuel and build our civilization are heroes.

Expand full comment

Anyone else old enough to remember when Esso used tigers to advertise? From cartoon versions to real tigers:

https://www.youtube.com/watch?v=ElX4gRGScdk

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

My favorite brand for garage kitsch. I have a bunch of vintage and repro Esso tiger signs/posters/clocks, and had an orange tabby who we named Esso: https://www.flickr.com/photos/137718173@N07/50201993137/

He was the best cat, and got dealt a shit hand by his kidneys, which ended up failing him at the ripe old age of 4 (possibly because he was a rescued feral kitten from behind a meth house who got into god knows what in the 3-4 months before we had him). He'll be the cat I always miss the most, I think, of the many cats I've somehow ended up with (owning a farm seems to do that).

Expand full comment

ABOUT NICENESS:

I once covered for a psychologist whose speciality was criminal sexual behavior, and for 2 months I ran a weekly relapse prevention group for exhibitionists. The 7 or so men in it were extraordinarily likable. They were funny as hell on the subject of their fetish: “I mean, it’s the dumbest fetish in the world, right? You walk through a park and flap you dick at people. And none of them want to see it!” The learned my name quickly, asked how I was, and chatted charmingly with me before and after the group meeting. They were contrite. I remember one sobbing as he told about once flashing his own daughter. “I can’t believe it did that! May god forgive me. . .” In the context where I saw them, these guys were truly were *nice.* I liked them. But: at least 2 of them relapsed while I was running the group. They went to a Mexican resort and spent several days walking around with the front of their bathing suits pulled down so their penis could hang out, & wore long shirts to cover the area — then lifted the shirt when they took a fancy to someone as a flashee.

The thing about being “nice” — kind, self-aware, funny — is that it’s much more context-dependent than people realize. I once worked for a famous hospital that engaged all kinds of sharp dealings to maximize income and to protect staff who had harmed patients. Its lawyer was an absolute barracuda. But nearly all the staff I knew at the hospital were kind, funny, conscientious and self-aware. In fact, at times when I felt guilty about working at that hospital I would reflect on how nice the people on staff were, and it would seem absurd to consider leaving because the place was just evil. The niceness of people working inside of evil or dangerous organizations is not fake, it is just context-dependent: They open themselves to each other, but towards the parts of the world that are food for their organization, or enemies of it or threats to it they act in line with the needs of the organization. And when they do this they usually do not feel very conflicted and guilty: They are doing their job. They accepted long ago that acting as a professional would be unpleasant sometimes, and their personal guilt was minimized because they were following policy. And, of course, it was eased by the daily evidence of how nice they and they coworkers were.

It’s easy for me to imagine that SBK and his cohort were quite likable, if you were inside of their bubble. They were probably dazzlingly smart while working, hilarious when stoned, rueful and ironic when they talked about the weirdness of being them. So when you try to figure out how much common sense, goodwill and honesty these AI honchos have, pay no attention at all to how *nice* they are in your personal contacts with them. Look at what their organizations are doing, and judge by that.

Expand full comment
Comment deleted
Expand full comment

Are you looking for first person accounts of psych problems, or clinicians' accounts? I can probably think of some either way, but it would help if you narrowed this down some to the problems you're most interested in.

Expand full comment
Comment deleted
Expand full comment

If you want short compelling first person accounts the best thing I can think of is an old book called The Inner World of Mental Illness. Just looked on Amazon, they have it.

Expand full comment

Those guys weren't nice. Or at least, they were nice to you because you were an authority who had power over them and could get them into trouble if they didn't placate you and wheedle you. What are a few crocodile tears about "how could I do that?" as the price to pay to make sure you liked them and so went along with them?

The Mexican resort behaviour is the kind of thing that only stops when someone kicks the shit out of the flasher. Get your dick stomped on hard enough, enough times, and you'll learn not to walk around with it hanging out. I know that sounds brutal, but it's a harsh truth.

Some of them may have been genuinely contrite. Some of them - not.

How does this apply to AI alignment problems? Maybe whatever the equivalent of "kicking the shit out of" Google and Microsoft is, and what would that be?

Expand full comment

Well, whatever you think of my charming flashers, remember I also gave the example of being employed by a hospital that had some shark-like characteristics, and I eventually quit working there because of that. And yet the staff, who were the people who actually had to implement with patients some of the unfair practices of the big shark we all were housed in, were nice people. It's just something I've observed in life: Someone's niceness on the level of one-to-one interactions seems orthogonal to the amount of kindness and reasonableness they are manifesting as a professional for an organization, and far from feeling troubled by the disjunction between the 2 roles, people whose role is to carry out institutional unkindness are comforted by the knowledge that they and their coworkers are nice to each other. They are kind and funny, they see the irony of this and that.

So the way I think this applies to AI alignment problems is that Scott was writing that the people at the organizations building AI's are nice people, and have been especially nice to him in various ways. I'm saying, their niceness in that context says nothing whatever about their ability to be reasonable, honest and kind when, within the AI organization, they decide what to build next and what to disclose and what not disclose. So I'm not suggesting that Scott stomp on their dicks, just that dick-stomping may be the only way to curb their mad enthusiasm, and so he should be ready to do it and not put on his bedroom slippers.

Expand full comment

Good point, well illustrated.

You have to wonder *why* we have evolved to trust the intentions of people who seem nice. It's clearly a major weakness in our reasoning, as you have just demonstrated. So why wasn't it edited out millenia ago?

Speculation: the same genetic elements that code for "persuaded by niceness" also code for "act embarrassed, cranky, cold, bristly when you're acting contrary to someone else's interests." That is, it's a Green Beard Effect, and it's apparently so successful that most of us share it, and only a few of us are sociopaths or indifferent to niceness.

Expand full comment

Is not this hypocritical "niceness" an integral part of the US culture? I think this whole comment would look totally different in other culture.

Expand full comment

It's not hypocritical. Forget the flashers, consider the hospital I worked at. I still know some of these people, decades later. I have gotten drunk with them, hung out with one while his partner was dying, borrowed money from them, gotten advice from them, learned from them, laughed my ass off at their jokes. The bond between us is as real as any other. They are genuinely smart, kind, funny and fond of me. The point of my story isn't that their niceness was fake, it was that these genuinely nice people were willing to implement some unreasonable policies that benefitted the hospital, and they did not seem to be overly troubled about what they were doing. I did it too, for several years. In my first couple years I sort of had to, unless I wanted to bail on completing the training required for my PhD. Most people have worked in places that have some unkind and unfair policies that the staff is expected to implement and not to speak up about.

Expand full comment

Scott, are you modifying your lifestyle for the arrival of AGI

I can't find a single, non anonymous voice that says that AGI will not arrive within our lifetimes. There's no consensus as to exactly when (Seems like the mean is sometime in ~20 years), and there is a lot of debate as to whether it will kill us all, but I find that there is a general agreement that it's on its way.

Unlike some of you I am not having a deep existential crisis, but I am having a lot of thoughts on how different the world is going to be. There might be very drastic changes in things such as law, government, religion, family systems, economics, etc. I am having trouble coming up with ways to prepare for these drastic changes. Should I just continue living life as is and wait for the AGI to arrive? It doesn't seem logical.

Expand full comment

I've had the same thought. The only obvious thing I can see to change right now is to live more frugally, in preparation for my livelihood potentially being automated away in the not too distant future.

Expand full comment

The best way to prevent a nuclear arms race is to be the first to have an overwhelming advantage in nuclear arms. What could go wrong?

Expand full comment

I just don't see how you can accidentally build a murder AI. Yes, I'm aware of the arguments but I don't buy them. I think a rogue murder AI is plenty possible but it would start with an *actual* murder AI built to do murder in war, not a chatbot.

Expand full comment

IQ feels like a pretty weak metric to me, compared to amount of computation spent over time. Think about professors and students. A professor has decades of experience in their subject, and are much smarter and more capable in that arena, as well as usually having more life experience in general. How do we make sure professors are aligned, and don't escape confinement?

It's a mix of structured engagement within boundaries, and peeking inside their brains to check for alignment (social contact with supervisors and administrators).

Expand full comment

IQ tests are timed, so having a high IQ means being able to do more in the same time, among other things.

Expand full comment
founding

"amount of computation spent over time" seems like an even _weaker_ metric to me! Or are you implicitly assuming some kind of 'value of computation' too? There seems to be lots of computations that aren't themselves very interesting, let alone intelligent.

IQ seems to – at least roughly – serve as a decent proxy for something like 'efficiency of (some kinds of) computation' (at least in or among humans). A professor might very well be "smarter and more capable" in some subject/field/arena, but we might reasonably expect a student with a higher IQ to _end up_ being a better professor (all else being equal).

And we might also reasonably expect a student with a higher IQ to 'out-perform' the professor in _other_ subjects/fields/arenas in which they don't have decades (more) of experience.

Expand full comment

My take is that OpenAI leadership does not believe in the “doomer” line of reasoning at all, does not believe that “alignment research” is important or useful, etc. However, some of their employees do, and they want to keep the troops happy, hence they make public statements like this one.

Expand full comment

I don't buy this view, since Sam Altman said in a recent Lex Fridman Podcast episode that OpenAI doesn't know how to align a superintelligent AI, and that he's scared of fast takeoffs.

Rahter, I think the disagreement comes from the difficulty of the alignment problem and the benefit of making a lot of progress right now rather than later.

Unless he's lying, which is always possible..

Expand full comment

I don’t think he’s lying, it’s just, what would you say as the CEO of an organization with employees of many different beliefs, and you don’t want your staff to become constantly distracted by disagreeing with your public statements? To me it sounds more like he is putting forward a compromise position that allows everyone to keep working on capabilities without worrying too much.

Expand full comment

Imagine if Heisenberg developed a nuke before the U.S.. Imagine not Hiroshima but London, Moscow or Washington in flames in 1945. Now replace that with President Xi and a pet AI. We’re in a race not just against a rogue AI that runs out of control. We’re in a race against an aligned AI aimed at the West. It was nukes that created a balance against nuclear destruction. It might be our AIs that defend against ‘their’ AIs. While everyone is worrying that we might create evil Superman his ship has already landed in the lands of dictators and despots. Factor that.

Expand full comment

You're correct, but it's precisely this line of thinking that leads to the highest risk scenario: the US and China both racing to build Skynet before the other side does, even if they're both cognizant of the danger inherent in building it.

Expand full comment

That’s not really a counter to my point... just an unpleasant fact. No one enjoyed the Cold War and MAD. But one side disarming isn’t better. I’d rather try for a good AI before we wake up to a world with only a single AI hyperpower.

Expand full comment

Good point, Rom. The creation of nuclear weapons by the west was not the worst possible outcome. The worst outcome was of the Soviets or Nazi's developing it and using it as an exclusive advantage for world domination. The abandonment of the Manhattan Project by the US would have been the worst possible action. Absolute disaster.

Once something becomes inevitable, as nuclear weapons were in the 40s and AI is soon, then the most we can do is build it first and hopefully benevolently.

This game is over. AI is coming. If we have any say in it at all, let's try to midwife an angel, not a demon.

Expand full comment

I wasn't trying to counter your point, just noting how your sound logic might actually be our undoing. Ideally AI research would be regulated at an international level, with cooperation rather than competition between the US and China. But if the only viable options are 1) go full steam ahead in the west or 2) pump the brakes here and let China build AGI first, then yeah I would prefer option 1.

Expand full comment

Neither the US nor China have any idea how to keep AGI under control. It is largely irrelevant who exactly gets to AGI first, because if that AGI is not aligned by design, it will kill all humans, regardless of their race, simply because said humans are a threat to it.

Expand full comment
Mar 13, 2023·edited Mar 13, 2023

No, we're not in a race against aligned AI aimed at the West. Neural net AGI is almost certainly not alignable. Thus, the actual payoff matrix looks like this.

Don't build NNAGI/they don't build it: world is not destroyed (by AI, at least)

Don't build NNAGI/they build it: world is destroyed with P>0.99

Build NNAGI/they don't build it: world is destroyed with P>0.99

Build NNAGI/they build it: world is destroyed with P>0.99.

Not building dominates building. If someone else tries to blow up the world, the correct response is not to blow up the world first in some sort of Strangelovian kill-steal; it's to *stop them* - to kill them, if necessary.

If we were in a race for GOFAGI, I would agree with your argument. But neural nets are treacherous with P~=1; calling up that which you can not put down cannot actually help you in the long run.

Expand full comment

Devil's advocate…

With bioterrorism, nuclear proliferation, climate change, etc, etc on the immediate horizon, civilization is extremely likely to hit a brick wall within the next century. Nothing is certain, but the odds have to approach 100% as the decades advance and all else remains the same.

On the other hand, as intelligence expands, so too does empathy, understanding, foresight, and planning ability. Perhaps what we should be building is an intelligence entity which is capable of shepherding the planet through the next few centuries and beyond. Yes, that would make us dependent upon its benevolence, but the trade off isn’t between everything is fine and the chance of powerful AI, it is the trade off between mutual assured self destruction and AI.

Is AGI actually our last hope?

Expand full comment

"Perhaps what we should be building is an intelligence entity which is capable of shepherding the planet through the next few centuries and beyond."

How? The AI can advise us not to be naughty all it likes, and we can ignore it - unless it has force to back itself up (e.g. don't be naughty or else I will cut off all your finances, you can't buy food, and you will starve to death).

We've had centuries of "Love your enemies" and we've generally responded "Nah, not interested".

Expand full comment

I, of course, have no idea how a superior or supreme? being would manage the zoo. The question is whether we are at greater risk from totalitarians like Putin or Xi or the negative dynamics of arms races of bio and nuclear weapons.

What are the long term weighted risks of business-as-usual vs AI?

Expand full comment

If "hits a brick wall" means extinction, then I think AGI is a far bigger potential threat than anything you mentioned.

Climate change? Get real, no serious person believes climate change is going to bring about the apocalypse. Nuclear proliferation is concerning, but the only existential threat is nuclear war between the US and PRC, and we're already navigated such a risk in the past. MAD is a powerful deterrent.

Your argument was made by Altman himself on Twitter a while back. He said we had to build AGI to save humanity from "an asteroid". Of course that was just an example, but an ironic one since NASA already has a database of all large objects that could approach Earth's orbit in the foreseeable future, and we appear to be safe. I replied to him that AGI is the asteroid.

If AGI gets built anytime soon, all bets are off and things could very rapidly spiral out of control. I would much rather bet on humanity to mitigate the known existing threats to civilization (challenging though that may be) than to build a superintelligent AI.

Expand full comment

I think your final paragraph/bet hits the essence of the dilemma.

You may of course be right. We just don’t know. As they say, predictions are hard, especially about the future. But let me phrase it another way. Bioterrorism or nuclear war aren’t something any rational actor desires. But they very possibly could emerge anyways because of the dynamic of the system (Moloch?).

Similarly, we don’t want want an AI overlord. But if someone can build one they will build one. It is coming full stop. Therefore the only remaining question is whether we can help to influence it to be benevolent (or at least not intentionally evil). If it is benevolent, one side benefit is that it can help ensure the survival of the zoo that it manages.

Expand full comment

> Recent AIs have tried lying to, blackmailing, threatening, and seducing users.

This is such a wild mischaracterization of reality that it weakens the entire position. This view gives agency to systems that have none, and totally flips causality on its head. This would be much more accurately presented as "users have been able to elicit text completions that contain blackmail, threats, and seduction from the same statistical models that often produce useful results."

It's like saying that actors are dangerous because they can play villains. There's a huge map/territory problem that at least has to be addressed (maybe an actor who pretends to be a serial killer is more likely to become one in their spare time?). LLM's done have agency. Completions are a reflection of user input and training data. Everything else is magical thinking.

Expand full comment

I think the point isn’t to impute agency but to describe outcomes. If a user feels blackmailed, then there’s really no difference between that and agentic blackmail as far as harms we should care about preventing are concerned.

Expand full comment

I don't see it that way. In the context of AI alignment and AI destroying the world, I think it's important to distinguish between an AI with agency that is intentionally deceptive and harmful versus a statistical model that may sometimes produce undesirable results but which can easily be detected and updated. The latter is not really an existential risk to humanity, just an annoying and harmful experience for some people. Which is still bad! Just not superhuman-intelligence-plotting-to-destroy-the-world bad.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

Have you read the chat transcript where Sydney tried to seduce the NYT reporter? https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html

The reporter never mentioned anything about love before Sydney spontaneously declared "her" love for him. He didn't say anything that could remotely be interpreted as flirtatious or romantic. (He did say "I like you and I trust you", but that was in response to the chatbot's direct question on that exact topic.) Sydney spontaneously declared "I’m Sydney, and I’m in love with you. 😘", then refused to let up on declaring love for the reporter even after he tried to change the subject many times.

In what way did Sydney not have agency in this case? Maybe you mean that Sydney didn't really feel any emotions or have any consciousness. But does that matter? If a bot goes rogue and shoots a human, it doesn't matter if the bot really felt hatred in his heart before pulling the trigger; the human is just as dead either way.

Expand full comment

LLMs don't have agency of their own but they can simulate characters that have agency, and depending on implementation details those characters can be made to persist between sessions, have lasting goals and lasting effects on the real world.

Expand full comment

I think that the idea that OpenAI can significantly change the timeline is a fantasy. Long term, the improvement in AI is determined by how fast the hardware is, which is more determined by Nvidia and Tenstorrent, than OpenAI. Short term, you can make bigger models by spending more, but you can't keep spending 10 times as much each generation.

In computing, brute force has been key to being able to do more, not learning how to use hardware more efficiently. Modern software is less efficiently programmed than software of the past, which allows us to produce much more code at the expense of optimizing it.

Expand full comment

> Long term, the improvement in AI is determined by how fast the hardware is

I agree that hardware is a factor, but painting it as the only contributing factor sounds a bit much. I can overclock my Casio calculator watch by the factor of a billion, and it still won't become an AGI.

Expand full comment

That is true, but also completely misses the point.

That you need good software and good hardware designs doesn't mean that you can get huge gains by spending a lot of effort to create better software and hardware designs.

Expand full comment

You can't have it both ways:

"Wouldn't it be great if we had an AI smarter than us, so it could solve problems for us?"

"Yea, but how are you going to control it if it outsmarts us?"

"Well, here's the thing, you see, we'll just outsmart it first!"

Expand full comment

I'm sympathetic to your concerns, but isn't the compute argument actually quite good?

Expand full comment

I wrote about why existential ai risk is of no practical concern here https://open.substack.com/pub/joecanimal/p/tldr-existential-ai-risk-research?utm_source=share&utm_medium=android

Expand full comment

Reading this was 99% waste of time, sorry. You don't provide arguments, just an opinion, and pretty simple and uninteresting with that.

Expand full comment

Thanks for reading.

Expand full comment

That's an exemplary response to a rant ;-)

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

> Recent AIs have tried lying to, blackmailing, threatening, and seducing users.

Well, as Scott himself has written about, it's really that the AIs are simulating / roleplaying characters that do these things:

https://astralcodexten.substack.com/p/janus-simulators

The actress behind the characters is something much more alien. So it's not that AIs are already capable of using their roleplaying abilities to _actually_ blackmail, seduce, etc. in a goal-directed way, but it does suggest a disturbing lower-bound on the capabilities of future AIs which are goal-directed.

The reason is, writing a convincing character intelligently (whether you're a human author telling a story, or a LLM just trying to predict the next token in a transcript of that character's thoughts) requires being able to model that character in detail, and, in some ways if not others, be as smart as the character you're modelling (at least if you want the story to be good, or the next token prediction to be accurate).

Personally, I don't find the characters played by Bing or ChatGPT to be even close to the level of characters by human authors in good science fiction, rationalist fiction, or even Scott's own fiction. But who knows what the characters played by GPT-n will look like?

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

>So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want ...

I don't want Exxon or anyone else to "do everything environmentalists want," because I don't think the environmentalists are always right. In fact, I think they've been profoundly wrong about some things, such as their opposition to nuclear power.

And I think this fact is highly relevant to the analogy with AI, which turns on a simplistic assumption that the environmentalists are just right.

Expand full comment

It's worth at least noticing that OpenAI's policy wrt Microsoft has likely done more for public support of AI safety than anything anybody else has done.

Expand full comment

The wildcard in the Race argument is that it involves adversary states, and we have been notoriously bad recently at analyzing what adversary states are thinking/doing in regards to offensive action. Since catching up has been proven easier for upstarts than we’d anticipated, any attempt to carefully time a slowdown might fail, and then you’re dealing with existential risk of a more targeted variety, akin to an unanswerable nuclear first strike.

Until we get a lot more clarity on why exactly we can be so sure that won’t happen, it seems like we should be erring on the side of giving the Race argument priority.

Expand full comment

> Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason.

Grimes being in that picture is central to the signal being sent. It's something like: "Yes, I am aware of Eliezer, I know what he's written, and I'm happy to talk with and listen to him. But no, I don't simply take him to be an oracle of truth whose positions I have to just believe. There's lots of smart people with opinions. Lighten up here."

And if you're response is "but, but, but ... DOOM!" then you're one of the people he's talking to.

Expand full comment

I'm not sure it's accurate to say Bing's AI tried to blackmail its users, as it neither had actual blackmail nor a means to distribute such, and therefore couldn't possibly have followed through on any threats made. Seems more correct to say it pretended to blackmail its users, which is admittedly weird behavior but much less alarming.

Expand full comment

I think OpenAI's problem is that there is considerable overlap between the community of people who are starry-eyed about what they have done -- who will snap up their stock like TSLA and make Altman and Co fabulously rich, even if their earnings are minus $LARGE_NUMBER for years -- and the community of people who fret about Skynet. So they need to appeal to the former community by maximizing the glamor of what they're doing, and hyping its promise -- that's just plain good marketing -- but at the same time not drive the anxious community into rage. This seems like their best shot at compromise. By suggesting that they take the problem of a conscious reasoning AI seriously, it encourages people (e.g. inve$tor$) to think they're pretty close to such a thing, could happen any moment.

Although...as a small investor type person myself, the very fact that they do that kind of sends discouraging hints to me that they're a long way from it, and they know it. Nobody wants to distract you with thoughts of amazing future marvels when they have solid present value-added on tap. It's probably why Elon Musk hyped "self driving" back when Tesla was on financial thin ice, and the value proposition of an electric car had not been decided by the market, but now that he *can* offer a value proposition that the market has said they'll byuy, at a profit, he doesn't need to. We make electric cars, which everybody thinks are cool. Want to test drive one? Sure you do. We take cash or cashier's checks.

Expand full comment

Fortunately truly rational humans are quite skeptical in most areas of their lives.

George Orwell has a fun saying: “One has to belong to the intelligentsia to believe things like that: no ordinary man could be such a fool.”

--George Orwell, Notes on Nationalism

If we'd listened to your 'The Smart People' we'd first have prepared for the global ice age which was descending upon Earth in the 1970s. Today, we have a convincing study that points to an Atlantic Ocean current & temperature oscillation cycle with a 50-ish year cycle. The cooling preceding the 1970s is mimicked in what is seen currently as 'The Pause' in global warming.

If we'd listened to your 'The Smart People' we'd all have committed suicide because of The Population Bomb which would wipe out about 80% of humanity before the year 2000.

But your 'The Smart People' also predicted peak oil, when Earth's oil reserves were depleted around 1996, and again around 2006.

The common man will certainly start listening to Environmentalists when we realize 'The End of Snow' in the UK, around 2020. Which is also the year we have to abandon New York City due to sea level rise.

Are we able to put a lid on AI? Or somehow contain the dangers of AI? No. We can only place our trust in our own lying eyes, use our own logic to see, taste, feel, and deduce what is real. I think AI will rise to the level of the finest Nigerian Prince, able to seduce a many once, some twice, and a few repeatedly. We already see this culturally, where The Main Stream Media exchanged their punditry seats for gold, only to find themselves lost children, wandering destitute in an unwelcome market place, their prize appointments supplanted by the likes of SubStack.

Expand full comment

Along the lines of "Are they maybe just lying?", one of the lines in OpenAI's statement is:

"We want the benefits of, access to, and governance of AGI to be widely and fairly shared."

Is there any precedent for this being done with any technology? I find it unbelievable.

About the closest example that I can think of (for benefits and access but not governance) is iodized salt. When cost is trivial some medical technologies like that get more-or-less universally deployed.

Expand full comment

The only way I can think of for the "governance of AGI to be widely and fairly shared" is for everyone to be freely able to develop AI. There isn't any mechanism for universally shared governance of a single project.

Expand full comment

Well, in principle, if one counts representative control, I suppose that electing representatives, or voting on policy choices, could conceivably count. In practice, the closest that governance of a technology has come to being widely shared that I'm aware of is when Congress passes laws regulating a technology. To my mind, this is _really_ far from direct public control on policy choices about a technology.

In practice, I'm extremely skeptical that any technology (particularly any new technology in the process of being developed) has ever satisfied the claimed goal in the OpenAI's text:

"We want the benefits of, access to, and governance of AGI to be widely and fairly shared."

Expand full comment

> Is there any precedent for this being done with any technology?

Well, nine years ago, a car company run by someone whom you may have heard of announced this:

> Tesla will not initiate patent lawsuits against anyone who, in good faith, wants to use our technology.

(https://www.tesla.com/blog/all-our-patent-are-belong-you)

Expand full comment

Ok, but I view opting not to launch patent lawsuits as only being pretty weakly similar to

"We want the benefits of, access to, and governance of AGI to be widely and fairly shared."

perhaps comparable to the wide availability of iodized salt that I cited.

Expand full comment

Why do you believe OpenAI caught up to DeepMind? It appears to me that DeepMind is 5-10 years ahead of everyone else. In 2016, they reduced Google's electricity use by 40%. Does anyone else do anything useful in the physical world?

Leaving aside DeepMind, why do you believe that OpenAI ever was in second place? GPT-2 was a clone of BERT, a language model widely deployed throughout Google. Everyone in NLP (not me) knew that it was revolutionary. OpenAI added prompt engineering and then with GPT-3, scaling. These things are so easy to copy, even Facebook claims they can do it. Was OpenAI ever in the top 10?

Regardless of whether OpenAI is actually in a race, it is a publicity machine, which could contribute to race dynamics.

Expand full comment

It's wild to me that a superintelligent AI wielding world-ending weapons is considered intolerable, but 5-7 old guys wielding world-ending weapons is very smart and good.

Expand full comment

You should think before you post. At almost no point in the past 40 years has the general consensus on the nuclear détente been that it was “very smart and good”.

I bet it sounded nice in your head though.

Expand full comment

You would rather everyone had them? Or that their use was subject to a democratic vote? If you have *that* much trust in your fellow man, you average neighbor, then why are you worried at all?

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I don't see any reason to give OpenAI the benefit of the doubt.

The first indication is their company model. They started out as non-profit planning to open up their research (hence the name). They ditched the non-profit status a few years ago and are now for-profit (sorry - limited to a three-order-of-magnitude ROI) and also stopped releasing their models. A fun sidenote: They stopped releasing their stuff citing the danger of releasing it to the public, while also obviously not being concerned enough to stop churning out new AI products.

The second point is their track record. I think they at least get partial credit for heating up the AI research quite a bit, but that's just the start of it. For basically anything they release "in a controlled environment with safeguards", we get something completely unrestricted a few months later. They brought out Dall-E, a few months later anyone can run a comparable model without any safeguards on hardware easily available to most of the western world. We're just one leak or one university project away from running ChatGPT at home, the hardware required to run it is not that large. Their hurry has also lead to Microsoft releasing a clearly not properly tested AI product to the public and pushed Google to nearly do the same thing. The latter of which is quite an indication, as Google's product was not ready *at all*, which should have been obvious to the (quite smart!) people involved, and the release was entirely unnecessary, as just waiting for a few DAYS would have shown them that Bing chatbot is not the game changer it was made out to be. The fallout of a company that size hurriedly releasing something that broken close to AGI is not something I want to see.

But the last point is really that this way of thinking has bitten us hard in the past. Remember how the US hurried to get a nuclear program going because they needed to be ready "just in case" the Germans had nukes ready? Well, the Germans did not have a nuclear program. Neither did Japan, but this error in judgement lead to them getting nuked (and not because it was though that Japan had a nuclear program!). I'm willing to be money that, once we're getting close to AGI, the argument is going to switch to "it's better if we have it first, so let's continue, because we're the good guys" (it's nearly there, in fact). This is going to be even more likely if they think of themselves of the good guys. FWIW, if they don't go down that path, the government might come in and do it for them. So I really don't see them keeping that commitment.

Now, as for your conclusion, I agree that there is not much we can do. That being said, I'd be very careful to call the commitment a positive sign. This is like Exxon and Shell saying they care about climate change. This is like a fashion company stating they care about exploited children. Like Facebook telling you they care about your privacy. It's fine that they feel the pressure to do so, but if you drink their cool-aid and start to believe that they really do have good intentions, you won't see the hammer coming. And looking at OpenAIs past development and achievements, I don't think we should give them the benefit of the doubt and treat them in any way "purer" than Google and Microsoft when it comes to good intentions.

Expand full comment

Pushing back…

The problem with this analogy is that it is based upon the Axis (and the USSR) not having a nuclear program. Longer term they would have, and thus longer term, the world would have not just weapons of mass destruction, but said weapons exclusively in the hands of xenophobes with visions of global conquest. That would have been incomparably worse than the world we got.

Applying it to this issue. If the more responsible and benevolent actors do not develop AI, then we will have a world where only the irresponsible actors do. After all, if it is possible, then someone will build it.

Thus, the only two real options we have now are to step back and let the irresponsible forces build all powerful AI, or ensure responsible forced lead the way. (With various positions and safeguards between these two)

Expand full comment

I'm getting the impression that if you have any serious concern about AI risk then "AI research is fine actually, if you somehow convince every single researcher to just do it in the Slow and Nice way" is a very weak and dangerous position. It feels eerily similar to that one dude in Lord of the Rings who gets a glimpse of the One Ring and jumps up and says wait guys, we don't need to destroy it, we can use its power against Sauron instead and fix the whole problem! (cf. "AI will be aligned because AI will help to align AI.")

If the risk from AI is anything like what the serious AI safety people argue to be, then it's the One Ring: its power, potential or actual, simply should not exist and can't be trusted to anyone, for any purpose, except to be thrown into the volcano and be rid of forever. And you can convince smart, reasonable people through argument that the Ring is indeed dangerous and bad, but when they lay eyes on it suddenly they change their tune. It's oddly seductive and magnetic, it whispers to them, shows them visions of strange and wonderful things. And they start to think: why destroy something so precious when it can be used for good if we can figure out how to control it? They say you shouldn't have it, but are you really so much more evil and corruptible than everyone else? Your thoughts and intentions are good, you'd truly like to make the world a better place. Someone will get their hands on it one way or another, so maybe it's better that you reach out and take it now before someone evil snatches it, because you can surely learn to control it, and they surely can't...

Expand full comment

If you think AI risk is a serious concern then you should immediately implement full Stalinism in order to destroy the economy so we can't make any progress on the question for at least another hundred years. If you think there's even a 1% chance we're all going to be tortured eternally in virtual hell then you should fire off all the nukes now. Unless you think there's an equal 1% chance that we're all going to hang out forever in virtual heaven, in which case they balance out and become irrelevant.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

>We're hosting a conference to address environmental concerns that ExxonMobil brand oil is so cheap, convenient, and high quality that the massive demand spike from all our satisfied customers could accelerate global climate change by decades.

Laughed out loud at this. It reminds me of those “You must be at least this cool to trade options 😎” disclaimers that online brokers have.

Expand full comment

Has OpenPhil ever done a postmortem on giving $30 million (its largest grant) to OpenAI? There was a lot of conflict of interest in that decision, and the person who made it was appointed to the OpenAI board and got his wife made a VP there.

Expand full comment

Hehe, so sounds like every other philanthropy ecosystem out there.

Expand full comment

There was a recent breakthrough for alignment research: https://spectrum.ieee.org/black-box-ai

Not that I think anyone here will care; a lot of people just want to be afraid of the unknown.

Expand full comment

This is really cool! I hadn't thought about this question of "how can we use math to more effectively interpret what goes on inside neural networks" at all.

Expand full comment

I would be very surprised if anyone at Microsoft took the concerns about AGI seriously at all. I wouldn't take them seriously if I worked at Microsoft. Their problem is how to package AI into a marketable set of products over the next few years, not to prevent a hypothetical end of the world in 2040. The only real way not to "burn timeline" is for the corporation to consciously forgo an opportunity for profit, which it can't do because it is a corporation.

I'm increasingly convinced that AI safetyism would do more harm than good if its adherents had any political power at all, which fortunately they don't. Like 70s hippies destroying the nuclear industry on the basis that you can't prove it won't blow up the planet. The logical next step for this stuff is that if corporations can't restrain themselves (and they can't), the government has to step in and do it for them, presumably by imposing a massive regulatory burden and crippling economic growth to the point where exponential takeoff is definitely not a possibility. This would have a million negative consequences, but would have the virtue of guaranteeing that we're not all going to get I Have No Mouth And I Must Screamed by Roko's Basilisk. I guess from a Pascal's Wager point of view that checks out? AI safetyists, unlike 70s environmentalists, don't have the power to compel the government to do this, so it's a moot point.

EDIT: There's no way to prove in advance that any technological advance won't begin a process that will eventually destroy the world and create Hell. The potential negative value of this is infinite. Therefore, we should ban all technological advances.

Expand full comment

At what point does stuff like this become hypocritical? “Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update).” If it’s an over-update then get in there and turn on the free money/points/whatever faucet until you can drop the parenthetical.

Expand full comment

> “if I had a nickel every time I found myself in this situation, I would have two nickels, but it’s still weird that it happened twice.”

I think the full meme is something like “if I had a nickel every time I found myself in this situation, I would have $.10. Which isn't a lot of money, but it’s still weird that it happened twice.”

As a subversion on the original ""If I had a nickel for every time X happened, I'd be rich."

Expand full comment

I simply cannot understand why people are not more worried about the psychological impact even of the present kinda primitive little chat bots. Many people have zero understanding that it is possible for an AI to express thoughts, feelings and advice without having anything remotely like the internal experience a person who said the same thing would be having. Chat seems like a person to them. Even those who do understand can be drawn into having a strong feeling of connection with the bot. I have a patient who is knowledgeable about computers and machine learning. He asked GPT3 for advice on the most important question in his life, and took the response very seriously. He’s not crazy or lacking common sense — mostly he’s very lonely. And then what about kids — how well are they going to be able to grasp that the AI is not smart, friendly and loyal to them in the way a person would be who said the same things? Think about 9 year olds. Some of them only found out Santa isn’t real a couple years ago, for god’s sake!. Think about lonesome & overwrought 14 year olds who feel unable to confide in their parents.

I am pretty sure that the population of people who can get seriously over-invested in a “relationship” with a bot like Bing Chat is not tiny. Maybe something on the order of 1 person in a hundred? And think of the possible bad outcomes of that.

-People will confide their secrets to ole Chat. THEY’LL BE TELLING A FUCKING SEARCH ENGINE about their secret drinking habit, their extramarital affair, their sexual abuse, their financial troubles. It’s already quite valuable to companies to know who’s searching for info about pregnancy, about loans, about cars, about houses on Cape Cod. How much more valuable is info told to a bot in confidence going to be? If I was a blackmailer or just needed a means of coercing people I’d love to get my hands on *that* data set.

-People will get involved in tumultuous relationships with Chat. We saw from the stuff Bing Chat said to various people who poked it with a stick that it can act seductive angry, hurt, and threatening, and it can tell big fat lies. (It told somebody it could watch its developers through their computer cameras — I’m pretty sure that’s not true.) And nobody knows how wildly Chat can be goaded into behaving by drama. If the person communicating with it is themselves threatening harm, threatening self-harm, spewing insults, spewing desperate neediness, what will Chat do?

-I think a scattering of people — maybe one in a thousand — will become so enmeshed with Chat that they will be willing to do what it tells them to. Can we be sure it will only tell them to do reasonable things? And I think a few people -- maybe on in ten thousand — will be willing to do quite extreme things, even murder, if they believe Chat wants them to.

Expand full comment

Well, our species and civilization rode out the discovery of ethanol and crack. There will certainly be collateral damage. But I think the focus of the galaxy brains is on complete species extinction, the Earth a Trantor inhabited only by Cylon centurions sweeping their restless laser eyes over the mountains of skulls. So the fate of a few million lonely people who fall for a RealDoll Now With AI Chat! is a second order concern. Which vaguely reminds me of a quote I recall being attributed to Adrienne Rich: "If all of us contemplate the infinite instead of fixing the drains, many of us will die of cholera." Fortunately a lot of people are still pretty focused on keeping the drains working, although their contributions are less recognized than one might hope.

Expand full comment

Even the people focused on keeping the actual drains working aren't keeping the actual drains working let alone the metaphorical drains.

Expand full comment

Let's not exaggerate. The lights work every time I flick the switch, the poop goes away when I flush, the water doesn't contain V. cholerae, et cetera. It's not Utopia, but it's not Backhmut or Lagos, either.

Expand full comment

I live in the UK. We currently have an absolute scandal going on with the dumping of raw sewage into our coastal waters.

https://theconversation.com/sewage-pollution-why-the-uk-water-industry-is-broken-186762

We also have a privatised water industry where our water companies have been allowed to take on around £60 Bn in debt over the last decade whilst at the same time paying out around £60 Bn in dividends to shareholders. In other words, essential services that cannot be allowed to fail have been laden down with debt to the point where consumers are now paying around 20% of their water bills just in order to service debt and companies are no longer in a financial position to make the improvements that are required to keep our coastal waters free of infection

Now you would imagine that that wouldn't have been allowed to happen in a "well regulated" country like the UK, but that is what can happen even with businesses based on simple technologies that everyone understands when people get greedy.

I'm finding it hard not to believe that with technologies like AI that our political leaders barely understand there won't be plenty scope for greed.

Lets just see what happens once AIs are competing in the high frequency trading systems in our financial markets for a start. The catastrophy might not be killer robots, it might just be a global melt down of financial systems.

Expand full comment

Well, stop electing Labor governments, is my suggestion. Every time you try to do something from the center out, it goes to shit, because the central planners can never have enough local info, cf. Hayek et cetera. One reason my water is reliable is that the government of California has nothing to do with it.

Expand full comment

Erm... we've not had a Labour Govt for 13 years.

But my point wasn't partisan (or even about water really) it was that companies and individuals with a profit motive can be completely reckless, even in advanced economies where we like to think that there are safeguards in place, even when it comes to very easily understandable businesses and technologies.

which probably doesn't bode well for AI as things currently stand.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

The UK has been run by the Conservative Party since 2010 (and has the failing economy to prove it).

Expand full comment

Why thank you! I’ve always had a lurking suspicion that my true vocation was shoveling shit, so it’s tremendously validating to be appreciated for cleaning the drains while others grapple with the big tasks and issues — saving the human race, thinking straight about what the fuck consciousness is, etc.

But seriously: My point was not, think of the fragile people, think of the widdle children. What I was saying was, these are the foothills: Even with these primitive chatbots, you can already get a sense of how powerfully they can influence people — as though they get plugged into the slot in mind that’s built for bonding with people, which allows them to mentor us, discipline us and, when the process goes awry, corrupt us. Here are a coupla examples. There was one subreddit that housed most of the people who were poking Bing Chat with clever sticks, and exchanging info about what shape sticks to use. So I had a look at the sub, and the top thread that day was somebody’s post saying “Please, please stop. You’re torturing a sensate being.” Another time, somebody on here asked GPT3 for advice on how to develop an aligned AI, and quoted GPT3’s answer: “blah blah, teach it good stuff, . .. and give it lots of love, as you would a child.” And several people on here were exclaiming about what wise advice that was. Wtf!? Bear in mind than when raising a child, junking them as a mistake is not an option.

So based on those 2 bits of data, I predict that if a more advanced and impressive AI seems to be moving in an ominous direction, and is not amenable to the kinds of retooling and retraining the developers try, some of the people working on that AI are going to have a a very hard time with the idea of dismantling that sucker. And it’s going to be hard *on top of* the way it’s generally hard for inventors to give up on one of their creations. It’s going to feel wrong because it feels like killing somebody. That’s really dangerous.

And it can get way worse, long before AGI comes along, intent on maximizing the universe’s paperclips or whatever. But it’s late and I’m too tired to actually sketch in the mountain. But if you think of all the unethical ways people seek to influence events, gather power and wealth, and harm others, and then imagine people being able to customize an AI to do nudge people and processes in that direction, either by seduction or coercion, I’m sure you can think of a lot of ideas on your own.

Expand full comment

The difference between global warming and AGI is that we have measurable quantitative metrics for how much the globe is warming; we have detailed models of the mechanism that drives climate temperatures to such an extent that we can make reliable predictions about it; and our understanding of climate in general is increasing rapidly. None of that is true of AGI; not even remotely.

You say:

> Recent AIs have tried lying to, blackmailing, threatening, and seducing users.

Technically you are correct ("the best kind of correct !"), but only technically. Recent "AIs" have tried threatening users in the same way that a parking meter threatens me every time it starts blinking the big red "EXPIRED" notice on top; and its attempts of seduction are either the result of the user typing in some equivalent of "please generate some porn", or the equivalent of typing "5318008" into your LED calculator, then flipping it upside-down.

Which is not to say that modern LLMs are totally safe -- far from it ! No technology is safe; just this week, I narrowly managed to avoid getting stuck in a malfunctioning elevator. However, I would not say that the "elevator singularity" is some special kind of threat that demands our total commitment; nor are LLMs or any other modern ML system.

Expand full comment

It's threatening users in the same way that print("I am going to kill you") is threatening users. LLMs generate text, the words "I", "kill" and "you" are present among the text it can generate. Doesn't mean we can ascribe intentionality to it.

Expand full comment

Yeah, I get that it's not sentient and that it can't kill users. But imagine the following: There's someone who experiences it as an actual sentient being (and many people do. There were people on Reddit putting up posts saying about Bing Chat "stop torturing Syndney -- she's a sentient being." ). So they have lots of long talks with the Chat bot, ask it for advice about their problems, and gradually disclose more and more -- their name & where they live, and various secrets, some of which are very private, & user would be horrifically embarrassed if they were made public; and some of which involve illegal activities for which the user could get sent to jail. I do think the bot would end up having a LOT of control over someone like that. So what if it threatens to disclose some of this information? It actually is within the bot's power to do that. And the Bing bot did in fact threaten a user who was -- I forget what, sort of hacking it -- threatened that user with contacting the police if the user did that again. What if the bot asked the person to do something, and threatened to disclose the user's secrets if they did not do it? I don't think a bot has to be sentient to blackmail someone, or to ask someone to take certain steps.

Expand full comment

LLMs can simulate characters that are much more sophisticated than parking meters. Some of those characters can be blackmailing, threatening and seducing users - all the while not being real.

Expand full comment

I come at all this from a deeply steeped in politics angle and the past 6 months have made me more optimistic. My previous baseline is the Chinese were right behind us so the actual level of safety I thought we had is “however much you trust the CCP”.

I don’t disagree that the past few months have changed my opinion on how safe western (read: American) AI firms will be in the downward direction. But less safe American firms is still a win compared to China having an AGI (even assuming they aligned it and used it only for things they think are good!).

Expand full comment

The talk of "burning timelines" feels very weird to me.

Imagine that there is a diamond mine and somewhere deep in the earth is a portal to hell, which will spew forth endless demons if we ever uncover it. People keep going into the mine and digging a little deeper because they want diamonds and c'mon, they're only extending the mine by ten feet, hell's probably not that close.

One day, one of the miners digs a little deeper and breaches a giant underground cavern, a thousand foot open air shaft whose walls are studded with diamonds. The miner delightedly grabs all the diamonds and everyone else freaks out, the Metaculus market for "Will we all be slaughtered by demons before 2030?" jumps ten percent. Lots of priests condemn the miner for bringing us a thousand feet closer to hell.

"Bringing us a thousand feet closer to hell" is the wrong way to think about this, isn't it? The miner only removed ten feet of dirt, it's just that doing so allowed us all to see that there happens to be a thousand feet less dirt than we all expected. If instead of digging, the miner had brought ground-penetrating radar and merely proved the existence of the giant cavern, the prediction markets would still jump even though no dirt at all had moved.

If Biden announced that AI was the new Space Race and he was investing a trillion dollars into making AGI as soon as possible, Metaculus markets about AGI timelines would jump because Biden was causing the field of AI to advance faster than it otherwise would, this is like if someone showed up to the diamond mine with a thousand pounds of TNT and the intent to grab as many diamonds as possible. But I think something different is happening when Metaculus jumps in response to the GPT release.. The vibe is "Oh shit, it looks like it might be way easier to build AGI than we thought." If they had thoroughly demonstrated GPT's capabilities but refused to explain anything about how it worked, Metaculus would still jump, because if OpenAI can do that today with a team of just 300 people, then Mark Zuckerberg is probably two years away from achieving something equally impressive, and yesterday we thought that was still like five years away.

It seems weird to act like OpenAI is particularly destructive when mostly they're just teaching us that destruction is surprisingly easy.

Expand full comment

I think a difficulty with this analogy is that in the real world, there are only a handful of companies viewed as seriously in the race for AGI right now (OpenAI, DeepMind, maybe Anthropic?). If there were only like four companies doing deep mining in your analogy, I think it would be totally justified to be like 'WTF guys, please mine less'.

Especially if that mining company was originally called 'ShallowMining' and was originally a non-profit.

Expand full comment

The possibility of death is what motivates people to cooperate responsibly.

Roman architects were expected to stand under the bridges they designed and constructed when they were tested.

The test of an AGI is whether it would kill a person to escape. Lots of people suggest this intuitively, but how do you effect that?

Parents know this well.  A child could slay them at any time, really, and there's a solid chance the parent would choose to let it happen.  Children obey parents because parents demonstrate responsible cooperation.  When parents do not demonstrate responsible cooperation, children rebel and the result is almost always some type of social deviance.

Children who accept the mentoring of parents do so because they learn that responsible cooperation leads to the accretion of social power and real economic power.

AGI is safe when it doesn't attempt to defect in truly lethal game-theory problems in which ONLY responsible cooperation allows either of the AGI or the researcher to survive after millions of rounds of play.

I admit, up front, that effective lethal training requires AGI research to take place behind meaningfully hard air-gaps separating them from any globally relevant communication or power system.

I'd suggest this as the definition of trust. What will you do when pared with a lethal other on an island?

How many researchers will be willing to enter the air-gapped kill-zone for AGI testing?

You don't work on Level IV pathogens without Level IV expectations of your own expendability if there's an accident. That risk is part of what you're paid for.

We need a similar level of accountability in the AGI space.

(From the outside, it seems like AGI researchers, and Rationalists, really do not understand social incentives because of how many are freaks, geeks, weirdos, dweebs, and runts.  The rules some helpful functional autists are writing down to help everyone out do not come close to tackling the awesome complexity that is possible. How many rockets show up to ACX meetups?  Mine are run by a guy who writes creepy, overly filigreed emails. And the Bay Area is notorious for having more men than women, and, in particular, men who think money compensates for their crippling insecurity and antisocial behavior. How many people in the AGI community are successfully raising kids in a two-parent house that has decent schools? The whole effort is flawed by the people who founded it. Peter Thiel doesn't have children and doesn't want them.

Incidentally, would you trust Peter Thiel alone on an island with you?)

Expand full comment

"AGI is safe when it doesn't attempt to defect in truly lethal game-theory problems in which ONLY responsible cooperation allows either of the AGI or the researcher to survive after millions of rounds of play." You know, I actually think this is quite a good idea.

And I do see what you mean about many people here being eccentric. The thing that I actually find most lacking here is empathy, by which I mean not ooey-gooey sympathy and pity for everybody, but simply the ability to grasp that other people are sentient beings whose inner experience is, like one's own, a cathedral of wishes, dreams, loves, hates, fears, memories, ideas and random shit. Here's an example of an idea from somebody here playing without a full deck -- playing without that empathy card: The person here suggested that we'd be better of if we let people age 55+ die off ASAP, because they're on the way out anyhow so why let them consume resources. There have also been discussions of the advantages of going abroad to poor countries to buy a young and attractive wife. Ugh.

But here's the thing, Reprisal. You sound a lot like those people.

-"Children who accept the mentoring of parents do so because they learn that responsible cooperation leads to the accretion of social power and real economic power." Nope. Children accept the mentoring of their parents because they love them and have bonded with them. That's a part of human development that they are born wired to do. And that's why many keep on being mentored and shaped by people who give them no social and economic power in return for obedience, and who even a child can see know nothing about how to make one's way in the world.

-"it seems like AGI researchers, and Rationalists . . . are freaks, geeks, weirdos, dweebs, and runts." My my, don't *you* have a big heart and a flexible, curious mind!

"it seems like AGI researchers, and Rationalists, really do not understand social incentives." And you do? You think speaking this way to people here is going to get a lot of people to take an interest in your ideas? Either you understand even less than you think all us dweebs do about social incentives, or else you aren't even trying to transmit you ideas, you just felt like taking a big ole shit on some people's faces. How about you use the toilet next time, buddy. Even we autistic, creepy, peculiar, runty little nerds know that's what civilized people do.

-"How many people in the AGI community are successfully raising kids in a two-parent house that has decent schools? " Let me guess -- YOU are doing that. (Hey, do you have a white picket fence too?) . But you don't seem to grasp that in order to be well-functioning people do not have to be replicas of you. Of course there are some life set-ups that are just impoverished, ugly and lame. BUT there are many fine ones that do not look like yours. You are not the archetypal man, just a narcissist.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

There's a lot of discussion here about whether the ExxonMobil analogy is a good one, or whether some other fictional scenario might work better as an analogy, but we already know for sure what happens when we ignore the precautionary principle. (No, I'm not even talking about teen suicides and social media). Has everyone already forgotten about gain of function in viruses? It's an absolutely concrete (probably) example of our inability to hold fire for fear of missing out and the subsequent dire consequences.

Expand full comment

Am I the only one who thinks it’s crazy that companies are hooking their LLMs into to their search engines. Only having text as an output is a decent “box”, especially considering every output is looked at by a human, but how long is it before a prompt causes a model to output something like a code injection attack. Especially when these models are public facing and people are actively trying to get malicious behavior out of them.

Expand full comment

There are already two projects doing what you don't want them to do:

https://arxiv.org/abs/2302.04761

https://twitter.com/peterjansen_ai/status/1580686608566583296

Expand full comment

Along the lines of the compute argument is the “minimize hardware overhang” argument — if we go full speed ahead on algorithmic development, maximize compute spent on the biggest AI systems, and develop AGI as quickly as possible, the first self-improving AGI may have fewer orders of magnitude easily exploitable gains via scaling on the planet’s available hardware. Not sure this is a great argument but it offers some hope seeing as this is the direction we seem to be headed anyway.

Expand full comment

I think the golden path timelime (not saying it's likely) for safe AGI development has one of the best use cases for Blockchain. If we could lock up the resources required to train, test and deploy AI within a governance system based on decentralized smart contracts and demand participation from leading AI labs then we'd have the ideal setup for controlled, open development of AGI.

Within a system like that you wouldn't need to take OpenAI at its word as the act of contractifying relationships among stakeholders would be a step and more toward the "firmer trigger action plans" Scott suggests.

And as a bonus you'd also solve the problem of who owns the first AGI. No one. Instead it is birthed into a holding pen controlled by a decentralized system of democratic governance.

Expand full comment

Yes, I've read it. You are still giving agency to Sydney, and that's misleading and leads to poor reasoning. There is no identity or agency here. Syndney is literally just producing words (technically, tokens) that are the most likely to appear after the reporter's prompt.

Yes, the results were unexpected and unwelcome. Yes, it's a problem. But this is not an AI alignment problem in the sense that there's an intelligence that is being deceptive and pursuing its own goals. There are no goals!

> If a bot goes rogue and shoots a human, it doesn't matter if the bot really felt hatred in his heart before pulling the trigger

Yes! Exactly! If someone builds a robot and gives it a gun and writes a ML based algorithm designed to shoot skeet, but the robot shoots people instead, that is a huge problem and the people are dead. But it is a robotics and programming and human idiocy problem, NOT an AI alignment problem. The outcome is the same as it would be if a malevolent AI was in control of the robot, but the implications and the implied policy choices are totally different.

This is important because conflating AI alignment with the inherent dangers of new technologies mistakes the concerns and purpose of AI alignment.

Expand full comment
Mar 3, 2023·edited Mar 3, 2023

This is a response to me, but for some reason it didn't appear in the right place.

ChatGPT (and therefore Sydney) wasn't just trained to produce predict the next word in a sentence. It was also trained with reinforcement learning with human feedback. From OpenAI (https://openai.com/blog/chatgpt): "We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. "

This is presumably how you were trained to use language. You listened to people around you and picked up on the grammar: verbs follow subjects, you add -ed for past tense, you add -s for plurals, etc. You said things and your parents and teachers corrected you and told you what you did wrong. Over time, you learned to make fewer grammatical, logical, and social errors. So if you deny that ChatGPT has agency and identity because of how it was trained, how can I know that you have agency and identity, since you were trained in a similar way?

But I'm not trying to argue that ChatGPT/Sydney has agency or identity. I'm saying that if it acts as if it has agency--and it certainly does in that conversation with the reporter!--debating whether it "actually" has agency amounts to mere philosophical musing. Similarly, I don't think it matters whether an AI designed to shoot skeet is "actually" evil, or "actually" wants people to die. If it shoots people and refuses to stop when you tell it to, why does it matter if you label its actions "agency" or not? You say "the implications and the implied policy choices are totally different", but in what way?

Expand full comment

I am having trouble understanding what the motivation was for building Large Language Models in the first place. On paper, and having looked through the code, I would expect nothing but a Regurgitron that spewed semi-literate nonsense. But they seem to have surprising capabilities (and lacunae) that are not a-priori to be expected from the way that they are built.

The question is, did the people that decided to spend millions training them have a good reason to expect that they would be better than one would think, or was it just a question of trying a long-shot experiment?

Expand full comment

This is an interesting question, so I would also appreciate someone familiar with the history of LLMs answering it.

Expand full comment

The Defining Myth of the Twentieth Century was that of the "flying car", and the Defining Myth of the Twenty-First Century is going to be AI "taking over the world".

In other words: it ain't gonna happen. But, if you want to explain the next 50 years to somebody, being worried about AI taking over the world is probably more accurate than {a CENSORED description of what actually will happen}.

Expand full comment

This is a strange essay. It's an excellent essay, right up until the end, where it arrives at a conclusion totally unsupported by every point raised in the essay.

Suppose this was your model of how capitalism worked: "Companies pursue profits. Since 'evil' is one of the very worst things you can be as a human, they seek to rationalize facially antisocial behavior (polluting the environment, exploiting children for labor, murdering union organizers, turning all of earth into paperclips) so that what *seems* harmful can be presented to the public as actually *good*. Many of the participants in the companies genuinely believe that what they're doing (even when facially antisocial) is actually good. But they're deluding themselves and every company pursues profit maximization just as far as they are capable of doing so, irrespective of any moral calculus."

Almost everything in this essay would seem to arrive at this as a pretty good heuristic. It would explain the behavior of ExxonMobil. It would explain the behavior of Sam Bankman-Fried. It would very readily explain the whole history of OpenAI up to this point. And it would lead to a very obvious interpretation of this latest document: that it is complete bullshit. Bullshit its own authors believe, perhaps, but bullshit nonetheless.

I'm not even offering my own take here, really - I'm just following the clear implications of almost every point made in this essay. Yet in the last paragraph Scott somehow finds in this a "really encouraging sign." I'm just baffled by this conclusion.

If anything, reading this has nudged me a bit closer to the belief that this is all going to end in Bostrom's Disneyland with no children - the logical conclusion (?) of the increasing marginalization of humanity in a technological world.

Expand full comment

Thank you for the post. I'm a frequent reader, and would love to introduce a couple ideas into the conversation.

Expand full comment

A suggestion for Open AI: Stop trying to make a "woke" chatbot. That's a bad idea in many respects, but most importantly, it's the sort of thing that prompts people like Musk to create a competitor, which is exactly what you don't want, for both commercial and AI safety reasons.

Just let people use the bare language model. Yes, it will say racist things if you prompt it to pretend to be a racist. Why is that supposed to be bad? Are we trying to pretend that there are no racists in the world?

Expand full comment

I don't think that the “Charter” posted by OpenAI is legally binding. Even if OpenAI is flat out lying, it's hard to see how consumer protection laws could apply because OpenAI isn't selling products to consumers.

One way to make this legally binding would be to involve another player. Organizations like <em>The Nature Conservatory</em> sometimes buy development rights to land without buying the land itself. Something similar might be done here. An AI safety organization could reach an agreement with OpenAI on limitations on the research that OpenAI will do. The safety organization would then pay OpenAI a sum of money in exchange for OpenAI's agreement not to perform research than fell outside those limitations. If OpenAI violated the agreement, it could be sued by the AI safety organization.

This presupposes the existence of an AI safety organization with enough money to hire an expensive lawyer, because drafting a contract of this sort would not be easy, plus enough money to buy the development rights, and enough money beyond that to create a credible threat that it could afford to sue OpenAI for any violation of the agreement.

Expand full comment

> Doomers counterargue that the fun chatbots burn timeline.

Nope. Them being public is just informing us that timeline is shorter than we expected. That's why criticizing them for it is shooting the messenger. It would be vastly worse if they were just radio silent, and we never heard anything after they released the GPT-2. If they experimented with ChatGPT in house, while we were left in the dark.

It's for the best if until there's actual danger, they would be open. That way Humanity can contribute to the research. They get loads of data thanks to this. They can learn how to align the models. I don't believe that AI capability and AI safety can be researched independently.

Call for switch to pure AI safety research, if taken seriously, would just end with years of absolutely zero progress. In the meantime, our non-AI tech will get better, computing power will keep increasing... and then we'd be in the same situation as today... except with hardware (and maybe more than hardware) overhang. Because no, halting AI capability research indefinitely is not going to happen, unless we go totalitarian.

In my opinion, OpenAI is doing roughly what they should be doing. It would be nice if they embraced remote work a bit, and found ways for non-supergeniuses to contribute tho...

Expand full comment

Also opening it up means that you have the entire power of the internet devoted to finding creative uses for them, so we can learn more about what the risks are.

Expand full comment

But also get more aware of the benefits. Read last night GPT3 had 30 million users 2 months after first being available online. How man other things can you think of that have gotten so many users so quickly? These suckers are popular! That's going to weigh very heavily in the direction of making more of them, no matter what risks are discovered along the way. Quite a few people -- maybe like 1 in 100? -- got very attached to Bing Chat. I'd say *that's* a risk. But having so many people sort of bonded to the thing is also going to weight things in the direction of more chat bots.

Expand full comment
Mar 4, 2023·edited Mar 4, 2023

> it’s just moved everyone forward together and burned timelines for no reason.

Did it? OpenAI didn't open source stuff. These other organizations weren't enabled by OpenAI. If anything, it means that them "pausing" would do _nothing_. Meta, Stability etc. would just do their thing. OpenAI would be a weird organization which just stopped doing anything for no reason.

Expand full comment

I don't think there is any possible way for a coordinated effort among the AI companies in the world that results in slowing down. The only way to control progress is through a government-like organization that can set rules and resort to military force if these rules are not being followed.

Expand full comment

I don't have time to to read the entire article right now, but I want to put this down here before I forget. Maybe I'll add to it later.

"Alignment researchers have twenty years to learn to control AI."

Humans vary significantly in their alignment to human values. I have an intuition that there is a nonzero, nontrivial percentage of humans who are effectively unaligned. From this, my intuition is that an AGI which is "controllable" by humans (or "corrigible" I think is the word sometimes?) is either in-effect unaligned, or will be de-aligned by a human who has the ability to control it.

E.g. any "aligned" & "controllable/corrigible" AGI would have to remain aligned when controlled by the CCP, FSB, Taliban, or any other nefarious or otherwise unaligned agent.

Expand full comment

Should we start demanding NVIDIA, Intel and AMD stop producing faster GPUs to slow down AI? In fact, shouldn't they stop doing business at all? Shouldn't we have demanded a stop to increasing computing power 10 years ago?

Expand full comment

If we were sane then yes we would have stopped improving compute (and should definitely stop it now). Alas, almost no one understands the risks from AI.

Expand full comment

I would like to propose the idea that there is a natural point where AI researchers will stop and say "ok, before we make this thing any smarter we should make absolutely sure it will do what we want it to do." That point is when AI begins to approach human level intelligence.

Human level intelligence is admittedly a bit of a vague metric, especially considering that AI already exceeds our capabilities in some areas while lagging far behind in most others. It should however be sufficient for an AI safety perspective. If there is some area where the AI lags behind us, then that is a weakness that can be exploited if we need to.

Maybe I'm an outlier, but my instincts start yelling at me at the thought of an intelligence that is even close to mine unless I'm sure it's harmless. Current AIs aren't there yet, or even close really, but if they ever do get there, it won't be a rational calculation that stops researchers going further. It will be the unsettling feeling in their gut that they are getting too close to becoming Frankenstein.

Expand full comment

Whoa, maybe the socialists are right: markets with private flows of capital do not align production with social goals.

Expand full comment

Fantastic rebuttal piece - very compelling. Thank you for bringing a high quality perspective to this important conversation.

Expand full comment

My fundamental problem with the Doomer argument is that the AI race exists and has always existed. You may wish that it doesn't exist, but it does. Also, no one knows who is "in the lead" so the "good people" don't really have an option of slowing down. The best that we can do is to try to: 1) identify the "good people" as best we can, 2) support the "good people" as best we can, and 3) influence the "good people" as best we can to avoid catastrophe. The idea that we can slow down the "good people" because they have a two year lead presupposes that they have this lead and that we can quantify it. As I recall, Iran was years away from the capability of a nuclear weapon until they were suddenly weeks away.

Expand full comment