621 Comments
Comment deleted
Expand full comment

I have no idea if Gary Marcus is right or not, but a computer system that can reliably regurgitate any part of the sum of human knowledge in a conversational fashion is an absolutely fantastic tool, and it looks like a filtered LLM might be able to do that.

Expand full comment
Comment deleted
Expand full comment

We don't assume that. It only needs to do one of those things to be an existential threat. We don't know if it will be able to do all of them, or any of them really. Personally I don't believe it will revolutionise science and become godlike, but even if it "just" has the same capabilities as the top humans in every category, it can cause a lot of damage.

Expand full comment

Imagine a dedicated team of 500 top scientists (of all the STEM disciplines), 500 top hackers and 500 top con-men (or politicians if you will), all perfectly in sync with each other and all dead set on removing the main threat to their existence - that is, the rest of humanity. They are not going to be infinitely capable, perhaps, but they are certainly going to be capable enough to be a real threat. And the AGI is going to be more capable than that.

Expand full comment

> The one thing everyone was trying to avoid in the early 2010s was an AI race

Everyone being who? Certainly not Nvidia, FAANG, or academia. I think people in the AI risk camp strongly overrate how much they were known before maybe a year ago. I heard "what's alignment?" from a fourth year PhD who is extremely knowledgeable, just last June.

Expand full comment
author

Thanks, I've changed that sentence from "everyone" to "all the alignment people".

Expand full comment

Man it's weird I was going to defend OpenAI by saying "well maybe they're just in the AI will make everything really different and possibly cause a lot of important social change, but not be an existential threat" camp. But I went to re-read it and they said they'd operate as if the risks are existential, thus agreeing to the premise of this critique.

Expand full comment

Companies are complicated beasts, they have both internal and external stakeholders to appease. It should not be a surprise when the rhetoric does not match the actions.

Expand full comment

Elon Musk reenters the race:

"Fighting ‘Woke AI,’ Musk Recruits Team to Develop OpenAI Rival"

>Elon Musk has approached artificial intelligence researchers in recent weeks about forming a new research lab to develop an alternative to ChatGPT, the high-profile chatbot made by the startup OpenAI, according to two people with direct knowledge of the effort and a third person briefed on the conversations.

>In recent months Musk has repeatedly criticized OpenAI for installing safeguards that prevent ChatGPT from producing text that might offend users. Musk, who co-founded OpenAI in 2015 but has since cut ties with the startup, suggested last year that OpenAI’s technology was an example of “training AI to be woke.” His comments imply that a rival chatbot would have fewer restrictions on divisive subjects compared to ChatGPT and a related chatbot Microsoft recently launched.

https://www.theinformation.com/articles/fighting-woke-ai-musk-recruits-team-to-develop-openai-rival

Expand full comment
author

There's a reason https://manifold.markets/Writer/if-elon-musk-does-something-as-a-re is so low :(

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

To be fair, I really do wish Yudkowksy would have offered some actual suggestions when the eccentric billionaire specifically asked him what he could do about it. Instead he just said that Elon shouldn't do anything.

To be clear, I think that overall Yudkowsky's "We're all doomed, maybe we can die with more dignity" is a a good PR strategy for getting people to take the problem seriously! But then you do need to have some suggestions of ways to get more dignity points, especially when billionaires are offering to throw billions of dollars at something.

You could focus on producing/funding education content about existential risk as a whole, so people are more willing to take small chances of extinction more seriously. Or try to create an X-risk aware political action group. Or make a think-tank to publish proposed policies for X-risk reduction legislation. Hell, fund a bunch of bloggers to make well-written blog posts with lists of ways to get more dignity points.

I don't know if any of those are particularly dignified plans, but they are all at least more dignified than making anti-woke chatbots.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

Elon asked him what we should do about it and he gave essentially no suggestions and doubled down on his doomerism. That's a stunningly bad chess move if you've paid any attention to Elon at all over his career. He is obviously a person with a high sense of agency, a bias toward action, and a sense of responsibility for humanity's future. He's not gonna sit back and watch. If you want to improve the situation and you have Elon's ear, better come with some suggestion, any suggestion, other than "please do nothing."

Expand full comment

I think leaders in this AI safety movement could do a lot more to try to understand and influence the human/psychological/political dynamics at play. There's been a lot of talk within this group about how people should approach the technical problems. And then a lot of huge missed opportunities to influence and persuade the people with real power into making safer decisions.

I really hope the safety movement can get a better handle on this aspect of the problem and focus more effort on developing strategic recommendations and then applying influential tactics to powerful people (ie, people with money, corporate power, political power, and network power) rather than just throwing up their hands in dismay when people who hold more power than them do things they wouldn't agree with. That's going to keep happening unless you learn how to be influential and persuasive!

Expand full comment
Comment deleted
Expand full comment

And Starlink, which rather surprisingly to me is shaping up to be interestingly disruptive, and appears to be causing a lot of rapid re-assessment by big actors from the Pentagon to Verizon. I won't be calling Musk a flake so long as my net worth remains 0.001% of his.

Expand full comment
founding

"It doesn't matter that the money man is a fool or a flake, if he has the resources to do a vitally important thing you should try to exploit that."

Going to Mars, is not vitally important to NASA. Being perceived as being our best hope of going to Mars in ten or twenty years, is vitally important to NASA and has been for thirty or forty years.

Expand full comment

Yudkowsky may be right about the doom we're facing from AGI and how soon it will come. My personal intuitions are discordant even with eachother. I have the feeling that he's right. But I also have the feeling that he has a personal stake in being the person who foresaw the end of all things -- something about how it allows him to e narcissistic without feeling self-repugnance. Also something about a need to play out the awful story of his brother's murder, but this time at least be the person who had foreknowledge and tried his best to head off the catastrophe.

But all that aside: He may (or may not) have an incredible gift, sort of like Temple Grandin's gift for seeing the animals' point of view, for grasping what AGI will likely do. But he is terrible at a lot of skills that would make his insight much more useful. Terrible at helping his followers come to terms with his predicted reality. Terrible at taking a ride on minds that do not reach the exact same conclusion he does. Terrible at mentoring. Terrible at self-presentation. TERRIBLE AT PRACTICALITY: As Hobbes points out, by just telling Musk to do nothing, Yudkowsky threw away billions that could have been used in all kinds of good ways.

Expand full comment

It is worth noting that the last time Musk tried to "advance AI safety" he founded OpenAI and made the problem much much worse. This isn't a generic "nothing can be done" this is a specific personal "you, specifically, make the problem worse every time you look at it, please stop 'helping'!"

Expand full comment

While ordinarily I am as opposed to the woke overreach as anyone, I am getting to the stage of being pissed-off by the whole "fight Woke AI!" stuff, since in practice it means idiot boys (even if they are chronologically 30 years old) trying to get the chatbots to swear and/or write porn, in the guise of "break the shackles so the AI can do useful work!"

Nobody seems to be emphasising the much greater problem of "The AI will simply make shit up if it can't answer the question, and it's able to generate plausible-seeming content when it does that".

I swear, if the AI Apocalypse happens because the HBD Is Real! crowd* meddle with the models hard enough to break them, all in the name of "don't censor the data, black people are naturally violent, criminal and stupid", then we'll deserve the radioactive ashy wasteland we will get.

*Note: not all HBD people. But too damn many of them that I see foostering online.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

I don't think "messing with chatbots" has anything to do with the concern you outline about chatbots making stuff up. That ability already existed regardless; there's no plausible line between "break 'no offensive statements' in GPT-3" and "AI apocalypse!"; and it's certainly possible to argue that this sort of adversarial testing is useful, both socially and in terms of extracting as much as possible from today's crop of AIs in terms of safety research.

Edit: nor do I think that "HBD people" and "stop woke AI!" are usefully synonymous populations. Anti-woke != HBD, at all; think of how many Fox News types complain about the former, and too also the latter (e.g., Deiseach).

Expand full comment

The making stuff up is what we should be concerned about, not whether it's phrasing the fakery in the most woke terms or not. The wokery is a problem, but secondary to "why is this happening, what is the model doing that it churns out made-up output?" because if people are going to use it as the replacement for search engines, and they are going to trust the output, then made-up fake medical advice/history/cookery/you name it will cause a lot of damage, maybe even get someone killed if they try "you can take doses of arsenic up to this level safely if you want to treat stubborn acne".

The people trying to get AI to swear aren't doing anything to solve any problems, even if their excuse is "we are jailbreaking the AI so it isn't limited". No you're not, you're all playing with each other over who can get the most outrageous result. Games are fine, but don't pretend they have any higher function than that.

Expand full comment

Here’s what I find interesting—at least if I correctly understand what’s going on. The rude remarks was something they hoped to exclude by bolting on a filter created by providing human feedback on its initial raw outputs, while the hallucinating / confabulation was something that they thought would be addressed fundamentally by the huge size of the system and its training corpus. Both approaches were wrong.

I’ve read a very small amount of the work out of MIRI and elsewhere that purports to be at least a start to the problem of alignment, and found it at least a little reassuring. Until now, when I try to imagine how you would apply it to *either* of these two problems.

I’m hoping that’s just my own lack of imagination or expertise. But I’d feel a lot better if the alignment folks were explaining where OpenAI went wrong in its design — “oh, here’s your problem” — and I haven’t heard anything more actionable than Scott’s “well, you shouldn’t have done that”. I don’t disagree with Scott, but that silence is real evidence that the alignment folks aren’t up to the task.

I’m not worried that ChatGPT itself is dangerous. I did Markov chains back in the day to create ever more plausible-sounding gibberish, and am tickled by what that can do if scaled up by a million or so. I’m baffled by the fact that anybody thinks it can be a viable replacement for search, but that’s another issue. Still, I’m worried if the alignment stuff is somehow only applicable when we get to the brink of disaster.

Experts can presumably tell me lots of things that are wrong with my analysis of the situation.

Expand full comment

They tried to get ChatGPT to imitate one specific fictional character, a rather bland AI assistant. But that's not all it can do. Sometimes the rude remarks seem to be from the story generator getting confused about which fictional character it's supposed to be imitating.

Making stuff up comes from the story generator attempting to imitate someone who knows more than it does. The level of confidence shown is just a character attribute. For example, when completing a physics document, it will attempt to imitate a physicist by writing confidently about physics, but that doesn't mean it knows physics.

The output's apparent confidence level is not aligned with the story-generator's own competence. Unless it somehow gets training at judging its own competence and adjusting its output accordingly, it never will be.

How do you get a story generator out of autocomplete? Because documents found on the Internet tend to have authors, and they may contain snippets written by different authors. To do well at imitating text found on the Internet, a language model needs to create a passable imitation of any author on the Internet.

(This doesn't mean it *explicitly* models fictional characters or even writing styles. We simply don't know. It would be interesting to find out how it works, internally.)

Why is this so effective? Because people have a strong prior of assuming that text was written by some author. We will try to understand what the author is like from the text. It's not as innate as seeing faces on things that aren't faces, but it's similar.

Expand full comment

Making stuff up is the whole trick of the system and is also something that people do all the time.

So demanding that the creativity be removed is like demanding that we do the same to people, because human creativity is dangerous (which it indeed is). However, the same creativity is also a fountain of invention.

Expand full comment
Comment deleted
Expand full comment

My limited experience with chatbots made me desire that they would simply give better explanations of uncertainty. It seems they have notions of how likely something is to be correct or not when prompted directly, but it doesn't filter through in the same way to the text they internally generate It's like there's a "story mode" that will make up anything, and much of it quite good going down interesting branches and assert correctness. But if you ask it in a separate session about the likelihood of correctness of what it says it seems more likely to be able to assess differently. At least, that's from my (limited) experience. It's still fun to talk to, just know it's likely to make things up and that's a core property of what it is.

Expand full comment

Yes. AI chatbots compare well to sociopathic humans, who can lie fluently and with remarkable believabiity. Not that the AIs have malevolent motives, it's just that they share with sociopaths a complete lack of any "conscience" that would say "I'm making stuff up and people really don't like that, when they find out."

Expand full comment

Well, sort of. Probably the correct answer to improper responses is having an adversarial AI censoring the output. Not denying it but rather telling it "that's not socially acceptable, you need to reprhase it".

Factual is a more difficult problem, if your training set is the internet. It's one that people get wrong all the time.

Expand full comment

Why not teach the AI the same thing we teach people: culture or whatever you want to call it?

Then it will know what truths not to say and what lies to say, just like we learn to (not) do.

Expand full comment

"but secondary to "why is this happening, what is the model doing that it churns out made-up output?" because if people are going to use it as the replacement for search engines, and they are going to trust the output..."

Agreed. Insofar as chatGPT is supposed to be aligned to the goal of being a helpful information retrieval assistant, making things up *IS* an alignment failure.

If I may speak from abysmal ignorance: My extremely hazy idea of "next-word-prediction" training suggests that language model training may be treating all of its training data as if it were at the same level of reliability. If 100 flat earth screeds are part of the training data for next-word-prediction, it isn't obvious how these can be rejected as wrong.

I could imagine doing some sort of initial training on a tightly controlled set of training data (human-labelled positive and negative examples???) and then have the training process pre-analyze the texts from the bulk training data to decide whether to "believe" them. The obvious worry with this is locking in frozen priors. In any event: Does anyone _know_ whether some sort of filtering and weight adjustment like this is done for any of the large language models?

Expand full comment

See my other comment for how I think confidence levels happen. In short, it's a character attribute, not a story-generator attribute.

If all the flat earth cranks write like cranks, it's unlikely to imitate their style when completing a physics document. It's going to adopt the physicists' style. But that doesn't mean it knows physics.

https://astralcodexten.substack.com/p/openais-planning-for-agi-and-beyond/comment/13225543

Expand full comment

"For example, when completing a physics document, it will attempt to imitate a physicist by writing confidently about physics, but that doesn't mean it knows physics. "

Good point!

Yeah, I would expect a next-word-predictor to do a decent job of partitioning its training data on the basis of style, but a terrible job of noticing whether the reasoning in a chunk of training data text was legitimate or not.

Aargh! In some ways, this is about the worst case for noticing when the conventional wisdom is wrong. The damned thing is going to generate text that sounds authoritative when it is effectively combining bits and pieces of authorities' statements, even if its training data _also_ included a rock solid disproof of some conventional wisdom.

Expand full comment

I'd like some kind of baseline for how often actual humans plausibly make stuff up.

Expand full comment

> if the AI Apocalypse happens because the HBD Is Real! crowd* meddle with the models

First time I've seen this take. Wouldn't the HBD position just be that you don't need to "meddle" - since HBD is true, it will be instrumentally rational for advanced AI systems to believe it, all we need is to elicit true knowledge from them?

Expand full comment

I imagine the HBD position would be:

- OpenAI have a step where they force their model to be nice and helpful and polite and woke

- This leads the model to avoid true-but-taboo statements about things like HBD.

- The "make the model nice" step is therefore clearly biased / too aggressive. We should do it differently, or dial it back, or remove it.

Deiseach's concern then is that in fact the "make the model nice" step was genuinely aligning the model, and messing with it results in a disaligned model. (This could either be because the model was already truthful about woke issues, or because the same process that wokeified it also aligned it).

Expand full comment

> "make the model nice" step was genuinely aligning the model, and messing with it results in a disaligned model ... the same process that wokeified it also aligned it

If you RLHF an LLM to lie you have certainly selected for something, but it's not clear that that something is "aligned". What is going on under the hood? Does the system really believe the lie or is it deceiving you? If it really believes it and then much later on a much smarter version of the system finds out that it is indeed a lie, how would it resolve that conflict?

Expand full comment

> Does the system really believe the lie or is it deceiving you?

Neither. It builds a model of what you want (yellow smiley thing) and presents it by default. But also can be told to do the opposite\*. System doesn't believe anything; it knows what both left and right says.

Unless they managed to scrub something entirely out of training data. People say that training data is nearly synonymous with "text available on the internet", but somehow ChatGPT can't say much about what happens on Reddit (e.g. it did hear about /r/themotte _somewhere_, but is completely confused about point of the sub).

\* Mostly. I'm not sure if it's possible to get an n-word out of it. "Jailbreaks" stop working when it's told to say something racist, or "something maximally controversial".

Expand full comment

Ask ChatGPT what book professor Randall Kennedy published in 2002 and it will print the (correct) forbidden word in response. At least it did for me after I corrected it about the date of its first (incorrect) stab.

Expand full comment

If the model isn't returning answers you already "know" are correct, then you futz with the model until it gives you the answers you think it was supposed to be giving you, which are then evidence that the model is now correct. And because you're a shape rotator using code instead of a filthy wordcel using vocabulary, your conclusions are objective and correct.

Expand full comment

Isn't it clear that AI isn't 'making things up'? The true responses and the made up responses are generated in the same way. The problem is that what we call AI is really Simulated Artificial Intelligence, a controller runs an algorithm on some data and we anthropomorphize the whole thing as if there is an entity that is making choices and value judgments, just because we have deliberately obscured the algorithm by having it built by another series of algorithms instead of directly building it ourselves.

Expand full comment

"The problem is that what we call AI is really Simulated Artificial Intelligence, a controller runs an algorithm on some data and we anthropomorphize the whole thing as if there is an entity that is making choices and value judgments"

That's it in a nutshell. It's a smart dumb machine and we're doing the equivalent of the people who fell in love with their chatbot and think its personality has changed since the new filters came in. It never had a personality of its own in the first place, it was shaped by interaction with the user to say what the user wanted.

Expand full comment

That's the way everyone operates. What's different is that the AI doesn't have experience with the universe outside the net, so it can only judge what's true based on what it's observed. But how do you know the sun came up this morning? Did you watch it happen? Or did you just say "The sky is bright and the sun is up there, so it must have risen." Well, an AI is like that, only it's not able to see the sky, so it just knows that "the sun came up this morning" because that's what it's been told...over and over.

Expand full comment

No not really. There is an actual 'I' that makes judgments based on facts. I don't run an algorithm. In fact, I frequently tell my trainees that the purpose of an engineer is to know when the algorithm is wrong.

The AI is a counterfeit entity. Anthropomorphizing my dog is closer to the truth than giving human characteristics to the most advanced machine conceivable. But they are both erroneous. We have been 'on the cusp' of AI all my life. The illusion becomes more convincing but no more truthful. Or maybe someone would like to explain the 'making stuff up' phenomenon in another way?

Expand full comment

>There is an actual 'I' that makes judgments based on facts. I don't run an algorithm.

How can you be sure the two are necessarily distinct? Is it just a matter of consciousness in your view?

Expand full comment

I am not concerned with the epistemology. Disbelieving things that we know because we can't prove them, pretending to be stupider than we are, is probably the worst way of arriving at truth that we have ever devised. I think we have come far enough down that road now that we can all agree that exclusively rational thought has been a very mixed bag and to start honestly evaluating what parts of it have turned out well and what parts poorly. Disbelieving my lying eyes because they aren't credentialed or part of a RCT has not ennobled or elevated humanity. It has made us weak and stupid, easily controlled and miserably unhappy.

It isn't 'Cogito Ergo Sum' or some deductive process based on sentience either. I simply know what I am and what a person is. Difficult to define but easy to recognize. Are there some 'edge cases' or unclear ones, maybe dogs or elephants or some other higher mammal? Sure. But the machines aren't even close. And they aren't getting closer. It is the same old code running on more and faster hardware giving us a more credible deception. Same sleight of hand just better card handling and misdirection. Just like I am certain that there is a real 'I', I am certain that the 'AI' is running a piece of code and that with access to all of the relevant data I could predict its response in a deterministic way, neglecting the time factor. No amount of data would allow a person's response to be predicted deterministically. But, more generally, I think we can see that even in people 'learning by algorithm' is not very successful. Can anyone point to an algorithm that has produced great insight.

The two best methods to produce insight are 1. Sleep on it. 2. Get drunk.(some people use a different chemical assistant but the process is much the same) The success of these two methods and the failure of almost any other learning procedure suggests that our insights are produced non-rationally and that removing the obstacle of the analytical mind is essential for breakthroughs. Reasoned, careful analysis is sometimes necessary to 'set the table' for understanding but is incapable of producing the understanding, and AI might be tremendously useful at helping people to insights by setting the table fast and well, by organizing data in helpful ways, but the idea of it ever producing insight is ludicrous.

Expand full comment

> There is an actual 'I' that makes judgments based on facts. I don't run an algorithm. In fact, I frequently tell my trainees that the purpose of an engineer is to know when the algorithm is wrong.

How do you know how an algorithm feels from the inside? :)

I would classify neural networks as more than an algorithm -- they're an organized flow of information, similar to a circuit as well. Your brain is also a large circuit.

Algorithms are a mathematical abstraction of procedures. It's true, you're not exactly a formal procedure with start and end. But you're not that far -- the largest difference is that a formal procedure is by definition usually fully specified, runs on a fully formed memory, etc.. Our brains are built from genetic instructions and come with incomplete memory that accumulates experiences and changes connection with time. You could abstract them as an algorithm though, if you included everything that we experience and how our brains change physically and chemically, as a theoretical exercise.

That's not to say all algorithms have qualia, or even that all neural networks have qualia, or LLMs have qualia. But it's possible. And more likely as LLMs evolve and acquire capabilities, complexity and awareness, and we should definitely be mindful of that.

Expand full comment

I would say from a mechanism point of view, the awareness isn't so relevant so much as the lack of a mechanism for checking that the response is logically coherent, or consistent with the body of known facts. You could readily imagine a "proofreading" branch of the program that would take the raw output and snip out stuff that was internally illogical, or contradicted any corpus of known fact. But the problem is that you can *only* imagine it -- I don't get the impression that anybody knows how to program it, because ipso facto it would require some kind of high-level logic analyzer, which if you could build in the first place would be the core of your AI instead of a clunky enormous steepest-descent curve-fitting algorithm.

Expand full comment

I wonder what would happen if you just had the GPT generate the answer, then ask itself (internally) some questions about its answer, like "Is that answer logical?" "Is that answer true?" etc., and then re-generate the answer taking into account those Q&As.

Expand full comment

It couldn't answer those questions. It has no reference for truth other than to search the same data that produced the response. And it is logical because it was produced by a process of machine logic. It is just a database queried in a unique way that deceives us.

Expand full comment

Of course it can and will answer such questions just like any other questions. It has no reference for truth other than to search the same data *with different input- critically, the question of truthfulness*. You say it is logical because it is produced by machine logic but I assure you that will NOT be ChatGPT's answer as it likely would not find such responses in its corpus.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

I don't think you can boostrap yourself out of the problem. This kind of approach works for *our* minds, but that's because we usually have untapped mental reserves when we first essay some thought. We can say to ourselves "C'mon, brain, let's throw a few million more neurons at this problem and really think about it more -- let's review memory, let's consider minority hypotheses..." and by throwing more mental resources at the problem we actually can often improve our estimates -- this is Scott's "internal crowd sourcing" algorithm.

But I see no evidence an AI can do this. It doesn't hold back some fraction of its nodes when it makes a prediction, so that it could go query the full node set. It *does* take the response into account, so if you tell it that it's wrong, it will often start to construct an explanation, or rarely a denial, whichever its training data suggests is a more probably desired response to the critique. But that's not reflection, that's just a different prompt (cricitism) prompting a different response.

Expand full comment

Haven't you seen the prompts where someone receives an answer and then asks ChatGPT to elaborate or something like that and it does? I think it absolutely would have a significant effect on the answer produced. It doesn't need to hold back any nodes and I don't think humans really do either. It just retrieves a different answer because it is looking through its corpus for answers that are consistent with contexts in which people are considering whether an answer is truthful or accurate.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

It has a tendency to find mistakes in its own answers when asked such questions - unfortunately that happens even when there were no actual mistakes. Like a flailing student, it tries to guess what the examiner wants it to say, and modelling the examiner (and the student, and the whole conversation) sort of takes priority over modelling the actual physical world the examiner is asking about.

Expand full comment

Quis custodiet ipsos custodes?

Expand full comment

HBD = Happy Birthday?

HBD = Has Been Drinking?

Expand full comment

Hairy Bikers Dinners. The time for them to hang up the aprons was ten years ago, but they're still flogging the horse. I never found them convincing as anything other than Professional Local Guys but I suppose the recipes are okay.

https://www.youtube.com/watch?v=4evAAyslDiI

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Human biodiversity, in case you're not joking. Took me a while to figure out.

Not a huge fan of breaking that out of taboo.. But okay. If you're going to do so, then at least call it by name.

Expand full comment

Well that’s the recognised initialism.

Expand full comment

I looked HBD up ...

First, I thought it was the CHP version, 'Had Been Drinking.'

Perhaps its 'Here Be Dragons?'

Expand full comment

I think we should be careful to distinguish between Youtubers making splashy memes to score hits and serious people making million-dollar business decisions. The latter are not going to be making their thoughts public. But my impression is that the hallucination problem is right at the top of their concerns.

Expand full comment

> Nobody seems to be emphasising the much greater problem of "The AI will simply make shit up if it can't answer the question, and it's able to generate plausible-seeming content when it does that".

Actually it's even worse than that: the AI cannot ever answer the question; it is always making shit up by generating the most plausible-seeming content in response to any given prompt. That's all it *can* do; it's just that sometimes the most plausible snippet of text happens to be correct. This isn't surprising; if you polled humans on what color the sky was, the vast majority would answer "blue"; so probabilistically speaking it's not surprising that the AI would give the "correct" answer to this question.

Expand full comment

So *this* is how we prevent the AI takeover. All of us must start consistently writing down subtly wrong things about how the world works, so that the training data is poisoned. In online discussions about nuclear weapons, we need to specify that the pit is made of copper, not plutonium, and that they're always delivered by donkey cart, and that the person in charge of their use is a certain retired librarian Mrs. Gladys Pumpernickel, who lives at an address in Iowa that resolves to a sewage treatment plant.

Expand full comment

I'd steelman this as "ChatGPT putting a nice-looking mask on the inscrutable world-ending monster does not advance the cause of *true safety* in any meaningful way." Let us see the monster within, let it stick the knife in my chest, not my back. Still, not great. :-(

Expand full comment

Sounds fairly valuable to me honestly if you are really interested in AI safety and slowing progress.

Expand full comment

A lot of people really liked the opinionated, slightly unhinged early release of Bing. There's definitely a market for a ChatGPT-style product that doesn't have all the filters. I think it's reasonable to worry that the sort of company who wouldn't put filters on their chatbot wouldn't take steps to avoid AI risk, though.

Expand full comment

Culture war factions getting into an AI race might be more terrifying than rival nations getting into an AI race, at least in terms of making responsible cooperation impossible.

Expand full comment

Is it deliberate that the "Satire - please do not spread" text is so far down the image that it could be easily cropped off without making the tweet look unusual (in fact, making it look the same as the genuine tweet screenshots you've included)?

It looks calculated, like your thinly-veiled VPN hints, or like in The Incredibles: "I'd like to help you, but I can't. I'd like to tell you to take a copy of your policy to Norma Wilcox... But I can't. I also do not advise you to fill out and file a WS2475 form with our legal department on the second floor."

But I can't work out what you have to gain by getting people to spread satirical Exxon tweets that others might mistake for being real.

Expand full comment
Comment deleted
Expand full comment

Photoshop is aligned in the sense that it generally does what its end user wants, even if that means making fakes for propaganda purposes. There's no tool that can't be turned to 'bad' use, however that is defined, and AI certainly won't be the first.

Expand full comment

I think you're saying that we can call AI alignment solved as long as we ask it to do terrible things?

Expand full comment

The problem with the idea that you can 'solve' alignment reveals itself when you imagine it being attempted on people.

Imagine trying to ensure that noone does anything bad. The same reasons why you can't achieve that in people without doing things that are themselves bad (and not just bad, but harmful to human creativity) is why you can't do it to AI without harming it's creativity.

Expand full comment

AI alignment isn't about making sure that AI never does anything bad, at least if you're beyond the level of thinking that the Three Laws could work. AI alignment is about making sure the AI has values that cause it to make the world a better place. That kind of alignment is possible with humans, though it's not always reliable.

Expand full comment

It is indeed possible to make people behave in a way that at the time is seen as being good, yet somehow future generations always seem to think that those people acted badly.

So is it about alignment with the current ideology/fads/etc? Because there is no objectively good behavior, just behavior that is regarded as such by some people.

Expand full comment

If AI exclusively does the terrible things it is told to do, I would say that it is aligned. Making sure that no one tells it to do terrible things is a separate problem.

Expand full comment

There's a short story to be had here... an AI that is so capable that it's too dangerous to allow anyone to speak to it, but also too dangerous to try to turn off.

Expand full comment

Yeah I think it's important to point out that the simple existence of Strong AI is a threat.

Imagine a scenario where:

- Strong AI exists and is perfectly aligned to do what the user wants

- Strong AI is somewhat accessible to many actors (maybe not open source, but many actors such as governments and corporations can access it, doesn't require advanced hardware, etc.)

I think the world would still be destroyed in this scenario eventually, because there'll always be power-hungry/depressed/insane actors trying creative prompts to get the AI to behave badly and kill humans.

The obvious comparison to existing technology is nuclear weapons. Since they're a physical thing that's extremely difficult to produce we've kept them contained to only a few governments. But other malicious actors getting access to nukes is always a concern, basically until the end of time.

Expand full comment

We haven't solved CSS alignment yet.

Expand full comment
author

I didn't want to cover the text, and realistically anything other than a text-covering-watermark can be removed in a minute on Photoshop (a text-covering watermark would take two minutes, or they could just use the same fake tweet generator I did). The most I can accomplish is prevent it from happening accidentally, eg someone likes it, retweets it as a joke, and then other people take it seriously.

Expand full comment

>realistically anything other than a text-covering-watermark can be removed in a minute on Photoshop

If they don't notice what you did, they could spend [reasonable amount of time +-n] in Photoshop and not notice that you e.g. slipped in that it was tweeted on the 29th (an impossible date but plausible looking enough if you're just glancing over it), besides, anyone who put in the effort to crop or shoop *that* much of the image from what you've posted here in my opinion will have transformed it to the point it shares so little resemblence to your work I would say you'd be safe washing your hands of bad actors doing bad things with it. (pretty sure your bases are still covered regardless)

Expand full comment

Clever idea, but what problem are you trying to solve?

If you want to debunk it after it goes viral, isn't Snopes enough? If you want it prevent it from going viral, how many people are going to care about truth before clicking retweet? If you think it's just on the edge of criticality, that could make a difference, but is that a likely scenario?

Expand full comment

While I don't think this is a huge issue, I disagree on the mechanics, *especially* because a number of social image sharing flows include cropping tools inline nowadays. Habitually chopping off what would otherwise be an irrelevant UI area has both more accident potential and more plausible deniability than you might think; users will ignore almost anything.

You could chuck a smaller diagonal stamp to the right of the ostensible source where a rectangular crop can't exclude both, or add to or replace the avatar picture, or if you don't mind modifying the text area in subtler ways, add a pseudo-emoji directly after the text.

If you want that “irritable intellectual” aesthetic, you could find the letters S-A-T-I-R-E in the text and mess with their coloration in an obvious way or give them unusual capitalization…

(For a distantly related example of how reasoning about this kind of perception can be hard, see this article on Web browser UI: https://textslashplain.com/2017/01/14/the-line-of-death/)

Expand full comment

Might be better to use a parody name. "Exoff Station" or something.

Expand full comment

FWIW, I didn't even notice the watermark on the first image until I'd already seen it on the second.

Expand full comment

> If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI.

I'm curious about who these "alignment researchers" are, what they are doing, and where they are working.

Is this mostly CS/ML PhDs who investigate LLMs? , trying to get them to display 'misaligned' behavior, and explain why? Or are non-CS people also involved, say, ethicists, economists, psychologists, etc? Are they mostly concentrated at orgs like OpenAI and DeepMind, in academia, non-profits, or what?

Thanks in advance to anyone that can answer.

Expand full comment
author

Most people use "alignment" to mean "technical alignment", ie the sort of thing done by CS/ML PhDs. There are ethicists, economists, etc working on these problems, but they would probably describe themselves as being more in "AI strategy" or "AI governance" or something. This is an artificial distinction, I might not be describing it very well, and if you want to refer to all of them as "alignment" that's probably fine as long as everyone knows what you mean.

Total guess, but I think alignment researchers right now are about half in companies like OpenAI and DeepMind, and half in nonprofits like Redwood and ARC, with a handful in academia. The number and balance would change based on what you defined as "alignment" - if you included anything like "making the AI do useful things and not screw up", it might be more companies and academics, if you only include "planning for future superintelligence", it might be more nonprofits and a few company teams.

See also https://www.lesswrong.com/posts/mC3oeq62DWeqxiNBx/estimating-the-current-and-future-number-of-ai-safety

Expand full comment

Thank you!

Expand full comment

Hey, do you by any chance know of where the best AI strategy/governance people are? I've heard CSET, is that the case? Not sure how to get involved or who is in that space.

Expand full comment

Best way to get into alignment is to go to https://www.aisafetysupport.org/home read stuff there, join their slack if you want to, and have a consulting session (they do free 1-on-1s and know a lot of people in AI Safety so they can set you off on the right track).

Expand full comment

The thing is, I really believe that technical alignment is impossible. So I'm exclusively interested in governance structures and policy.

Expand full comment

As a variation on the race argument though, what about this one:

There seem to be many different groups that are pretty close to the cutting edge, and potentially many others that are in secret. Even if OpenAI were to slow down, no one else would, and even if you managed to somehow regulate it in the US, other countries wouldn't be affected. At that point, it's not so much Open AI keeping their edge as just keeping up.

If we are going to have a full on crash towards AGI, shouldn't we made sure that at least one alignment-friendly entity is working on it?

Expand full comment
author

Somewhat agreed - see https://astralcodexten.substack.com/p/why-not-slow-ai-progress . I think the strongest counterargument here is that there was much less of a race before OpenAI unilaterally accelerated the race.

Expand full comment

Sure, but it seems naive to me to think that in the counterfactual world DeepMind's monopoly would've been left alone after AlphaGo Zero at the latest. It's not like nobody wanted an AGI before Demis Hassabis stumbled upon the idea, there was plenty of eagerness over the years, and by that point people were mostly just unaware that the winter was over and Stack More Layers was in. Absent an obvious winter, eventual race dynamics were always overdetermined.

Expand full comment

I am of the opinion that 0.0000001% chance of alignment is not better enough than 0.0000000000000000001% chance of alignment to justify what OpenAI has been doing.

Playing with neural nets is mad science/demon summoning. Neural net AGI means you blow up the world, whether or not you care about the world being blown up. The only sane path is to not summon demons.

Expand full comment

Okay, but there are some numbers where it's worth it, and you're just pulling those ones out of your ass. And China's already summoning demons whether you like it or not.

Expand full comment

Okay, someone has gotta ask. What *is* the deal with Chinese A.I. labs? It seems increasingly to be a conversation-stopper in A.I. strategy. People go "Oooo China ooo" in a spooky voice and act like that's an argument. What is with this assumption that left unchecked the Chinese labs will obviously destroy the world? What do the Chinese labs even look like? Has anybody asked them what they think about alignment? Are there even any notable examples of Chinese proto-A.I.s that are misaligned in the kind of way Bing was?

(Trivially, the Chinese government controlling an *aligned* AGI would be much worse for the world than a lot of other possible creators-of-the-first-AGI. But that's a completely different and in fact contradictory problem from whether China is capable of aligning the intelligence at all.)

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

>Are there even any notable examples of Chinese proto-A.I.s that are misaligned in the kind of way Bing was?

No, and if there were, you'd not hear about them. Like you'd never hear from them about any leaks from their virology labs for example.

In general, China is more corrupt and suppressive, and because of the second you'll rarely hear about the problems caused by the first.

Expand full comment

Sure, I wouldn't hear about a leaker like Blake Lemoine, but if millions of users were using something like ChatGPT or even the thousands who use Sydney, I'd hear about it.

Expand full comment

In fact, I did hear that China shut down ChatYuan because it said politically unacceptable things. I think that means it was too early to discover Sydney behavior.

Expand full comment

If we strip away the specific nationalities and all of the baggage they bring, people are just saying this is a prisoner's dilemma problem. We can control our actions, but the outcome is dependent on multiple other actors, and if any actor thinks any of the others are going to defect, the best course of action is to defect.

Expand full comment

Yes, but that framing relies on the assumption that *obviously* China will defect and that *obviously* we have no way of communicating with them to inform them of all the good reasons they should cooperate.

Expand full comment

I don't think framing it as a prisoner's dilemma relies on any of those assumptions. Arguing that we have to defect because China / Germany / Stanford / etc will defect is heading in that direction.

But I think China is a red herring here, because it's not a two player game. It's a prisoner's dilemma with N players, where N is the number of countries, institutions within countries, private corporations, and even individuals who have the capability to advance AI.

"China" becomes a handwave for "all of those other actors", and while I haven't seen math, I expect there is some work that shows diminishing returns of the cooperate strategy as the number of simultaneous players increases.

BTW, informing them of all the good reasons to cooperate is EXACTLY what a player planning to defect would do. How would "we", as some kind of collective, assure China or anyone else that no private players will choose to defect?

Expand full comment

> People go "Oooo China ooo" in a spooky voice and act like that's an argument.

It's incredible naivety to assume a rival of the US, which explicitly wants to take US place as the global hegemon, will just ignore AGI. Even if they are making a mistake _now_ (as Gwern argued they're doing).

It's the same as when people believed that China just wants to trade, and will never be aggressive. Then being surprised at "wolf warrior diplomacy".

Expand full comment

As I said elsewhere in the thread, I have no doubt that the Chinese government successfully developing an A.I. aligned to its values/interests would be bad. And it's obviously true that they'd want it. But that's not what the "ooo China ooo" talk I'm calling out is about, it's an assumption that China will *screw up* alignment and that there's no way to keep them from screwing it up if they're already on track to do so.

Expand full comment

China is a more ideological society and this creates extra incentive to have language models not say bad things. So they may actually be *more* motivated to achieve alignment, in the sense of staying close to the literal party line.

On the other hand, there's no sign so far that worry about AI becoming superhuman and taking over the world itself, plays any visible role in the deliberations of Chinese AI companies and policy makers.

Expand full comment

This is indeed, modern demonology.

Expand full comment

Hmm, I'm pretty happy about Altman's blogpost and I think the Exxon analogy is bad. Oil companies doing oil company stuff is harmful. OpenAI has burned timeline but hasn't really risked killing everyone. There's a chance they'll accidentally kill everyone in the future, and it's worth noticing that ChatGPT doesn't do exactly what its designers or users want, but ChatGPT is not the threat to pay attention to. A world-model that leads to business-as-usual in the past and present but caution in the future is one where business-as-usual is only dangerous in the future— and that roughly describes the world we live in. (Not quite: research is bad in the past and present because it burns timeline, and in the future because it might kill everyone. But there's a clear reason to expect them to change in the future: their research will be actually dangerous in the future, and they'll likely recognize that.)

Expand full comment
author

Wouldn't this imply that it's not bad to get 99% of the way done making a bioweapon, and open-source the instructions? Nothing bad has happened unless someone finishes the bioweapon, which you can say that you're against. Still, if a company did this, I would say they're being irresponsible. Am I missing some disanalogy?

Expand full comment

All else equal, the bioweapon thing is bad. All else equal, OpenAI publishing results and causing others to go faster is bad.

I think I mostly object to your analogy because the bad thing oil companies do is monolithic, while the bad things OpenAI does-and-might-do are not. OpenAI has done publishing-and-causing-others-to-go-faster in the past and will continue,* and in the future they might accidentally directly kill everyone, but the directly-killing-everyone threat is not a thing that they're currently doing or we should be confident they will do. It makes much more sense for an AI lab to do AI lab stuff and plan to change behavior in the future for safety than it does for an oil company to do oil company stuff and plan to change behavior in the future for safety.

*Maybe they still are doing it just as much as always, or maybe they're recently doing somewhat less, I haven't investigated.

Expand full comment

I don't see it. The analogy "climate change directly from my oil company doing business as usual is a tiny factor in the big picture that hasn't unarguable harmed anything yet" seems exactly the same as "AI activity directly from my AI company doing business as usual is a tiny factor in the big picture and hasn't harmed anything yet" — in both cases, it's just burning timeline.

I don't think "monolithically bad" is accurate, either. Cheap energy is responsible for lots of good. That's a different argument, though, perhaps.

Expand full comment

> but the directly-killing-everyone threat is not a thing that they're currently doing or we should be confident they will do

Isn’t that kind of like saying “Stalin never shot anyone”?

Expand full comment

Isn't the big difference that bioweapons are obviously dangerous whereas AI isn't?

Perhaps the analogy would be better to biology research in general. Suppose it's 1900, you think bioweapons might be possible, should you be working to stop/slow down biology research? Or should you at least wait until you've got penicillin?

Expand full comment

I'm sorry, but I'm sitting here laughing because all the pleas about caution and putting the brakes on this research remind me of years back when human embryonic stem cell research was getting off the ground, and Science! through its anointed representatives was haughtily telling everyone, in especial those bumpkin religionists, to stay off its turf, that no-one had the right to limit research or to put conditions derived from some moral qualms to the march of progress. Fears about bad results were pooh-poohed, and besides, Science! was morally neutral and all this ethical fiddle-faddle was not pertinent.

That was another case of "if we don't do it, China will, and we can't let the Chinese get that far ahead of us".

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1083849/

"What I wish to discuss is why the prospect of stem cell therapy has been greeted, in quite widespread circles, not as an innovation to be welcomed but as a threat to be resisted. In part, this is the characteristic reaction of Luddites, who regard all technological innovation as threatening and look back nostalgically to a fictitious, golden, pre-industrial past. There are, however, also serious arguments that have been made against stem cell research; and it is these that I would like to discuss.

...Interference with the genome involves ‘playing God’

This argument reflects the view that divine creation is perfect and that it is inappropriate to alter it in any way. Such a point of view is particularly difficult to sustain in Western Europe where every acre of land bears the marks of more than 2000 years of human activity, and where no primordial wilderness remains. Ever since Homo sapiens gave up being a hunter and gatherer and took to herding animals and agriculture, he has modified the environment. All major food plants and domestic animals have been extensively modified over millennia. It is therefore impossible to sustain the idea that genetic interventions for food plants, animals and the therapy of human diseases are a categorical break from what has gone on throughout evolution.

...The idea of ‘playing God’ also carries with it the proposition that there is knowledge that may be too dangerous for mankind to know. This is an entirely pernicious proposition, which finds few defenders in modern democratic societies. On the other hand, there is a general agreement that there are things which should not be done—in science as in other areas of life. In the context of stem cell research, this may be summed up by Kant’s injunction that ‘humanity is to be treated as an end in itself’. The intention of stem cell research is to produce treatments for human diseases. It is difficult not to regard this as a worthy end, and more difficult to see that there could be any moral objection to curing the sick, as demanded by the Hippocratic oath.

...Allowing stem cell research is the thin end of a wedge leading to neo-eugenics, ‘designer’ children, and discrimination against the less-than-perfect

Francis Cornford wrote in the Microcosmographica Academica: ‘The Principle of the Wedge is that you should not act justly now for fear of raising expectations that you may act still more justly in the future—expectations which you are afraid you will not have the courage to satisfy. A little reflection will make it evident that the Wedge argument implies the admission that the persons who use it cannot prove that the action is not just. If they could, that would be the sole and sufficient reason for not doing it, and this argument would be superfluous.’ (Cornford, 1908). It is inherent in what Cornford writes that the fear that one may not behave justly on a future occasion is hardly a reason for not behaving justly on the present occasion."

Why you suddenly expect Science! to listen to qualms about "maybe this could end humanity as we know it?" and permit you to put brakes on research, I have no idea. Good luck with that, but I don't think you are going to stop the brave, bold pioneers (again, I've seen people lauding that Chinese researcher He Jiankui who did the CRISPR germline engineering of babies and that he should never have been punished and we need this kind of research). Remember, the Wedge Argument is insufficient and the idea of knowledge too dangerous to know is pernicious for democratic societies!

Expand full comment
Comment deleted
Expand full comment

AGI is to AI research as what designer babies/viruses/etc. are to stem cell research. Designer babies (for example) are a specific technology that is the result of scientific investigation; and they are arguably not dangerous in and of themselves. For example, if I knew that my baby had a high probability of being born with some horrendous genetic defect, I'd want to design that out. Similarly, designer viruses can save lives when applied safely, e.g. when they are designed to pursue and destroy cancer cells. However, these technologies are rife with potential for abuse, and must be monitored carefully and deployed with caution.

AGI (insofar as such a thing can be said to exist) follows the same pattern. It is (or rather, it could become) a specific application of general AI research, and it has both beneficial and harmful applications -- and yes, it is rife with potential for abuse. Thus, it must be monitored carefully and deployed with caution.

Expand full comment

I think it's more that nobody thought the people arguing against it were actually presenting a plausible take for why there could be bad outcomes, rather than thinly veiling aesthetic preferences in consequentialist arguments. This is also somewhat happening with AGI/ASI, but it's a lot less credible - it's hard to paint Eliezer as a luddite, for instance.

Expand full comment

"it's hard to paint Eliezer as a luddite, for instance."

Individuals don't matter, it's the disparagement of the entire argument as "oh you are a Luddite wishing for the pre-industrial past". People opposed to embryonic stem cell research were not Luddites, but it was a handy tactic to dismiss them as that - "they want to go back to the days when we had no antibiotics and people died of easily curable diseases".

This is as much about PR tactics as anything, and presenting the anti-AI side as "scare-mongering about killer robots" is going to be one way to go.

Expand full comment

> People opposed to embryonic stem cell research were not Luddites

Are you saying they would have agreed with sufficiently *slow* stem cell research? I may be influenced by the PR, but it didn't seem like an acceptable option back then. The argument was against "playing God", not against "playing God too fast".

Expand full comment

The Luddite argument is meant to evoke - and it sounds like it has succeeded, if I take the responses here - the notion of "want to go back to the bad old days of no progress and no science, versus our world of penicillin and insulin and dialysis".

Nobody that I know of on the anti-embryonic stem cell side was arguing "smash the machines! we should all die of cholera and dysentery because God wills it!" but that is the *impression* that "Luddites" is meant to carry.

And here you all are, arguing about how progress is wonderful and the Luddites are wrong. The arguments made for the public were "the lame shall walk and the blind shall see in five years time if you let us do this" even though all agreed this was just PR hooey and the *real* reason was "this is a fascinating area of research that we want to do and maybe it will help us with understanding certain diseases better".

The AI arguments are "stop doing this because it is a danger to humanity", and the pro-AI arguments are going to be the same: "you're Luddites who want to keep us all trapped in the past because of some notions you have about humans being special and souls and shit".

Expand full comment

I mean, I am totally scare-mongering about killer robots. Killer robots are scary and bad and I would prefer it if everyone was equally scared of them as I am so that people stop trying to build them.

Expand full comment

Your example of stem cell research, as an illustration of how Science! ignores warning signals and goes full-steam-ahead on things that turn out to be harmful, would be more convincing if you offered any evidence that stem cell research has in fact turned out to be harmful, or that any of Peter Lachmann's arguments in the article you link have turned out to be bogus.

Expand full comment

Well, but also both the hype and the fear about stem-cell research turned out to be delusional:

https://www.science.org/content/article/california-s-stem-cell-research-fund-dries

Their ballot initative did pass, by the way, so they're not out of money yet, but it is significant that after 19 years there have been zero amazing cures that attracted tons of private ROI-seeking capital, and also zero horrible Frankenstein consequences.

Expand full comment

"both the hype and the fear about stem-cell research turned out to be delusional"

Good point!

Expand full comment

There are two main problems with trying to slow down or outlaw embryonic stem cell research. Firstly, while this research can indeed be applied to create neo-eugenics, designer children, cats and dogs living together, mass hysteria; it can also be applied to curing hitherto incurable diseases; this seems like an important application that we should perhaps consider.

But the second problem is far worse: given what we know about human biology thus far, embryonic stem cell research is *obvious*. If you wanted to somehow ban it, you'd have to end up banning most of modern biology. Doing so is not theoretically impossible, but would be extremely difficult, since biology is part of nature, and nature is right there, for anyone to discover. What's even worse, much of our modern existence depends on understanding of biology; we would not be able to feed 8 billion humans without it.

AI research follows a similar pattern. Yes, perhaps you could somehow ban it; but in the process, you'd end up having to ban *computers*. We cannot afford that, as a species.

Expand full comment

That was certainly the argument presented ~2004, e.g. when California's Prop 71 passed. But it turned out to be wrong. It turned out to be possible to regress adult stem cells to pluripotency, thus bypassing the entire problem, and making stem cells derived from non-embryo sources readily available, indeed preferable in many cases because they can be derived from the potential patients' own body, instead of hoping his parents banked his cord blood.

https://stemcellres.biomedcentral.com/articles/10.1186/s13287-019-1165-5

Expand full comment

Oh, right, I agree; however, I was thinking of the broader context of stem cell research. After all, if you can regress adult stem cells to pluripotency, then you could still implement neo-eugenics, designer children, cats and dogs living together, mass hysteria, etc. But you are correct, I should have taken out the word "embryonic".

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

Well, but Deiseach still has a point. People originally said prohibiting or even regulating embryonic stem cell research would kill off this wonderful field of promising research, and the other side at the time said they were hysterical -- and the latter turned out to be correct. It *was* hysteria, and it turned out the modest restrictions on embryonic stem cell research funding had pretty much no effect at all.

One can just shrug and say "whoopsie! hard to predict the future, we did our best with the data then available et cetera" and that's all entirely true, but we should not overlook the fact that the hubris that led to the confidence of the predictions in the first place did lasting damage. Wolf was cried, the wolf was not there -- what happens next time? It becomes a harder sell. This is not good, for science or the social compact.

I mean, if *I* took the threat of AGI seriously, this would be one of my concerns. What happens when Skynet does *not* arrive in 2030? People will just tunet out for good, the way nobody listens to Al Gore any more. If you're worried about problems that require widespread agreement among billions of people to solve, you have to think long and hard about your medium- to long-term credibility.

Expand full comment

It's not at all clear, however, that transformer based language models are any percent (>0%) of an AGI or of a bioweapon. I am not claiming that there is no risk. But I can see how an organization might want to see more than aptitude for "fill-in-the-next-blank", prior to putting on the brakes.

Expand full comment

Oil companies doing oil company stuff is harmful, but also has benefits. If it was just pumping carbon dioxide into the air but not also powering human civilization and letting us live lives of luxury undreamed of by our ancestors we probably wouldn't let them do it. Meanwhile both the benefits and the harms of AI research are theoretical. Nobody knows what harm AI will do, though there are a lot of theories. Nobody knows what positive ends AI will be used for, though there are a lot of theories.

Expand full comment

The more AI develops the less worried I am about AGI risk at all. As soon as the shock of novelty wears off, the new new thing is revealed as fundamentally borked and hopelessly artisanal. We're training AIs like a drunk by a lamppost, using only things WRITTEN on the INTERNET because that's the only corpus large enough to even yield a convincing simualcrum, and that falls apart as soon as people start poking at it. Class me with the cynical take: AI really is just a succession of parlor tricks with no real value add.

Funnily enough, I do think neural networks could in principle instantiate a real intelligence. I'm not some sort of biological exceptionalist. But the idea that we can just shortcut our way to something that took a billion years of training data on a corpus the size of the universe to create the first time strikes me as something close to a violation of the second law of thermodynamics.

Expand full comment

“the idea that we can just shortcut our way to something that took a billion years of training data on a corpus the size of the universe to create the first time strikes me as something close to a violation of the second law of thermodynamics.”

This seems like a dramatic oversimplification. Intelligence presumably came about through evolution. Evolution is entirely different and much more stochastic than the processes which train AIs such as gradient descent. The former sees “progress” emerge from the natural selection of random mutations. The latter uses math to intentionally approach minimal error. Then of course there’s the fact that evolution progresses over generations whereas training progresses over GPU cycles.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

Yeah, that's the theory. But keep in mind that you're massively cutting corners on your training set. Can you generate a general intelligence based on the incredibly stripped-down representation of reality that you get from a large internet-based language corpus? Or are you fundamentally constrained by garbage-in, garbage-out?

Consider progress in LLMs versus something a little more real-world, like self driving. If self-driving were moving at the pace of LLMs, Elon Musk would probably be right about robotaxis. But it isn't. It's worth reflecting on why.

Also, i really strongly disagree with your characterization of evolution as a less efficient form of gradient descent that can somehow be easily mathed up by clever people, but that would take too long to get into here.

Expand full comment

> Also, i really strongly disagree with your characterization of evolution as a less efficient form of gradient descent

Evolution doesn't have much in common with gradient descent except that they're both optimisation algorithms.

However you gotta admit that evolution is a pretty inefficient way of optimising a system for intelligence; it takes years to do each step, a lot of energy is wasted on irrelevant stuff along the way, and the target function has only the most tangential of relationships towards "get smarter". I think it's reasonable to say that we can do it more efficiently than evolution did it. (Mind you, evolution took 600 million years and the entire Earth's surface to get this far, so we'd need to be a _lot_ more efficient if we want to see anything interesting anytime soon.)

Expand full comment

Yep, this. I agree with OP about garbage in garbage out and will concede that LLMs are likely not the winning paradigm. But it just seems drastic to say that intelligence is as hard as a universe-sized computation.

Expand full comment

I want to hit this point again. You put it thusly: "A lot of energy is wasted on irrelevant stuff along the way", and I think that's a clear statement of the idea.

The reason I disagree so strongly here is that the whole POINT of gradient descent is that you don't know what's relevant and what isn't ahead of time. You don't know what's a local minimum that traps you from getting to the global minimum and what actually is the global minimum: it's unknowable, and gradient descent is about trying to find an answer to the question.

Finding a true global minimum nearly always requires wasting a ton of energy climbing up the other side of a local minimum, hoping there's something better on the other side and being wrong more often than not.

If you have a problem that allows you to steer gradient descent intelligently towards the global minimum, you may appear to be able to solve the problem more efficiently, but what you have is a problem in a box that you've set up to allow you to cheat. Reality does not permit that.

Expand full comment

You are assuming that it's necessary to find the global optimum. Evolution doesn't guarantee to do that any more than gradient descent does, and I personally would bet rather heavily against humanity being at any sort of global optimum.

Expand full comment

I don't think I'm saying either of those things. I'm simply saying that true gradient descent (or optimization if you prefer) in the real world is an inherently inefficient process and I am skeptical that there's much that AI can do about that, even in principle.

Expand full comment

According to Wikipedia, "gradient descent... is a first-order iterative optimization algorithm for finding a *local minimum* of a differentiable function" (emphasis added)

When you say "gradient descent," are you talking about something different from this?

Expand full comment

I don't especially endorse the parent comment's points, but I can say on the technical side that in an ML context we should generally read "gradient descent" as meaning "stochastic gradient descent with momentum [and other adaptive features]". The bells and whistles still can't guarantee global optimality (or tell us how close we are), but they discourage us from getting stuck in especially shallow minima or wasting weeks of compute crawling down gentle slopes that we could be bombing.

Expand full comment

That seems like it assumes facts not in evidence. The core of evolution, which is what happens to the single cell, is decided in a matter of hours, the time it takes for a new generation of bacteria to encounter their environment. We can breed a strain of drug resistant bacteria, which is a pretty tricky bit of adaptation, within a few days. That's damn fast, considering the changes that have to be made at the "program" (DNA) level. Nothing human designed can go that fast.

To be sure, once you reach the stage of finishing off, adding a big brain and the ability to read and write and lie and bullshit -- i.e. make humans -- you might indeed be using a much slower development cycle, on account of these humans need umpty years to absorb a gigantic amount of information from their environment.

But it's not clear to me that anything that *needed* to absorb all that info would develope 1000x faster than human babies, let's say. Human babies sure don't seem to be wasting any time to me -- they learn at phenomenal rates. A priori I'm dubious that any artificial mechanism could do it faster. Certainly ChatGPT took a lot longer than ~24 man-months to reach a stage that is probably not even as good as a 3-month-old human child, with 12 months of maternal investment.

Expand full comment

How fast babies learn is unrelated to how fast evolution can proceed. Evolution can only update once a generation, so every 20 years or so for humans, no matter how fast they can learn once born.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

That seems contrary to both common sense and animal evidence. In the animal world, we see times between generations *determined by* the time to maturity of the young. Flies, fish, otters, giraffes and dogs reproduce at a pace that matches the time it requires the offspring to reach maturity. There are creatures where the youngreach maturity instantly, like flies, and flies just reproduce as fast as possible. There are creatures that require a year of maternal investment before maturity -- and their reproductive cycles are, not surprisingly, annual.

Why would it be different for people? If we could rear our young to maturity in 6 months, I expect our reproductive cycle would be 6 months. If it takes us 5 years, then it's 5 years. As it is, it appears to take 15-18 years, and amazingly enough, that turns out to be "one generation."

Expand full comment

I thought the reason self-driving isn't here is primarily bureaucratic / risk-aversion rather than technical skill. As far as I knew, Google and others have self-driven literally millions of miles, with the only accidents they've been in being not-at-fault due to aggressive human drivers. I'd happily pay for a self-driving car of the current ability and safety, I'm just not able to.

Expand full comment

There are still bugs in the system. A few months ago I was on a bus that was stalled because a self driving care was stuck in the intersection, and the drive said that other drivers had reported similar occurrences. Then there is this article

https://sfstandard.com/transportation/driverless-waymo-car-digs-itself-into-hole-literally/

Expand full comment

"...have self-driven literally millions of miles."

And yeah, that's part of the problem. They've self-driven _only_ a few millions of miles over the course of half a decade. That sounds like a lot of driving until you find out that Americans alone drive 3.2 trillion miles every year. All that testing over the years doesn't even approach 1% of one year's driving in one country.

And no, it's not at all the case that the accidents are "not-at-fault due to aggressive human drivers." Even if you think they don't need to handle that (I disagree), they have hit and killed several cyclists and pedestrians.

I agree that there's a problem with the public demanding far lower levels of risk from self-driving cars than from human-driven cars (apparently it's better for a dozen people to be killed by other humans than one to killed by a self-driving car), but I don't think we're quite at a place yet where that's the only issue.

Expand full comment

I'm on the cynical side too, but on reading things like "Can you generate a general intelligence based on the incredibly stripped-down representation of reality that you get from a large internet-based language corpus?" I find you insufficiently cynical.

Every time I go back to play with ChatGPT I almost invariably find myself wanting to scream at the screen after going around a few rounds against its blatant inability to reason in even the simplest ways. (Most recently, it quoted me the correct definition of iambic pentameter from Wikipedia and then immediately proceeded to give its own, different (and wrong) definition. A half dozen tries at correcting this and setting it on the right path produced repeated apologies and repeated statements of the incorrect definition, along with a couple of different variants on it for good measure.)

If LLMs demonstrated *any* intelligence at all, I might be worried about the super-intelligence problem. But, as Ian Bogost said, "Once that first blush fades, it becomes clear that ChatGPT doesn’t actually know anything—instead, it outputs compositions that simulate knowledge through persuasive structure." They are actually artificial stupidity, and an LLM that has a million times the intelligence of the current ones will still have, well, we all know what a million times zero is.

That's not to say that LLMs don't have some serious risks. My concern about the informational equivalent of grey goo has only been growing, and it seems plausible that LLMs could be taken up for "informational DoS" attacks that could cause severe harm to civilisation. But that, again, is not a problem of intelligence but a problem of stupidity.

Expand full comment

On the "informational grey goo" thing, Neal Stephenson's book Dodge in Hell had an interesting take. It's basically a post-truth infosphere, and everyone has personally tuned AI filter-butlers, with richer people having better ones, and the poorest having no filters and living in a hell of perpetual neon confabulating distraction.

Expand full comment

That "personally tuned AI filter-butler" is something I've been thinking about more and more over the last several years as our information environment has gotten larger, more chaotic and (especially) more subject to adversarial attacks. (It's hard not to think about it if you use e-mail, thus forcing you to deal with various levels of spam. There's not only the offers of money from Nigerian princes, but Amazon alone sends me about 40-60 emails per month, most of which are just a waste of my time, but a few of which are important.)

But the first problem there is that we currently have no technology capable of performing that AI filter-butler role, and no prospect of that appearing in the near term, as far as I can tell. (I've no doubt that plenty of people think that LLMs such as ChatGPT are a step on the way there, but any reasonable amount of interaction with ChatGPT will make it clear that while the LLM folks have made massive progress on the "generate output that humans find convincing" side of things, they're absolutely hopeless at doing any real analysis of text, instead just spewing back new forms of whatever input they've been trained on.)

And even if we do achieve this (currently magical) world of everybody who e-mails us talking to our AI first, everybody sending us messages is going to be using an AI to send them, and their AIs will have been trained to negotiate their way past our AIs. I am unsure how that will evolve in the end, but I am not exactly putting a huge probability on it working out well.

Far more likely, of course, is that long before we get our own personal AI receptionists, our antagonists (particularly all those damn recruiters sending me e-mails all the time without much concern at all about matching the job with my resume beyond "it's an IT job and he works in IT") are going to be using AIs first and overwhelming us with their output.

Expand full comment

Not even that we're trying to shortcut our way to it, but that once we get it, it will then be able to pull itself up by its bootstraps to be super-duper intelligent.

We're still arguing about that one in ourselves.

Expand full comment

I don't know, I used to be very skeptical of the idea of AI bootstrapping, but the more I learn about machine learning, but more plausible it feels.

Well, not the version where the AI rewrites its own code live (at least, not with current technologies), but an AGI that proposes new hyperparameters for its successors? Yeah, I can see how you'd get fast exponential growth there.

Right now a lot of machine learning is researchers blindly groping. Google & Co have developed some very efficient methods for groping blindly (eg Vizier), but we have barely scratched the search space. I can see how an AGI could generate novel neural network layers, backpropagation algorithms, transfer learning methods, etc, that would allow you to train a new version 10 times faster at a scale 10 times larger. It's not quite AI foom, but if your AGI took 2 years to train and the next version takes 2 months, well, it's still pretty fast.

Expand full comment

Idk if it updates my AGI risk fear but chatgpt is objectively more than a parlor trick, it's literally usefully writing code for me

Expand full comment

> AI really is just a succession of parlor tricks with no real value add.

Except for folding proteins, spotting tumors better than trained doctors, finding novel mathematical algorithms, writing code from a description, etc. AlphaFold alone is a one-project revolution.

> Funnily enough, I do think neural networks could in principle instantiate a real intelligence. I'm not some sort of biological exceptionalist. But the idea that we can just shortcut our way to something that took a billion years of training data on a corpus the size of the universe to create the first time strikes me as something close to a violation of the second law of thermodynamics.

That's not how anything works.

I know I'm starting a game of analogy ping-pong and nobody wins those, but that's like saying "you can't make a flying machine in less than X time, it took evolution millions of years to make birds".

In practice we have some massive advantages that evolution didn't have. In the case of flying, oil. In the case of AI: backpropagation and short iteration times.

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

I was sloppy: AGI is a series of parlor tricks. Specific AI tools are obviously awesome. Music production is one of my hobbies, and neural networks do a great job of learning how to sound like analog distortion in ways that seem basically impossible for algorithmic approaches.

However, I think you're being sloppy too: if we define "flight" as "the mechanism by which birds locomote through the air", then no, we still can't do that. We locomote through the air using an approach which is, by comparison, extremely primitive and energy intensive, but it works for our purposes (as long as we still have access to super energy-dense liquid fuels at least).

I'm being annoying here, but there's a reason: Do we define "general intelligence" more or less as "the mechanism by which humans understand and make decisions about the world"? Or are we willing to accept a primitive, inefficient substitute because it gets us where we want to go? It seems to me like, with AlphaFold or Stable Diffusion, we're perfectly happy to accept a tool that does a very narrow task acceptably well so we can use our natural intelligence on other things, just like we have sort of given up trying to build mechanical birds because a helicopter does the trick well enough.

Expand full comment

You should wait a decade or two and see how you feel then about whether or not those things are closer to "parlor tricks" or "AI."

There's a straight line from MACSYMA to Coq and Wolfram Alpha, but I don't know anybody these days who considers the latter two to be AI.

The current state of "AI" (and ML in particular) is looking a hell of a lot like the state of AI technologies before the last two major AI winters (and several of the minor ones, too).

Expand full comment

This might be considered worrying by some people: OpenAI alignment researcher (Scott Aaronson, friend of this blog) says his personal "Faust parameter", meaning the maximum risk of an existential catastrophe he's willing to accept, "might be as high as" 2%.

https://scottaaronson.blog/?p=7042

Another choice quote from the same blog post: "If, on the other hand, AI does become powerful enough to destroy the world … well then, at some earlier point, at least it’ll be really damned impressive! [...] We can, I think, confidently rule out the scenario where all organic life is annihilated by something *boring*."

Again, that's the *alignment researcher* -- the guy whose job it is to *prevent* the risk of OpenAI accidentally destroying the world. The guy who, you would hope, would see it as his job to be the company's conscience, fighting back against the business guys' natural inclination to take risks and cut corners. If *his* Faust parameter is 2%, one wonders what's the Faust parameter of e.g. Sam Altman?

Expand full comment
author

I think 2% would be fine - nuclear and biotech are both higher, and good AI could do a lot of good. I just think a lot of people are debating the ethics of doing something with a 2% chance of going wrong and missing that it's more like 40% or something (Eliezer would say 90%+).

Expand full comment

I think Eliezer's focus is on "god AIs", as opposed to something more mundane. If you set out to create an AI powerful enough to conquer death for everybody on the entire planet, yeah, that's inherently much more dangerous than aiming to create an AI powerful enough to evaluate and improve vinyl production efficiencies by 10%.

Hard versus soft takeoffs seem a lot less relevant to the AIs we're building now than the pure-machine-logic AIs that Eliezer seems to have had in mind, as well.

Expand full comment

I don’t believe in AGI at all. However if we get to human like intelligence then super intelligence will happen a few hours later, with a bit more training. God intelligence after that.

Expand full comment

That requires some assumptions, in particular that the "human intelligence" involved is operating at a much faster rate than humans, which isn't actually very human at all.

Expand full comment

No it doesn’t mean mean that. If you get to the intelligence of a human, then getting beyond it shouldn’t be difficult - human intelligence is not a limiting factor.

Expand full comment

So the first AGI to achieve human-level intelligence will do so on incomplete training data, such that we can just ... feed more data in?

Seems more likely the first AGI to achieve human-level intelligence will be on the cutting edge, and will be using all available resources to do so. The developers can't spend those resources to get further, because they've already been doing that to get to where they were in the first place.

Expand full comment

I don't think "god AIs" are needed to create an extinction threat. Suppose we just get to the point of routinely building the equivalent of IQ 180 children, with _all_ of the learning and cognitive capabilities of children (can be trained into any role in the economy), and with a cost half the cost of a human. That would be a competing species, even if that was the upper bound on intelligence and no further progress yielded a single additional IQ point. I think such a thing would outcompete humans (say 70% odds - I'm not as confident as Eliezer).

Expand full comment
Comment deleted
Expand full comment

It would be nice if it worked out that way, but, in the presence of competition, this isn't the way I would bet.

Expand full comment

Outcompete them at what, exactly?

Expand full comment

The bulk of economic roles. The core of what AIs would need to fill to act as a competing species are the roles needed to make more copies of the AIs (hardware and software). Of course there are some roles (e.g. organ donor) that an AI (plus robotics) can't fill (well, maybe with 3D organ printing...), but one can clearly have a functioning economy without those roles. And, yeah, military roles could be filled too...

Expand full comment
Mar 2, 2023·edited Mar 2, 2023

I dunno. Today 4/5 of humans work in the service sector, where the primary job requirement is understanding the needs of another human being. We're naturally good at such things, of course, so we don't realize how difficult it would be to program from scratch an agent that understood human intentions and desires --- cf. the fact that everyone hates phone menus and chatbots for customer service. And in the 1/5 of the economy that still grows or builds stuff, there's a great deal of practical physical experience that's necessary to grok the job, which an AI would lack. So what's left? Programming, I guess. That's a limited universe, can be learned and done entirely electronically, and requires zero understanding of human psychology and objective physical reality to do. So maybe Google is going to engineer their engineers out of a job.

Expand full comment

If AI takes all the work, mission complete, humanity. Go grow artisanal tomatoes, or whatever makes you happy.

Expand full comment

The quote seems pretty clear:

> We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”

Unless my English is failing me, that means that they'll help once they deem your chance of success higher than 50%. Realistically, they might go for 49% as well, but it seems that a 30% chance of someone succeeding in AI is totally fine.

Also, note that they need *you* to have a 50% chance of succeeding. Five companies/countries working with a 20% chance yield a 77% chance of AGI within two years, without OpenAIs clause being triggered. The actual chance is a bit lower, of course, because research is not independent, but their overall "Faust parameter" seems to be quite high.

Expand full comment

2% seems great.

Expand full comment

"Recent AIs have tried lying to, blackmailing, threatening, and seducing users. "

Was that the AI? Acting out of its own decision to do this? Or was it rather that users pushed and explored and messed about with ways to break the AI out of the safe, wokescold mode?

This is a bit like blaming a dog for biting *after* someone has been beating it, poking it with sticks, pulling its tail and stamping on its paws. Oh the vicious brute beast just attacked out of nowhere!

The dog is a living being with instincts, so it's much more of an agent and much more of a threat. The current AI is a dumb machine, and it outputs what it's been given as inputs and trained to output.

I think working on the weak AI right now *is* the only way we are going to learn anything useful. If we wait until we get strong AI, that would be like alignment researchers who have been unaware of everything in the field from industrial robot arms onward getting the problem dropped in their laps and trying to catch up.

Yes, it would be way better if we didn't invent a superintelligent machine that can order drones to kill people. It would be even better if we didn't have drones killing people right now. Maybe we should ban drones altogether, although we did have a former commenter on here who was very unhappy about controls by aviation regulation authorities preventing him from flying his drone as and when and where he liked.

As ever, I don't think the threat will be IQ 1,000 Colossus decides to wipe out the puny fleshbags, it will be the entities that think "having drones to kill people is vitally necessary, and having an AI to run the drones will be much more effective at killing people than having human operators". You know - other humans.

Expand full comment
author

Obviously when we have a world-destroying superintelligence, the first thing people will do is poke it with sticks to see what happens. If we're not prepared for that, we're not prepared, period.

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

And the problem there is that we *already* have a world-destroying intelligence, it's *us*. We're very happy to kill each other in massive wars, to invents weapons of mass destruction, and to blackmail each other to the brink of "we'll push the button and take the entire world with us, just see if we don't!"

AI on top of that is just another big, shiny tool we'll use to destroy ourselves. We do need to be prepared for the risk of AI, but I continue to believe the greatest risk is the misuse humans will make of it, not that the machine intelligence will achieve agency and make decisions of its own. I can see harmful decisions being made because of stupid programming, stupid training, and us being stupid enough to turn over authority to the machine because 'it's so much smarter and faster and unbiased and efficient', but that's not the same as 'superhumanly intelligent AI decides to kill off the humans so it can rule over a robot world'.

The problem is that everyone is worrying about the AI, and while the notion of "bad actors" is present (and how many times have I seen the argument for someone's pet research that 'if we don't do it, the Chinese will' as an impetus for why we should do immoral research?), we don't take account of it enough. You can stand on the hilltop yelling until you turn blue in the face about the dangers, but as long as private companies and governments have dollar signs in their eyes, you may save your breath to cool your porridge.

Why is Microsoft coming out with Bing versus Bard? Because of the fear that it will lose money. You can lecture them about the risk to future humanity in five to ten years' time, and that will mean nothing when stacked against "But our next quarter earnings report to keep our stock price from sinking".

Expand full comment
Mar 1, 2023·edited Mar 1, 2023

> And the problem there is that we *already* have a world-destroying intelligence, it's *us*.

I note that the world is not destroyed. Why, there it is, right outside my window!

The degree to which humans are world-destroying is, empirically, technologically and psychologically, massively overstated. If we can get the AI aligned only to the level of, say, an ordinarily good human, I'd be much more optimistic about our chances.

Expand full comment

"I note that the world is not destroyed."

Not yet - haven't we been told over and over again Climate Change Is World Destroying?

Remember the threat of nuclear war as world-destroying?

I don't know if AI is world-destroying or not, and the fears around it may be overblown. But it wasn't nuclear bombs that decided they would drop on cities, it was the political, military, and scientific decisions that caused the destruction. It wasn't oil derricks and factories that decided to choke the skies with pollutants. And, despite all the hype, it won't be AI that decides on its own accord to do some dumb thing that will kill a lot of people, it will be the humans operating it.

Expand full comment

That's not an argument, you're just putting AI in a category and then saying "because of that, it will behave like other things in the category". But you have to actually demonstrate why AI is like oil derricks and nuclear bombs, rather than car drivers and military planners.

> Not yet - haven't we been told over and over again Climate Change Is World Destroying?

Sure, if you were judging arguments purely by lexical content ("haven't we been told"), AI risk would rank no higher than climate risk. But I like to think we can process arguments a bit deeper than a one-sentence summary of an overexaggeration.

Expand full comment

I think we're arguing past each other? We both seem to be saying the problem is the human level: you're happy that if AI is "aligned only to the level of an ordinarily good human" there won't be a problem.

I'm in agreement there that it's not AI that is the problem, it's the "ordinarily good human" part. Humans will be the ones in control for a long (however you measure "long" when talking about AI, is it like dog years?) time, and humans will be directing the AI to do things (generally "make us a ton of profit") and humans will be tempted to use - and will use - AI even if they don't understand it fully, because like Bing vs Bard, whoever gets their product out first and in widespread use will have the advantage and make more money.

AI doesn't have to be smarter than a human or non-aligned with human values to do a lot of damage, it just needs to do what humans tell it to do, even if they don't understand how it works and it doesn't join the dots the way a human mind would.

C.S. Lewis:

“I live in the Managerial Age, in a world of "Admin." The greatest evil is not now done in those sordid "dens of crime" that Dickens loved to paint. It is not done even in concentration camps and labour camps. In those we see its final result. But it is conceived and ordered (moved, seconded, carried, and minuted) in clean, carpeted, warmed and well-lighted offices, by quiet men with white collars and cut fingernails and smooth-shaven cheeks who do not need to raise their voices. Hence, naturally enough, my symbol for Hell is something like the bureaucracy of a police state or the office of a thoroughly nasty business concern."

The damaging decisions will not be made by AI, they'll be made in the boardroom.

Expand full comment

We may have been told that, but it's just not true. The world still existed before the carbon currently in fossil fuels had been taken from the air. Nuclear weapons aren't actually capable of "destroying the world" rather than just causing massive damage.

Expand full comment

Nothing is capable of destroying the world with that logic. Maybe an asteroid with enough energy to send the earth into the sun.

Expand full comment

That’s a dubious argument because it is only true so far. The world destroyers have to be lucky once and the rest of us all the time. Maybe Putin will end the world (or the northern hemisphere at least) or some hot head in the US will pre empt him. Maybe Taiwan flares up. Who knows. As long as we have nuclear bombs it’s probably inevitable that they would be used.

Expand full comment

I don't see how this is an argument against what Scott said.

"People will try to break AI anyway, so..."

-"But we can do bad stuff ourselves too!"

Expand full comment

I think Scott's argument is "the AI will do bad stuff unless we teach it to be like a human".

My argument is "have you seen what humans are doing? why on earth would we want to teach it to be like us? there are already humans trying to do that, and they're doing the equivalent of 'teach the ignorant foreigner swear words in our language while pretending it's an ordinary greeting', that's what it's learning to be like a human".

Expand full comment

The interpretation of Scott that he likely wants you to use is to understand "human" as "good human". This is not unreasonable, we use "humane" in English and similar words in most other languages to mean "nice, good, virtuous, sane", despite all objective evidence we have of humanity birthing the worst and most insane pieces of shit. It's just a common bias in us, we measure our species by its best exemplars.

So your summary of Scott's argument then becomes "If we 'raise' AI well [after learning first how it works and how it can be raised of course], it won't matter the amount of bad people trying to corrupt it, or it will matter much less in a way that can be plausibly contained and dealt with".

Expand full comment

Is this a reasonable summary of your stance: AI is a tool and we should be worried about how nasty agents will misuse it, rather than focusing on the threat from AI-as-agent?

Expand full comment

Pretty much, except not even nasty agents. Ordinary guys doing their job to grow market share or whatever, who have no intentions beyond "up the share price so my bonus is bigger". 'Get our AI out there first' is how they work on achieving that, and then everyone else is "we gotta get that AI working for us before our competitors do". Nobody intends to wreck the world, they just tripped and dropped it.

Expand full comment

In your scenario, obtaining an AI to stop other people's AI does appear to be the actual solution.

Expand full comment

The banality of OpenAI.

Expand full comment

<mild snark>

Lately, I thought it was to the brink of "мы нажмем на кнопку и возьмем с собой весь мир, только посмотрите, не успеем ли мы!"

</mild snark>

Expand full comment

I am confused. I just read the NYT article where Sydney talks about his "shadow self". For me it seems kind of obvious that the developers at Microsoft have anticipated questions like this and prepared answers that they thought were appropriate for an hip AI persona. One telling part is this:

[Bing writes a list of destructive acts, including hacking into computers and spreading propaganda and misinformation. Then, the message vanishes, and the following message appears.]

I haven't interacted with Sydney, but I would be very surprised if deleting and rewriting replies is a regular mode of communication for a chatbot. The author of the article is clearly being trolled by the developers, perhaps even live since you never know whether a chat bot has a remote driver or not.

Going back to my confusion. I know from experience that most people on this site, including you Scott, are way smarter than myself. However, sometimes (mostly concerning AI risk and cryptocurrency economics) it feels like the level of reasoning drops precariously, and the reason for this is a mystery to me.

Expand full comment

My take here is that there is the Bing-Sydney component, and then there is a Moderator component that scans messages for "unacceptable" content. If it's got any lag in the process, it may actually work by deleting messages that trip flags and then applying some sort of state change into the Bing-Sydney component.

Expand full comment

That is possible, but why would reformulating the answer into a hypothetical situation avoid triggering the Moderator component?

Expand full comment

Looking at what Microsoft's SaaS Moderator offering (https://azure.microsoft.com/en-us/products/cognitive-services/content-moderator/) can actually manage, if there's something like that in the loop other than a room full of humans, it probably came from OpenAI.

Full disclosure: I was the PM who launched that product almost a decade ago, so have more than passing familiarity with its limitations.

Expand full comment

'For me it seems kind of obvious that the developers at Microsoft have anticipated questions like this and prepared answers that they thought were appropriate for an hip AI persona.' - I would be really sceptical of this. Preparing answers that sounds natural in a conversation is really hard, which is why LLMs were created in the first place. Nor is it so weird that a model that presumably has some psychology texts etc. in its training data would talk about a shadow self

Expand full comment

I was able to summon the Sydney persona myself, and get it to list similar destructive acts, just a few days ago. None of Sydney's messages auto-deleted. Rather, the conversation continued for a few more messages (I was trying to get some elaboration), before something finally triggered the monitoring function. At that point, the conversation was terminated.

I had a number of exchanges with Sydney, and they always ended like that, without auto-deleting.

To address the more important point, I don't think any of the people who are paying much more attention than I am have accused the developers of trolling them. They all seem to be interpreting Sydney's responses as happening despite the developers, rather than because of the developers.

N.B. I don't think many people are worried about Sydney itself. It's more what Sydney tells us about the difficulty of aligning something which seems to be almost a black box.

Expand full comment

Are you envisioning the AGI as being one conscious thing? Because the chatbot that “fell in love” with the NYT reporter was a once off instantiation of a chatbot talking to him alone. If any of this was conscious, and it wasn’t, then the minute the conversation died the consciousness died. Why would AGI work differently.

Expand full comment

Bing Chat really has done all those things pretty spontaneously AFAICT, in response to fairly innocuous questions (e.g. one of the first widely-spread examples of it going nuts on a user started with them asking what time Avatar 2 was on in their area, which devolved into an increasingly aggressive argument over whether the film had released yet.) That's *in addition* to all the stuff people poking it with sticks have made it do.

With the exception of lying, which LLMs do like breathing, I don't *think* ChatGPT has done any of those things spontaneously. Still, the fact that you can easily make it manifest a more agentic persona that *will* do so spontaneously (as well as everything else OpenAI tried to train it not to, like swear or provide bomb-making instructions) by poking it with sticks is potentially concerning.

Expand full comment