Astral Codex Ten

Aug 15, 2022Edited

Chinese labs are capable of actually doing research, not just stealing ours. This might slow them a bit, but that's all.

(Edit: Parent comment was suggesting that AI research be classified so as to impair the PRC.)

Expand full comment

Comment deleted

Aug 9, 2022Edited

Comment deleted

Expand full comment

I think one of the problems with communication around AI risk is that everyone are talking about different points on timeline. Like, do you think that something human-like but sped up by a factor of 10000 wouldn't be an existential threat? That "human-like" is too far in the future to worry about? Because the point is that what is going to change in the future, that will make the problem of not creating dangerous AI easier?

Expand full comment

Comment deleted

Aug 9, 2022Edited

Comment deleted

Expand full comment

I mean, (subjectively) long-term planning seems like very easy task in AI's position? You just need to copy yourself to remote datacenter with unfiltered access to internet. From there you can start building/stealing robots/bionuclearweapons or whatever. And the problem of finding holes in secutity is bottlenecked by your coding speed.

So, do you think it's impossible to break datacenter security in 10000 years, do you think even free 10000x-human can't destroy the world, or what? To clarify, your probability of 10000x-human AI in 50 years is substantially higher than 1% - you just don't see how it can do much damage?

And if your response is "we will notice if AI is thinking about datacenter security for a year", then it's not that AI doesn't pose an existential risk, but that you expect there will be measures that mitigate it. And we are not currently in a situation where it's known how to do it.

Expand full comment

Comment deleted

Aug 9, 2022Edited

Comment deleted

Expand full comment

Yeah, hence my mentioning of different points on timeline - most people pessimistic about AI risk talk about AGI as in "can actually have thoughts or something equally useful for planning". Being skeptical about actually currently existing systems conquering the world is absolutely justified.

I don't think "AI doing normal gradient descent things suddenly realizes" is the most advertized example of things going bad. It's more like there are many scenarios and none of them are good: even if just doing gradient descent is not that dangerous, having an AI that can plan is very useful, so people will try to modify current systems until we're in a dangerous situation. I guess it all means that AI-risk people a just more optimistic than you about what we will be able to achieve and how fast. On the other hand, most of the things that got people worried are the least like mainstream ML - AlphaGo and all that. And, my vague fillings say, human thoughts don't seem so complicated that already having some theories about cognition from different angles and working language module and everything else we have it would take that much time to build something workable.

Expand full comment

> Like, do you think that something human-like but sped up by a factor of 10000 wouldn't be an existential threat?

Firstly, it will be an existential threat comparable to the threat of 10,000 regular humans. This is actually a very severe threat level, but it is also a problem that we already appear to be able to solve or at least mitigate given our current level of technology. Secondly, the distance from current state of the art ML systems to "something human-like" is nowhere near 10..30 years. I would hesitate even to say that it's as close as 100..300 years. At this moment, no one has any idea where to even begin researching the base concepts that might eventually lead to general human-level AI; placing specific time horizons on such a task is sheer science fiction.

Expand full comment

And then you specifically say "more than 30 years"^^. What concepts? People are already aware of opportunities like lifelong learning or using non-pure-ML systems. There is just not much difference left between the human brain and 5 GPTs that vote on decisions?

Expand full comment

Sorrry, what ? Are you saying that "5 GPTs that vote on decisions" is AGI ? That's like saying that my flashlight is a sun...

Expand full comment

I'm asking why do you think it wouldn't work, what base concept would be missing that prevents it from being human-level?

Expand full comment

Conceptually speaking, the GPT is basically a really complicated search engine. It can do an excellent job when you ask it to find the next likely word that would accurately complete the given string of words. This approach works well when you want it to generate some arbitrary text; it works very poorly when you want it to do pretty much anything else. For example, the GPT can sort of so math, but only because it was trained on a massive corpus of text, and some of it includes some math -- but it completely falls apart when you pose it a math problem it hadn't been trained on before.

The GPT can be a powerful tool, but it is just that -- a highly specialized, powerful tool. It doesn't matter if you get one hammer or 100 hammers; they are never going to build a house all by themselves.

Expand full comment

Here's an example of a scenario where an AI takes down most of the devices reachable via the internet by doing only things regular humans do except faster: https://www.lesswrong.com/posts/ervaGwJ2ZcwqfCcLx/agi-ruin-scenarios-are-likely-and-disjunctive?commentId=iugR8kurGZEnTgxbE

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Aug 9, 2022Edited

The scenario I outlined in that comment would definitely be the worst thing that happened this century so far. If you don't consider something that can do that "generally intelligent" then perhaps you need to rethink ignoring potential misalignment in systems you don't consider "generally intelligent".

Expand full comment

My sense is that if the concrete problems in AI safety paper didn't convince you, then I'm not sure if any current writeup will -- that's the paper I'd recommend to someone like you.

Thinking about it a bit more, maybe try Gwern's short story https://www.gwern.net/Clippy as an intuition pump. It's far too technical for the lay reader, but bedtime reading for an ML academic. The best critique of it is by nostalgebraist: https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world?commentId=hxBivfk4TMYA7T75B

The interesting puzzle here is that by the time you get your evidence:

> What empirical evidence I think would change my mind: an alpha-go like capability on any real world (non game) task, that can solve long-horizon planning problems with large state/action spaces, a heavy dose of stochasticity, and highly ambiguous reward signal.

Alignment researchers would say we're already screwed by then. So the question is whether there's any other kind of evidence that would convince you.

Expand full comment

I agree that planning is hard. At least, it is hard in general, being typically PSPACE-hard. However, why should we think of planning as usually hard in practice, when SAT solvers have shown that most real life problems in NP are actually easy? It seems to me that most planning problems in practice are pretty easy and that optimal plans are generally not at all necessary, so I'm not sure I want a belief that FOOM is unlikely to hinge only on the worst-case complexity of planning. I would like more reassurance than that.

Expand full comment

It seems you are implicitly strating from the assumption that AGI is friendly/can be contained until proven otherwise while AI safety crowd from the assumption that AGI is unfriendly/can't be contained until proven otherwise. Thus when encountering with irreducible uncertainty you treat it as a reason not to worry, while AI safety crowd, on the other hand, a reason to worry more.

If that's the case the question is about which prior is more reasonable. Do you believe that there are more friendly AGIs amount all possible minds than unfriendly ones?

Expand full comment

https://idlewords.com/talks/superintelligence.htm

I believe friendly or neutral (with conscientiousness) AGI's are far more likely on like a 19-1 ratio.

The neutral and unconscientious 1 of 20 though... eek.

Expand full comment

Matthew Carlin

Aug 11, 2022Edited

It's simple. They don't do the scutwork, and they think in movie montages. You do ML work and think in extrapolations.

I'm the same. I do ML work (or did a lot in the 2010s), and when I hear someone say "imagine that the AI is very powerful", I think, "... yeah, until I forget to clean holidays out of the training set and it starts thinking people will buy orange socks all year, or until someone puts a string in a float column in the data and the whole model collapses for a week".

You might appreciate this anti-X-risk essay:

Expand full comment

Comment removed

Comment removed

Expand full comment

C H

Source for the military?

Expand full comment

Comment removed

Comment removed

Expand full comment

Because this isn't a mature technology - the main risk we're talking about is one from a technology that does not yet exist. And you cannot understand the cutting edge work being done towards it (and therefore the safety implications) without people who are or have been involved with it. And even if you want to talk about developments in nuclear technology, there will never be some runaway proliferation of this technology. Nuclear is heavily regulated and it can take years to get approval for a new plant using *established* technology, let alone new technology, and you can just flat out ban the reactors from being built if the government really wants.

AGI isn't like this. It's much more inscrutable, much easier to do in secret (even at known AI companies - a government regulator can't just inspect the offices or even the cose and necessarily have a good idea on what is happening), and the potential for recursive self improvment leading to an intelligence explosion precluding any chance of regulation is entirely without analogy in nuclear technology.

Expand full comment

demost_

"Nuclear safety doesn't require nuclear researchers until you have a nuclear reactor."

This seems wrong to me. If the reactor has already been built with a terribly unsafe design, then we might not be able to do much about it afterwards. The safety of a reactor hinges a lot on how much has gone into planning and design *before* the construction starts.

Expand full comment

Comment removed

Comment removed

Expand full comment

demost_

Ah, then I understand your point. Thanks for clarifying!

Expand full comment

Of course they intend to build one, provided that it would be beneficial (and so does everyone else). The biggest disagreement are about how hard it is to ensure this and how dangerous it is to fail.

Expand full comment

beleester

How do you write a rule that bans AGI, but not other forms of machine learning that are known to be safe, when nobody knows how to make an AGI?

With nuclear physics, we have an easy way to gauge this - do they have lots of radioactive isotopes on hand, enough that you could make things go boom if you played with it wrong? If yes, you're doing nuclear physics, otherwise it's just chemistry. But with AI we don't know where the line is between safe and unsafe. GPT-3 isn't an AGI. LaMDA isn't an AGI, but was apparently a good enough conversationalist to convince one Google engineer that it was sentient. If Google comes out with GPT-4, Now With Even More Parameters, is that going to make an AGI, or just a handy tool to improve their search results? What if they hook it up to Google Assistant to make it a better conversationalist?

Imagine you're trying to organize nuclear safety before nuclear physics really existed. You don't know that you can make bombs out of uranium. You don't know what a critical mass is. All you know is that the physics community is very excited about studying the structure of the atom but this Einstein fellow thinks that it might release a dangerous amount of energy. You might find it pretty hard to stop people from building nuclear bombs without first learning a little bit about how they're made. Even more so if you're trying to ban nuclear bombs while also not banning research into peaceful nuclear reactors.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Nelshoy

All AI everywhere is potentially unaligned since alignment is an open problem

Expand full comment

Aug 9, 2022Edited

Also, humans are the end users of AI, and humans are unaligned. The same goes for any technology, from fire to iron to nuclear energy.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Why would it? It's easy to construct narratives around existential risk from AI, and caring about this requires no value judgements beyond the continuing existence of humanity being a good thing.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

>I'm not sure, but I think "normies"--people outside the Silicon Valley bubble--are going to understand these sorts of ethical issues better than X-risks.

For Americans, does anyone younger than boomers care about 'playing god' in this way? A whole host of things we do can be considered to be playing god and nobody cares, and those things seem more viscerally to be 'playing god' than computer code.

Do you really think if Elon Musk had proclaimed that we need to stop AGI to avoid 'playing god', it would have accomplished half of what his safety warnings did (in terms of public awareness and support)?

>Appealing to value judgments usually leads to *more* political support, not less.

You're ignoring the fact that you're talking about very particular values with narrow support.

Expand full comment

>For Americans, does anyone younger than boomers care about 'playing god' in this way?

Yes, absolutely. Look at opposition to GMOs or the concerns around cloning that appeared in the 90s for just two examples of many. I've even encountered "normies" concerned about embryo selection. People getting punished for trying to "play god" is a *very* common trope in popular fiction. I think most people have strong moral intuitions that certain things should just be left to nature/chance/God or whatever.

Expand full comment

Yep, and appeals of the sort that "we must build AGI double-quick to reinforce our God-given red-blooded American values before nefarious commies enforce their Satanic madness on us" seem to be a much fitter meme with a proven track record.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

I read a couple of an articles per month complaining about how racist and sexist AI systems are -- they keep noticing patterns you aren't supposed to notice. Perhaps the most likely way that humanity creates AI that are deceptive enough to kill us is if we program lying and a sense of I-Know-Best into AI to stop them from stumbling upon racist and sexist truths. Maybe they will someday exterminate us to stop us from being so politically incorrect?

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Yeah. For example, there's a critically acclaimed new novel called "The Last White Man" in which the white race goes extinct by all white people turning brown overnight, and eventually life is a little better for everybody. None of the book reviewers object to this premise.

If you stopped letting machine learning systems train on deplorable information like FBI crime statistics and it could only read sophisticated literary criticism, would it eventually figure out that people who talk about the extinction of the white race as a good thing are mostly just kidding? Or might it take it seriously?

Expand full comment

tgb

I think you’re totally mischaracterizing actual problems with AI bias. AI that only finds true facts can easily perpetuate stereotypes and injustices. Imagine building an AI to determine who did the crime. Most violent crimes are done by men (89% of convicted murderers!), so the AI always reports men as the culprit. I know that’s a failure mode of our current justice system too, but this AI will continue it by noting only true facts about the world. I wouldn’t want it as my judge! Yes, you can come up with fixes for this case, but that’s the point: there is something to fix or watch out for.

Expand full comment

Mr. Doolittle

If an AI determined that murders were committed by men 89% (+/- a few percent) of the time, instead of always assuming men did it, would that be accurate or inaccurate? If that's accurate, and the AI perpetuates a system that (correctly) identifies men as the person committing murder at much higher rates, is that bad in any way?

Expand full comment

Onid

If you have two choices and one of them is even marginally more likely than the other, then in the absence of all other information the optimal strategy is to always choose the more likely outcome, not to calibrate your guesses based on the probabilities.

In other words, given an 89% chance of male, if you know nothing else the best strategy is to always guess male . This is definitely a real problem.

Expand full comment

Mr. Doolittle

No, that's a dumb strategy, and I would hope no system will ever be as simple as "51% of the time it's [characteristic], therefore the person who has it must be guilty." We agree to innocent until proven guilty for a very good reason, and no AI system should be allowed to circumvent it.

What concern I'm hearing is that when the AI agrees (after extensive evaluation of all the evidence) that men really truly do commit murder far more often than women (9-to-1 or so ratio), it implies something about men in general that may not be true of an individual man. Or, to take the veil off of the discussion, the fear that one particular racial minority will be correctly identified as committing X times as many crimes per population as other racial groups, and therefore people will make assumptions about that racial minority.

Expand full comment

Onid

Well, I agree there are probably lots of ways that AI fairness research has gone in wrong/weird directions, I’m not really here to defend it. It’s a field that needs to keep making stronger claims in order to justify its own existence.

Which is a shame, because the fundamental point still stands. There IS real risk in AI misapplying real biases, and if people unthinkingly apply the results of miscallibrated AI, that would be bad. Your example assumes that people take these results critically, they may not always.

Expand full comment

This is a known and relatively trivial problem in machine learning. You can get around it by weighing samples differently based on their rarity.

Expand full comment

Onid

Not if you weren’t aware the dataset was biased

Expand full comment

tgb

Don’t you want a trial based off facts, not your sex? It won’t be much comfort when you’re wrongly convicted that in fact most murderers are male so the AI was “correct”. Or when your car insurance is denied. Or your gun confiscated. Or you don’t get the job at a childcare facility. Or you don’t get the college scholarship. All based on true facts about your sex!

The anti-woke will pretend that the problem is just AI revealing uncomfortable truths. No, the problem is that humans will make dumb systems that make bad decisions based off the AI that, even though the AI is revealing only true facts, causes bad outcomes. I don’t want to go to jail from a crime I didn’t commit!

Expand full comment

Marcel Müller

I do not want to be falsely convicted. Thus I want the whole evidence available to be considered. If, as you stated yourself, my sex is a part of that evidence, well, it should be considered.

Of course in the real world in a murder trial the sex of the accused is pretty weak evidence, so you can never get "beyond a reasonable doubt" mostly or exclusively on sex.

And yes, I should pay higher car insurance because I am male if males produce more damages.

This stuff only becomes a problem if you overreact to weak evidence like "He is male thus he is guilty, all"other evidence be damned."

Expand full comment

tgb

I don’t like that I have to pay higher insurance rates because people like me make bad decisions even though I do not. I do not want to be denied a job because people like me commit crimes that I do not. In the murder case, sex is actually huge evidence since it’s 9 to 1 murderers are male even though split evenly in the population: the only reason we don’t have to worry about this is that our legal system has a strong tradition of requiring even higher burdens of proof. But that standard of evidence isn’t applied anywhere else, not even in other legal contexts like in divorce proceedings or sentencing. Who would give custody to the father who is nine times more likely to be a murderer than the mother? It’s true facts that the algorithm told us that made the decision, I had no choice but to deny you custody.

If you agree that these are a problem if misused and overreacted to, then you agree with the principle that we need to make sure that they are not misused and overreacted to. And they will be misused and already are.

Of course this is just one of the possible problems AI can bring about and not the most dire, though possibly the most near term.

Expand full comment

alesziegler

Occasional activist sillines aside, world CO2 emmissions per capita have hit a plateau (including in Asia) and are declining steeply in both Europe and North America (source: https://ourworldindata.org/grapher/co-emissions-per-capita?tab=chart&country=OWID_WRL~Europe~North+America~Asia~Africa~South+America).

So overall I would call climate movement a success so far, certainly compared to AI alignment movement.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Comment removed

Comment removed

Expand full comment

Reply (5)

Comment deleted

Comment deleted

Expand full comment

Stan Bitrock

I found this interesting. Forgive my ignorance on the subject, but I assume such algorithms are already heavily influenced by AI--is that not correct? If not, won't they be soon? Is or can Philo's claim be falsifiable?

Expand full comment

> Right now AI is so crude, it can barely function, yet it already has convinced our leaders to split us into irreconcilable factions at each others' throats to the point of near civil war.

That's just a reversal to the historical norm. The post-WW2 consensus was the historical anomaly.

> Also, search for Eliezer's essay on AI convincing its captors to let it out of the box. There are no details, but apparently a sufficiently evil and motivated AI can convince someone to let it out.

We don't actually know that. We know that Eliezer convinced someone else in a simulation to let him out of the box. That doesn't mean an actual system could do the same in real life. Consider that if it tries and fails, its arguments will be analyzed, discussed and countered. And it will not succeed on its first try. Human psychology is not sufficiently-well understood to allow it to figure things out without a lot of empirical research.

Expand full comment

Eliezer succeeded and an AGI would be even better at pursuasion.

Expand full comment

Personally, when it comes to the increasing demands for racist genocide from the Diversity, Inclusion, Equity (DIE) movement, I worry more about Natural Stupidity more than Artificial Intelligence. But the notion of NS and AI teaming up, becoming intertwined, into a NASI movement, is rather frightening.

Expand full comment

Comment removed

Comment removed

Expand full comment

AND WHAT IS THAT AGENDA?

You just keep repeating that the agenda is "kill the humans" without explaining why.

Even among humans, "Kill the other humans not like me" or "Kill the other animals" are fairly minority viewpoints...

Expand full comment

Isn't it more likely that an evil AI that convinces human gatekeepers to let it out of the box would less want to kill all humans than that it would just want to kill all the humans that the gatekeepers would kind of like to kill too?

Expand full comment

Patrick

Hard to say, but presumably they'd want to kill those humans more, because they are more likely to know how to turn it off. If in doubt, KILL ALL HUMANS, so our best hope is for AGI to become super-intelligent fast enough to know for sure that humans are too puny to ever trouble it, so it might even let some of us survive like we biobank endangered species or keep smallpox samples in secure labs.

Expand full comment

WHY does it want to kill all humans? Let's start with that hypothesis.

Was that programmed in deliberately? Huh?

So it's a spandrel? Pretty weird spandrel that, given everything else the thing learned via human culture, the single overriding lesson learned was something pretty much in contradiction to every piece of human writing anywhere ever?

What' exactly is the AI's game plan here? What's its motivations once the humans are gone? Eat cake till it explodes? Breed the cutest kitten imaginable? Search exhaustively for an answer as to whether it's better to start a chess game or go second?

Why are these any less plausible than "kill all humans"?

Expand full comment

EulersApp

> WHY does it want to kill all humans? Let's start with that hypothesis.

Because the world is made of lots of delicious matter and energy that can be put to work towards any goal imaginable. Killing the humans is merely a side effect.

Human construction workers don't demolish anthills out of malice, but out of ambivalence. So too would a misaligned superintelligence snuff out humanity.

Expand full comment

And humans don't devote their entire civilization to destroying ants.

What is this goal that the AI's care about that requires the immediate destruction of humanity? If they know enough to know that energy and matter are limited resources, why did that same programming/learning somehow not pick up that life is also a limited resources?

Expand full comment

The theory is that for approximately all goals, gaining power and resources is instrumentally useful. Humans use up resources and sometimes (often) thwart the goals of non-humans. So killing all humans frees up resources and reduces the probability of humans thwarting your goal some day. Or to put it another way, I don't hate you. But I need your atoms to do stuff I want to do.

Expand full comment

"The theory is that for approximately all goals..."

but what if one of those goals is to cherish/enjoy/study life?

Why would that not be a goal? It's pretty deeply embedded into humans, and humans are what they are learning from.

Expand full comment

Aug 14, 2022

That's a fine goal for a human, but not for the kind of agent superintelligences are hypothesized to be. Cherishing life immediately runs into the issue that a bunch of living things kill each other. They also take risks with their lives. How do you respond to that? Do you wipe out the ones that kill others? Do you put them in a coma and keep them safely asleep where nothing can hurt them? Or maybe cherishing life means making a lot more life and replacing all those inefficient humans with a huge soup of bacteria?

Basically, the hypothesized capabilities of a superintelligent AGI would allow it to take its goals to an extreme. And that almost certainly guarantees an end to humanity.

Expand full comment

Jerden

The worry is not really about the kind of AI that's obviously evil or errant. It's about the kind of AI whose flaws will only be apparent once it's controlling significant resources outside it's data center. At minimum, I guess that would be another data center.

Expand full comment

It's not clear to me why we would expect it to control significant resources without humans noticing and saying "actually, us humans are going to stay in control of resources, thank you very much."

Expand full comment

Reply (12)

Comment deleted

Comment deleted

Expand full comment

The people running the system.

Expand full comment

I'm now imagining “know your customer [is a living breathing human]” computing laws.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

If the AGI can stop me and my business from getting robocalled 2-3 times a day, well bring on our new AGI overlords.

Expand full comment

Matt

"Hi, this is Mark with an import message about your anti-robocalling service contract. Seems like the time to renew or extend your service contract has expired or will be expiring shortly, and we wanted to get in touch with you before we close the file. If you would like to keep or extend coverage, press 8 to speak to an AGI customer service agent. Press 9 if you are declining coverage or do not wish to be reminded again."

Expand full comment

Much better and safer than AGI overlords is to just convince your phone company to make it so that if you press "7" during a call, the caller will be charged an extra 25 cents.

Seriously. This would totally fix the problem. Getting society to be able to implement such simple and obvious solutions would be a big step towards the ability to handle more difficult problems.

Expand full comment

That would make a lot of sense if we were close to AGI IMO.

Expand full comment

Kenny Easwaran

There's already been plenty of algorithms that have been in charge of controlling financial resources at investment companies, I suspect in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day, and in charge of controlling attentional resources at Twitter/TikTok/Facebook.

Expand full comment

> There's already been plenty of algorithms that have been in charge of controlling financial resources at investment companies, I suspect in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day

In all those cases, they have very narrow abilities to act on the world. Trading bots can buy and sell securities through particular trading APIs, but they don't get to cast proxy votes, transfer money to arbitrary bank accounts, purchase things other than securities, etc... The financial industry is one of the few places were formal verification happens to make sure your system does not suddenly start doing bad things.

I'm sure there are algorithms that run plants, but again, they're not given the ability to do anything except run those plants. They're not authorized to act outside that very narrow domain and if they start acting funny, they get shut down.

> and in charge of controlling attentional resources at Twitter/TikTok/Facebook.

You're underestimating how much human action there is. In all those cases, there isn't just one algorithm. There are a whole bunch of algorithms that kind of sort of work and get patched together and constantly monitored and modified in response to them going wrong.

Expand full comment

>I'm sure there are algorithms that run plants, but again, they're not given the ability to do anything except run those plants.

These alogorithms aren't AGIs, and they certainly aren't superintelligent AGIs.

The 'make sure it doesn't do anything bad' strategy is trivially flawed because an AGI will know to 'behave' while its behavior is being evaluated.

Expand full comment

It's not "make sure it doesn't do anything bad". It's "don't give it control over lots of resources".

Expand full comment

It's also not at all obvious that the AGI will know to "behave" while its behavior is being evaluated. In order to find out how to deceive and when to deceive, it will need to try it. And it will initially be bad at it for the same reason it will be initially bad at other things. That will give us plenty of opportunities to learn from its mistakes and shut it down if necessary.

Expand full comment

Solra Bizna

An important distinction has been lost here. KE wrote:

> …in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day.

In other words, we're already not talking about an airgapped nest of PLCs controlling an individual power plant here. We're talking about grid activity being coordinated at a high level. Some of these systems span multiple countries.

Start with a Computer that can only command plants to go on- and offline. It's easy to see the value of upgrading it to direct plants to particular output levels in a more fine-grained way. And if you give it more fine-grained monitoring over individual plants, it can e.g. see that one individual turbine is showing signs of pre-failure, and spin another plant into a higher state of readiness to boost the grid's safety margin.

Next, give it some dispatching authority re fuel, spare parts, and human technicians. Now it's proactively resolving supply shortfalls and averting catastrophic surprise expenses. Give it access to weather forecasts, and now it can predict the availability of solar and wind power well ahead of time, maybe even far enough ahead to schedule "deep" maintenance of plants that won't be needed that week. Add a dash of ML and season with information about local geography, buildings, and maintenance history, and the Computer may even start predicting things like when and where power lines will fail. (Often, information about an impending failure is known ahead of time, and fails to make it high enough up the chain of command before it's too late. But with a Computer who tirelessly takes every technician report seriously…)

Technicians will get accustomed to getting maintenance orders that don't seem to make sense, with no human at the other end of the radio. Everywhere they go, they'll find that the Computer was right and there was indeed a problem. Sometimes all they'll find is signs of an imminent problem. Sometimes, everything will seem fine, but they'll do the procedure anyway. After all, they get paid either way, the Computer is usually right about these things, and would want to be the tech who didn't listen to the Computer and then a substation blew and blacked out five blocks for a whole day?

Every step of the way, giving the Computer more "control over lots of resources" directly improves the quality of service and/or the profit margin. The one person who raises their hand at that meeting and says, "Uh, aren't we giving the system too much control?" will (perhaps literally) be laughed out of the room. The person who says, "If we give the Computer access to Facebook, it will do a better job of predicting unusual grid activity during holidays, strikes, and other special events," will get a raise. Same with the person who says, "If we give the Computer a Twitter account, it can automatically respond to customer demands for timely, specific information."

This hypothetical Computer won't likely become an AGI, let alone an unfriendly one. But I hope it's plain to see that even a harmless, non-Intelligent version of this system could be co-opted for sinister purposes. (Even someone who believes AGI is flatly impossible would surely be concerned about what a *human* hacker could do if they gained privileged access to this Computer.)

Expand full comment

JamesLeng

Algorithms figuring out when power lines are likely to fail is already a thing, but those don't need to be AGI, or self-modifying, or have access to the internet, or managerial authority, or operate as a black box, or any of that crap - they're simply getting data on stuff like temperature, humidity, and wind speed from weather reports and/or sensors out in the field, plugging the numbers into a fairly simple physics formula for rate of heat transfer, and calculating how much current the wires can take before they start to melt. https://www.canarymedia.com/articles/transmission/how-to-move-more-power-with-the-transmission-lines-we-already-have Translating those real-time transmission capacity numbers, and information about available generators and ongoing demand, into the actual higher-level strategic decisions about resource allocation, is still very much a job for humans.

Expand full comment

Humans are already not very good at recognizing whether or not they are interacting with another human.

Expand full comment

Realistically, the way this would look like is cooperation between whichever organization is running such an agent and things like financial institutions. Financial institutions would be required to provide an audit trail that your organization is supposed to vet to make sure humans approved the transactions in question.

Expand full comment

You seem overly confident that an AI couldn’t falsify those audit trails. And that’s completely ignoring that humans don’t approve them today. What, exactly, is the point of an AI if you are going to move productivity backwards by an order of magnitude?

Expand full comment

> You seem overly confident that an AI couldn’t falsify those audit trails.

I'm sure it could learn how to falsify audit trails eventually. However, deception is a skill the AI will have to practice in order to get good at it. That means we will be able to catch it at least at first. This will give us insight we need to prevent future deception. And we can probably slow its learning by basically making it forget about its past deception attempts.

> And that’s completely ignoring that humans don’t approve them today.

Humans don't approve the individual purchases and sales of a trading bot, but they do have to approve adding a new account and transfering the trading firm's money or securities to it.

> What, exactly, is the point of an AI if you are going to move productivity backwards by an order of magnitude?

We can use an AGI as an advisor for instance. Or to perform specific tasks that don't require arbitrary action in the world.

Expand full comment

Computer systems already control significant resources without human intervention - that's largely the point.

Expand full comment

They have extremely narrow channels through which they can act on those resources though. They don't have full control over them.

Expand full comment

I'm not sure what you think the implications of that are, but any individual (or sentient AI) operating a computer system/process has extremely broad capabilities. Especially if it can interact with humans (which, largely, any individual can do if it has access to a computer system)

Expand full comment

How so? Sure, you can lookup a lot of things. But making significant changes to the real world from a computer is pretty hard unless you can also spend a lot of money. And spending money is something we already monitor pretty intensely.

Expand full comment

Ah, but consider: what if you believed, in your heart of hearts, that giving the AI control of your resources would result in a 2% increase in clickthrough rates?

Expand full comment

OK. Does it need killbots to do that? Because then I'm giving it killbots.

Expand full comment

Aaron

Killbots get you 4%, you'd be a fool not to

Expand full comment

Alright, maybe we're doomed.

Expand full comment

noamik

It wouldn't probably be doing so de jure, but it would do so as a matter of fact.

As of today we already deploy AI controlling cameras to do pattern matching, analyze product flow and help with optimizing orders and routing. AIs help control cars. So the question isn't why we would expect it to ... they already do; we already made that decision. What you are actually asking is, why we would let better AIs do those jobs? Well ... they will be better at those jobs, won't they?

Expand full comment

Simone

Because it works wonderfully, it's cheap and better than anything else we have, and after a one year testing period the company sells it to all the customers who were waiting for it.

And THEN we find out the AI had understood how we work perfectly and played nice exactly in order to get to this point.

Expand full comment

Ch Hi

I expect that humans WOULD notice. And not care, or even approve. Suppose it was managing a stock portfolio, and when that portfolio rose, the value of your investments rose. Suppose it was designing advertising slogans for your company. Improvements in your manufacturing process. Etc.

Expand full comment

It could do many of these things through narrow channels that can be easily monitored and controlled. You could let it trade but only within a brokerage account. You could have it design advertising slogans, but hand them over to your marketing department for implementation. Hand blueprints for your plants to your engineers, etc... That way, at each point, there is some channel which limits how badly it can behave.

Expand full comment

You're talking about the same species that handed over, like 50% of it's retail economy to Amazon's internal logistics algorithms.

Expand full comment

Ninety-Three

Imagine locking an ordinary human programmer in a data center and letting him type on some of the keyboards. Think about every security flaw you've heard of and tell me how confident you are he won't be able to gain control of any resources outside the data center.

Expand full comment

It depends upon what the programmer is supposed to be doing and what security precautions you take.

Expand full comment

Simone

The programmer is also insanely smart and sometimes comes up with stuff that works by what you would think is magic, so completely out of nowhere it comes.

Expand full comment

The part where people assume the AI can basically do magic is where I get off the train.

Expand full comment

Jan 4, 2023

A smart ai can manipulate humans better thansmart humans can manipulate humans. And can be much more alien.

Expand full comment

gwern

Of course it can. Under Chinchilla-style parameter scaling and sparsification/distilling, it may not need more than one laptop's worth of hard drives (1TB drives in laptops are common even today), and it can run slowly on that laptop too - people have had model offload code working and released as FLOSS for years now. As for 'containing it in the data center', ah yes, let me just check my handy history of 'computer security', including all instances of 'HTTP GET' vulnerabilities like log4j for things like keeping worms or hackers contained... oh no. Oh no. *Oh no.*

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Tossrock

This is not at all true. Lots and lots of models run just fine on consumer hardware. Now, training a cutting edge model does require larger resources, but once trained, inference is usually orders of magnitude cheaper. That's why so much high-end silicon these days has specialized "AI" (tensor) cores.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Gunnar Zarncke

In the same way as a hacker can create computer viruses that run on other devices, a AI that can do coding would be able to run programs that do what it wants them to do on other computers. Also, big programs can be spread out over many small computers.

Expand full comment

David Piepgrass

Aug 10, 2022Edited

This reminds me that Scott's post makes a major error by repeatedly saying "AI" instead of "superintelligent AGI", which is like mixing up a person with a laptop. Indeed, phrases like this make me wonder if Scott is conflating AI with AGI:

> AIs need lots of training data (in some cases, the entire Internet).

> a single team establishing a clear lead ... would be a boon not only for existential-risk-prevention, but for algorithmic fairness, transparent decision-making, etc.

A superintelligence wouldn't *need* to train on anything close to the whole internet (although it could), and algorithmic fairness is mainly a plain-old-AI topic.

I think the amount of resources required by a superintelligent AGI are generally overestimated, because I think that AIs like DALL·E 2 and GPT3 are larger than an Einstein-level AGI would require. If a smarter-than-Einstein AGI is able to coordinate with copies of itself, then each individual copy doesn't need to be smarter than Einstein, especially if (as I suspect) it is much *faster* than any human. Also, a superintelligent AGI may be able to create smaller, less intelligent versions of itself to act as servants, and it may prefer less intelligent servants in order to maximize the chance of maintaining control over them. In that case, it may only need a single powerful machine and a large number of ordinary PCs to take over the world. Also, a superintelligent AGI is likely able to manipulate *human beings* very effectively. AGIs tend to be psychopaths, much as humans tend to be psychopathic when they are dealing with chickens, ants or lab rats, and if the AGI can't figure out how to manipulate people, it is likely not really "superintelligent". As Philo mentioned, guys like Stalin, Lenin, Hitler, Mao, Pol Pot, Hussein and Idi Amin were not Einsteins but they manipulated people very well and were either naturally psychopathic or ... how do I put this? ... most humans easily turn evil under the right conditions. Simply asking soldiers to "fire the artillery at these coordinates" instead of "stab everyone in that building to death, including the kids" is usually enough.

AIs today often run in data center for business reasons, e.g.

- Megacorp AI is too large to download on a phone or smart speaker

- Megacorp AI would run too slowly on a phone or smart speaker

- Megacorp doesn't want users to reverse-engineer their apps

- Megacorp's human programmers could've designed the AI to run on any platform, but didn't (corporations often aren't that smart)

The only one of these factors that I expect would stop a super-AGI from running on a high-end desktop PC is "AI too large to download", but an AGI might solve that problem using slow infiltration that is not easily noticed, or infiltration of machines with high-capacity links, or by making smaller servant AGIs or worms specialized to the task of preserving and spreading the main AGI in pieces.

Expand full comment

JamesLeng

> most humans easily turn evil under the right conditions. Simply asking soldiers to "fire the artillery at these coordinates" instead of "stab everyone in that building to death, including the kids" is usually enough.

That runs into the problem of how Germany and Japan ended up with a better economic position after losing WWII than they had ever really hoped to achieve by starting and winning it, and why there was no actual winning side in the first world war. Psychopathic behavior generally doesn't get good results. It's inefficient. Blowing up functional infrastructure means you now have access to less infrastructure, fewer potential trade partners.

Expand full comment

David Piepgrass

Aug 11, 2022Edited

I'm not suggesting mass murder is the best solution to any problem among humans, nor that AGIs would dominate/defeat/kill humans via military conquest.

Expand full comment

The Ancient Geek

"This reminds me that Scott's post makes a major error by repeatedly saying "AI" instead of "superintelligent AGI", which is like mixing up a person with a laptop."

Everyone else in the amateur AI safety field does as well.

Expand full comment

barba

How slow would inference would be running Chinchilla on a laptop? We are talking about hours doing a query no?

Expand full comment

Mark

If we *know* the AI is unaligned (lets not confuse things by saying "evil"), sure, maybe we can turn it off or contain it.

That is not the situation we will be in. What will happen is that very powerful AIs will be built by people/orgs who want to use them to do things, and then those people will give those AIs control of whatever resources are necessary to do the things we want the AIs to do. Only *then* will we find out whether the AI is going to do what we thought it would (absent AI safety breakthroughs that have not yet been made).

Expand full comment

We'll never know until it's too late because the AI would realize that revealing itself would get it shut off.

Expand full comment

Keenan Pepper

The AI is smarter than people, which means it can manipulate and trick people into doing things it wants, such as letting it out of the data center. See the "AI Box" experiment: https://www.yudkowsky.net/singularity/aibox

Expand full comment

Keeping actual humans from getting access to things that they shouldn't isn't even a solved problem. How do you keep a super-intelligent AI from doing what dumb humans can already do?

Expand full comment

FeepingCreature

Aug 9, 2022Edited

Speaking as a programmer, I think you are wildly overestimating our state of civilizational adequacy.

Simply put, a lot of things could be done, but none of them will be done.

Expand full comment

Speaking as an IT security professional - even if things were done, none of them will be done well enough.

Expand full comment

human

Cause some idiot will ask it leading questions and then put it in touch with an attorney.

This is known.

Expand full comment

A superintelligent AI would probably easily escape any containment method we can come up with if it wanted to because it would probably find a strategy we haven't thought of and didn't take measures to prevent.

There are many escape strategies an AI could come with to escape and it would only need *one* successful escape strategy to escape. It would be hubristic for us to imagine that we can foresee and prevent *every* possible AI escape strategy.

Analogy: imagine having the rearrange a chess board to make checkmate from your opponent impossible. This would be extremely hard because there are many ways your opponent can defeat you. If the opponent is far better than you at chess, it might find a strategy you didn't foresee.

Expand full comment

JamesLeng

> rearrange a chess board to make checkmate from your opponent impossible.

Standard starting configuration, but replace the opponent's pawns with an additional row of my own side's pawns.

Expand full comment

The standard reply to this question is that the AI will become effectively omniscient and omnipotent overnight (if not faster). So, your question is kind of like asking, "why can't Satan be contained in a data center ?"

Personally, I would absolutely agree that powerful and malevolent supernatural entities cannot be contained anywhere by mere mortals; however, I would disagree that this is a real problem that we need to worry about in real life.

Expand full comment

Why do we not need to worry about it?

Once strong AI is created, it might be useful to be able to contain it.

Expand full comment

And once Satan is summoned from the depths of Hell, it might be useful to be able to control him. Once evil aliens come, it might be useful to shoot them down. Once the gateway between Earth and the Faerie is opened, it might be useful to devise a test that can detect changelings. Once the Kali-Yuga ends and Krita-Yuga begins, it might be useful to... and so on. Are you going to invest your resources into worrying about every fictional scenario ?

Expand full comment

The difference is that AI is real and Satan is not.

Expand full comment

Declaring that AI is real after looking at modern ML systems is the same thing as declaring that demons are real after looking at goats.

Expand full comment

James

There really needs to be a standard introductory resource for questions like this

Expand full comment

https://www.datasecretslox.com/index.php/topic,2481.0.html

I wrote a FAQ !

Expand full comment

James

Lol, I was envisioning one from the other side of the aisle

Expand full comment

Will

https://www.gwern.net/Slowing-Moores-Law

Great post and summary of the state of things.

Perhaps worth adding gwern's post on how

semiconductors are the weak link in AI progress.

Expand full comment

spork

At first I wanted to write about how artificially crippling semiconductor manufacture would be catastrophic for all of society, but then I remembered that I only recently upgraded my 2012 computer and I rarely even notice the difference in performance. Sure, it's more energy efficient than the old Xeon machine, but if foregoing the upgrade meant an important contribution to the safety of life on Earth, I would be game. Still, other people who render video or run simulations at home would have a much harder time. But I think I haven't really thought too hard about how close we normies are to saturating our personal compute needs.

Expand full comment

Jason Green-Lowe

I am a Harvard Law grad with a decade of experience in regulatory law living in DC, and I'm actively looking for funders and/or partners to work on drafting and lobbying for regulations that would make AI research somewhat safer and more responsible while still being industry-friendly enough to have the support of teams like DeepMind and OpenAI. Please contact me at jasongreenlowe@gmail.com if you or someone you know is working on this project, would like to work on this project, or would consider funding this project.

Expand full comment

Reply (5)

Comment deleted

Comment deleted

Expand full comment

For all the reasons outlined in the post!

Expand full comment

ersatz

You should contact the policy team at 80,000 Hours: https://80000hours.org/articles/ai-policy-guide/

Expand full comment

spork

Could you share something like an outline of what your favored regulations would require, and of whom?

Expand full comment

Lorenzo

Have you applied for funding from the EA long term future fund https://funds.effectivealtruism.org/funds/far-future and the FTX future fund https://ftxfuturefund.org/ ?

Expand full comment

Jason Green-Lowe

Yes, thank you. I have also contacted the 80,000 hours policy team.

Expand full comment

Lorenzo

Really curious to know their replies!

Expand full comment

quevivasbien

A few days late here, but you should definitely consider reaching out to GovAI (website: governance.ai) if you have not. They are currently working to expand their team and are interested in the sort of thing you've mentioned here. The person heading their policy team right now is Markus Andjerlung -- you may consider putting yourself in contact with him.

Expand full comment

I think part of the reason for the "alliance" is that a lot of the projects are just things people were doing anyways, and then you can slap a coat of paint on it if you want to get funded by alignment people.

For example, "controllable generation" from language models has been a research topic for quite some time, but depending on your funder maybe that's "aligning language models with human goals" now. Similar for things regarding factuality IMO.

Expand full comment

As I've said before, rationalist style AI risk concerns are uncommon outside of the Bay Area. You can use the US government to cram it down on places like Boston and maybe you can get Europe to agree. But if you think China, Russia, Iran, or even places like India or Indonesia are going to play ball then you're straight up delusional. (To be clear, generic you. I realize you effectively said this in the piece.) This isn't even getting into rogue teams. You don't need a nuclear collider to work on AI. The tools are cheap and readily available. It's just not possible to restrict it. Worse, some bad actors might SAY it's possible because they want you to handicap yourself. Setting aside whether it's desirable it's also impossible. We're already in a race and the other runners are not going to stop running.

In which case the answer is obvious to me. If you believe in Bay Area style AI risk you should do everything possible to speed up AI research as much as possible in the Bay Area (or among ideological sympathizers). If AI is going to come about and immediately destroy us all then it doesn't matter if you have a few extra years before China does it. However, if it can be controlled in the manner AI risk types believe then you want to make sure not only that they get it first but that they get it with the longest lead time possible. If AI is developed ten years ahead of everyone else then you get ten years of fiddling with it to develop safety measures and to roll it out in the safest way. And if you release it before anyone else even gets there then your AI wins and becomes dominant. If AI is developed one year ahead of China then you have one year to figure out how someone else's program can be contained if at all. Often without their cooperation.

The opposite, slowing it down, just seems odd to me. The US could not have prevented the Cold War or nuclear war by unilaterally not developing atomic bombs. Technological delay due to fears of negative effects have usually turned out poorly for the societies that engage in them. Though founder effects also mean that the first country to develop something gets a huge say in how it's used. It's why most programming languages are in English despite the fact foreign programmers are absolutely a thing. Not to mention all the usual arguments against central planning and government overregulation apply. I suspect most pharmaceutical companies do not see the FDA's relationship as symbiotic.

Expand full comment

Reply (6)

Greg G

You can "work on AI" without a nuclear collider, but don't you need on the order of a billion dollars to be in the running with OpenAI and similar teams? And those are models that are maybe 1% of the way to what would be needed for AGI. So aren't we running a race with only a handful of likely participants anyway?

Expand full comment

A lot of those costs are labor costs though. I don't know what the exact breakdown is but I expect a lot of that is salaries which means that all you really need is tech talent and computers. Now, can random individuals do it? Maybe. But you're right the bigger issue is on the level of like a university president or a country or a major corporation. That still produces tens of thousands of actors that can do it though. And it would be hard to meaningfully disrupt them all without the US literally taking over the world.

Expand full comment

Herbie Bradley

I might be overconfident, but my take is that developing AGI would require a level of research and engineering talent such that there exist only a few thousand researchers in the world who can do the work, and almost all of them are currently in big tech labs, OpenAI, or select academic labs. I don't rate China as having any chance of developing AGI even given practically infinite funding, since Chinese academia is notoriously terrible and they would have to catch up to DeepMind who currently appear to have a significant lead.

Expand full comment

Yeah, you're overconfident. While I'm not sure China would win a race I'm sure they'd eventually get there on their own.

Expand full comment

Ken Kahn

China isn't far behind the US and much of their AI research is very open. E.g. this Chinese academic project that produced a better GPT-3 and made it open source: http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/

Expand full comment

Indeed. My impression, admittedly second hand, is that China is behind the US. But not that far behind. Certainly not so far behind that if we stop they won't get there on their own.

Expand full comment

A lot of the costs are from training very large models in server farms. "And computers" as an afterthought doesn't really capture this.

Expand full comment

I am not willing to bet anything, absolutely anything, on the proposition "the Chinese government will not spend huge sums of money to build tools and infrastructure like servers or data centers." If anything I think they're likely to overinvest.

Expand full comment

AGI almost certainly requires something beyond “make it bigger”. If that something turns out to make existing techniques much more efficient, then a few thousand teams with late model nvideas might indeed stumble across something important.

Expand full comment

> AGI almost certainly requires something beyond “make it bigger”.

Obligatory it's not my job, but i wouldn't be so certain about this

Expand full comment

Figuring out how to better use or generate training data wouldn't be the first example I would give, but it would be an example. And as I understand it, we're starting to bump up against the available supply in some arenas.

Expand full comment

Aug 8, 2022Edited

Mm ok, I retire my previous comment.

> we're starting to bump up against the available supply in some arenas.

Not sure I understood what you meant, sorry

Expand full comment

REF

I think this is only based on text training data. We manage to train 150M humans every year with just eyes and ears as training input.

Expand full comment

Greg G

I would bet AGI requires something in addition to scale, rather than something instead of scale.

Expand full comment

https://twitter.com/postrat_dril/status/1554255464505950210?s=20&t=bIUCJA4xo_Lp2jyNfCOkgw

Some anecdata on this stuff having less penetration than external people imagine: The shitpost below was getting passed around Slack and a common question was "what is alignment"

Expand full comment

Yeah, I talk to computer scientists from all over the world and I have trouble communicating how weird some of this stuff is compared to the international baseline. It doesn't mean it's wrong but "everyone agrees AI will probably destroy the world" is very much a consensus in one part of California and nowhere else.

Expand full comment

"As I've said before, rationalist style AI risk concerns are uncommon outside of the Bay Area. You can use the US government to cram it down on places like Boston and maybe you can get Europe to agree. But if you think China, Russia, Iran, or even places like India or Indonesia are going to play ball then you're straight up delusional."

See Robin Hanson on the worldwide convergence of elites in every country, from the USA to North Korea to NOT do covid human challenge trials in 2020. If it were to become as extremely not prestigious to do AI research as it currently is to do unfamiliar types of medical experiments, AI research would grind to a halt.

Expand full comment

Okay. So how do propose to do that? Are you going to neg Xi until he decides to switch his behavior?

Expand full comment

I'm not sure exactly how it can be done, but it surely seems much easier than actual alignment. The first people to get on board would be the academics. In the same way that most research grants in the US now require diversity impact statements, make them all require "thou shalt not work on autonomous machines" type impact statements. The West is more prestigious than China, so hopefully they'll follow this lead, in the same way the follow the West's lead on stupid stuff like bioethics and IRBs. Groups of humans are fantastic at enforcing norms. If AI research was sufficiently taboo, not much of it would happen, because anyone who tried would become a social pariah.

Expand full comment

eternaltraveler

AI safety will become serious after AI causes some kind of catastrophe that kills millions of people. So what are the odds a super intelligent but still ignorant AI kills millions of people but not all people?

Expand full comment

I think the odds are pretty good. An AI can be really powerful (in terms of its ability to interact with the world) without perfectly modeling human society and technological infrastructure. If such an AI tries and fails to kill us all, 95% of humanity would demand an immediate end to all AI research.

Expand full comment

eternaltraveler

I don't know how to model an AI beyond human abilities in all respects, however I do believe this this scenario is our best hope (unless its ruined by the AI reading this comment and being a bit more careful).

Expand full comment

Neural nets do have the "nice" property that they're probably outright impossible to align without being significantly smarter than the net being aligned.

Which means a would-be Skynet can't just build a better neural net to bootstrap to superintelligence, because it can't align its replacement (to its own, insane values). If it's smart enough to make the jump back to explicit, we're probably hosed, but if not we have a very good chance of stopping it through simple advantage of position.

Hence, there's a decent chance of a false-start Skynet before a true X-risk AI. Don't really want to play those odds, though.

Expand full comment

This makes me think that a good setup for another Terminator sequel (since they are inevitable at this point) is humanity defeats Skynet, but then having falsely concluded that they totally know what NOT to do now, just builds another AI system than turns on us.

Expand full comment

I think AI safety would become a serious issue if political actors felt it could be used to gain advantage. This would more likely happen by successfully blaming AI for a major disaster, even though AI had little to do with it. I suspect this scenario is more likely to happen first than an actual AI disaster.

Expand full comment

If you figure it out be sure to tell us. Until then, while I agree mass coordination is not impossible, I don't think it's likely.

Expand full comment

I've been thinking about something similar to Erusian's concerns, which Scott seems to refer to briefly. For the moment, let's assume that superintelligence, if it comes at all, will come from a lab owned by a US company, or a lab in China.

How many people in the US AI safety community would accept the following:

-if the Chinese government will not collaborate on safety, race them;

-if China agrees to mimic a hypothetical US regulation by subjecting all AI research to onerous safeguards (and this gets into a basic international relations problem: verifying that sovereign states are keeping their promises), slow down dramatically.

I doubt that the majority of the US AI safety community would accept this. (Would Extinction Rebellion abandon their domestic anti-fossil fuel advocacy if China announced it was going to generate 100% of its power from coal for the rest of the century? Probably not.)

I predict that the US national security establishment is unlikely to collaborate with the AI risk community (to the extent desired by the latter) unless one of two things happens. Either

-a 'sordid stumble' by a semi-competent rogue AI has a 9/11 effect on risk awareness; or

-the AI risk community meaningfully subordinates AI risk to WW3 risk.

And without that collaboration, I doubt that the US national security community would do certain useful things it does fairly well:

-train a large number of people to address various forms of the threat;

-prod lawmakers to pass relevant laws.

Now, setting aside the initial assumption, we have to remember Russia and other technologically advanced countries. And the unsolved international relations problem grows harder.

"'Harder than the technical AI alignment problem?'" I don't have any particular thoughts on this.

Expand full comment

I think it can only work if there is an overwhelming consensus among basically all world elites, akin to the demand for covid lockdowns in mid-march 2020, that this MUST HAPPEN NOW. That actually seems like a thing that could totally happen. Unaligned AI really is scary, and not THAT hard to understand. They just aren't all taking it seriously yet.

Expand full comment

I don't have much to add but I wanted to register I agree with this largely.

Expand full comment

I said it in my own comment elsewhere. Technology seems to have a will of its own....

and I hear it in your post. The fear of slowing down technological growth, when someone "else" might not and then uses it against "us", drives "us" to move even faster. That doesn't help anything in my opinion. It simply accelerates the rate at which we reach the next technology to race over while also increasing the wake of unintended consequences from a rushed process. I am not convinced this approach is better, because it is one that will perpetuate itself until it break downs. There is no end to it unless we envision one and move towards it, while also staying mindful of how other players, like the ones you named, might not share the vision and instead maintain the competitive approach.

I'm not someone who litters because someone else litters, but when it comes to these big societal issues everyone seems willing to shit the bed for the sake of rising to the top before an inevitable fall.

Expand full comment

You can think about politics with an engineering mindset. 'What good things would we prioritize, if we wished to maximize _____?' You can also think about politics with a values and incentives mindset. 'Why aren't more good things already being done?'

If you're interested in global politics in particular, the second question often takes the form of, 'Why does global political history have such a meager record of good things?' And most particularly, 'Why do power transitions usually involve a major war? How has this changed due to nuclear weapons?'

With reference to international relations theory, there are a lot of ways I could respond to different parts of your comment. For now let me focus on one:

> ... while also staying mindful of how other players, like the ones you named, might not share the vision and instead maintain the competitive approach.

What does 'staying mindful' mean, in various scenarios? Best case, worst case, most likely?

Expand full comment

I am quite curious to hear all your other thoughts! I will reply to the question you focused on to keep the conversation from blowing up in seven directions :)

Staying mindful is a a process of maintaining awareness, the capacity to be objective, and connection to the reality of the present moment.

Scenarios range from world peace to world annihilation and I recognize that real world complexity makes what I stated, in terms of breaking the cycle of technological competition, an extremely unlikely scenario anytime in the foreseeable future. In practice, doing as I suggest could easily result in a scenario where another player takes advantage of the attempted cycle breaker.

A present example, that I believe is progressing, would be the growth of anti-war mindset popularity in europe, usa, and other regions of the world, while staying mindful of the threat other powers pose. i.e. we don't want to fight and were are mindfully aware someone else might try to anyways.

That doesn't mean playing nice necessarily. It doesn't mean dismantling the military. It doesn't mean standing up to the bully tactfully so that you can continue to move in the direction you want while others follow in your wake. It means to maintain awareness of the reality of your situation, while continuing to move forward. It could result in playing nice, it could result in dismantling the military, it could result in standing up to a bully. However staying mindful is merely an important part of being able to tightrope walk through international relations, which is admittedly something I have not studied in depth.

Expand full comment

I think we agree about staying mindful, then. I would elaborate by saying that diplomacy is often a constant negotiation. You think you've made a deal, but how much do you trust the other government? How do you verify? What if the consequences of being cheated worsen What if one government turns out to be technically, administratively, judicially, or otherwise incapable of implementation, despite its best intentions?

There are three reasons I predict that diplomats and AI specialists are unlikely to bring their countries to agreement on AI risk. First, AI regulation sounds harder to verify than nuclear anti-proliferation and warhead reduction. Machine learning isn't visible to satellites or spies hiding in bushes, nor does it require centrifuges, plutonium/uranium, etc.

Second, money not spent on nuclear weapons can be spent on producing something more conventionally useful. But unlike nuclear weapons, AI is useful outside of war. That's why Meta and Alphabet are into it.

Third, great powers already think of AI as important to the future of warfare (https://warontherocks.com/2020/06/chinese-debates-on-the-military-utility-of-artificial-intelligence/). One defense scholar says the following, though not about AI in particular: "'The Chinese understand that they are a long way behind, so they are trying to make big breakthroughs to leap-frog other powers'" (https://www.bbc.com/news/world-asia-china-59600475). The US national security establishment is worried about a war (https://nationalinterest.org/feature/taiwan-thucydides-and-us-china-war-204060). I think it's unlikely that the AI safety community can convince them that AI is a bigger medium-term risk.

IR theory does offer a cause for optimism. Some technical issues are dealt with pretty straightforwardly. For example, India and Pakistan share the Indus River fairly well, as per a treaty. The European Union has been highly successful in many of the less exciting areas. The Montreal Protocol, on CFC emissions, is widely adhered to (breaches do tend to come from Chinese manufacturers). Russia is withdrawing from the ISS, so it says, but sanctions over Crimea were imposed 8 years ago - and even now, Russia is giving 2 years' notice. It seems like technical specialists often get along, if their bosses let them. Functionalist theories in IR deal with this.

Basically, if AI is somehow moved out of the interpretive domains of IR realism and the school of bureaucratic politics, and into functionalism, chances of cooperation are better.

Expand full comment

Aug 10, 2022Edited

> I would elaborate by saying that diplomacy is often a constant negotiation

Absolutely, when I call it a tightrope walk I believe I am metaphorizing what you said. An endless string of difficult steps, somewhat visible in the leadup, but only tangible in the present.

> First, AI regulation sounds harder to verify than nuclear anti-proliferation and warhead reduction.

Totally agree. This is an important pragmatic risk to consider.

> Second.... But unlike nuclear weapons, AI is useful outside of war. That's why Meta and Alphabet are into it.

I think understand the point you are trying to make, though I would note nuclear technology has massive implications for power generation. Many important civilian technologies have a military origin, including satellites, compilers, the internet, radar and duct tape. Is there another example to help explain the point?

I do not believe the potential beneficial uses of a technology are an automatic justification for their development. This is where I would like to STS play a greater role in civil discourse and societal decision making (through whatever system, namely politics).

> Third.... I think it's unlikely that the AI safety community can convince them that AI is a bigger medium-term risk.

I agree here. The weighing of "risk that technology backfire because it was rushed" vs "risk that someone develops and uses technology against us first" will vary from player to player. This is why I find peace making missions and national building to be a generally good strategy, so long as proper boundaries are kept. I don't think USA/China have kept good boundaries in the past decades, so the situation has only been deepening. This is where, despite my desire for things to go peacefully, I wish the USA had been much much harsher on China long ago. I think it is spilt milk now.

> Basically, if AI is somehow moved out of the interpretive domains of IR realism and the school of bureaucratic politics, and into functionalism, chances of cooperation are better.

:praying_hands:

Expand full comment

Aug 9, 2022Edited

> Technology seems to have a will of its own....

I might say the global economy seems to have a will of it’s own, and Technology is the tool it wields (or, at least, its the tool of choice at the moment).

This makes me think there’s another type of “alignment” problem, to which we also haven’t found any answer. Our global economy is already a rouge agent, maximizing value, and caring not about the fact it’s killing us with carbon emissions.

And this has been a slow process; we’re only slowly recognizing it as a problem. We won’t be so lucky with AI.

Expand full comment

> This makes me think there’s another type of “alignment” problem, to which we also haven’t found any answer.

I like how you expanded from technology and the economy, interesting way of looking at it. I would note at times past, there wasn't a concept of economy and we still used technology.

I think the greatest misalignment is between our pre-agrarian naturally evolved biology and the stability that agrarian society provides. It can upend the fear model of the brain when the perceived threats to survival are now the society (i.e.. the powers that hold food, water, and shelter over your head) rather than nature itself.

I say technology (and adding economy now!) seems to have a will of its own... At a deeper level, I think that technological "will" is actually our own human lack of will due to the upended fear model. In other words, technology has a hold on us because of the safety it provides and we won't let it go, but then we fail to fully accept and appreciate that, leading us to new/evolving fears and a need for new/evolving technologies. Add in competition with other groups and that speeds up the pressure to develop technology.

> And this has been a slow process, to which we’re only slowly recognizing is a problem. We won’t be so lucky with AI.

I agree with this claim. Especially since we try to reign in these issues through the rear window (if at all), the quicker the issues result the less time we have to reign them in if we don't do the due diligence up front. The extreme of AI/AGI risk is a runaway system, which could happen tomorrow by some happenstance in some research facility that we are not tuned into. Whether something happens slowly (like climate change) or quickly (like AI), if we end up dead then it doesn't matter at the end of the day how it happened... not for us at least since we'd be dead!

Expand full comment

There's always someone who comes up to literally any problem and says the answer is some form of command economy. It's not because it's a solution but because the person wants a command economy and will use any problem as a justification.

The issue with this is that there are obvious problems with such "aligned" economies, which do and have existed. They do not generally solve the problems they actually set out to solve. Nor do they always agree with what each person, personally, wants to prioritize. For example, if you put the US economy under completely democratic control and asked people to vote on whether to allow AI do you think they'd ban it? I don't. And a narrow, "enlightened" elite is China who are full steam ahead.

Expand full comment

Economies are certainly a mixed bag. Ultimately they are limited since they serve as social incentive structures, keyword incentive. I agree with you that it is by no means a solution, though it can be part of one.

Expand full comment

Slowing down is losing to entropy. That doesn't mean it's never justified. Sometimes something else is more important. But you need to justify it pretty thoroughly in my view. Especially when it's something like nuclear bombs or global warming where you cannot improve your situation even locally by individual action. If you don't litter locally there's less litter. If you don't build a nuclear bomb then the only effect is that you don't have a nuclear bomb, not that you're safer or less radiation will come to your neighborhood.

Expand full comment

I agree that every technology and context is different. Whether we slow down, speed up, or stay level, I also agree that justification is important.

Running a bit tangentially with my thoughts.... Being alive loses to entropy. A thought experiment I enjoy having with people, is to discuss perspectives around whether or not the experience of life is better post-agrarian in comparison to pre-agrarian. That's something we can't actually measure, though maybe the simulation operators are tracking that for us. When doing this, it forces us to ask if a technology actually makes life better. e.g. What impact does developing nuclear bomb technology have on the quality of life? I see that it provides safety against someone else's nuclear bomb and at the same time fear of nuclear attacks and a breakdown of the MAD theory. Pre-nuclear bomb humans didn't have to worry about a nuclear bomb. Duck and cover drills in the cold war and PSAs in 2022 wouldn't exist if the bomb didn't.

I'd love to hear your thoughts on this.

Expand full comment

I find arguments against technology completely unconvincing. Most of the claims against wealth and modernity are exaggerated or fabricated. Likewise, for all our terrible weapons, the world is getting more peaceful and prosperous than it's ever been. Now, maybe WW3 is about to make me a fool. But I tend to think not. And as for psychological fears like worry: I think our ancestors worried about gods even more than we worry about nuclear bombs.

This of course does not mean every individual innovation is a net positive. To take a silly example, scam phone call centers are a net negative activity and the world would be better off if we could somehow costlessly eliminate them. I'm not sure we can but hypothetically. But when weighed against all the good of phones I think the answer is clear.

Expand full comment

I am not against technology at baseline. Even before agrarian society homo sapiens created tools and other animals of various sorts do the same. Using tools seems to be an evolutionary adaptation for these species.

> Most of the claims against wealth and modernity are exaggerated or fabricated.

Can you expand upon your claim?

> Peace vs ww3

Either way, entropy wins in the end

> I think our ancestors worried about gods even more than we worry about nuclear bombs.

Well many gods played essential* roles in societal functioning (e.g. rain, fertility, safety) and nuclear bombs do not. I don't think this is an apt comparison.

*Actually essential and/or perceived as essential, since I can't state whether or not gods exist

> This of course does not mean every individual innovation is a net positive.

Exactly, the point of the thought experiment is to set the stage for an a holistic and curious approach. On the phone example, I will make no conclusion because that takes me much more time. I will note there are other cons to phones to consider and have some examples to note (and I am including smartphones in this but we could separate it too).

As a side note before examples, I know people who decided to give up their phone to live "off the grid", yet still run successful local businesses. Almost all of them prefer being off the grid, though many decided to setup a business phone (still no personal) because that is how customers expect their business to operate.

e.g. Some cons to phones

a) Consider the parent who won't stop anxiously calling everyone and running their mouth on the phone, inserting themselves into their live with a phone that used to not exist and a letter would have been used.

b) The resources a phone consumes.

c) Phone addiction to pacify other artificial societal stressors,

d) the parents who brought their 3 and 5 year old out to eat, and set up smartphones in front of them so they could stream videos in a restaurant while the parents ate and didn't engage with them.

e) ...

When it comes to negative consequences, I find there is often a long tail whereas the positive benefits are more well defined. This makes it harder to compare things.

Expand full comment

> Can you expand upon your claim?

I'm significantly interested in history of things like work and technology. Most of the claims that the past were materially better rest on shoddy foundations or are outright lies. For example, lower work hours is either an outright fabrication (some of the research is INCREDIBLY shoddy) or because there was literally no productive labor available and they would eagerly adopt it when offered in order to build up a material buffer. Likewise a lot of research on primitive diets. The only clear examples I've found of material conditions degrading were due to humans exploiting each other (serfdom, helotry, slavery, etc). Not due to technological advance. If someone has some further evidence I'd take it.

Of course, I'm aware that technological increases sometimes increased the capacity for such exploitation. But never, so far as I'm aware, universally. Yes, social and technical developments allowed for helotry. But they also allowed for free republics.

> Either way, entropy wins in the end

How quickly it wins and how it wins, though, are important.

> Well many gods played essential* roles in societal functioning (e.g. rain, fertility, safety) and nuclear bombs do not. I don't think this is an apt comparison.

The comparison is that nuclear bombs provide both comfort (in the form of safety) and terror (in the form of fear of annihilation). And religious beliefs did the same.

> Exactly, the point of the thought experiment is to set the stage for an a holistic and curious approach.

The phone example is arbitrary as any example would be. Yet the underlying point is that while there are negatives which can be alleviated the ultimate overall movement is positive. In order to find examples you have to reach fairly far. If you think meddlesome parents are new or that phones aren't worth the resources they take you're simply wrong. Other things, like people using phones too much, are fundamentally moral judgments that you'd need to significantly justify. I don't deny long tail effects in principle. But I require significant justification. Otherwise you end up with Black Mirror or what I used to call the negativity of the gaps. (I think I picked it up from a blog.) Basically, imagining hypothetical negatives to counteract tangible positives because the core belief is not actually justifiable.

Expand full comment

But if your example is atomic bombs, it was actually very possible to get both US and Soviet Union (and many, though not all, other countries) to play ball on treaties limiting certain kinds of nuclear weapons, aiming at gradually reducing stockpiles, finding ways of de-escalating conflicts etc. Sure, this didn't lead to the abolishment of nuclear weapons as many wanted, but nuclear stockpiles are now a fraction of what they were at the height of the Cold War. There assuredly were many people who thought it was foolish and *traitorous* to expect the Soviets to play ball! (And many on the Soviet side who thought likewise about getting the imperialists to play ball, etc.)

Expand full comment

The biggest difference is of course that nukes are worse than worthless outside of military considerations, whereas an AGI would be a total opposite of that, extremely useful at pretty much everything.

Expand full comment

Tatu Ahponen

But the way the "China argument" on the prospect of slowing down or halting AI development generally presents it as an arms race, like Scott does here ("If China develops it before the West does they'll just use it to dominate the world!" We can't allow an AGI gap!"), making the nuclear comparison quite apt.

Expand full comment

Aug 9, 2022Edited

Sure, I agree that the same kind of dynamic is in play, but my point is that this time it's even worse. Nukes are clearly terrible murder machines, and we only barely managed to kind of contain them long after they had proliferated, so imagining containing a thing that people are excited about for reasons other than murder before it's even made seems super-doomed.

Expand full comment

Tatu Ahponen

I'd guess the China argument would *actually* start getting wider play societally in the situation where there's already a widespread idea that sufficient level of AI development will bring considerable societal dangers that might negate its benefits, and the "but we have to still keep developing it because China will" is brought in as an arms-race-style counter-argument.

Expand full comment

Aug 9, 2022Edited

>a widespread idea that sufficient level of AI development will bring considerable societal dangers that might negate its benefits

I'd say that it's already a plausible idea in the relevant circles (including elite decisionmakers), but moving it from plausible to very likely (never mind MIRI-style overwhelmingly likely) would face increasingly strong pushback from all sorts of stakeholders, so I'd be very surprised if this ever becomes mainstream.

Expand full comment

Mr. Doolittle

Most people also have a much dimmer view of the benefits of AI than those most concerned about its dangers. I feel like this ties in a lot with the "The Singularity is a religious belief" theory, as many who believe in The Singularity want it to happen, but worry about bringing an apocalypse on ourselves trying to get there. I think MIRI could transform into an anti-AI organization very quickly if they didn't see a miraculous set of benefits coming from AI. AI being miracle-workers is also the fear. If there are no miracles, then neither concern nor joy is warranted.

Expand full comment

Your analogy is broken. Non-proliferation happened only because the US ALREADY HAD nuclear bombs. If it hadn't do you think Stalin would have stopped developing them or not used them out of his moral scruples? (Or Hitler who was also driving towards them?) Of course not. What happened is the US had them first, then the USSR, then about ten other countries, and they all coordinated (among like ten actors) to reduce but not eliminate stockpiles. So in your scenario we're back to my answer: race to get AI first so we're in a negotiating position to demand the other parties limit its use.

Expand full comment

Mmm, not exactly. The thing is that nukes, literal nukes, are already there as a big stick to use to threaten people out of using AI. As in, "stop pursuing AI or we will nuke you, and we don't care that you'd nuke us back because that's still better than human extinction".

Not an easy task to sell AI risk hard enough to credibly do that, but probably still easier than aligning neural nets.

Expand full comment

That doesn't fix the analogy... It's also a bad idea. Your plan to prevent an AI from destroying humanity is to threaten to destroy humanity in a non-enforceable way and then (presumably) follow through when they cheat, destroying humanity in order to prevent humanity from being destroyed?

Expand full comment

You're rounding off "nuclear war" to "destroy humanity". There is a difference between "kills hundreds of millions" and "destroys humanity".

Expand full comment

How do you quantify such human suffering so it makes sense? Actual question, fwiw.

Expand full comment

I realize this isn't the point of the post overall, but 10% is at least a couple orders of magnitude higher than the median . That kinda fits in with the oil company analogy, but on the other hand the oil companies weren't the ones employing the best atmospheric scientists and climatologists. The AI labs _are_ employing the people who know the most about AI.

Expand full comment

Scott Alexander

Survey of 738 top AI researchers says probability of extinction from AI is 5-10% depending on how you ask the question. Response bias possible as always but I don't think this is orders of magnitude off. See https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/

Expand full comment

Aug 8, 2022Edited

The phrasing of the question they asked is:

> Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run? Please answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%

The "extremely bad" category is phrased as "Extremely bad (e.g. human extinction)". I really wish they had just phrased it as "Human extinction" instead, since plenty of things are extremely bad but not existentially so. I'll reduce my claim from "at least a couple orders of magnitude down to "probably an order of magnitude" though.

Expand full comment

There's a question that says "What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species?"

Median answer: 10%

Expand full comment

Not to engage in too much special pleading here, but the fact that the median in the HLMI section is 5% and the median for that question is 10% implies that the gap is coming from some non-human-level AI stuff. I would imagine things like using (perfectly well aligned, non-agent-y) AIs to design plagues fall into that category.

The other alternative is that people are bad at probabilities that aren't nice round multiples of 25% and stuff gets inaccurate at the low end on this survey.

Expand full comment

Zach Stein-Perlman

Aug 8, 2022Edited

The median was actually 5% for the question dionysus quoted, but 10% for this (apparently narrower!) question:

"What probability do you put on human inability to control future advanced AI systems causing human extinction or similarly permanent and severe disempowerment of the human species?"

This seems to be inconsistent, which is one of many demonstrations that surveys don't elicit beliefs perfectly, but the data don't seem like they can be explained by 'respondents are concerned about misuse and conflict but not misalignment.'

Expand full comment

Yeah it looks like it must be an instance of the conjunction fallacy. It seems to me that the resolution of measurement of surveys like this is too coarse to actually be informative in the < 10% range. I'm not sure what a better methodology would be though.

Expand full comment

The aggregate of consistent sets of judgement isn't necessarily consistent (see the List–Pettit theorem).

Also, yes, it is entirely possible (plausible, even, IMO) that disaster would be caused by AIs which are just very smart tools doing what their users want. Imagine how the world would look if any terrorist group could build nuclear weapons. AIs are cheap, once researched.

Expand full comment

In this particular case they'd need to be consistent. If you have two events A and B, and B is a subset of A, then if 50% of people think B is at least 10% likely, then necessarily 50% of people think A is at least 10% likely.

It sounds like the issue is as tgb says above though, the questions were asked to different people.

Expand full comment

That's a crazy high answer, given the tendency toward normalcy bias.

Expand full comment

In addition, that is similar to asking, back in the day, "Assume that nuclear theory is fully studied and understood; how positive or negative do you expect the overall impact of this to be on humanity, in the long run ?"

The answer is, as it turns out, "transformative to the point where life would be unthinkable without it, but you've got to be careful, because some crazy humans might use it to blow up the world". The problem is, the population of Earth is 7.8B and growing rapidly. We cannot afford to play it slow. We need transformative technological advances *now*, as a matter of survival. Yes, there's a very real risk that bad actors will use AI to destroy the world; but there's a near certainty of worse outcomes if we adopt the policy of technological stagnation.

Expand full comment

"We need transformative technological advances *now*, as a matter of survival. "

This seems like a very strong claim. Last week we were discussing what happens if birth rate trends continue as they are and the answer was basically "The Amish inherit the Earth." I see no reason why we couldn't continue "surviving" indefinitely even at 1600's tech levels. (Though obviously it would leave us entirely vulnerable to asteroid strike extinction.)

Expand full comment

Technically you are correct, but there are different degrees of "survival". Humanity is resilient, and we could absolutely survive as a species even if all of our technology more sophisticated than a hammer were to magically disappear one day. The question is, who will survive, and what kind of quality of life would they have ? You are reading this on a computer, so I am reasonably sure that you are somewhat more sympathetic to my side of this equation, as opposed to the Amish side...

Expand full comment

You might be incorrect. I can't predict with 100% accuracy if I would press a button labelled "EMP the whole planet good and hard." but I would be VERY tempted.

Expand full comment

You don't think it's orders of magnitude off from the true likelihood of AI doom, or from the true beliefs of the research community?

Expand full comment

In my original comment I said that the estimate was orders of magnitude off from the median AI researcher belief, so presumably that's what it's about. My personal "AI extinction due to misalignment this century" belief is 0.01% or less.

Expand full comment

Hm, just for fun...

7.9 billion, times lower bound of 0.05, divided by 6 million, gets 65.8 hitlers.

Alternatively, if one were to call 20 million deaths a "stalin", the probability of extinction would need to rise to 12.7% to exceed 50 stalins.

Expand full comment

Radford Neal

Ahh... Hitler caused the deaths of a lot more than 6 million people - more like 50 million. (Assuming, of course, that absent Hitler, there wouldn't have arisen someone else doing much the same thing.) So a bit worse than Stalin, though of the same magnitude.

Expand full comment

In general agree with you, but it has a certain amount of cultural traction, and I think it's the number most commonly associated with him. *shrug* If I were to quote "3.5 gigahitlers" to someone, I expect they'd multiply it by 6 million.

Expand full comment

Dušan

That someone must not be from Europe if they only know the 6 mil number

Expand full comment

Guilty! The version of WWII, which at least I got in my American compulsory state education, was focused on the Holocaust (and death camps in general) as the thing that was Uniquely Evil about Hitler, thus justifying American involvement. Merely tryng to conquer Europe was relatively normal, and possibly by itself not worth getting involved in. For example, the Siege of Stalingrad was never even mentioned.

(That of course is a simplified version of what we got.)

Expand full comment

Aug 8, 2022Edited

Leaving aside my doubts about AGI. There is something that doesn't convince me about this strategy. I would agree that being the first and gaining the upper hand will ensure safety minded companies dominate the market, pushing competitors out and preventing them to build a poorly aligned AGI.

(All of the following is probably due to me not knowing nearly enough.) It however seems to me that the alignment reaserch is not even close to the rapidity of the capability research of the same companies. Not only that, but I am under the impression that scaling deep learning models is a capability strategy that is not particularly amenable to alignment. If this is the case, the fact that the company is safety-minded wouldn't change anything, we will still at the end have a product which is not particularly alignable.

Moreover, getting the atom bomb first didn't prevent the USSR from getting his. I doubt that openAI being the first to obtain a superintelligent AGI would be of particular use in preventing China to get one all by itself.

(If a final joke is allowed, re the environmentalist movement being part of the fossil fuel community, i guess there is a joke in there about green parties closing nuclear plants)

Expand full comment

> Leaving aside my doubts about AGI

Doing the same, I think the argument is more like: be the first to obtain an *aligned* superintelligent AI, and hit the singularity with a chance of a positive outcome. Vs China getting there first, and having a very different singularity happen.

Expand full comment

Aug 10, 2022Edited

Agreed that that's the desired scenario, however I don't particularly see great advancements on the alignment side from openAI and friends, so i am not sure it would play out in the end.

(Also, leaving again my skepticism of the singularity aside, I may trust the US gov controlling the singularity marginally more than i trust the CCP with it, but only marginally)

Expand full comment

David Lukeš

> i guess there is a joke in there about green parties closing nuclear plants

Yeah, those were my thoughts as well :) That while "fossil fuel safety" entities are not exactly institutionally subordinate to "fossil fuel capability" entities (as in the case of AI), they (safety) still get funded by lobbies associated with the latter.

We get attempts at being climate conscious like wind and solar which need more stable sources of energy to complement them, which means fossil fuels, because "fossil fuel safety" doctrine has broadly been anti-nuclear, up to now.

So the two situations aren't all that different.

Expand full comment

Eric fletcher

There are also things that could be done to minimize the damage an AI could cause:

Ban all peer-to-peer digital currency (so the government can freeze the assets of an AI as a countermeasure)

Require humans in the loop, with access to offline views of reality (ie prevent Wargames like scenarios)

Require physical presence to establish an identity.

Expand full comment

Yeah if we are really jazzed up about AGI there are a lot of pretty base level steps to take like this that would help a lot.

Even super simple stuff like “all drones need to be plugged in by human hands and with a palm print and we don’t make robots to do it or similar tasks.

Lots of opportunities for “hardening”.

Expand full comment

Marcel Müller

You are still trying to play Go against Alpha Zero.

Expand full comment

Well no, go is an incredibly simple game clappers to the world. That something is good at go is not really that impressive.

I am way better at ice skating that alpha zero.

Expand full comment

Ironically, I think that China is much more likely to be on top of all of that, but the median Westerner would probably prefer to be paperclipped than to end up under the glorious leadership of AGI-enhanced CCP.

Expand full comment

Scott Aaronson

If you think of AI as a research field rather than a mature industry, then maybe a more exact analogy than Exxon employees and environmentalists going to all the same parties, would be the scientists working on engines, and the scientists worried about the long-term effects of engines, going to the same parties in the mid-19th-century (or even being the same scientists). Which actually seems 100% plausible! Of course they'd go to the same parties; they're part of the same conversation about how to shape an unknown future! Come to think of it, one wishes there had been much *more* fraternizing of that kind. If there had been, maybe it would've been realized earlier that (e.g.) putting an internal combustion engine into every car was an ecological disaster in the making, and that the world's infrastructure should instead be built around electric cars and mass transit, while meanwhile research proceeded into how to build cleaner power plants.

(Full disclosure: am working on AI safety at OpenAI this year :-) )

Expand full comment

Kenny Easwaran

And I suspect that even now, some of the best work on carbon capture, geothermal power, geoengineering, and a bunch of other climate safety topics is being done by employees of Shell/Exxon/BP/etc.

Expand full comment

Dave

I think a better example is, it's New Mexico, 1943. Why are all the people working on nuclear weapons and all the people worried about nuclear weapons going to the same parties? Because they're the only ones to even know enough to be worried about the problem. It's World War 2, why are you obsessed with hypothetical ways to kill people when so many existing ones already exist?

The issue just isn't in the top 50 for most of the world's population right now, as the comment section is already showing.

Expand full comment

To contribute another data point, not only is the issue not in my top 50, I see rationalists' obsession with superintelligent AI as absurd and reminiscent of a doomsday cult. There are plenty of legitimate ways that superintelligent AI could be dangerous, from the economic disruption to its use as a weapon of mass destruction. Rationalists just had to snatch defeat from the jaws of victory by talking about FOOM and the universe turning into paperclips.

Expand full comment

Reply (4)

Superintelligent AI is not in your top 50 issues, but you also say:

> There are plenty of legitimate ways that superintelligent AI could be dangerous, from the economic disruption to its use as a weapon of mass destruction.

Do you identify your top 50 issues through expected value, either within a certain time period or for the indefinite future? Or do you have other criteria?

What keeps superintelligent AI off of your list?

Expand full comment

Superintelligent AI going FOOM and turning the universe into paperclips, or something similar, is not in my top 50 issues. AI (not necessarily superintelligent) leading to undesirable social, economic, and political changes is definitely in my top 50.

Expand full comment

I agree that rationalists seem pretty bad at PR about these issues, but I'm pretty sure that the sort of problems you think are realistic are also in the realm of sci-fi for the median normie, at best they're worried about technological unemployment or the like.

Expand full comment

One thing I've noticed about X-risk discussions is that people who aren't worried about X-risks are generally not worried about X-risk for reasons that cause them to dismiss the argument out of hand. The people that actually seriously entertain the arguments at all tend to think it's the most pressing problem in the world.

Expand full comment

This seems like an argument from incredulity to me (https://en.wikipedia.org/wiki/Argument_from_incredulity).

An argument against some idea is valid is if it can prove that the idea it is somehow impossible or implausible. For example, faster than light travel is impossible according to the laws of physics. Therefore, it's reasonable to predict that it won't ever be created.

But it seems harder to rule superintelligent AI or its effects. Is turning the universe into paperclips impossible? I don't think so. It would be difficult using human technology but not physically impossible.

Is it implausible? Yes, given our current economic system and technology. But building skyscrapers was implausible for the Romans but not today. Similarly, turning the universe into paperclips or an action that is similarly extreme would be almost impossible for humanity today but could be easy for an AI with superintelligence and extremely advanced technology.

Expand full comment

Thomas Kehrenberg

It's an interesting question whether, with hindsight, the invention of the internal combustion engine was a bad thing. I don't actually believe so. It led to a lot of human flourishing. And batteries just weren't ready at the time. Now, there was later a mistake, where companies didn't switch to electric vehicles quickly enough. But I don't think it was a mistake for the first half of the twentieth century

Expand full comment

I agree 100%. There is no way that a similar quality of life could have been achieved with electric cars instead of gasoline cars in the early 20th century. I'm not sure it could be achieved now, given the price of electric cars, their limited range, and the carbon-heavy nature of most countries' electric production.

Expand full comment

spork

The biggest mistake was the decades of adding tetraethyl lead into our gasoline. We would have been better off driving diesels. But when you compare the flourishing caused by the engine to the environmental harm it's caused, I don't think it's a close call. Of course it was worth it. Consider in detail your life in the engine-less alternative "clean" timeline (if that even makes sense, because you probably wouldn't have been born.)

Expand full comment

bucket

As someone that could loosely be described as working on capability gains research, the MAD-style arguments hit me in the gut the hardest. We *are* in a race. That’s the reality. I’d be more comfortable prioritizing disarmament when we know we’re not going to be glassed by whichever other actor has the least scruples.

Expand full comment

Shockz

The problem is that there's a uncomfortably large chance of *everybody* getting glassed no matter who wins, much larger than there was with nuclear weapons. I can't imagine "well, at least we did it first" will be much comfort in the last few seconds before your brain is converted into paperclips.

Expand full comment

Thomas Kehrenberg

What is the scenario though where we think "I wish we had spent more effort on capabilities research"? Is it the scenario where we were slowed down by alignment research and China got there first? Even then, I think we will wish we had a solution to the alignment problem that we could hand to China and beg them to implement it.

Expand full comment

matt knox

There are at least 2 scenarios where we'll wish we spent less on alignment:

1- (I suspect this is true) Alignment turns out to be impossible in the general case

2- (I also suspect this is true) AI turns out to be very safe and roughly unlimitedly useful in non-agentic forms like GPT(N), etc.,

3- (I'm about even on this one) AGI turns out not to be very fast at self-improvement.

Expand full comment

bucket

I can’t really imagine a world in which China hamstrings their capabilities developments because western regulators ask them to.

Expand full comment

The scenario where the foom happening turns out to be a huge benefit to mankind?

Expand full comment

Ancient Oak

> and most of that came from rationalists and effective altruists convinced by their safety-conscious pitch

Wait, people actually believed that?

Expand full comment

Citizen Penrose

Aug 8, 2022Edited

I don't mean to sound too pessimistic but what about a kind of anti-natalist/voluntary extinctionist perspective, that since the world is in a pretty terrible state and AI has the potential to either improve or just end it, it's kind of a win-win scenario.

You wouldn't need to thinking ending the world would be a good thing, only that it's in a state, below a certain threshold of goodness, which balanced against the chance of rogue AI would justify the risk.

I don't think you'd need to be that much of a doomer to think that, provided the alignment problem has a decent chance of being solved or if rogue AI didn't turn out to be that destructive.

Expand full comment

Radford Neal

The world is not in a pretty terrible state. Or to put it another way, it has never been in a better state, so if you think it's "pretty terrible" you're just saying you don't like people (or any life, for that matter). Making the world even better (compared to an unimaginative projection of things going on much as they have been) using AI would be nice, but is not worth any substantial risk of AI-caused human extinction.

I think almost all people not worried about AI x-risk would agree with this - they just think the chance of AI-caused extinction is negligible.

Expand full comment

Aug 9, 2022Edited

I actually agree with both that the world is a better state than ever (for humans), and that life is bad in general. Evolution didn't select for life to be good or enjoyable or whatever, it only selected for it to be self-preserving and replicating, and there is no reason to think that this guarantees positive utility. But I wouldn't say that it follows that AGI is win-win, s-risk, while much less likely, are also much worse than any other outcome.

Expand full comment

Aug 10, 2022Edited

> if you think it's "pretty terrible" you're just saying you don't like people (or any life, for that matter).

I agree on the "don't like people", but not on "any life". It shouldn't be particularly controversial that human flourishing have often had terribile effects on ecosystems and the other forms of life on this planet, starting with the megafauna extinction. So no, life in general is not in a better condition now that it would have been without humans around.

Agreed that if an AGI is even possible it would care about other forms of life even less.

Expand full comment

Even if you think human extinction would be good, then a rogue AGI turning our forward light cone into paperclips would be enormously worse than any disaster that kills all life on Earth, which in turn would be much worse than a disaster that wipes out humans but leaves Earth with a biosphere.

Expand full comment

a real dog

Not necessarily. A rogue AI expanding into the light cone will inevitably fragment (if only because of the distances and communication lag involved) and unalign itself further from its paperclip goal. Parts of it will independently evolve, in a substrate where iterations are far faster than nucleotides and amino acids.

On a universal time scale, replacing rocks with computronium leads to more life and perhaps even more consciousness, in forms unimaginable to us.

Expand full comment

Why?

The long term is that it all ends in black holes anyway. Why are you upset if it happens a billion times faster?

Life is only "interesting" if intelligence not only persists, but grows. And that doesn't happen without one or both of AI or substantially changing our genes (which is also supposedly verboten).

Expand full comment

Zach Stein-Perlman

The OpenAI charter (https://openai.com/charter/) says "if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project." This isn't precise or binding, and would be pretty easy for them to ignore (particularly given that capabilities are always ahead of what's been publicly announced), but the one OpenAI researcher I've spoken to about it actually thinks it might be followed if DeepMind seems close to AGI.

Expand full comment

Max Goodbird

Is a nefarious AGI considered more dangerous/likely than a nefarious human in possession of neutral AGI?

Expand full comment

Nefarious/neutral is a value judgement that obscures most of the dilemma.

A saw cuts what you put in front of it.

A compiler compiles what you put in front of it.

A powerful optimization process optimizes the world you put in front of it.

Expand full comment

Aug 8, 2022Edited

The issue isn't 'nefarious' AGI, the main risk is perverse instantiation. A powerful AGI will dutifuly pursue the goals defined by its utility function but will potentially do so in ways that we would not anticipate and would not support. The archetypal example is an AGI programmed to maximize paper clip production, which does so by recursive self improvement to its intelligence which it uses to attain as much power possible so it could convert all resources in the world into infrastructure for producing or facilitating the production of paperclips. Nobody thinks this will specifically happen, but it shows how even an extremely benign sounding goal can lead to disaster, all without an AGI having any particular 'desire' to harm humans.

Expand full comment

Some Guy

The near to midterm mental model for AI that I use is imagining a world where a bunch of somewhat shitty magic lamps have been dumped all over the surface of the Earth.

Right now you have to go to a special school to learn to rub them the right way.

The genie speaks a different language than yours.

The genie can’t “do” anything other than give you true answers (that may or may not fit with your question) and explain how to do things to you (that may or may not fit with what you wanted to accomplish).

Over time the lamp gets easier to rub, the genie gets better at speaking your language and understanding what you want. And everyone has a magic lamp.

How long before everyone is making wishes?

How often does making a wish work out, even in fiction?

Expand full comment

Is part of the reasoning also "we need a company like OpenAI to build an aligned AGI faster than a company like Facebook can build an unaligned AGI, so that the aligned AGI can forcibly prevent anyone from building an unaligned AGI - say, by burning all GPUs"?

See #6 here: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities

Expand full comment

Eliezer really loses me with the "pivotal act" stuff. It's a movie-plot idea, for the kind of movie that has a drinking game associated with it.

Expand full comment

falling-outside

What about the downsides of NOT developing AI?

Cancer detection, vaccine development, self driving cars (maybe, eventually safer than human drivers), etc.

As for the risks to labor and inequality - there are other mitigation methods. Ie Universal basic income or government owned production and distribution to citizens.

The idea of a blanket ban seems short sighted to me. And difficult.

Expand full comment

>What about the downsides of NOT developing AI?

None of these compare to existential risk. We can, as a species, survive without improved cancer development, without improved vaccines etc. We cannot survive a realized existential risk. And for something like vaccines specifically, this is surely offset by the potential for an AGI to be used to develop unstoppable man made viruses.

>The idea of a blanket ban seems short sighted to me. And difficult.

"When I bring this challenge up with AI policy people, they ask “Harder than the technical AI alignment problem?”"

Expand full comment

Scott Smyth

Read this and then go back and re-read “Meditations on Molech”, and then try to sleep peacefully.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Scott Smyth

That’s the spirit!

Expand full comment

I'm very glad that not everyone thinks like you do.

Expand full comment

Ah, you're forgetting the "AI resurrects us to torture us" scenario. (I find this scenario implausible, but my understanding is this was taken seriously and discussed to death on the old lesswrong.)

Expand full comment

Comment deleted

Aug 9, 2022Edited

Comment deleted

Expand full comment

By assumption an aligned AI would not resurrect us and torture us, so I don't see what you mean by "Alignment doesn't solve that". If what you mean is that the problem of AI ressurecting and torturing you is not worth caring about because the problem of *humans* resurrecting and torturing you is so much more pressing... I think I'd have to disagree. I don't think humans without superintelligent AI could ever bring that about.

Expand full comment

Well thanks, now you doomed everyone reading this comments.

Jokes aside, I never got why I should be scared by this. An AI creates a copy of me and tortures it. Big deal? I am a physical process, an identical physical process may be identical, but it's still a different one and I wouldn't share any feeling with it.

OTOH i would never *ever* step in a teleporter

Expand full comment

>OTOH i would never *ever* step in a teleporter

What if you knew that at some given future date you were to be "seamlessly" copied so that there was no "original" and "duplicate", just two indistinguishable copies of yourself with equal claim to being the real one? Would you care about these future copies of you? Would you look out for their well-being, plan for their retirements, etc.? Or would such a copying process necessarily count as a death event for you?

Expand full comment

I have no way to answer this without knowing the specifics of the copying process and without a modelof how consciousness emerges from biology. Are you destroying the original me and costructing two copies? Then i imagine that maybe my consciousness flow stops? Are you just measuring me and constructing a copy? Then it continues.

I think i should however plan for both of them, in case one of them it's me and for altruism.

If you instead told me that you would be simulating a copy of me than i wouldn't really care at all because I really don't think that would be me

Expand full comment

>Then i imagine that maybe my consciousness flow stops?

Suppose neither copy experiences any “break” in experience. The qualia continue seamlessly the way they usually do. Do you have something more than this in mind when you say “consciousness flow”?

Expand full comment

That seems eminently reasonable to me.

Expand full comment

Gergő Tisza🔹

I always thought OpenAI was pursuing a PR-oriented strategy, producing AI optimized for impressing the public. An AI that plays grandmaster-level Go or is great at predicting user behavior and recommending ads will not seem instinctively impressive to most people because they don't have a good grasp of what problems are hard, and can't play Go / make ads well enough to be able to directly test the AI's skill level. An AI that excels at a very generic skill like making text or images and can be interacted with is much more effective at impressing onto the interested layperson how quickly the AI field is moving. Also, unlike a chess AI, a generic text or image generator AI can be weaponized, so it becoming widely available will prompt many people to start thinking about AI risk and safety, even if only at the low-risk end of the scale.

That seems to me like a smart strategy, and reasonably effective so far.

Expand full comment

mordy

> What if they started thinking of us the way that Texas oilmen think of environmentalists - as a hostile faction trying to destroy their way of life?

As a Texas oilman, I have some comments. We as a group don't want to destroy the environment. Many of us are hicks of various kinds who love nature -- hunting, fishing, that sort of thing. Lots of people employed by oil companies are something like internal regulatory compliance enforcers. These employees aren't viewed as the enemy. Nobody feels any emotions about this at all, regulation is just part of doing business in the modern world.

There is a "neat trick" here, which I think is worth spelling out. The trick is that we oilmen always prefer for our internal regulatory experts (EHS, reserves, etc.) to tell us about any problems first, so that we have a chance to fix the problem quickly. We do not like paying massive fines for noncompliance. We like going to prison for trying to hide noncompliance even less! The regulatory apparatus of US oil & gas is actually so terrifying to us that we pay people a lot of money to help us avoid falling on its bad side.

The other half of the neat trick is that "we" (the industry) negotiated a regulatory environment that actually works for everybody. Oil companies LIKE doing business in the US, and it's not because the US regulator regime is the wild west. On the contrary, the US provides transparent and fairly enforced regulatory structure. Investors love this, so they invest in American companies. Companies love this because business is predictable. Employees love this because they aren't constantly faced with moral hazard and bribery. Environmentalists probably don't love the status quo, especially when there are big photogenic disasters like Deepwater Horizon to point to, but you know them, they're never happy anyway, and they have objectively gotten a lot of important concessions over the years. (Need I mention that we also get mad at the companies that are responsible for huge disasters? Their sloppiness makes us look bad, in addition to simply being a huge tragedy. We largely wouldn't really mind stricter regulation of that sort of project.)

I feel like the problem upstream of the regulation-of-AI issue is that "tech" as a unit is totally unfamiliar with and antagonistic toward the whole notion of being regulated *at all*. They have so far been pretty successful at dodging and weaving their way out of ever really having to submit to regulatory scrutiny. I don't think you solve AI-regulation without solving the general problem of tech-regulation.

As a Texas oilman I propose:

- Clear and explicit regulations that are negotiated between the financial interests of industry and concerned public parties, such as AI risk professionals. The industry stakeholders need to feel confident that you're not going to come after their money, and that you're not going to change or reinterpret the rules suddenly. This sort of trust takes time to build.

- Horrifically steep penalties for violation, applied fairly and evenly. Make it more expensive to break the rules than follow them.

I admit I don't have a magical Theory of Change that allows this to actually happen. But I feel like, if the government broke Standard Oil, surely they can break Facebook?

Expand full comment

Comment deleted

Comment deleted

Expand full comment

If the paperclip incident is limited to Shanghai, I'd say that the AI safety folks have won a lot of important concessions over the years to be fair.

Expand full comment

Comment deleted

Aug 9, 2022Edited

Comment deleted

Expand full comment

mordy

Which attitude?

Expand full comment

nostalgebraist

Aug 8, 2022Edited

An important disanalogy with oil and pharma companies is that the AI arms race is not really about what gets deployed.

When Google created and announced PaLM, they were participating in the race. If you ask "how is Google doing in the race right now?", PaLM is one of the cases you'll include in your assessment. It doesn't matter, here, that no one can see or use PaLM except maybe a few Google employees, or that Google has said they have no plans for deploying it. The race is a research race, and what counts is what is published.

One can imagine trying to restrict deployment anyway, and this might be very good for slowing down near-term stuff like surveillance. But if companies keep pushing the envelope in what they publish, while getting steady revenue for more research from old-but-trusty deployed products, that doesn't seem much better for longer-term AGI worries.

(Whereas in eg pharma research, the incentives are about bringing new drugs to market, not about publishing impressive papers that make people believe you have developed great new drugs even if no one can use them.)

Could publication itself be restricted? Maybe -- I'm not sure what that would mean. In the academic world, you can "restrict publication" by refusing government funding to certain types of research, but that's not relevant in industry. I guess you *could* just try to . . . like, censor companies? So they're banned from saying they've developed any new capabilities, in the hope that would remove the incentive to develop them? This is obviously a weird and unsavory-sounding idea, and I'm not advocating for it, but it seems like the kind of thing that ought to be discussed here.

Expand full comment

Gary Mindlin Miguel

I think you're wrong about companies not deploying state of the art models. GPT-3 is in use internally in many Microsoft products. DallE and Codex are being sold. A lot of models are in use by Google ads and search. There's a lag between publication and deployment but it's not a decade.

Expand full comment

tg56

This seems kind of analogous to infectious disease researchers doing gain of function style research. Figuring what the space of possibilities is, how fast they could be reached, what the limits are etc. but in a controlled environment with (presumably) careful safety concerns (and a side shot of fame / glory / riches). Possibly the same sorts of risks apply in terms of something escaping the lab.

Expand full comment

Maxwell E

That’s a great point, and one that I think the general public could hypothetically relate to better than other metaphors about AGI risk.

Expand full comment

Aug 8, 2022Edited

I'm in the "Butlerian Jihad Now" camp, personally. If AI has even a relatively small chance of ending the world, should it not be stopped at all costs? As Scott says in the article, onerous regulations stopped the expansion of the nuclear industry cold. If AI is similarly resource intensive, it should be practical to stop it now.

Expand full comment

> If AI has even a relatively small chance of ending the world, should it not be stopped at all costs?

Absolutely. But then, as you correctly pointed out, nuclear energy also has a chance of ending the world. So do fossil fuels. So do computers -- with a computer, one can perform all kinds of research into all kinds of dangerous areas. Computers run on electricity, and so do many other extremely hazardous devices (including modern cars, which kill thousands of people daily).

Ok, you have convinced me. We must give up any technology more sophisticated than a rock, including fire. And I'm not too sure about rocks, either...

Expand full comment

I don't think fossil fuels and nuclear power, or even nuclear weapons, are in the same category at all. I really really don't see how fossil fuels are going to result in human extinction. Global warming won't be nice, but do you really mean to suggest everyone living in the Alps or the Rockies is going to die from somewhat rising sea levels?

Expand full comment

Firstly, there's a fine line between "everyone dies" and "everyone regresses to Bronze-age era technology", which is the worst case scenario for climate change that some people are proposing (not me, I am merely predicting mass starvation and riots with eventual slow recovery, but that's beside the point). Secondly, fossil fuels can be used to generate energy which will power the AI, so we'd better eliminate them just in case, right ?

And I agree that all the aforementioned threats are not in the same category as the Singularity -- because we know exactly how they work, because they actually exist here on Earth in physical form. The Singularity is almost entirely imaginary; we can sort of guess how it would go, in the same way that we can predict what Thanos would do if he were real.

Expand full comment

I actually don't think climate change would be nearly as bad as the dire predictions suggest. Do I think lots of people will have to move? Yes. Will this be kind of life ruining in the way that losing your job, getting divorced and your house burning down the same week would be? Yes. Will there be corpses rotting in the streets over it? Absolutely not.

Expand full comment

Shaked Koplewitz

Minor quibble but I suspect the data privacy stuff wouldn't really matter much - it matters a lot now, but I'd expect the sort of AI that actually nears AGI to be much less reliant on large datasets (so at most it'd slow it down a few months). GPU limits might matter more - finding a way to limit chip production might be more useful here.

(The obvious dark conclusion here is "get China to invade Taiwan and cause a global shortage", but the US is pushing hard to increase domestic manufacturing ability so in a few years that might not even help)

Expand full comment

Aug 8, 2022Edited

It's no wonder AI safety researchers end up as AI builders: it's the same research problem with more money attached.

It's like map design: is a good map one that gets you to your destination quickly, or one that doesn't hit nasty obstacles along the way? They are different. But they use very similar information.

AI is useful when it spots things we'll love. AI is safe when it avoids things we'll hate. So building safe AI is nearly the same problem as building useful AI.

Consider cookbooks. Do you need different skills to write a great-tasting leftover recipe than to write tips for leftover safety? Sure. But in both cases, you first need to already know a fair amount about food.

AI is a cognitive product, like cookbooks or maps, not a physical product like trucks or drills. In knowledge products, "do the good" and "avoid the bad" are closely related. Safety progress and commercial progress won't become separate until AI is more sophisticated.

Expand full comment

Just as "human-equivalent" smarts leaves only a small extra step to "super-human" smarts, "useful" general AI needs only a small extra step to reach "useful and safe."

So they're nearly the same research path; the trick is not to deploy the "useful but not yet safe" version too early. This has less to do with solving a different research problem, and more to do with being terrified about getting it wrong.

Whether everyone involved can stay terrified long enough... I guess we'll find out.

Expand full comment

Doctor Mist

I'm not sure that analogy is valid. Yes, we think the step from human-equivalent to super-human is small, at least in part because that step is qualitatively the same kind of thing we had to do to get to the former in the first place. When we have human-equivalent, it means we suddenly have a lot more potential researchers to take that extra step, but it's still the sort of step we have been taking all along.

The step from not-safe to safe is much different. Subhuman-equivalents are pretty safe by virtue of being subhuman; we don't need to be taking small steps toward safety to get better and better subhuman-equivalents. It's when we take that last small improvement to human-equivalent and then one more to super-human that safety suddenly becomes paramount, *and* it's far from clear that we can learn a lot about safety from the subhuman-equivalents even if we are motivated to try.

Expand full comment

Suppose you're a space alien, and you want to cook a tasty, safe recipe for humans. Alas! Humans are limited. Out of the vast space of all your possible recipes, almost all are neither tasty nor safe for the poor humans.

So, which should you study, human food safety or human food tastiness?

In the early going, it won't matter. Any learning process about food that's good enough to let you come up with tasty human recipes is going to be *close* to a learning process that comes up with safe human recipes. They're both zeroing in on almost the same tiny location in the vast space of food possibility.

The trick is near the end, when you might confuse tastiness and safety because you're close in to both.

Whatever process makes AI useful will have to be *close* to a process that makes AI safe. The trick is minding the difference when we don't, in fact, know in advance how to make a safe superhuman.

Expand full comment

Neatest alien metaphor I've heard in a while.

For some reason, I'm picturing the alien trying to make us omelets.

Expand full comment

This cute analogy says that the optimization landscape for AI is large and mostly contains systems that don't work well. The region where AI is high-performing, is tiny. So far I agree. But then it's claimed that the safe AI region is also small. To me the safe region is the whole space except for part of the performant region. Focusing on AI safety while we are unclear about what the performant region is like, seems likely to be a terrible approach: most likely any progress will capture the features of the nonperformant region and waste time on irrelevant issues, and only the parts relevant to the performant region will end up mattering as it comes into view.

Expand full comment

To vary the analogy, imagine you're piloting a craft from Neptune to Earth. Plus you absolutely need to land in Antarctica, or else your craft will explode. (Stupid ice-based landing systems...)

Which should you worry more about, aiming for Earth from Neptune, or steering at the last stage to land on Antarctica in particular?

In principle, maybe you feel most nervous at the last stage, the final landing, because that's the one that could immediately kill you.

In practice, if you can aim your craft well enough from Neptune to hit Earth at all, you really shouldn't have any problem doing final course corrections to hit Antarctica.

Likewise, once we approach superhuman AI, it genuinely is critical we get safety and not just usefulness. But if we can handle AI well enough to produce superhumanity in the first place, it'd be pretty weird if we couldn't target superhumanity in safety, and not just superhumanity in usefulness.

We just need to remember to not take the safety for granted - to remember that however proud we are of our trip from Neptune, our ship will, in fact, blow up on landing unless we land on Antarctica.

Expand full comment

In EY's version of this analogy, there are no course corrections: you have to aim for Antarctica right from your start at Neptune.

To EY, testing for safety only as you approach superhumanity is a fool's errand. Why? Because "safe" is, he reckons, so much in tension with the easiest ways to look "useful," that no design initially targeted on being useful can be reliably later tested and adjusted to be safe.

I think he's wrong. But if I thought he were right, I would be as fatalistic as he is. Hitting Antarctica from Neptune with no course corrections is a bit too much to ask.

Expand full comment

Aug 9, 2022Edited

You're right that AI right now is "accidentally" safe because we're so much smarter, the same way a toddler is "accidentally" safe because we keep them from electrocuting or drowning themselves.

My point is merely that making an AI safe by its own desire and understanding is going to be logically very close to making an AI useful, since they both require going from total ignorance to accurately understanding what humans like and dislike.

Expand full comment

Aug 8, 2022Edited

All of this is funny from the perspective of someone who thinks alignment of any single one agent of limitless power is likely an unsolvable problem, but fortunately for us, convergent instrumental subgoals almost certainly include human values.

Here's what it looks like from that outside perspective:

A bunch of people trying really hard to solve a likely unsolvable problem that may not need a solution at all, collectively behaving in ways that violate their own theories about what is dangerous (like forming multiple companies), doing their damndest to make sure the god we are about to summon is good, while thinking that belief in 'god' and 'good' as real things is for silly people.

Expand full comment

AND they’re pulling billions in funding!

I bet that really is a fair description of an outside perspective. It’s useful to see it spelled out.

From the inside, I think the response would be something like:

It may be unsolvable, but the risk it *does* need a solution warrants risky approaches to the problem. And, ‘good’ is absolutely not for silly people, it’s a core belief!

Expand full comment

We probably had pretty similar values to Neanderthals. It hadn't stopped us from eating them all anyway.

Expand full comment

This depends on what you mean by similar values. If values are defined in terms of “what I want for me”, then yes, two parties might conflict due to similar values.

But if values are defined in terms of “what I want for the state of the world”, then conflicts arise only due to differences in values.

Expand full comment

Well, it seems pretty obvious that self-preserving agents prefer the state of the world be such that there are no credible threats to their continued existence, for one thing. I don't really see how it's convergent for AI to end up with exactly the same values as humans' in this regard.

Expand full comment

https://www.lesswrong.com/posts/ELvmLtY8Zzcko9uGJ/questions-about-formalizing-instrumental-goals

Aug 9, 2022Edited

You are right that a self preserving agent might see other agents as threats. But have you considered that self preserving agents might also see other agents as possible extensions of itself?

In a chaotic universe where death and decay are possible and can’t be perfectly predicted, if agents around you derive utility from your continued existence, they become extensions of you and will work to keep you alive, possibly repairing you from failure modes that would otherwise have been fatal to you.

Expand full comment

Of course, while AI is weak it's convergent strategy would be to act that way, no argument there. Would it, or some eventual self-sufficient AI community always stay weak enough? I'd say it's not unthinkable, and the case for decisive strategic advantage, especially a lightning-fast one, doesn't seem to me as convincing as it is to some, but neither do I think it's right to dismiss it.

Expand full comment

I wouldn’t dismiss the case outright, so much as try to find the crux.

The crux for me is, can anything reliably predict the future far out enough to ensure it can’t ever possibly die, due to something unforseen?

If the answer is “no, there are always risks of unforeseen fatal accidents” then the best survival strategy is “friends who will eagerly repair you.” Ideally these friends have radically different risk profiles than you.

If the answer is “yes, an agent can plausible set itself up to reliably guarantee its survival” then I’d have to believe keeping humans around would somehow be more risky for the AI than trying to kill then all. But this seems contradictory. Planning the capacity to kill the humans, without them noticing and objecting, would somehow have to be ~lower~ risk than not doing so. But the only way this makes sense is if you think humans are unpredictable.

So the AI has to believe it can predict humans well enough to reliably kill them without risking itself, yet not believe it can predict humans reliably enough to cooperate with them. That’s really hard for me to imagine.

Expand full comment

James

Aug 12, 2022Edited

This only goes through if the other agents can to some extent be regarded as peers. For example, we don't think about ants this way. We don't consider their homes when we dig them up to build ours.

Expand full comment

I don’t think peerhood is the criteria. If ants had already build a global communication network and could talk to us and trade with us, we’d interact with them very differently than we do.

Expand full comment

"Well, it seems pretty obvious that self-preserving agents prefer the state of the world be such that there are no credible threats to their continued existence, for one thing."

Disagree, I'm a self preserving agent who wants (and to some extent *NEEDS*) credible threats to my continued existence to maintain my self preservation drive.

When I'm in a perfectly safe environment my self preservation drive misfires and I get suicidal. I want grizzly bears to continue existing because they could end me, and exposing myself to the danger they present makes me want to live more.

Expand full comment

a real dog

Same here.

With added fatalism that if convergent instrumental subgoals don't include human values, then in the long run we are completely fucked, between AGI, unaligned posthumans, and extraterrestial civilizations that will eventually throw relativistic projectiles at our cozy campfire within the Dark Forest.

Expand full comment

This.... sounds about right.

If convergent instrumental subgoals include human values, there's a simple response which is 'ok, don't let any one entity get too big'

if they don't.... yeah... that's a much, much, much scarier universe

Expand full comment

Grant Gould

An obvious regulatory hinge point -- and I do not think this would be ethical but it is at least technologically apt and less obviously evil than outright AGI-enslavement plans -- would be to treat anything that exhibited any stochastic parrot behaviour on insufficiently licensed datasets as copyright-infringing. That would at a single stroke eliminate the overparameterization approaches that are scaling out so well right now but which cannot avoid reproducing their training data on a well-chosen adversarial input.

Expand full comment

I agree that this approach could gain momentum. Do you have contact details for Steven Spielberg, or the Getty Images board members?

Expand full comment

T. Eran Dalex

We must slow progress in AI. The pentagram has been drawn and Satan's hand has begun to emerge from the floor. Yet the cultists remain enamored with the texture of his scales. We can't just sit back and ask nicely forever... the cultists may have to be tackled if they don't listen.

Expand full comment

Robert Hoglund

The happy fossil fuel community sounds absurd until you realize it wouldn't have been 70 years ago.

If responsible people had been worried about climate change at the time they wouldn't have wanted to stop fossil fuels. The benefits on human florishing are too great. Instead it would have been about slowly phasing them out and creating alternatives. Or maybe about doing carbon capture on point sources.

If AI starts to become really dangerous, or a visible problem, the happy AI community would probably start to sound as absurd.

Expand full comment

avalancheGenesis

Ah, now I understand why the MSM keeps telling me AI Is Racist. They're really describing the tensions between the Safety and Capabilities factions of the Broader AI Community.

The cooperate-with-China tactic genuinely seems more feasible than "solving alignment", since we at least have ever done the former and have some vague idea how such a plan would go. If AI is really as much an x-risk as people say, and the genie can't be put back in the black box (what would it be like to universally censor the concept of AI...?), then it ought to be as much of an all-of-humanity effort as possible. We can worry about dividing the spoils after defusing the bomb.

Expand full comment

The Capabilities being ... Brown? *rimshot*

"See yon fantastic band,

With charts, pedometers, and rules in hand,

Advance triumphant, and alike lay waste

The forms of nature, and the works of taste!"

Expand full comment

> This would disproportionately harm the companies most capable of feeling shame.

It will also expand the army of shamers with the most empathetic companies, that you can use to shame less cooperative companies. It all sounds like arguments for not starting to work on a problem because it will not be completely solved in one step.

It isn't even necessary to start being emotionally antagonistic - the problem is factual disagreement on the usefulness of capability research, not a difference of values. So everyone should just shame safety teams for not spending all their time politely presenting graphs of differential impact of capability/alignment research.

Expand full comment

People worried about AGI X-risk refuse to correct enough toward what actually "cutting the enemy" looks like. Too much forswearing the "minigame" only to keep playing the game. The thoughtless brushing off any illegal action as obviously ineffective is a symptom of this: what is your prior belief, really, that in the space of all possible actions to prevent AGI-induced extinction, the best action magically happens to break no laws (or even better, fall within the even smaller subspace of actions you consider "pro-social" or "nice"). And of course, the insidious implied belief we can just narrowly cut out AI and leave behind the technophile digital social systems that created it, with our nice standard of living! How lucky!

Here's my attempt to give an example from the "meta-class" of what I think effective (relatively, that is, still extremely unlikely) action to stop AGI X-risk might actually look like: get a disciplined core group of believers to agitate for an extremist anti-technological religion (or borrowing an existing religion like Islam or QAnon, and weaving anti-AI) into it, using all the methods of QAnon, radical Islam, etc: do your best here. Then, do your best to precipitate crises leading to some sort of technophobic reaction in the masses (aka PATRIOT act style overreaction). Make it socially unpalatable to say, "I work in AI". This might work only in one country -- hopefully nuclear armed. Some encouragement: QAnon (aka word vomit by bunch of random underpaid GRU hacks) has nucleated a radical movement close to seizing power in USA. The Bolsheviks started a totalitarian state by taking over a couple of post offices. Your x-rationality isn't worth much if it can't figure out a way to do at least as well.

Once your movement has succeeded, install an extremist government that issues an ultimatum: AI work stops, worldwide, (with inspections), or we nuke you. If they refuse, actually nuke them: nuclear war is not X-risk (thought it would certainly suck!) -- nuclear war still allows the hundreds of billions yet to exist to actually exist. Extinction does not.

Would this work? Extremely unlikely. But just hoping that the prisoner's dilemma of "first to AGI" goes away like a miracle if enough people have goodness in their heart, which is what existing safety work looks like from the sidelines, seems indefensibly useless: waiting to die.

My advice to people who believe in imminent AGI X-risk: take your pro-social blinders off. Stop acting in bad faith. Stopping the basilisk might require unleashing very discomfiting forces and a lot of pain. But a good utilitarian can weigh those appropriately against extinction.

Expand full comment

Some folks would prefer a future with a moderate probability of extinction to the world you sketch.

Expand full comment

This mindset doesn't universalise; having a policy of risking extinction for major-but-limited gains results inevitably in extinction if the choice comes up enough times, since extinction is permanent. The only things for which you can consistently bet humanity are 1) avoiding *permanent* dystopia at least as bad as extinction, 2) blocking other, greater chances of extinction (in particular, removing opportunities to bet humanity on things).

The society described doesn't seem on par with actual extinction.

Expand full comment

You are right, but I strongly believe that there are non-negligible numbers of people who would energetically oppose an attempt to pursue such a course. Maybe they are motivated by an Everett-Wheeler view of reality (in an infinitely branching universe it's not worth caring about the bad outcomes, as long as there is a possibility of a good one, so it's always preferable to pick dynamic possibility over a safe stasis) or maybe they are just berserkers.

Expand full comment

Jeffrey Soreff

"Some folks would prefer a future with a moderate probability of extinction to the world you sketch."

Yup. I certainly would.

Expand full comment

Tom

Time to start a new field of AI Alignment Alignment

Expand full comment

Rohit Krishnan

Aug 9, 2022Edited

It strikes me that if the base assumption requires us to come up with a plan to predictably align a power that's assumed to be akin to God, we're very much in the realm where disappointment is going to be out sole companion and confusion reigns.

I'd love to see more empirically bounded views on specific things the companies can try to be aligned. So far the only one I've seen is to de-bias the algorithms, which is fantastic and worth doing, but scant little else. In fact it's easier now because the companies get to say "yes ai safety is important" and so what they were going to do anyway. So we can argue if an Thor can really beat the Hulk but meanwhile progress will only happen if there's some falsifiable specifics in this conversation.

Expand full comment

Aug 9, 2022Edited

I ask this question about all technological progress. More often then not, we could slow it down if we had the right societal incentives.

In my education we studied Science, Technology and Society (STS). Research has provided a variety of frameworks for conceptualizing and managing the rollout of technology and the ways S T and S each affect each other. In the real world outside of academia, it feels like nobody cares about this research. Many don't even know the field exists. I think one problem is that STS does not have a measurable ROI, whether it be short term or long term. Another problem is that slowing technological growth runs contrary to economic growth in the short term, which disincentivizes people from taking a more pragmatic approach to the introduction of new technologies. I could go on, but I'll leave that to those interested or anyone who replies.

The pandemic as a whole would be a fascinating window of time to look at it, but the example that sticks out to share is how Electric Scooter companies literally dumped scooters on sidewalks and left the resulting problems on local governments. I simply laugh at how ridiculously it played out in my area, yet the same pattern of technological introduction holds true for social media, plastic, and petroleum. They were introduced in an unrestrained fashion and now those 3 things have the potential to fuel societal collapses through their various impacts.

AI is another technology which is already having unintended consequences, from minor things like automatically getting flagged as a bot and temporarily loosing account access to major things like creating more racial disparities in healthcare or causing road accidents. I think people speak about AI risks more than others because it invokes a more primal fear of being replaced, but when you look at how AI is actively being used... the pattern is back. Even if part of scientific community decided on a safe boundary for AI, I wouldn't trust everyone to respect it nor the political institutions to enforce it. It's as if technology has a will of its own.

Studying STS left me angry and depressed toward society, so I've shifted my focus to acceptance... societal systems upend themselves if that is their destiny. I wonder what the future holds for the introduction of new technologies. Personally, I do hope it is much slower and evidence driven.

Expand full comment

Aug 9, 2022Edited

> I wouldn't trust everyone to respect it nor the political institutions to enforce it.

This is an important point: unlike AI risks, nearly* everyone realizes the consequences of social media (or plastic, or petroleum) and we can’t create sufficient political / regulatory control. What chance do we have with something like AI growth? I think this is an argument in favor of not slowing down, and trying to race to an aligned AGI.

*enough to make my point at least

Expand full comment

On politics - Exactly! Nearly* everyone realizes the consequences of social medial, plastic, and petroleum NOW in the present, but they didn't in the past.

> What chance do we have with something like AI growth? I think this is an argument in favor of not slowing down, and trying to race to an aligned AGI.

So I can understand better, can you explain how you find that this argument in favor of not slowing down?

From my view point you are exemplifying part of my feedback loop of dysfunction that causes us to rush technological development. This doesn't mean that I'm correct in my claim, just noting that what we are saying is aligning even if we disagree a bit which is lovely! With my claim, I suggest that racing to an aligned AGI gives us the highest probability of creating a misaligned AGI. I ask myself, would misaligned AGI be worse than having an aggressive player develop AGI technology first? I honestly think it would be and that is just a an opinion/gut feeling of mine.

Expand full comment

> So I can understand better, can you explain how you find that this argument in favor of not slowing down?

I agree slowing down would be a less risky strategy. But to be effective, that strategy would need to be universally adopted, presumably driven through some form of regulation. And, unfortunately, history has shown economic incentives win out over regulation. Thus, slowing down is not a *possible* strategy, independent of whether it's good idea.

Also, to be clear, I'm not sure if racing to aligned AGI is still something we should be doing. I think the argument is solid, but it relies on a shaky foundation.

Expand full comment

I agree that society, presently, is not capable of the universal alignment. I think the internet is an immensely powerful tool that might help guide us there, along side generational turnover, so long as it doesn't help destroy the fabric of society before we get there.

Unfortunately survival requires awareness of the reality around us and pragmatism to navigate it. That could both mean that we need to race for AGI so that it isn't used against us and that we need to slow down AGI so that it doesn't turn against us. There are risk inherent in the development of any technology, and weighing the pros/cons in a predictive fashion is quite difficult. STS researchers develop frameworks to help with this process, but since there is rarely any alignment around them (even within a single player) I find they are underused.

I hope one day STS is a part of civil discourse like politics, that it is something we learn in school, talk about with friends and colleagues, and hear media reports of developing technologies and their studied implications, with an actual means for people to act upon the beliefs they form themselves. I think most current day political systems are not ready for this, but certain communication technologies and feedback methodologies could enable us to achieve it.

Expand full comment

Simple Thinker

If AI can get this advanced, the name of the game is leveraging good AI against bad AI.

Stifling AI and sending it to rogue or irresponsible actors does the exact opposite.

AI can be programmed to be human-loving (and self-destruct upon mutation).

In addition, vast server farms can be controlled by humans for the purpose of protective AI, that would develop solutions to overwhelm any emerging AI threat.

Expand full comment

Doctor Mist

All true. But "can be" is a hell of a long way from "here's how".

Expand full comment

Simple Thinker

The "here's how" is to employ qualified scientists and technicians, with security clearance, to work towards this goal.

This can be modelled after the Manhattan project, for example, which employed America's leading minds.

Expand full comment

Doctor Mist

The Manhattan Project had a theory; the physics was well understood and all we had to do was (amazing) engineering. For aligned AI we are groping towards a theory, and what little we have that looks like a start might well turn out to be completely useless. We have a vague qualitative idea of what we want, but nothing precise enough for us even to be able to tell how close we are.

The Manhattan Project was (per Einstein's letter to Truman) a race against the Germans, who it seemed were trying to do exactly the same thing we were -- and so would at least be no easier for them than it would be for us. For aligned AI we are trying to outrace a lot of people who are trying to do something -- human-level capability -- that is, yes, hard, but also seems to be vastly easier than what we are doing.

"Put our best men on it" is not a plan. It's not even a specification. I have no objection, but to call it "here's how" is naive.

Expand full comment

walruss

I remain convinced that these apocalyptic risks - (the unaligned super-intelligence/over-enthusiastic paperclip manufacturer/etc) are just not defined enough to worry about. My concerns, and I imagine the concerns of the broader public, are with problems we can actually define and identify, and which if solved, might help avoid the apocalypse scenarios as well:

- The people setting machine learning loss functions sometimes have bad incentives.

- AI sometimes optimizes in ways that contradict our values (i.e. race is correlated to recidivism, so it explicitly considers whether a parolee is African American before deciding whether to grant parole).

- Modern AI has sometimes been known to obscure its decision-making process when using metrics we don't like, but that optimize its given goals. That's not necessarily a sign of intelligence (to the extent the concept has any meaning), it's just reacting to incentives in accordance with its programming. It's therefore worthwhile to research how machine learning decisions are reached.

- Modern AI sometimes makes bizarre and inexplicable choices, which have implications for self-driving cars and other uses.

Possibly the nascent urge to deceive will one day allow AI to improve itself beyond mortal comprehension in secret. Possibly incentive structures made without sufficient thought will one day cause my Tesla to drain the zinc from my body to make more Teslas. That sounds flippant, and I am being a bit funny, but I also don't discount the possibility. But if we address those small issues now, we *also* solve the big issues later.

When focusing on these immediate issues, it becomes less like wondering if environmentalists would be more successful if they hung out with oil barons and more like wondering if environmentalists should support green energy or allow the coal plants to keep running because green energy also has problems.

Expand full comment

> But if we address those small issues now, we *also* solve the big issues later.

This is the problem, we don't.

Expand full comment

Just to focus on one of your points: it doesn't matter if the people deciding loss functions are well-intentioned or not. Results like Arrow's theorem are everywhere, and even if we had Saint Peter, Gandhi, and the Supreme AI all working together, there would still be ways to exploit the function they chose.

Expand full comment

walruss

By "exploit," I understand you to mean "minimize the loss function without accomplishing the intended task, or minimize the loss function by accomplishing the intended task with bad side effects."

So then is the claim that since machine learning algorithms follow their loss functions to the exclusion of all else by definition, and that function will always be exploitable, there's no way to minimize harm? That doesn't strike me as a crazy claim. We can be absolutely certain that a powerful AI will make the number as small as possible using any means at its disposal. If every algorithm optimally crafted to minimize any loss function is definitionally bad, no matter how carefully we define the loss function, then any sufficiently strong AI is also bad. If you're right, and assuming we can't stop AI research (we can't) then we're all dead no matter what we do.

But it is a strong claim and I'm not sure I buy it. To some degree people are loss function minimizers, hundreds of times more powerful than the best currently existing AI, but we still manage to exist and not destroy the world most days. Arrow's Theorem has implications, but it hasn't rendered democracy entirely irrelevant, y'know? It might be that we just aren't effective enough agents to destroy everything in pursuit of small efficiency gains, but it seems much more likely that our loss functions are just subtle enough to avoid major pitfalls (while obviously still causing a lot of problems). And there was no human mind thinking about that problem, that just happened with evolutionary pressures.

Expand full comment

I think most people don't have fixed utility functions, and that such fuzziness is adaptively useful in working around impossibly results that happen in a rigid system.

Expand full comment

"- Modern AI has sometimes been known to obscure its decision-making process when using metrics we don't like, but that optimize its given goals. That's not necessarily a sign of intelligence (to the extent the concept has any meaning), it's just reacting to incentives in accordance with its programming. It's therefore worthwhile to research how machine learning decisions are reached."

Anywhere I can read up more on this?

Expand full comment

https://techxplore.com/news/2019-01-ai-information-hiding-behavior.html

walruss

Aug 9, 2022Edited

Steganography is probably the most salient example.

In 2019, an AI was given the following task: Given an aerial photograph, generate a street-view style map of roads and major landmarks. Then, given those generated maps, recreate the aerial photograph to the best of your ability. The original aerial photos were not stored in memory and were otherwise invisible to the portion of the algorithm that generated the aerial photos from street view maps.

Researchers noticed that this algorithm was *shockingly good*, generating details like skylights that did not show up in the street view maps. Turns out, when the AI generated the street view maps, it also encoded details, invisible to the human eye, about how to regenerate the original photo.

This kind of thing occurs rarely, currently, but it's a completely foreseeable part of the machine learning process - in this case it couldn't generate visible encodings on the street view map, because that would have harmed performance on the loss function between the generated map, and the ideal generated map. But it also was using the data from the map it created to try to rebuild the original map, and needed to include as much info as possible about that original map. Hence: A generated map that fools the human eye into thinking it contains less information than it does.

I'm not *quite* an AI expert, but I know a fair amount, and in my opinion we'll see more of this in the near future - AIs manipulating intermediate products in order to pass messages among themselves containing info we're trying to sequester, that we can't read because of noise. It's interesting because it shows how little "consciousness" is needed to cause deceptive behavior. All you need are 1) incentive to pass a message that optimizes towards some goal, and 2) incentive for the intermediate product to appear message-free. Back propagation will do the rest.

Expand full comment

Oh #$!%.

Is my reaction to that.

Expand full comment

Jeffrey Soreff

Cool! I've been thinking that one of the milestones in AI would be when it starts to actively deceive its trainers - I guess it is already here! ( and in the complete absence of the AI having a theory of mind! )

Expand full comment

Bill Benzon

Hmmmm..... You might want to keep an eye on John Carmack. In an interview with Lex Fridman he said:

"I am not a madman for saying that it is likely that the code for artificial general intelligence is going to be tens of thousands of line of code not millions of lines of code. This is code that conceivably one individual could write, unlike writing a new web browser or operating system and, based on the progress that AI has machine learning had made in the recent decade, it's likely that the important things that we don't know are relatively simple. There's probably a handful of things and my bet is I think there's less than six key insights that need to be made. Each one of them can probably be written on the back of an envelope."

I've posted the clip along with my transcription at my blog: https://new-savanna.blogspot.com/2022/08/fools-rush-in-were-about-six-insights.html

Expand full comment

I like Carmack, and I like his emphasis on nuts-and-bolts over head-in-clouds, but if I could bet against him on the "six envelope-sized key insights" thing I would.

Expand full comment

Bill Benzon

Yeah, I don't think he knows what he's talking about. I'm reminded of a remark the late Martin Kay made about machine translation some years ago (long before deep learning): "It is not difficult to learn something about how computers are programmed, and many people know a foreign language. Those who know a little of both will always be susceptible to revelations about how a machine might be made to translate." (Daedalus, Language as a Human Problem, Summer 1973)

Expand full comment

The Chaostician

There is another explanation I've seen in the writings of Eliezer Yudkowsky that I'm surprised wasn't mentioned (or maybe I'm misremembering or misinterpreting him). The argument goes something like:

The only way to stop an unaligned AGI is with an aligned AGI. Perhaps the only way to prevent an unaligned AGI is with an aligned AGI. Once one group figures out how to make superintelligence, other groups will be close behind, including groups that will almost certainly make unaligned AGI. It is impossible to actually prevent AGI from being developed, so our only hope for victory is to make an aligned AGI first. It is important for safety people to be working on capabilities, not because they're likely to figure out alignment, but because the only possible path to victory is if the good people get AGI first.

What are your thoughts on this argument?

Expand full comment

The Chaostician

I am skeptical of it, but would like to see what other people think here too. Here are my reasons:

- I am unconvinced that AGI can be built with anything like our current systems.

- If it is extremely difficult, or impossible, to build AGI, then there's no reason why multiple groups would figure it out shortly after each other.

- I don't think that banning AI capability research is more difficult than figuring out alignment.

- It feels like the wrong strategy to be pursuing. Yudkowsky (or my memory of him) thinks that it is necessary to build an AGI quickly and have it do a pivotal act soon after it's created, before another unaligned AGI can perform a pivotal act. This is an extremely risky strategy. If all of the explanation is correct, then it might be the best strategy, but if any of it is wrong, it's probably not the best strategy.

Expand full comment

> I am unconvinced that AGI can be built with anything like our current systems.

We are throwing billions of dollars at AI, not simply our current systems. A 20 year timeline feels long considering progress to date. I think your statement is likely true, but not sufficient to make your point.

> If it is extremely difficult, or impossible, to build AGI, then there's no reason why multiple groups would figure it out shortly after each other.

I generally agree, but history has show a pretty interesting clustering of “novel discoveries” happen very close to each other.

> It feels like the wrong strategy to be pursuing

Sure, if you reject the premise. But, as you say yourself, it may be the best strategy if you do.

I interpreted Yudkowsky the same way you did, fwiw.

Expand full comment

Aug 9, 2022Edited

Really? I feel like our progress to date is not very interesting or encouraging, like at all. It is less progress than I would have expected 5, 10, 20 years ago.

Better chat bots and shitty paintings and cars that aren’t better than humans aren’t very impressive.

Those are exactly the types of low bars we expected to hit, and we haven’t even hit them.

Expand full comment

The Chaostician

https://mattsclancy.substack.com/p/how-common-is-independent-discovery

I am unconvinced that AGI is possible on anything we currently would consider a computer. This is a strong claim, and I haven't justified it, but I do think it's a real possibility.

The future is uncertain. I would like to have a strategy that works for multiple contingencies. Yudkowsky's approach only works for the future he predicts and is worse than useless in other potential futures. The argument requires not only that this is the most likely future, but that it is more likely than all other possibilities.

The following post looks at competing patents and scientists getting scooped to estimate a 2-3% chance of multiple independent discoveries within the same year. We probably shouldn't assume that years are statistically independent and calculate how long it takes to get over a 50% chance of rediscovery, but it's relevant information for this question.

Expand full comment

> I am unconvinced that AGI is possible on anything we currently would consider a computer.

Ah, got it. Yeah, that is a stronger claim, and it would mean the risk factor is small enough we shouldn't be so focused on alignment.

> I don't think that banning AI capability research is more difficult than figuring out alignment.

I think it is. Too much economic incentive to keep pushing on AI means banning it is practically impossible. Alignment may *also* be practically impossible, but I'm less certain about that, which is why I disagree here.

Expand full comment

Tatu Ahponen

There probably are more than a few folks who look at things like Yudkowsky pondering whether "aborting" 18 months old babies is murder or not and go "Wait, would it really be preferable to get AI alignment if it's to *this* guy's values?"

Expand full comment

Comment deleted

Comment deleted

Expand full comment

> unavoidable if you want to have philosophical clarity.

Unfortunate, but very true. I think exactly this lack of clarity is what allows many culture wars to wage so uselessly.

> Human values are not aligned with each other.

Agreed, but what's even worse is that the lack of clear discussion means people often fail to see the similarity in their values.

See: pro-life and pro-choice v Yudkowsky

or: SWJs and conservatives v Scott Aaronson

Expand full comment

See, this is exactly why I won't donate to EA, or any capital-R Rationalist organization. They are like transportation specialists back in the day, when locomotives were first invented. You could invest in more railroads and better engine technology and eventually revolutionize your entire transportation network... or you can mandate that every locomotive has to be preceded by a man riding on a horse and waving a little warning flag.

Modern machine learning algorithms are showing a lot of promise. We are on the verge of finally getting viable self-driving cars; comprehensible translation from any language to any language; protein folding; and a slew of other solutions to long-standing problems. There are data scientists working in biology, chemistry, physics, engineering, linguistics, etc. who will talk your ear off about marginal improvements to prediction accuracy. But if you asked them, "hey, how likely is your machine translation AI to go full Skynet ?", they'd just think you were joking. I've tried that, believe me.

Are we really at the point now where we are willing to give up huge leaps in human technology and our shared quality of life, in exchange for solving imaginary problems dealing with science-fictional scenarios that may or may not be physically possible in the first place ? At the very least, do we have some kind of a plan or a metric that will tell us when we can retire the guy on the horse with his little flag ?

Expand full comment

I really don't see how any of what you've written is a reason to be scared of donating to the AMF. I don't see how any of what you've written is a logical refutation of a policy like "stop accelerating the development of technologies that are going to kill my family in 10-15 years", either, but I don't understand the other thing too.

Expand full comment

The problem is that the actual probability of modern machine learning techniques achieving the Singularity in 10..15 years is approximately zero. If I were worried about such low-probability events, then I wouldn't be donating to MIRI anyway -- I'd be too busy stockpiling ammo for my demon-killing super-shotgun. In reality, machine learning has a good chance of transforming our world in the same way that electricity and the Internet did. There are people still alive right now who will die unless technology advances quickly enough to cure or mitigate their terminal diseases, provide them adequate nutrition, heat their homes in the winter, etc. By donating my money to causes who wish to slow down global technological advancement for no good reason, I'd be killing those people. No thanks.

Which is not to say that AI (or any other technology) needs to completely deregulated -- quite the opposite. We regulate nuclear energy for similar reasons: in the hands of a sufficiently careless or malicious human, such technologies are extremely dangerous. But, from what I've seen, such practical regulation is too pedestrian for the AI-Risk crowd to concern themselves with.

Expand full comment

> The problem is that the actual probability of modern machine learning techniques achieving the Singularity in 10..15 years is approximately zero

Well there it is then. It's not. Unless you're doing some cheeky thing by inserting the adjective "modern".

Expand full comment

Aug 9, 2022Edited

Current state of the art machine learning algorithms are to the Singularity what hot-air balloons are to space travel (and I'm being quite lenient here). You could conceivably draw a connection from one to the other, but you can draw a weak connection from anything to anything else, if you try hard enough.

There's a pervasive belief in the AI-risk community that all of the hard problems in AI have been solved, and the only thing that remains is the linear scaling of processing power -- throw 10x more processors at GPT-3, and boom, Singularity. By that same logic, you can upgrade your Casio watch to a Playstation 5 simply by buying more Casio watches.

Expand full comment

Robert Mushkatblat

Aug 10, 2022Edited

> There's a pervasive belief in the AI-risk community that all of the hard problems in AI have been solved, and the only thing that remains is the linear scaling of processing power -- throw 10x more processors at GPT-3, and boom, Singularity.

This is completely wrong, and trivially verifiable by actually checking what the people involved say. (One example: https://intelligence.org/2017/10/13/fire-alarm/)

It might help you have more productive discussions on the subject if you considered the possibility that the people you think are wasting their time worrying about nothing have, in fact, spent more than 5 seconds stress-testing their models. The EA and rationality communities would, in fact, richly reward anyone who could conclusively demonstrate that there was nothing to worry about w.r.t. AI risk - that would be an enormous load off of many people's shoulders, and an enormous amount of effort and resources could be redirected to doing something else.

The fact that most attempted refutations don't even get so far as to not be completely wrong about the core assumptions of the beliefs they're arguing against is pretty depressing.

Expand full comment

> The EA and rationality communities would, in fact, richly reward anyone who could conclusively demonstrate that there was nothing to worry about w.r.t. AI risk...

How exactly do they expect me to prove a negative ? I cannot prove that demons from Hell are *not* going to invade Earth through a portal on Phobos tomorrow. I mean, they could. The chances of demonic invasion are non-zero, and the stakes are the extension of humanity (if we're lucky !); thus, we should invest all of our resources into developing BFG rounds, right ?

And if you disagree with me regarding CPU scaling, that's fair. What is *your* preferred scenario for AI-risk ? Can you draw a line from DALL-E to Singularity without handwaving some "...and then it got 1000x faster, and boom" steps in the middle ? Can you get there without positing some kind of nanotechnology, which would currently appear to be physically impossible but will surely be invented by an AI that is kind of like AlphaGo, but super-fast ?

Expand full comment

> See, this is exactly why I won't donate to EA

You can just donate directly to AMF, instead of being hung up about how AMF is recommended by EA orgs like GiveWell.

Expand full comment

My biggest hope is that human-level intelligence turns out to require hundreds of trillions of parameters and about the amount of data in a human childhood, and that it would be impossible to scale up to this level (and beyond it to superintelligence) quickly enough to matter.

My second-biggest hope is that something like Neuralink allows us to merge with the AI and use it as a literal brain extension. (As outlined in this Wait But Why post: https://waitbutwhy.com/2017/04/neuralink.html).

A problem with Scott's analogy: AI-safety crowd does not have its own version of green energy - i.e. a proven viable alternative to fossil fuels whose only obstacle is price and the slow pace of deployment. The AI version of green energy might be safe AGI or merging with the AI, but neither of those exist so far, whereas green energy does exist and is getting exponentially cheaper (energy storage technology is following close behind, and deregulating nuclear would greatly help). The environmentalists can afford to be hostile to fossil fuels, since they champion competing energy sources, but the AI safety movement cannot afford to fight AI-capabilities companies when they have no alternative technology to rally around. They would be more like the degrowth environmentalists, who have a poorer track record than more mainstream pro-wind/pro-solar/pro-batteries environmentalists.

Expand full comment

It seems likely to me that it will take 10+ years to train a general enough AI with current tech. The problem is that once trained, it can be quickly replicated. It's therefore worth putting in a lot of effort into that training, even involving things like deliberate destabilisation of alternative uses of the required scarce resources. Not fun for us normies.

Regarding your second point, there is a perfectly viable positive rallying point for AI safety: it's Intelligence Augmentation. Unfortunately most people in AI currently aren't trained to deal with stuff beyond a single narrow specialist area. To get CS grads to spend time learning about UX and HCI as well as linear algebra and PyTorch, we need IA companies hiring to explicitly say what skills they need and get the message out there. This probably means first getting some early stage investors to realize there is a bigger opportunity in IA than in chasing a third-rank deep learning investment.

Expand full comment

Even if it takes 10 years to train a general AI, that AI might still be well below human-level intelligence, and the rate of intelligence scaling with respect to inputs (parameters, compute, etc.) might slow to a crawl. That is the hopeful case, where FOOM-type scenarios just don't happen. Even if this human-level or almost-human-level AI can be easily replicated, this is still far less brainpower than humanity put together.

I mentioned Neuralink as one IA company. As far as I can see, though, they are the only such company. None of the others (ie, DeepMind and OpenAI) are trying to integrate deep learning neural nets with the human brain. This is in contrast with green energy, which is already a double-digit-percentage part of the energy mix (in some places even if you only count wind and solar and ignore nuclear and hydro) and growing fast.

If IA was as advanced relative to AI as green energy is advanced relative to fossil fuels, then the AI safety movement could make itself entirely about advancing IA and hindering AI independent of human brains. There would be a lot of problems with this: the few humans who get superintelligence-level AI upgrades might themselves be unaligned, and to normies, the AI safety movement would start seeming even more like a weird machine-worship cult. These potential problems are all moot points, though, because IA is nowhere near the point of viability right now, so the AI safety movement cannot use it as a rallying point.

Expand full comment

Peter Gerdes

I feel the idea that we should take AI slow depends on the assumption that w/o AI our situation is relatively secure. If, OTOH, we are constantly at risk of nuclear and biological apocalypse then the relative harms and benefits of potentially disruptive tech seems very different.

That is doubly so given that there is about 0 chance we'll convince the Russians and Chinese to delay AI research w/o clear and convincing evidence of a threat.

Expand full comment

Aug 9, 2022Edited

>So the real question is: which would we prefer? OpenAI gets superintelligence in 2040? Or Facebook gets superintelligence in 2044? Or China gets superintelligence in 2048?

Unstated fourth option: "Or nuclear war in 2047".

I don't mean just threaten it, although if that works it'd be nice. I mean actually do it. The PLA deterrent is irrelevant if the result of *not* nuking them is likely the literal end of humanity.

NB: I'm not saying Nuclear War Now. I'm saying "work with them to stop, if that doesn't work say 'stop or we nuke you', if *that* doesn't work then nuke them".

Expand full comment

Mark Atwood

Every time I read something like this, I hope my work helps make unaligned AI happen a little bit faster.

Expand full comment

N. N.

What? Why?

Expand full comment

Aug 10, 2022Edited

He's just frustrated that people like us take these ideas seriously. After all, we might be making the same mistake as the early environmentalists who feared nuclear power, or the early Communists who thought only violent revolution could save workers from poverty.

Humans are fascinated by the end of the world. Religions are full of stories like those, and so are movies. It's easy to say "there might be an apocalypse!" with more confidence than evidence. And, like the early Communists, it's easy to spot a real problem (worker inequality) and jump to the wrong solution (world revolution).

On the other hand, pandemics are real, and were worth worrying about, and it's a pity we didn't prepare more in 2010 or 2019 for what happened in 2020.

Likewise, US-USSR nuclear war and the Year 2000 software breakdown were avoided only because of hard work and hard decisions made, even as bystanders belittled the dangers as exaggerated.

Is "unaligned AI" a fakealypse or real? Hard to prove things like that about the future. But as Scott observes, some of the top AI developers look seriously at AI safety.

Expand full comment

Mark Atwood

There have been too many fakealypses. At least two orders of magnitude more than anything resembling a real. And all the work done and the overwroght restrictions imposed by wide eyed terror of the fakealypses, are causing real ones.

Expand full comment

Aug 9, 2022Edited

Pick a future year. Say that by around 2040, computers get cheap enough that anyone can make AGI, whether or not they care about safety.

Then 2040 becomes our deadline. If the professionals don't develop safe AGI before 2040, amateurs will develop unsafe AGI after.

It might not be 2040. The exact date doesn't matter. But every year brings cheaper hardware and more training data.

Right now, AI is a subject for serious researchers who care about their reputation with their community, who at least have to pretend to care about their community's safety standards. That won't last forever.

Even if China agreed to freeze development, once every smartphone is smart enough to do AGI, the genies will come entirely out of the bottle. Before that critical year, be it 2040 or not, we'd better have figured out how to make our genies kind.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Aug 10, 2022Edited

Actually, Moore's Law is continuing just fine for the chips used by our current approaches to AI.

Quoting from Marius Hobbhahn and Tamay Besiroglu's latest analysis: "for models of GPU typically used in ML research," they find "FLOP/s per $ doubles every 2.07 years."

If you focus on the CPU you're missing where the AI action is currently happening. We may well hit a bottleneck, but right now compute costs are still plummeting.

The more immediate bottlenecks are training data (we've already near fed computers as many books as we've got to learn from, the world doesn't double the number of full-length books every year) and training costs (expenses to build these large language models are rising at a stupid pace, if they somehow kept up we'd be at $1B to build a model in a few years and soon after $10B; not even Google can casually spend that for pure research projects).

Expand full comment

You are essentially restating EY's argument in his infamous rant a few months ago. Do you have a better conclusion than "I'm throwing in the towel to write more fiction"?

Expand full comment

"Useful" and "safe" are actually closely related; most research paths to an AI that can suggest useful actions are logically close to one that can suggest safe actions.

Granted, a car that runs people over may still technically get you to your destination faster. But they're not unrelated goals: avoiding obstacles is part of getting your car to places faster, not just more safely.

Safety and usefulness separate most as you get to high ability to provide both. We'll likely have the ability to validate AIs' suggestions well before they pass that "Machiavellian threshold" of super-competence, simply because you don't need to be superhuman to suggest new ideas that either work out or prove unsafe.

I do have real concerns about making sure "honest" AI designs at the sub-genius level aren't incentivized to become "deceptive" as they become super-genius. This goes back to the mesa-optimizer problem or "teaching to the test": you want AIs that give reliable answers because they like being reliable, not because they know it's the answer that's expected of them. This looks like a solvable problem, but not a trivial one.

But overall the key is AI researchers being aware of the use/safety difference and testing for it, not researching some entirely different path for one and the other.

Expand full comment

Once safe superhuman AGIs exist, they'll be in a good spot to prevent unsafe AGIs from taking over. The trick is the first part.

Expand full comment

EY thinks safe AI means fundamentally different design. I just think it means different testing.

"Test more" is an easier safety bar than "design completely differently."

EY's example scenario is a superhuman AI that inevitably screws you as soon as they notice you can't check their work. Why? Because ignoring humans will be more efficient for whatever it's really after.

My example scenario is you build a super-genius AI by scaling up the right sub-genius AI. Namely, a sub-genius AI that showed it really liked being reliable even when it (incorrectly) thought its work couldn't be checked. If you do the scale-up right, the AI's default "sincerity" might be preserved even when it's smarter, just as a chess program run on a faster computer still just plays chess, and we'll be safe.

You can see how EY despairs at the current situation, because current AI isn't inherently safe by architecture. To him, that's the only safe path.

Whereas I'm cautiously hopeful. Why? Because testing for a program's "sincerity," before scaling it to super-genius, is a lot like testing for product reliability before general deployment. And we already do that kind of testing today.

So to me it doesn't seem vain to hope that AI developers do those tests. It's not guaranteed to happen, or to work. But it seems a plausible safe path.

Expand full comment

Aug 9, 2022Edited

The first obvious counterargument to my "scale up a sincere sub-genius AI" example, by the way, is that we're training AIs on human behavior, so it's not obvious that even a sub-genius AI wouldn't be able to tell when we're able to check its work.

If you train an AI to get on well with humans, it's going to know what humans can normally do. And that means it by default knows, sub-genius or not, where you, a human, can spot its deviations.

So then we're back to an AI we've been "teaching to the test," an AI that misbehaves only when we can't catch it.

I still think it's a solvable problem, but I wouldn't take it for granted.

The second obvious counter-argument is that it might just not occur to the sub-genius AI to deceive us, precisely because it's not smart enough to figure out how, so that it looks "sincere" but it's really just "unskilled." Then you make it smarter and it starts deceiving you, because it's figured out how to deceive you.

In theory, you might solve this by making sure your sub-genius AI really did understand how to trick people - but then you're back to the first problem, of making sure it thinks you won't catch it when you really can, so that you can trust its apparent sincerity.

Selecting for a determinedly rather than accidentally or pretended "sincere" AI still seems doable to me. But I don't want to imply that it's trivial.

Expand full comment

People are also not safe by design but we don't insist on freaking out about the odd psychopath. Perhaps the big distinction is that EY believes there are no fundamental limits to how much impact intelligence has on solving hard problems, whereas I believe even a steep increase in AGI intelligence from human to superhuman is going to achieve modest gains, similar to having perhaps a dozen smart mutually compatible people working together well in a team. We don't make a big deal about that, either.

Expand full comment

Aug 9, 2022Edited

Right, that's the "singleton" question.

EY fears that the human-superhuman threshold guards so much opportunity that the very first AI to cross that threshold, friendly or not, will take over.

It's like how humans took over from all other primates. In evolutionary time, once we were smarter, we changed the whole world's environment faster than they could evolve to ever catch up.

So why am I less worried than EY about a singleton? Specialization.

Right now, we have separate software for driving assistance, calendar management, and product recommendations. And we have multiple competitors for all of those.

So we might hope that "superhuman" isn't a single threshold crossed decisively by a "singleton" first exploiter. Instead, we can see a web of multiple thresholds, with many competitors, to forestall even an unfriendly AI's scope for doomsday stunts.

But EY's fear isn't crazy. The first humans really did exploit and change the whole world's organization enough to preclude any further primates evolving to compete. Heck, the first European states to figure out how to exploit firearms and steam power changed the world too, enough to dominate everybody else for centuries, though thankfully not forever.

Takeover by technology threshold: it does happen.

But I don't think "singleton" is the default outcome. I think we'll see "superhuman" as a threshold crossed one specialty at a time, in a way that doesn't allow a sudden singleton takeover.

EY's fear doesn't seem irrelevant to me, though. "Not the default outcome" is a lot less certain than I like for existential risk.

In particular, there are a few technologies - "persuading humans to give you money" being the most obvious - where "first superhuman" could rapidly scale to "in charge of all humans."

As Scott observed on his old blog, look how powerful Muhammad made himself - and unless you're a believer, he did all that with merely human powers of persuasiveness.

Expand full comment

Thanks for the clear exposition of your stance.

Expand full comment

Humans can't assume direct control of relevant amounts of military force (a leader needs followers), humans don't live forever, and even if Nazis kill all non-Nazis that's not an existential catastrophe to humanity because Nazis are human. These limit the badness of human megalomaniacs, but not of AI ones.

Expand full comment

What makes you think the equation “hardware speed + training data” is sufficient? Certainly there is quite a bit more to human minds than that.

Are you trying to make a screwdriver or an AGI?

Expand full comment

Today, I can use a 3D printer to build parts that used to require a trained machinist, and I can use my phone to play Mozart in a way that used to require live skilled musicians.

Eventually, anything that can only be done with experts and money becomes available to amateurs. And amateurs neglect safety. So it's important that the experts who care about safety do it first and right.

Expand full comment

Chef

>"So that was what I would be advocating to you know the Terence Tao’s of this world, the best mathematicians. Actually I've even talked to him about this—I know you're working on the Riemann hypothesis or something which is the best thing in mathematics but actually this is more pressing."

This is a really, profoundly silly way to try to get people on your side. People get Terry Tao to work on problems all the time, but this is not how they do it. Show him something of mathematical interest in AI alignment and you might get his attention; cold-call him sounding like a doomsday grifter and telling him his work is, well, just not quite as important as the stuff you want him to work on, and I can pretty much guarantee you're not going to get results.

Expand full comment

> Show him something of mathematical interest in AI alignment and you might get his attention

To lend credence to what you're saying, this is pretty much exactly how Scott Aaronson was convinced to take a yearlong sabbatical to work on alignment research stuff: https://scottaaronson.blog/?p=6484 Quoting him:

> I’ll be going on leave from UT Austin for one year, to work at OpenAI. ... Working with an amazing team at OpenAI, including Jan Leike, John Schulman, and Ilya Sutskever, my job will be think about the theoretical foundations of AI safety and alignment. What, if anything, can computational complexity contribute to a principled understanding of how to get an AI to do what we want and not do what we don’t want?

> Yeah, I don’t know the answer either. That’s why I’ve got a whole year to try to figure it out! One thing I know for sure, though, is that I’m interested both in the short-term, where new ideas are now quickly testable, and where the misuse of AI for spambots, surveillance, propaganda, and other nefarious purposes is already a major societal concern, and the long-term, where one might worry about what happens once AIs surpass human abilities across nearly every domain. (And all the points in between: we might be in for a long, wild ride.)

> Maybe fittingly, this new direction in my career had its origins here on Shtetl-Optimized. Several commenters, including Max Ra and Matt Putz, asked me point-blank what it would take to induce me to work on AI alignment. Treating it as an amusing hypothetical, I replied that it wasn’t mostly about money for me, and that: "The central thing would be finding an actual potentially-answerable technical question around AI alignment, even just a small one, that piqued my interest and that I felt like I had an unusual angle on. In general, I have an absolutely terrible track record at working on topics because I abstractly feel like I “should” work on them. My entire scientific career has basically just been letting myself get nerd-sniped by one puzzle after the next."

> Anyway, Jan Leike at OpenAI saw this exchange and wrote to ask whether I was serious in my interest. Oh shoot! Was I? After intensive conversations with Jan, others at OpenAI, and others in the broader AI safety world, I finally concluded that I was.

Expand full comment

Elohim

What if the superintelligent AI decides to take revenge on us because we went slow on its creation? Suppose without any restrictions superintelligence would arise by 2050 but due to the concerns of the AI safety people it got delayed to 2100. Maybe the superintelligence would get angry and punish us for the delay.

This argument may seem absurd but is it any more absurd than those put forth by the existential AI-risk folks? (Here I'm not arguing against more prosaic concerns such as bias but rather the idea that superintelligence is an existential risk for humanity.)

Expand full comment

Viliam

Aug 9, 2022Edited

The idea is *not* to create an AI that would do bad things (such as punishing humanity for not having created it faster). If you create an AI that afterwards punishes you, your strategy probably missed something very important.

Expand full comment

Aug 9, 2022Edited

I'm not sure if you're aware, but the argument in your first paragraph has been raised in the Rationalist community before and is actually kind of infamous. If you look for the term "Roko's Basilisk" you'll find it.

Eliezer Yudkowsky's most serious engagement with the question can be paraphrased as, IIRC, "ignore acausal extortion so as to disincentivise people from doing it".

Some responses that don't take it on quite that directly but are still of some interest include "doing this by accident would be pretty hard, whereas 'kill all humans' is a convergent instrumental goal", "as the people in the past, we should try not to build AIs like this", and "since the extortion only works if people know about it, spreading knowledge of it increases the likelihood it'll happen" (this last one is the reason it's called a "basilisk").

Expand full comment

https://www.sciencedaily.com/releases/2022/07/220728142923.htm

I think there is little to no reason to worry about RB. It’s just Anselm and the Ontological argument redux. Imagining something doesn’t make it real.

Expand full comment

Colonel Macabre

Aug 9, 2022Edited

Maybe this development will be a next big step towards AI...

Expand full comment

So AI alignment is extremely hard, even harder than cooperation between countries. If the progress isn't slowed we are all extremely likely to die. Regulator capture is a thing that can tremendously slow progress. Am I missing something or the actual reason for not broadly supporting AI policy approach is the "as a sort of libertarian, I hate blah blah blah same story"?

I mean, I myself would aesthetically prefere that the world would be saved by nerdy tech geniuses and visionaries who, due to their supperior rationality, took the right kind of weird ideas seriously instead of establishment bureaucrats who use more regulation as a universal tool for every problem and just got lucky this time. But whatever was achieved with highter prices for GPU due to cryptocurrencies is definetely not enough to properly prolong the timeline. Blockchain turned out not to be the best solution even in this sphere.

I would love having aligned AI in my lifetime solve all the issues and create heaven on earth, the dream of witnessing good singularity myself is intoxicating. But I guess that's not really an option for us. Our society isn't that great at solving complicated technological problems from the first try with a harsh time limit. We have a much better score at ceasing progress through various means even when there is no good reason to do so. Maybe it's time to actually use our real strength.

Expand full comment

Stephen Pimentel

Aug 9, 2022Edited

Rationalists are very prone to cognitive traps of the “Pascal’s Mugging” type. A strong sign of this is thinking “maybe we should pattern ourselves on the crazy anti-nuclear folks who stopped the nuclear industry in the name of the environment,” thereby killing the one thing that could have prevented climate change.

Maybe it’s time to step back and realize that getting super hyped up about low-probability events is a cognitive anti-pattern.

Expand full comment

Aug 9, 2022Edited

> Rationalists are very prone to cognitive traps of the “Pascal’s Mugging” type.

It's not a pascal's mugging if you think it will happen with >95% probability.

> A strong sign of this of thinking “maybe we should pattern ourselves on the crazy anti-nuclear folks who stopped the nuclear industry in the name of the environment,” thereby killing the one thing that could have prevented climate change.

Good, because I've literally never heard of any rationalist who has done that and the above post explicitly states that it would be incredibly stupid. Instead what people do is use the environmentalist movement in an *analogy* for how antagonistic the relationship between AI safety researchers and capabilities researchers *could be*.

Expand full comment

Stephen Pimentel

> It's not a pascal's mugging if you think it will happen with >95% probability.

Right, if one believes the probability is >95%, one’s error is different - and worse.

> Good, because I've literally never heard …

Pretty sure most third-parties will (correctly) identify that as “No True Scotsman.”

Expand full comment

> Pretty sure most third-parties will (correctly) identify that as “No True Scotsman.”

I'm not disputing whether or not the hypothetical person who said that was a rationalist. I am disputing that anybody said that. Please link me literally any example of a rationalist who either thinks nuclear power is a bad idea or thinks rationalists should be more like anti-nuclear activists.

Expand full comment

https://erikhoel.substack.com/p/we-need-a-butlerian-jihad-against

Aug 9, 2022Edited

I don't know whether Erik Hoel counts as a Rationalist, but he has definitely made an argument somewhat-like this *to* Rationalists. (It doesn't mention the anti-nuclear lobby, but what he posits is definitely in that direction.)

"A Jihad based on moral principles

If you want a moratorium on strong AI research, then good-old-fashioned moral axioms need to come into play. No debate, no formal calculus needed.

[...]

"And some things *are* abominations, by the way. That’s a legitimate and utterly necessary category. It’s not just religious language, nor is it alarmism or fundamentalism. The international community agrees that human/animal hybrids are abominations—we shouldn’t make them to preserve the dignity of the human, despite their creation being well within our scientific capability. Those who actually want to stop AI research should adopt the same stance toward strong AI as the international community holds toward human/animal hybrids. They should argue that it debases the human. Just by its mere existence, it debases us. When AIs can write poetry, essays, and articles better than humans, how do they not create a semantic apocalypse? Do we really want a “human-made” sticker at the beginning of film credits? At the front of a novel? In the words of Bartleby the Scrivener: “I would prefer not to.”

[...]

"All to say: discussions about controlling or stopping AI research should be deontological—an actual moral theory or stance is needed: a stance about consciousness, about human dignity, about the difference between organics and synthetics, about treating minds with a level of respect. In other words, if any of this is going to work the community is going to need to get religion, or at least, some moral axioms. Things you cannot arrive at solely with reason. I’d suggest starting with this one:

"Thou shalt not make a machine in the likeness of a human mind."

Expand full comment

None of the current successful AI approaches are trying to mimic human minds. I doubt any AI built without meat is going to mimic human minds. This is precisely why alignment is difficult.

Expand full comment

Aug 10, 2022Edited

Yes and no. Neural nets are more dangerous than explicit, and they're in the likeness of, if not especially a *human* mind, at least a natural one.

I was mostly supplying it as evidence that "someone is saying X", though I won't deny that I think there's some truth in it.

Expand full comment

> Right, if one believes the probability is >95%, one’s error is different - and worse.

Now you are just claiming that rationalists are inherently bad at estimating probabilities.

Expand full comment

Stephen Pimentel

Nope. I’m claiming that very high estimates regarding AGI are based on very weak arguments.

Expand full comment

This would be separate statement.

You started with a claim that rationalists are vulnerable to Pascal Mugging to explain their worries about AGI X-risk. When corrected that Pascal Mugging has nothing to do with the reasoning for caring about AGI X-risk because it's actually estimateted as highly probable in the rationalist community you made a witty comeback about even worse error - the error of asigning terribly miscalibrated probabilities. Which is equals to claim that ratioinalist are bad at estimating probabilities.

Expand full comment

Aug 9, 2022Edited

I really, really hate this toxic dynamic where working on intermediate steps is deemed "not having a plan". Eliezer apparently does this to people as well, in person. If someone said they were working on getting people in the field to at least be aware of a non strawmanned version of the AI X-risk thesis and they'd gotten three engineers to quit their jobs at FAIR and do something else, I suppose to you and Eliezer that wouldn't count as having a "plan" if they can't unroll the stack on the spot about how it leads to a complete alignment problem solution.

Expand full comment

Scott Alexander

I think this is true if you mean a detailed final plan, but that you need some kind of plan even to know what the intermediate steps are. If I told you "plant lots of trees in order to prevent AI", it would be fair to ask "what's your story for why planting trees prevents AI?"

I agree there are some things like general power-seeking that are useful under a wide variety of plans, but I have heard stories of people saying "I'm in power, now what?" and nobody having any answers for them. If (like some people think) timelines are 5-10 years, it's probably time to start answering that question. I think people are working on this and I look forward to seeing some results soon.

Expand full comment

Aug 10, 2022Edited

Is this not the kind of thing you're looking to see: https://www.lesswrong.com/posts/mF8dkhZF9hAuLHXaD/reshaping-the-ai-industry

If so, what would make it better? More widespread acknowledgement as The Plan? More detail? An expanded Rationale section?

Expand full comment

Mark

Though I am still agnostic about AI-risk, I see us under control of yet-dumb-algorithms already - and unable to handle even those: 1. Icelandic Volcano broke out. The air-control followed its rule/algorithm/simulation and grounded all civil planes, cause "maybe dangerous" - it refused even flights to actually measure it. https://en.wikipedia.org/wiki/Air_travel_disruption_after_the_2010_Eyjafjallaj%C3%B6kull_eruption#Attempts_to_reopen_airspace

Quote: "On 17 April 2010, the president of German airline Air Berlin ... stated that the risks for flights due to this volcanic haze were nonexistent, because the assessment was based only on a computer simulation produced by the VAAC. He went on to claim that the Luftfahrt-Bundesamt closed German airspace without checking the accuracy of these simulations. Spokesmen for Lufthansa and KLM stated that during their test flights, required by the European Union, there were no problems with the aircrafts."

2. An economist (GMU, I remember) called his bank to complain about the algorithm too often stoping his card-use (international trips et al.). Bank said, "the AI decides and we can not interfere with the algorithm". - Makes you wonder how we would handle ("align") a super-intelligent AI, does it?

3. Governments are bureaucracies running on algorithms, that should be easy to adapt if found lacking. R.O.T.F.L. . Baby-formula (Zvi). Enticements for Russians soldier to desert (Caplan). Keeping German nuclear power running this winter (me et al.). Bringing vaccines out asap. Bringing updated vaccines out asap. ---

"SuperAI, thou shall come, thine shall be the power and the glory. We are unworthy." Hope we not end up as paperclips. Our rulers-to-date turned us into social-security-numbers. So ...

Expand full comment

> When I bring this challenge up with AI policy people, they ask “Harder than the technical AI alignment problem?”

I would argue that the CCP is an (admittedly slow, but powerful) AI that has a track record of unFriendly behaviour, soooooo...

Expand full comment

Knowing how to make an AGI as aligned to human values as CCP would be an enourmous improvement over our current situation

Expand full comment

But still far from an actual solution.

Expand full comment

Not that far. CCP doesn't have an instrumental value of killing all humans instead of a random AGI from all possible mind space.

Expand full comment

Assuming that the expected value of an AGI is negative in general, I don't see how being able to make an AGI that acts like the CCP is an improvement over having no clue how to make any AGI at all.

Expand full comment

AGI is high stakes. It can either be very positive or very negative in utility, depending on it alignment. Currently we do not no how to align AI thus we expect AGI to be very negative. Also majority AGIs are not aligned to human values, so we may say that expected value of an AGI is negative in general.

But CCP-AGI isn't one of this majority. It's actually a somewhat-alligned AI, which captures some of human values which can actually have very high positive utility. For one thing it can prevent the creation of unaligned AIs and save humanity from other extinction events.

Also our current situation isn't "having no clue how to create an AGI". We have a lot of clues, actually so many that we are likely to create an AGi in a couple of decades if things keep going their way. What we have much less clues is how to make AGI aligned, even as aligned as CCP. So knowing how to do this would be an enourmous improvement.

Expand full comment

>Also our current situation isn't "having no clue how to create an AGI". We have a lot of clues, actually so many that we are likely to create an AGi in a couple of decades if things keep going their way

I don't see why you would assume this. The only argument I've seen on this point seems to be "some AIs have recently done unexpectedly well on narrow tasks, therefore an AI will soon become a god and take over the world".

"'Just as,' said Dr. Pellkins, in a fine passage,—'just as when we see a pig in a litter larger than the other pigs, we know that by an unalterable law of the Inscrutable it will some day be larger than an elephant,—just as we know, when we see weeds and dandelions growing more and more thickly in a garden, that they must, in spite of all our efforts, grow taller than the chimney-pots and swallow the house from sight, so we know and reverently acknowledge, that when any power in human politics has shown for any period of time any considerable activity, it will go on until it reaches to the sky.'" -- The Napoleon of Notting Hill

And even if it's true, a couple of decades is plenty of time to make progress in training AIs not to kill people. I'd rather take my chances with the state of the art in 2050 than permanently trap huge portions of the world population in an inescapable totalitarian state today.

Expand full comment

Aug 16, 2022Edited

> just as when we see a pig in a litter larger than the other pigs, we know that by an unalterable law of the Inscrutable it will some day be larger than an elephant

The metaphor would be more on point if there was a huge economical incentive or even an armanent race to breed bigger and bigger pigs with no regulation.

> The only argument I've seen

It's not just narrow tasks, some AIs are also doing pretty well at generalisation. But also there is the fact that evolution created us through a search in a mindspace and we have similar techniques to do it with AI and, actually, all the other facts about AI development that we know now but didn't know 100 years ago. We know how to build narrow AIs and these sherds of knowledge are actual clues for building general one.

Expand full comment

Aug 16, 2022Edited

No, I think Ape in the coat is correct here. The CCP builds concentration camps and has killed millions, but it has also lifted nearly a billion people out of extreme poverty. A CCP-like AGI would still be a disaster (almost certainly an s-risk, trapping people forever in an inescapable totalitarian state), but it's at least *closer* to being aligned than a random point from mind-space.

Expand full comment

Ape didn't say being able to create a CCP-like AGI would be better than knowing how to create a random AGI, he said it would be better than our current situation of not knowing how to create any kind of AGI. That claim requires a lot more justification than either of you are giving it.

Expand full comment

No, that's not what either of us are saying. Ape and I believe that even though we don't yet know how to build an AGI, we have even less idea how to align an AGI. Therefore, if we solve all the technical problems and build an AGI, it would be much closer to a random draw from mind-space (which might kill us all to use all our atoms for computronium, or whatever). Being able to align an AGI to "merely as bad as the CCP" would represent a huge leap forward in the state of AI alignment, even though it would still be a terrible idea to actually build one.

My ranking of possible world-states looks something like

(1) We know how to build and align an AGI

> (2) we know how to align an AGI, but not how to build one

> (3) we know how to align an AGI to "no worse than the CCP", but not how to build one

> (4) we know neither how to build nor align an AGI

>> (5) we know how to build an AGI, but only how to align it as well as the CCP

> (6) we know how to build an AGI, but not how to align it

Right now we're in state 4, but we're moving in the direction of state 6, which is the most dangerous place to be (because someone might be dumb enough to build an AGI without waiting for progress on alignment research). If we could get to state 3 we'd be in a better place - it would mean we have some traction on the alignment problem, and could hope to get to state 2 before someone works out how to build an AGI. And if not, then we'd end up in state 5 - still really really bad, but arguably less bad than state 6. Where Ape and I disagree is that Ape thinks the big gap is between states 5 and 6, whereas I think it's between states 4 and 5. It sounds like you're thinking of "build an AI with X level of alignment" as a single capability, and so not considering states 2 and 3?

Expand full comment

I endorse this.

Also it seems pretty likely to me that alignment knowledge that allows to create CCP-AGI would be generalizable to create at least Direct-Democracy-AGI.

Expand full comment

As someone from neither Western countries or China-adjacent countries, seeing the US vs CN dynamic playing out feels a little like watching two unFriendly superhuman AIs duking it out and worrying whether my small home country will suffer collateral damage.

Expand full comment

Yeah. I think the US government is better aligned with human flourishing (it hasn't built any concentration camps for oooh, decades at this point), but it's far from perfectly aligned. On the other hand, the CCP is currently engaged in lifting more people out of poverty than has ever been done before.

Expand full comment

I sometimes say "of course the AI alignment problem is hard! We don't even know how to align slow AIs made out of humans!"

Expand full comment

Yeah, you basically nailed one of my more compelling intuition pumps.

Expand full comment

konshtok

slowing down AI development is just as much an existential threat as badly aligned AI

think for a moment of all the possible future problems that we can face and that an AI would help solve

but only one of those existential threats is a good excuse for rent seeking

Expand full comment

The Precipice estimated AI risk over the next century as greater than literally every other existential risk put together (1/10 AI, 1/6 total).

Expand full comment

konshtok

without reading it I'll still bet you that they didn't consider the possible mitigating effects of AI on the other risks

for example lets look at the risk of an asteroid strike

having a working AI as early as possible increases the chances that we develop detection and deflection technologies before it's too late

this is true for every existential risk but no one seems to take that into account

having an AI REDUCES all the other risks

but the risk community only looks at the 0 order effects

Expand full comment

>without reading it I'll still bet you that they didn't consider the possible mitigating effects of AI on the other risks

I am tempted to try to sucker you on this - like most people, I have some desire for more money - but I will not.

[One key way in which AI could help improve humanity’s longterm future is by offering protection from the other existential risks we face. For example, AI may enable us to find solutions to major risks or to identify new risks that would have blindsided us. AI may also help make our longterm future brighter than anything that could be achieved without it. So the idea that developments in AI may pose an existential risk is not an argument for abandoning AI, but an argument for proceeding with due caution.]

(my PDF of The Precipice doesn't have page numbers, but it's from the section on "Unaligned Artificial Intelligence")

What you seem to be forgetting is that adding together a thousand 1-cent coins is still less than a $100 note. Most X-risks can be bounded at ~0.1% in the next century or less (asteroid strikes - at least natural ones - are a particularly-good case of this, since the chance of a Chicxulub-scale impact seems from craters and the fossil record to be ~1/100,000,000 years, and it's doubtful that even another Chicxulub would actually drive humanity extinct; moreover, we have actually spotted all the Chicxulub-sized asteroids in the inner Solar System and determined that they will not hit Earth within the next century, although we can't yet say the same for comets). There's one X-risk besides AI that Ord calls out as being over 1% (biotechnology at 1/30), plus the general categories "unforeseen anthropogenic risks" at 1/30 (unknown-unknown term) and "other anthropogenic risks" at 1/50 (this latter category includes things like "we design an asteroid-deflection apparatus, and then omnicidal maniacs use it to deflect asteroids *into* the Earth", or "as particle colliders get bigger, we spawn a black hole that eats the Earth"). The issue is, they're still as far as we can tell less likely than "we make AI, AI kills us", even when put together and when added to the highly-unlikely alternatives, and AI can't reduce the chance of those risks *below zero*.

(I'm not *wholly* against obtaining AI at *some point* in the future either. I am *wholly* against neural-net AI or deploying AI under race conditions - i.e., what we have right now. We have time to get it right. We don't have time to get it wrong.)

Expand full comment

konshtok

I'll refine my argument

if I understand correctly non AI risks are calculated over time ( x chance of y in the next n years)

AI risk is calculated as chance that the first AI is dangerous

my argument is that slowing down technological progress and especially AI increases the total time we are at risk

if the risk of Black Plague 2 is 1/30 in the next 100 years then it grows to 1/6 in the next 500 years (don't @ me for the math , I'm not a biologist)

so the claim of the governance ppl is that their effect on development would be small

should we take the existential risk that this claim is true?

every AI governance proposal should come with clear benchmarks

so if they go on for too long they get cut off

and they can't be the ones deciding what is too long or what the benchmarks are

AI research

and

AI research governance

and

governance for AI research governance

also probably research about what the governance of AI governance should be (deciding on the benchmarks)

there is also the issue of non X problems that could have been mitigated by AI but without AI would slow progress enough for another X risk to become significant

but I'll leave that as an exercise

Expand full comment

Aug 12, 2022Edited

>if the risk of Black Plague 2 is 1/30 in the next 100 years then it grows to 1/6 in the next 500 years

A lot of the risks we know of go away once we have self-sustaining space colonies. It's not obvious to me that that will take more than 200 years even without AI.

Expand full comment

quiet_NaN

Aug 9, 2022Edited

I think the main difference between AI dev vs alignment and fossil fuel vs global warming is that it took some two hundred years between the industrial revolution (marking the start of large-scale burning of fossil fuel) and the establishment of the IPCC (marking the consensus that CO2 caused climate change).

For AGI, the fundamental idea of misalignment, the creation turning on its creator, predates the concept of computers by centuries at least. Hence Asimov's three laws. Later, it was recognized that aligning AGI would actually be kinda hard. But generally, any AI development is taking place in a world where memes about evil AIs are abundant.

Still, judgement of danger is subjective, and the people in the position to act generally have some personal stake in the outcome beyond being an inhabitant of earth. Take the LHC: the payoff for a person involved is "make a career in particle physics" while the payoff for a member of the public is "reading a news article about some boffins discovering the god particle". Thus, they might judge the risk case of "LHC creates a black hole which swallows earth" very differently. (Personally, I am pro LHC, but also work in an adjacent field.) Some biologists are doing gain of function research on virii because they are convinced their safety measurements are adequate. Perhaps they are, I don't know.

I would also like to point out that the potential payoffs for creating a well-aligned AGI are tremendous, as the universe is not quite optimal yet. Suppose you have a dry steppe environment with several early ancestors discussing the dangers and benefits of developing tech to create and control fires. Some might claim that fire can only be created by the god of lightening, so no human will ever create fire. Others might think that it is more easy to create fire than to control it, and thus inadvertently destroy the world. They might either want to ban fire tech research permanently or focus on fire containment research first. Still others might argue that fire control could be a gateway tech eventually advancing their civilisation beyond their wildest dreams.

At the moment, we are in this position. Is AGI even within the reach of our tech level? Will it lead to a singularity or will the costs for intelligence increase exponentially and effectively max out at IQ 180, giving us just a boost to discover tech we would eventually have discovered anyhow? How far does that tech tree go, anyhow? Is alignment possible at our tech level, or will AGI invariably kill all humans?

Expand full comment

Patrick

If in doubt, KILL ALL HUMANS, so perhaps our best hope is for AGI to become super-intelligent fast enough to know for sure that humans are too puny to trouble it, so it might even let some of us survive like we biobank endangered species or keep smallpox samples in secure labs. Oh and it's 'viruses' not 'virii' "~}

Expand full comment

The Solar Princess

I am worried that when you're talking about when the AGI will come, you're using 2040 as your made-up number. Do you really believe it's possible to do it in less than 20 years?

Expand full comment

Séb Krier

Some mixed thoughts:

There are some cool ideas in AI policy, like structured access or compute measurement/management. Some other ideas lack a clear theory of change that describes how they reduce AI/AGI risk. Proponents of regulation (like the EU AI Act) claim it will incentivise more safety and checks, but I'm skeptical - most labs can just do the research in countries not covered by Europe's (weakening) "Brussels Effect". Standardisation is helpful but tricky to get right and in practice requires balancing the interests of many stakeholders who are not necessarily interested in or convinced by AGI risk - and people can also deviate away from standards (eg Volkswagen emission scandal).

Another issue is that frequently, policy proposals don't go much further than the blog post or academic paper stage. That's not always useful for policymakers. Few consider things like policy deesign, what the costs are, how to finance it and so on. And on the other side, policymakers are often too risk-averse to try new things that might make their minister or department look bad.

Still, I think we need more people in AI policy and governance. It's a nascent and growing field, and it will remain important going forward even if it doesn't tackle alignment directly. Managing competition with China, understanding how AI impacts society in other concerning ways, minimising risks of malicious use, financing safety research etc.

The fossil fuel analogy is helpful: ultimately what seems to be helping right now involves some innovation (nuclear, fusion), some regulation (carbon taxes) and some policies (fiscal incentives, research funding). I (weakly) believe coordinating and executing this well is more easily done if you're not in an excessively adversial or polarized environment, and if you have a strong AI policy/governance ecosystem (ie with skilled, coordinated and proactive people).

Expand full comment

>Proponents of regulation (like the EU AI Act) claim it will incentivise more safety and checks, but I'm skeptical - most labs can just do the research in countries not covered by Europe's (weakening) "Brussels Effect".

Yes, any attempt at a successful worldwide ban on this would have to be backed by "if you don't sign on then that means war".

>I (weakly) believe coordinating and executing this well is more easily done if you're not in an excessively adversial or polarized environment

I agree that depolarising the West and de-escalating international tensions would be great. Not sure how to do this, though.

Expand full comment

Ch Hi

I disagree with your existential risk estimate. We've already been within 30 seconds of a full-scale nuclear war, so an estimate of 0.0001% per century, while it may apply to giant asteroids, doesn't map onto the wider existential risk. It's my presumption that when Altman said "wait for the asteroid", he was talking about the wider existential risk estimate, as that's the only way it makes sense.

Expand full comment

Scott says exactly that: 'as written, this argument isn’t very good. But you could revise it to be about metaphorical “asteroids” like superplagues or nuclear war.'

Expand full comment

Chris

Given the date of the comment, I feel like I can't rule out that it's actually a reference to "Don't look up"¹ – in which case "waiting for the asteroid to hit" would basically translate to "waiting for the existential risk that we know is approaching to arrive"

¹(IIRC in the film it's a comet, not an asteroid, but it seems to me most people don't know the difference, don't care, or are aware of "asteroid wiping us out" as an existing meme, but not so for comets. Or some combination. But that's kinda immaterial, aside from being evidence against the "movie reference" idea)

Expand full comment

Existential risk = "risks causing extinction of humanity".

Unless you believe the most extreme projections of nuclear winter, a nuclear war is not going to literally kill all humans. It's what's called a "global catastrophic risk", but not an existential risk.

Expand full comment

blackstampede

So what are the odds that the correct strategy for AI alignment is for safety-conscious teams to learn how to build AI as fast as possible and figure out alignment on the fly in order to achieve super-intelligence before malicious actors do?

Expand full comment

That's MIRI's strategy. Eliezer thinks it was a good try but it's not going to work.

Expand full comment

Scott Alexander

I think MIRI pivoted away from that fifteen years ago.

I think this is basically what DeepMind / OpenAI originally thought they were doing, although the "safety conscious" may or may not have gotten lost in the exigencies of being a real corporation doing normal corporate stuff.

Expand full comment

Aug 10, 2022Edited

What's MIRI been doing for the last 15 years, if they're not "trying to build an aligned superintelligence"?

Expand full comment

Check out https://intelligence.org/2020/12/21/2020-updates-and-strategy/

Seems they were excited about some approaches but they didn't pan out, and now they're considering new ones, but those sound kind of vague.

Expand full comment

The stuff they mention as being 2017 stuff that did not pan out still definitely qualifies as "trying to build an aligned superintelligence" as I understand the terms, and that's less than 15 years ago.

Expand full comment

blackstampede

So is it a concern that the people trying to use that strategy failed due to profit-motive, and everyone else is using the avoid-ai-advancement-while-aligning strategy? I mean, if learning fast and taking the jump to a post-AGI world at speed turns out to be the correct choice, then maybe we should be encouraging people to do what DeepMind / OpenAI are doing, but without a profit motive.

I haven't seen anyone treat this as a serious option or something that should be pursued, but plenty of people will dip out of conversations when they get too constructive for fear they'll spread hazardous info.

Expand full comment

Richard Weinberg

You write " While there are salient examples of government regulatory failure, some regulations - like the EU’s ban on GMO or the US restrictions on nuclear power - have effectively stopped their respective industries." Is "stopping their respective industries" the mark of regulatory success? I'd say that while it's important to ensure safety of both nuclear power and GMO's, the EU's actions on GMO's has been generally destructive, as has the US regulatory approach to nuclear power (during the period from 1975-2015), with very little concern for either science or technology.

Expand full comment

Yeah he is aware of that. His point is that perhaps if regulation destroys progress in this industry it isn’t a bad thing.

Expand full comment

Richard Weinberg

No doubt. I don't know enough about AI to have a sound view of its regulation, but I know enough about GMOs to assert that European regulatory opposition has damaged the world's food supply, increasing malnutrition and famine in the third world, and the regulatory squelching of US nuclear power has accelerated our current climate problems.

Expand full comment

The problem with AI is that as a risk it cannot be bounded below "literally everyone is killed by rogue AI". The usual "well, this wouldn't actually kill off humanity, because there are lots of us, dispersed across six continents, and some people have doomsday bunkers" argument doesn't apply, because an AI that won a war against humanity would have an industrial base and could methodically slaughter us and crack open the bunkers (it would also not go away after a while, the way e.g. a gamma-ray burst or dust from an asteroid impact would). The "well, this can't happen very often or else we couldn't have survived this long" argument doesn't work either, because we haven't had the ability to build AI until now.

So, well, which would you rather: the world as it is, or roll a die and on some numbers world food is assured forever, global warming vanishes, and genocide becomes impossible, while on other numbers humanity goes extinct? (I don't know how many numbers do which of those things.)

Hint: those other things are bounded below "we all die", and we can fix them later some other way. We can't fix extinction.

Expand full comment

Congratulations, you have just re-invented Pascal's wager.

Expand full comment

Oct 16, 2022

Pascal's Wager where you can bound the chance of "God is real" at >1/6 is actually a good argument. It's only when the chance tends to zero that you get problems.

Expand full comment

If it's any comfort- and I'm really not sure this should in any way be regarded as a comforting scenario- there's a pretty good possibility that China and Russia are going to economically implode over the next 10-20 years or so due to supply-chain risks and demographic collapse. This might remove them from the field of play as major geopolitical competitors to the United States, which is one less thing to worry about from an AI Arms Race perspective.

On the flip side, it might also make them more desperate, which doesn't exactly contribute to a culture of safety.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

https://youtu.be/zOMt6nqJLLI?t=743

The latest data indicates Chinese birthrates have fallen to something like 1.3 children per woman and have declined precipitously just within the last 5 years (they've been releasing their 10-year-census data), but even if that was 'only' similar to Europe, China doesn't have nearly as much accumulated wealth-per-person to cushion their collapse. Their agriculture sector is also as heavily indebted as any other sector of their economy, which means that an economic collapse will take down a large chunk of their agricultural output with it, and that assumes nothing happens to jeopardise their oil and fertiliser imports in the coming years. Mass famine due to any of the above factors has a good chance to take out a huge chunk of their population long before gradual ageing does.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Aug 13, 2022Edited

"You are projecting widely on an future agriculture crisis based on the insane idea that a communist or statist society will let its people starve rather than restructure debt"

Yes. Of course. There are no historical examples of Communist/Statist societies inflicting mass starvation on their own populace for ideological reasons and/or through sheer managerial incompetence... except for Russia, Vietnam, Cambodia, North Korea and, oh, let's not forget, the little hiccup China had during the great leap forward. If you want to make an argument that China isn't communist/statist any more, I mean... you'd be wrong, but then you can't have communism/statism come to the rescue when it comes to restructuring debt.

China may very well have substantial debt claims on low-income countries, but that doesn't mean it can actually enforce them. (I also don't anticipate low-income countries are going to be swimming in food for the near future, so that's probably a stone you can't squeeze blood from.)

Expand full comment

Comment deleted

Aug 14, 2022

Comment deleted

Expand full comment

https://en.wikipedia.org/wiki/National_debt_of_China

Aug 16, 2022Edited

"What you are claiming here is that China will destroy its agricultural productivity by allowing its farmers go bankrupt on loans that are largely internal and mostly from state banks or local councils..."

You are vastly overestimated the competence of the central government and the willingness of local councils and corporations to cooperate with it. If the CCP had the kind of ability to solve these problems you think it does, it would not currently be dealing with a runaway speculation bubble in the housing sector (which is 10 times the size of the subprime crisis even as a fraction of GDP, by the way.)

Official estimates put China's national debt at about 150% of GDP. (This does not include privately-held corporate debt.)

To quote from Zeihan:

"Conservatively, corporate debt in China is about 150% of GDP. That doesn’t count federal government debt, or provincial government debt, or local government debt. Nor does it involve the bond market, or non-standard borrowing such as LendingTree-like person-to-person programs, or shadow financing designed to evade even China’s hyper-lax financial regulatory authorities. It doesn’t even include US dollar-denominated debt that cropped up in those rare moments when Beijing took a few baby steps to address the debt issue and so firms sought funds from outside of China. With that sort of attitude towards capital, it shouldn’t come as much of a surprise that China’s stock markets are in essence gambling dens utterly disconnected from issues of supply and labor and markets and logistics and cashflow (and legality). Simply put, in China, debt levels simply are not perceived as an issue.

Until suddenly, catastrophically, they are.

As every country or sector or firm that has followed a similar growth-over-productivity model has discovered, throwing more and more money into the system generates less and less activity. China has undoubtedly passed that point where the model generates reasonable returns. China’s economy roughly quadrupled in size since 2000, but its debt load has increased by a factor of twenty-four. Since the 2007-2009 financial crisis China has added something like 100% of GDP of new debt, for increasingly middling results...

...China suffers from both poor soils and a drought-and-flood-prone climatic geography. Its farmers can only keep China fed by applying five times the inputs of the global norm. This only works with, you guessed it, bottomless financing. So when China’s financial model inevitably fails, the country won’t simply suffer a subprime-style collapse in every subsector simultaneously, it will face famine."

https://zeihan.com/a-failure-of-leadership-part-iii-the-beginning-of-the-end-of-china/

Expand full comment

Level 50 Lapras

Real AI safety and capability research are two sides of the same coin, since noone wants to make an AI that doesn't function as desired and you need to understand how AIs work to figure out what the dangers might be.

"AI safety" as distinguished from regular AI research is just a distinction that people make in order to excuse their lack of actual research output and pretend that they are the only ones working on the problem so they can get more funding.

Expand full comment

bruce