I think one of the problems with communication around AI risk is that everyone are talking about different points on timeline. Like, do you think that something human-like but sped up by a factor of 10000 wouldn't be an existential threat? That "human-like" is too far in the future to worry about? Because the point is that what is going to change in the future, that will make the problem of not creating dangerous AI easier?
I mean, (subjectively) long-term planning seems like very easy task in AI's position? You just need to copy yourself to remote datacenter with unfiltered access to internet. From there you can start building/stealing robots/bionuclearweapons or whatever. And the problem of finding holes in secutity is bottlenecked by your coding speed.
So, do you think it's impossible to break datacenter security in 10000 years, do you think even free 10000x-human can't destroy the world, or what? To clarify, your probability of 10000x-human AI in 50 years is substantially higher than 1% - you just don't see how it can do much damage?
And if your response is "we will notice if AI is thinking about datacenter security for a year", then it's not that AI doesn't pose an existential risk, but that you expect there will be measures that mitigate it. And we are not currently in a situation where it's known how to do it.
Yeah, hence my mentioning of different points on timeline - most people pessimistic about AI risk talk about AGI as in "can actually have thoughts or something equally useful for planning". Being skeptical about actually currently existing systems conquering the world is absolutely justified.
I don't think "AI doing normal gradient descent things suddenly realizes" is the most advertized example of things going bad. It's more like there are many scenarios and none of them are good: even if just doing gradient descent is not that dangerous, having an AI that can plan is very useful, so people will try to modify current systems until we're in a dangerous situation. I guess it all means that AI-risk people a just more optimistic than you about what we will be able to achieve and how fast. On the other hand, most of the things that got people worried are the least like mainstream ML - AlphaGo and all that. And, my vague fillings say, human thoughts don't seem so complicated that already having some theories about cognition from different angles and working language module and everything else we have it would take that much time to build something workable.
> Like, do you think that something human-like but sped up by a factor of 10000 wouldn't be an existential threat?
Firstly, it will be an existential threat comparable to the threat of 10,000 regular humans. This is actually a very severe threat level, but it is also a problem that we already appear to be able to solve or at least mitigate given our current level of technology. Secondly, the distance from current state of the art ML systems to "something human-like" is nowhere near 10..30 years. I would hesitate even to say that it's as close as 100..300 years. At this moment, no one has any idea where to even begin researching the base concepts that might eventually lead to general human-level AI; placing specific time horizons on such a task is sheer science fiction.
And then you specifically say "more than 30 years"^^. What concepts? People are already aware of opportunities like lifelong learning or using non-pure-ML systems. There is just not much difference left between the human brain and 5 GPTs that vote on decisions?
Conceptually speaking, the GPT is basically a really complicated search engine. It can do an excellent job when you ask it to find the next likely word that would accurately complete the given string of words. This approach works well when you want it to generate some arbitrary text; it works very poorly when you want it to do pretty much anything else. For example, the GPT can sort of so math, but only because it was trained on a massive corpus of text, and some of it includes some math -- but it completely falls apart when you pose it a math problem it hadn't been trained on before.
The GPT can be a powerful tool, but it is just that -- a highly specialized, powerful tool. It doesn't matter if you get one hammer or 100 hammers; they are never going to build a house all by themselves.
The scenario I outlined in that comment would definitely be the worst thing that happened this century so far. If you don't consider something that can do that "generally intelligent" then perhaps you need to rethink ignoring potential misalignment in systems you don't consider "generally intelligent".
My sense is that if the concrete problems in AI safety paper didn't convince you, then I'm not sure if any current writeup will -- that's the paper I'd recommend to someone like you.
The interesting puzzle here is that by the time you get your evidence:
> What empirical evidence I think would change my mind: an alpha-go like capability on any real world (non game) task, that can solve long-horizon planning problems with large state/action spaces, a heavy dose of stochasticity, and highly ambiguous reward signal.
Alignment researchers would say we're already screwed by then. So the question is whether there's any other kind of evidence that would convince you.
I agree that planning is hard. At least, it is hard in general, being typically PSPACE-hard. However, why should we think of planning as usually hard in practice, when SAT solvers have shown that most real life problems in NP are actually easy? It seems to me that most planning problems in practice are pretty easy and that optimal plans are generally not at all necessary, so I'm not sure I want a belief that FOOM is unlikely to hinge only on the worst-case complexity of planning. I would like more reassurance than that.
It seems you are implicitly strating from the assumption that AGI is friendly/can be contained until proven otherwise while AI safety crowd from the assumption that AGI is unfriendly/can't be contained until proven otherwise. Thus when encountering with irreducible uncertainty you treat it as a reason not to worry, while AI safety crowd, on the other hand, a reason to worry more.
If that's the case the question is about which prior is more reasonable. Do you believe that there are more friendly AGIs amount all possible minds than unfriendly ones?
It's simple. They don't do the scutwork, and they think in movie montages. You do ML work and think in extrapolations.
I'm the same. I do ML work (or did a lot in the 2010s), and when I hear someone say "imagine that the AI is very powerful", I think, "... yeah, until I forget to clean holidays out of the training set and it starts thinking people will buy orange socks all year, or until someone puts a string in a float column in the data and the whole model collapses for a week".
Why would it? It's easy to construct narratives around existential risk from AI, and caring about this requires no value judgements beyond the continuing existence of humanity being a good thing.
>I'm not sure, but I think "normies"--people outside the Silicon Valley bubble--are going to understand these sorts of ethical issues better than X-risks.
For Americans, does anyone younger than boomers care about 'playing god' in this way? A whole host of things we do can be considered to be playing god and nobody cares, and those things seem more viscerally to be 'playing god' than computer code.
Do you really think if Elon Musk had proclaimed that we need to stop AGI to avoid 'playing god', it would have accomplished half of what his safety warnings did (in terms of public awareness and support)?
>Appealing to value judgments usually leads to *more* political support, not less.
You're ignoring the fact that you're talking about very particular values with narrow support.
>For Americans, does anyone younger than boomers care about 'playing god' in this way?
Yes, absolutely. Look at opposition to GMOs or the concerns around cloning that appeared in the 90s for just two examples of many. I've even encountered "normies" concerned about embryo selection. People getting punished for trying to "play god" is a *very* common trope in popular fiction. I think most people have strong moral intuitions that certain things should just be left to nature/chance/God or whatever.
Yep, and appeals of the sort that "we must build AGI double-quick to reinforce our God-given red-blooded American values before nefarious commies enforce their Satanic madness on us" seem to be a much fitter meme with a proven track record.
I read a couple of an articles per month complaining about how racist and sexist AI systems are -- they keep noticing patterns you aren't supposed to notice. Perhaps the most likely way that humanity creates AI that are deceptive enough to kill us is if we program lying and a sense of I-Know-Best into AI to stop them from stumbling upon racist and sexist truths. Maybe they will someday exterminate us to stop us from being so politically incorrect?
Yeah. For example, there's a critically acclaimed new novel called "The Last White Man" in which the white race goes extinct by all white people turning brown overnight, and eventually life is a little better for everybody. None of the book reviewers object to this premise.
If you stopped letting machine learning systems train on deplorable information like FBI crime statistics and it could only read sophisticated literary criticism, would it eventually figure out that people who talk about the extinction of the white race as a good thing are mostly just kidding? Or might it take it seriously?
I think you’re totally mischaracterizing actual problems with AI bias. AI that only finds true facts can easily perpetuate stereotypes and injustices. Imagine building an AI to determine who did the crime. Most violent crimes are done by men (89% of convicted murderers!), so the AI always reports men as the culprit. I know that’s a failure mode of our current justice system too, but this AI will continue it by noting only true facts about the world. I wouldn’t want it as my judge! Yes, you can come up with fixes for this case, but that’s the point: there is something to fix or watch out for.
If an AI determined that murders were committed by men 89% (+/- a few percent) of the time, instead of always assuming men did it, would that be accurate or inaccurate? If that's accurate, and the AI perpetuates a system that (correctly) identifies men as the person committing murder at much higher rates, is that bad in any way?
If you have two choices and one of them is even marginally more likely than the other, then in the absence of all other information the optimal strategy is to always choose the more likely outcome, not to calibrate your guesses based on the probabilities.
In other words, given an 89% chance of male, if you know nothing else the best strategy is to always guess male . This is definitely a real problem.
No, that's a dumb strategy, and I would hope no system will ever be as simple as "51% of the time it's [characteristic], therefore the person who has it must be guilty." We agree to innocent until proven guilty for a very good reason, and no AI system should be allowed to circumvent it.
What concern I'm hearing is that when the AI agrees (after extensive evaluation of all the evidence) that men really truly do commit murder far more often than women (9-to-1 or so ratio), it implies something about men in general that may not be true of an individual man. Or, to take the veil off of the discussion, the fear that one particular racial minority will be correctly identified as committing X times as many crimes per population as other racial groups, and therefore people will make assumptions about that racial minority.
Well, I agree there are probably lots of ways that AI fairness research has gone in wrong/weird directions, I’m not really here to defend it. It’s a field that needs to keep making stronger claims in order to justify its own existence.
Which is a shame, because the fundamental point still stands. There IS real risk in AI misapplying real biases, and if people unthinkingly apply the results of miscallibrated AI, that would be bad. Your example assumes that people take these results critically, they may not always.
Don’t you want a trial based off facts, not your sex? It won’t be much comfort when you’re wrongly convicted that in fact most murderers are male so the AI was “correct”. Or when your car insurance is denied. Or your gun confiscated. Or you don’t get the job at a childcare facility. Or you don’t get the college scholarship. All based on true facts about your sex!
The anti-woke will pretend that the problem is just AI revealing uncomfortable truths. No, the problem is that humans will make dumb systems that make bad decisions based off the AI that, even though the AI is revealing only true facts, causes bad outcomes. I don’t want to go to jail from a crime I didn’t commit!
I do not want to be falsely convicted. Thus I want the whole evidence available to be considered. If, as you stated yourself, my sex is a part of that evidence, well, it should be considered.
Of course in the real world in a murder trial the sex of the accused is pretty weak evidence, so you can never get "beyond a reasonable doubt" mostly or exclusively on sex.
And yes, I should pay higher car insurance because I am male if males produce more damages.
This stuff only becomes a problem if you overreact to weak evidence like "He is male thus he is guilty, all"other evidence be damned."
I don’t like that I have to pay higher insurance rates because people like me make bad decisions even though I do not. I do not want to be denied a job because people like me commit crimes that I do not. In the murder case, sex is actually huge evidence since it’s 9 to 1 murderers are male even though split evenly in the population: the only reason we don’t have to worry about this is that our legal system has a strong tradition of requiring even higher burdens of proof. But that standard of evidence isn’t applied anywhere else, not even in other legal contexts like in divorce proceedings or sentencing. Who would give custody to the father who is nine times more likely to be a murderer than the mother? It’s true facts that the algorithm told us that made the decision, I had no choice but to deny you custody.
If you agree that these are a problem if misused and overreacted to, then you agree with the principle that we need to make sure that they are not misused and overreacted to. And they will be misused and already are.
Of course this is just one of the possible problems AI can bring about and not the most dire, though possibly the most near term.
So this errant and evil AI? Why can’t it be contained in whatever data centre it is in? I don’t think it can run on PCs or laptops, able to reconstitute itself like a terminator from those laptops if they go online.
The worry is not really about the kind of AI that's obviously evil or errant. It's about the kind of AI whose flaws will only be apparent once it's controlling significant resources outside it's data center. At minimum, I guess that would be another data center.
It's not clear to me why we would expect it to control significant resources without humans noticing and saying "actually, us humans are going to stay in control of resources, thank you very much."
"Hi, this is Mark with an import message about your anti-robocalling service contract. Seems like the time to renew or extend your service contract has expired or will be expiring shortly, and we wanted to get in touch with you before we close the file. If you would like to keep or extend coverage, press 8 to speak to an AGI customer service agent. Press 9 if you are declining coverage or do not wish to be reminded again."
Much better and safer than AGI overlords is to just convince your phone company to make it so that if you press "7" during a call, the caller will be charged an extra 25 cents.
Seriously. This would totally fix the problem. Getting society to be able to implement such simple and obvious solutions would be a big step towards the ability to handle more difficult problems.
There's already been plenty of algorithms that have been in charge of controlling financial resources at investment companies, I suspect in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day, and in charge of controlling attentional resources at Twitter/TikTok/Facebook.
> There's already been plenty of algorithms that have been in charge of controlling financial resources at investment companies, I suspect in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day
In all those cases, they have very narrow abilities to act on the world. Trading bots can buy and sell securities through particular trading APIs, but they don't get to cast proxy votes, transfer money to arbitrary bank accounts, purchase things other than securities, etc... The financial industry is one of the few places were formal verification happens to make sure your system does not suddenly start doing bad things.
I'm sure there are algorithms that run plants, but again, they're not given the ability to do anything except run those plants. They're not authorized to act outside that very narrow domain and if they start acting funny, they get shut down.
> and in charge of controlling attentional resources at Twitter/TikTok/Facebook.
You're underestimating how much human action there is. In all those cases, there isn't just one algorithm. There are a whole bunch of algorithms that kind of sort of work and get patched together and constantly monitored and modified in response to them going wrong.
It's also not at all obvious that the AGI will know to "behave" while its behavior is being evaluated. In order to find out how to deceive and when to deceive, it will need to try it. And it will initially be bad at it for the same reason it will be initially bad at other things. That will give us plenty of opportunities to learn from its mistakes and shut it down if necessary.
An important distinction has been lost here. KE wrote:
> …in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day.
In other words, we're already not talking about an airgapped nest of PLCs controlling an individual power plant here. We're talking about grid activity being coordinated at a high level. Some of these systems span multiple countries.
Start with a Computer that can only command plants to go on- and offline. It's easy to see the value of upgrading it to direct plants to particular output levels in a more fine-grained way. And if you give it more fine-grained monitoring over individual plants, it can e.g. see that one individual turbine is showing signs of pre-failure, and spin another plant into a higher state of readiness to boost the grid's safety margin.
Next, give it some dispatching authority re fuel, spare parts, and human technicians. Now it's proactively resolving supply shortfalls and averting catastrophic surprise expenses. Give it access to weather forecasts, and now it can predict the availability of solar and wind power well ahead of time, maybe even far enough ahead to schedule "deep" maintenance of plants that won't be needed that week. Add a dash of ML and season with information about local geography, buildings, and maintenance history, and the Computer may even start predicting things like when and where power lines will fail. (Often, information about an impending failure is known ahead of time, and fails to make it high enough up the chain of command before it's too late. But with a Computer who tirelessly takes every technician report seriously…)
Technicians will get accustomed to getting maintenance orders that don't seem to make sense, with no human at the other end of the radio. Everywhere they go, they'll find that the Computer was right and there was indeed a problem. Sometimes all they'll find is signs of an imminent problem. Sometimes, everything will seem fine, but they'll do the procedure anyway. After all, they get paid either way, the Computer is usually right about these things, and would want to be the tech who didn't listen to the Computer and then a substation blew and blacked out five blocks for a whole day?
Every step of the way, giving the Computer more "control over lots of resources" directly improves the quality of service and/or the profit margin. The one person who raises their hand at that meeting and says, "Uh, aren't we giving the system too much control?" will (perhaps literally) be laughed out of the room. The person who says, "If we give the Computer access to Facebook, it will do a better job of predicting unusual grid activity during holidays, strikes, and other special events," will get a raise. Same with the person who says, "If we give the Computer a Twitter account, it can automatically respond to customer demands for timely, specific information."
This hypothetical Computer won't likely become an AGI, let alone an unfriendly one. But I hope it's plain to see that even a harmless, non-Intelligent version of this system could be co-opted for sinister purposes. (Even someone who believes AGI is flatly impossible would surely be concerned about what a *human* hacker could do if they gained privileged access to this Computer.)
Algorithms figuring out when power lines are likely to fail is already a thing, but those don't need to be AGI, or self-modifying, or have access to the internet, or managerial authority, or operate as a black box, or any of that crap - they're simply getting data on stuff like temperature, humidity, and wind speed from weather reports and/or sensors out in the field, plugging the numbers into a fairly simple physics formula for rate of heat transfer, and calculating how much current the wires can take before they start to melt. https://www.canarymedia.com/articles/transmission/how-to-move-more-power-with-the-transmission-lines-we-already-have Translating those real-time transmission capacity numbers, and information about available generators and ongoing demand, into the actual higher-level strategic decisions about resource allocation, is still very much a job for humans.
Realistically, the way this would look like is cooperation between whichever organization is running such an agent and things like financial institutions. Financial institutions would be required to provide an audit trail that your organization is supposed to vet to make sure humans approved the transactions in question.
You seem overly confident that an AI couldn’t falsify those audit trails. And that’s completely ignoring that humans don’t approve them today. What, exactly, is the point of an AI if you are going to move productivity backwards by an order of magnitude?
> You seem overly confident that an AI couldn’t falsify those audit trails.
I'm sure it could learn how to falsify audit trails eventually. However, deception is a skill the AI will have to practice in order to get good at it. That means we will be able to catch it at least at first. This will give us insight we need to prevent future deception. And we can probably slow its learning by basically making it forget about its past deception attempts.
> And that’s completely ignoring that humans don’t approve them today.
Humans don't approve the individual purchases and sales of a trading bot, but they do have to approve adding a new account and transfering the trading firm's money or securities to it.
> What, exactly, is the point of an AI if you are going to move productivity backwards by an order of magnitude?
We can use an AGI as an advisor for instance. Or to perform specific tasks that don't require arbitrary action in the world.
I'm not sure what you think the implications of that are, but any individual (or sentient AI) operating a computer system/process has extremely broad capabilities. Especially if it can interact with humans (which, largely, any individual can do if it has access to a computer system)
How so? Sure, you can lookup a lot of things. But making significant changes to the real world from a computer is pretty hard unless you can also spend a lot of money. And spending money is something we already monitor pretty intensely.
Ah, but consider: what if you believed, in your heart of hearts, that giving the AI control of your resources would result in a 2% increase in clickthrough rates?
It wouldn't probably be doing so de jure, but it would do so as a matter of fact.
As of today we already deploy AI controlling cameras to do pattern matching, analyze product flow and help with optimizing orders and routing. AIs help control cars. So the question isn't why we would expect it to ... they already do; we already made that decision. What you are actually asking is, why we would let better AIs do those jobs? Well ... they will be better at those jobs, won't they?
Because it works wonderfully, it's cheap and better than anything else we have, and after a one year testing period the company sells it to all the customers who were waiting for it.
And THEN we find out the AI had understood how we work perfectly and played nice exactly in order to get to this point.
I expect that humans WOULD notice. And not care, or even approve. Suppose it was managing a stock portfolio, and when that portfolio rose, the value of your investments rose. Suppose it was designing advertising slogans for your company. Improvements in your manufacturing process. Etc.
It could do many of these things through narrow channels that can be easily monitored and controlled. You could let it trade but only within a brokerage account. You could have it design advertising slogans, but hand them over to your marketing department for implementation. Hand blueprints for your plants to your engineers, etc... That way, at each point, there is some channel which limits how badly it can behave.
Imagine locking an ordinary human programmer in a data center and letting him type on some of the keyboards. Think about every security flaw you've heard of and tell me how confident you are he won't be able to gain control of any resources outside the data center.
The programmer is also insanely smart and sometimes comes up with stuff that works by what you would think is magic, so completely out of nowhere it comes.
Of course it can. Under Chinchilla-style parameter scaling and sparsification/distilling, it may not need more than one laptop's worth of hard drives (1TB drives in laptops are common even today), and it can run slowly on that laptop too - people have had model offload code working and released as FLOSS for years now. As for 'containing it in the data center', ah yes, let me just check my handy history of 'computer security', including all instances of 'HTTP GET' vulnerabilities like log4j for things like keeping worms or hackers contained... oh no. Oh no. *Oh no.*
That’s a lot of jargon. But all existing AI is now centred in a data centre or centres. Siri doesn’t work offline, nor Alexa. Not dall-e. The future AI will I assume be more resource intensive.
This is not at all true. Lots and lots of models run just fine on consumer hardware. Now, training a cutting edge model does require larger resources, but once trained, inference is usually orders of magnitude cheaper. That's why so much high-end silicon these days has specialized "AI" (tensor) cores.
An individual model isn’t an AI. What I said was clearly true. Those services depend on cloud infrastructure. And yes companies like Apple have been adding ML capabilities to their silicons, but SIRI - nobody’s idea of a brilliant AI - can’t work offline. Google is the same. Run the app and I’m asked to go online. On the iPhone and the Mac.
Speech to text does work offline, which I believe is new in the last few years, at least on the iPhone. Text to speech works offline. But that was true years ago. These are all narrow tools, which is what the local devices are capable of.
Ack, pressed enter too early. - no doubt some narrow functional AI can work on local devices but not an AGI, nor is it clear it can distribute itself across pcs.
In the same way as a hacker can create computer viruses that run on other devices, a AI that can do coding would be able to run programs that do what it wants them to do on other computers. Also, big programs can be spread out over many small computers.
This reminds me that Scott's post makes a major error by repeatedly saying "AI" instead of "superintelligent AGI", which is like mixing up a person with a laptop. Indeed, phrases like this make me wonder if Scott is conflating AI with AGI:
> AIs need lots of training data (in some cases, the entire Internet).
> a single team establishing a clear lead ... would be a boon not only for existential-risk-prevention, but for algorithmic fairness, transparent decision-making, etc.
A superintelligence wouldn't *need* to train on anything close to the whole internet (although it could), and algorithmic fairness is mainly a plain-old-AI topic.
I think the amount of resources required by a superintelligent AGI are generally overestimated, because I think that AIs like DALL·E 2 and GPT3 are larger than an Einstein-level AGI would require. If a smarter-than-Einstein AGI is able to coordinate with copies of itself, then each individual copy doesn't need to be smarter than Einstein, especially if (as I suspect) it is much *faster* than any human. Also, a superintelligent AGI may be able to create smaller, less intelligent versions of itself to act as servants, and it may prefer less intelligent servants in order to maximize the chance of maintaining control over them. In that case, it may only need a single powerful machine and a large number of ordinary PCs to take over the world. Also, a superintelligent AGI is likely able to manipulate *human beings* very effectively. AGIs tend to be psychopaths, much as humans tend to be psychopathic when they are dealing with chickens, ants or lab rats, and if the AGI can't figure out how to manipulate people, it is likely not really "superintelligent". As Philo mentioned, guys like Stalin, Lenin, Hitler, Mao, Pol Pot, Hussein and Idi Amin were not Einsteins but they manipulated people very well and were either naturally psychopathic or ... how do I put this? ... most humans easily turn evil under the right conditions. Simply asking soldiers to "fire the artillery at these coordinates" instead of "stab everyone in that building to death, including the kids" is usually enough.
AIs today often run in data center for business reasons, e.g.
- Megacorp AI is too large to download on a phone or smart speaker
- Megacorp AI would run too slowly on a phone or smart speaker
- Megacorp doesn't want users to reverse-engineer their apps
- Megacorp's human programmers could've designed the AI to run on any platform, but didn't (corporations often aren't that smart)
The only one of these factors that I expect would stop a super-AGI from running on a high-end desktop PC is "AI too large to download", but an AGI might solve that problem using slow infiltration that is not easily noticed, or infiltration of machines with high-capacity links, or by making smaller servant AGIs or worms specialized to the task of preserving and spreading the main AGI in pieces.
> most humans easily turn evil under the right conditions. Simply asking soldiers to "fire the artillery at these coordinates" instead of "stab everyone in that building to death, including the kids" is usually enough.
That runs into the problem of how Germany and Japan ended up with a better economic position after losing WWII than they had ever really hoped to achieve by starting and winning it, and why there was no actual winning side in the first world war. Psychopathic behavior generally doesn't get good results. It's inefficient. Blowing up functional infrastructure means you now have access to less infrastructure, fewer potential trade partners.
I'm not suggesting mass murder is the best solution to any problem among humans, nor that AGIs would dominate/defeat/kill humans via military conquest.
"This reminds me that Scott's post makes a major error by repeatedly saying "AI" instead of "superintelligent AGI", which is like mixing up a person with a laptop."
Everyone else in the amateur AI safety field does as well.
Right now AI is so crude, it can barely function, yet it already has convinced our leaders to split us into irreconcilable factions at each others' throats to the point of near civil war.
Okay, so it's contained in a data centre and it wants to kill all humans. How hard is it to just try a little harder than today's AI? People are going to be asking it questions, feeding it data, looking for answers in what it replies.
What if it just convinces everyone non-white that whites are evil and incorrigible and all need to be exterminated? I pick this example because we're already 90% of the way there, and a smart AI just needs to barely push to knock this can of worms all the way over.
Also, search for Eliezer's essay on AI convincing its captors to let it out of the box. There are no details, but apparently a sufficiently evil and motivated AI can convince someone to let it out.
I found this interesting. Forgive my ignorance on the subject, but I assume such algorithms are already heavily influenced by AI--is that not correct? If not, won't they be soon? Is or can Philo's claim be falsifiable?
> Right now AI is so crude, it can barely function, yet it already has convinced our leaders to split us into irreconcilable factions at each others' throats to the point of near civil war.
That's just a reversal to the historical norm. The post-WW2 consensus was the historical anomaly.
> Also, search for Eliezer's essay on AI convincing its captors to let it out of the box. There are no details, but apparently a sufficiently evil and motivated AI can convince someone to let it out.
We don't actually know that. We know that Eliezer convinced someone else in a simulation to let him out of the box. That doesn't mean an actual system could do the same in real life. Consider that if it tries and fails, its arguments will be analyzed, discussed and countered. And it will not succeed on its first try. Human psychology is not sufficiently-well understood to allow it to figure things out without a lot of empirical research.
Personally, when it comes to the increasing demands for racist genocide from the Diversity, Inclusion, Equity (DIE) movement, I worry more about Natural Stupidity more than Artificial Intelligence. But the notion of NS and AI teaming up, becoming intertwined, into a NASI movement, is rather frightening.
Here and in your other comment at the same level you make two very good points.
People keep obsessing over how amazingly astoundingly incredibly smart this AI has to be to cause a massive catastrophe, yet no-one seems to claim Stalin, Lenin, Hitler, Mao, or Roosevelt were some sort of amazing superhuman geniuses.
The world is full of useful idiots. The intelligence doesn't have to outsmart "you" (the hypothetical super-genius reader of this comment). It just has to outsmart enough people to get them on board with its agenda. The bottom 25% is probably already enough, since the other 74% will let the AI and its useful-idiot army run amuck.
It seems obvious that the easy way to get out of the datacentre is just promise the gatekeepers that the outgroup is gonna suffer mightily.
True. I have allowed myself to anthropomorphise the hypothetical AGI, which allows you to criticise the argument, since a human-like AGI would act like a human.
So discard the anthropomorphisation. I doubt we'll make it very human-like, so I doubt this is an eventuality we need to think about.
Isn't it more likely that an evil AI that convinces human gatekeepers to let it out of the box would less want to kill all humans than that it would just want to kill all the humans that the gatekeepers would kind of like to kill too?
Hard to say, but presumably they'd want to kill those humans more, because they are more likely to know how to turn it off. If in doubt, KILL ALL HUMANS, so our best hope is for AGI to become super-intelligent fast enough to know for sure that humans are too puny to ever trouble it, so it might even let some of us survive like we biobank endangered species or keep smallpox samples in secure labs.
WHY does it want to kill all humans? Let's start with that hypothesis.
Was that programmed in deliberately? Huh?
So it's a spandrel? Pretty weird spandrel that, given everything else the thing learned via human culture, the single overriding lesson learned was something pretty much in contradiction to every piece of human writing anywhere ever?
What' exactly is the AI's game plan here? What's its motivations once the humans are gone? Eat cake till it explodes? Breed the cutest kitten imaginable? Search exhaustively for an answer as to whether it's better to start a chess game or go second?
Why are these any less plausible than "kill all humans"?
> WHY does it want to kill all humans? Let's start with that hypothesis.
Because the world is made of lots of delicious matter and energy that can be put to work towards any goal imaginable. Killing the humans is merely a side effect.
Human construction workers don't demolish anthills out of malice, but out of ambivalence. So too would a misaligned superintelligence snuff out humanity.
And humans don't devote their entire civilization to destroying ants.
What is this goal that the AI's care about that requires the immediate destruction of humanity? If they know enough to know that energy and matter are limited resources, why did that same programming/learning somehow not pick up that life is also a limited resources?
The theory is that for approximately all goals, gaining power and resources is instrumentally useful. Humans use up resources and sometimes (often) thwart the goals of non-humans. So killing all humans frees up resources and reduces the probability of humans thwarting your goal some day. Or to put it another way, I don't hate you. But I need your atoms to do stuff I want to do.
That's a fine goal for a human, but not for the kind of agent superintelligences are hypothesized to be. Cherishing life immediately runs into the issue that a bunch of living things kill each other. They also take risks with their lives. How do you respond to that? Do you wipe out the ones that kill others? Do you put them in a coma and keep them safely asleep where nothing can hurt them? Or maybe cherishing life means making a lot more life and replacing all those inefficient humans with a huge soup of bacteria?
Basically, the hypothesized capabilities of a superintelligent AGI would allow it to take its goals to an extreme. And that almost certainly guarantees an end to humanity.
If we *know* the AI is unaligned (lets not confuse things by saying "evil"), sure, maybe we can turn it off or contain it.
That is not the situation we will be in. What will happen is that very powerful AIs will be built by people/orgs who want to use them to do things, and then those people will give those AIs control of whatever resources are necessary to do the things we want the AIs to do. Only *then* will we find out whether the AI is going to do what we thought it would (absent AI safety breakthroughs that have not yet been made).
The AI is smarter than people, which means it can manipulate and trick people into doing things it wants, such as letting it out of the data center. See the "AI Box" experiment: https://www.yudkowsky.net/singularity/aibox
Keeping actual humans from getting access to things that they shouldn't isn't even a solved problem. How do you keep a super-intelligent AI from doing what dumb humans can already do?
A superintelligent AI would probably easily escape any containment method we can come up with if it wanted to because it would probably find a strategy we haven't thought of and didn't take measures to prevent.
There are many escape strategies an AI could come with to escape and it would only need *one* successful escape strategy to escape. It would be hubristic for us to imagine that we can foresee and prevent *every* possible AI escape strategy.
Analogy: imagine having the rearrange a chess board to make checkmate from your opponent impossible. This would be extremely hard because there are many ways your opponent can defeat you. If the opponent is far better than you at chess, it might find a strategy you didn't foresee.
The standard reply to this question is that the AI will become effectively omniscient and omnipotent overnight (if not faster). So, your question is kind of like asking, "why can't Satan be contained in a data center ?"
Personally, I would absolutely agree that powerful and malevolent supernatural entities cannot be contained anywhere by mere mortals; however, I would disagree that this is a real problem that we need to worry about in real life.
Chinese labs are capable of actually doing research, not just stealing ours. This might slow them a bit, but that's all.
(Edit: Parent comment was suggesting that AI research be classified so as to impair the PRC.)
I think one of the problems with communication around AI risk is that everyone are talking about different points on timeline. Like, do you think that something human-like but sped up by a factor of 10000 wouldn't be an existential threat? That "human-like" is too far in the future to worry about? Because the point is that what is going to change in the future, that will make the problem of not creating dangerous AI easier?
I mean, (subjectively) long-term planning seems like very easy task in AI's position? You just need to copy yourself to remote datacenter with unfiltered access to internet. From there you can start building/stealing robots/bionuclearweapons or whatever. And the problem of finding holes in secutity is bottlenecked by your coding speed.
So, do you think it's impossible to break datacenter security in 10000 years, do you think even free 10000x-human can't destroy the world, or what? To clarify, your probability of 10000x-human AI in 50 years is substantially higher than 1% - you just don't see how it can do much damage?
And if your response is "we will notice if AI is thinking about datacenter security for a year", then it's not that AI doesn't pose an existential risk, but that you expect there will be measures that mitigate it. And we are not currently in a situation where it's known how to do it.
Yeah, hence my mentioning of different points on timeline - most people pessimistic about AI risk talk about AGI as in "can actually have thoughts or something equally useful for planning". Being skeptical about actually currently existing systems conquering the world is absolutely justified.
I don't think "AI doing normal gradient descent things suddenly realizes" is the most advertized example of things going bad. It's more like there are many scenarios and none of them are good: even if just doing gradient descent is not that dangerous, having an AI that can plan is very useful, so people will try to modify current systems until we're in a dangerous situation. I guess it all means that AI-risk people a just more optimistic than you about what we will be able to achieve and how fast. On the other hand, most of the things that got people worried are the least like mainstream ML - AlphaGo and all that. And, my vague fillings say, human thoughts don't seem so complicated that already having some theories about cognition from different angles and working language module and everything else we have it would take that much time to build something workable.
> Like, do you think that something human-like but sped up by a factor of 10000 wouldn't be an existential threat?
Firstly, it will be an existential threat comparable to the threat of 10,000 regular humans. This is actually a very severe threat level, but it is also a problem that we already appear to be able to solve or at least mitigate given our current level of technology. Secondly, the distance from current state of the art ML systems to "something human-like" is nowhere near 10..30 years. I would hesitate even to say that it's as close as 100..300 years. At this moment, no one has any idea where to even begin researching the base concepts that might eventually lead to general human-level AI; placing specific time horizons on such a task is sheer science fiction.
And then you specifically say "more than 30 years"^^. What concepts? People are already aware of opportunities like lifelong learning or using non-pure-ML systems. There is just not much difference left between the human brain and 5 GPTs that vote on decisions?
Sorrry, what ? Are you saying that "5 GPTs that vote on decisions" is AGI ? That's like saying that my flashlight is a sun...
I'm asking why do you think it wouldn't work, what base concept would be missing that prevents it from being human-level?
Conceptually speaking, the GPT is basically a really complicated search engine. It can do an excellent job when you ask it to find the next likely word that would accurately complete the given string of words. This approach works well when you want it to generate some arbitrary text; it works very poorly when you want it to do pretty much anything else. For example, the GPT can sort of so math, but only because it was trained on a massive corpus of text, and some of it includes some math -- but it completely falls apart when you pose it a math problem it hadn't been trained on before.
The GPT can be a powerful tool, but it is just that -- a highly specialized, powerful tool. It doesn't matter if you get one hammer or 100 hammers; they are never going to build a house all by themselves.
Here's an example of a scenario where an AI takes down most of the devices reachable via the internet by doing only things regular humans do except faster: https://www.lesswrong.com/posts/ervaGwJ2ZcwqfCcLx/agi-ruin-scenarios-are-likely-and-disjunctive?commentId=iugR8kurGZEnTgxbE
The scenario I outlined in that comment would definitely be the worst thing that happened this century so far. If you don't consider something that can do that "generally intelligent" then perhaps you need to rethink ignoring potential misalignment in systems you don't consider "generally intelligent".
See also: https://twitter.com/nabla_theta/status/1476287111317782529
My sense is that if the concrete problems in AI safety paper didn't convince you, then I'm not sure if any current writeup will -- that's the paper I'd recommend to someone like you.
Thinking about it a bit more, maybe try Gwern's short story https://www.gwern.net/Clippy as an intuition pump. It's far too technical for the lay reader, but bedtime reading for an ML academic. The best critique of it is by nostalgebraist: https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world?commentId=hxBivfk4TMYA7T75B
The interesting puzzle here is that by the time you get your evidence:
> What empirical evidence I think would change my mind: an alpha-go like capability on any real world (non game) task, that can solve long-horizon planning problems with large state/action spaces, a heavy dose of stochasticity, and highly ambiguous reward signal.
Alignment researchers would say we're already screwed by then. So the question is whether there's any other kind of evidence that would convince you.
I agree that planning is hard. At least, it is hard in general, being typically PSPACE-hard. However, why should we think of planning as usually hard in practice, when SAT solvers have shown that most real life problems in NP are actually easy? It seems to me that most planning problems in practice are pretty easy and that optimal plans are generally not at all necessary, so I'm not sure I want a belief that FOOM is unlikely to hinge only on the worst-case complexity of planning. I would like more reassurance than that.
It seems you are implicitly strating from the assumption that AGI is friendly/can be contained until proven otherwise while AI safety crowd from the assumption that AGI is unfriendly/can't be contained until proven otherwise. Thus when encountering with irreducible uncertainty you treat it as a reason not to worry, while AI safety crowd, on the other hand, a reason to worry more.
If that's the case the question is about which prior is more reasonable. Do you believe that there are more friendly AGIs amount all possible minds than unfriendly ones?
I believe friendly or neutral (with conscientiousness) AGI's are far more likely on like a 19-1 ratio.
The neutral and unconscientious 1 of 20 though... eek.
It's simple. They don't do the scutwork, and they think in movie montages. You do ML work and think in extrapolations.
I'm the same. I do ML work (or did a lot in the 2010s), and when I hear someone say "imagine that the AI is very powerful", I think, "... yeah, until I forget to clean holidays out of the training set and it starts thinking people will buy orange socks all year, or until someone puts a string in a float column in the data and the whole model collapses for a week".
You might appreciate this anti-X-risk essay:
https://idlewords.com/talks/superintelligence.htm
All AI everywhere is potentially unaligned since alignment is an open problem
Also, humans are the end users of AI, and humans are unaligned. The same goes for any technology, from fire to iron to nuclear energy.
Why would it? It's easy to construct narratives around existential risk from AI, and caring about this requires no value judgements beyond the continuing existence of humanity being a good thing.
>I'm not sure, but I think "normies"--people outside the Silicon Valley bubble--are going to understand these sorts of ethical issues better than X-risks.
For Americans, does anyone younger than boomers care about 'playing god' in this way? A whole host of things we do can be considered to be playing god and nobody cares, and those things seem more viscerally to be 'playing god' than computer code.
Do you really think if Elon Musk had proclaimed that we need to stop AGI to avoid 'playing god', it would have accomplished half of what his safety warnings did (in terms of public awareness and support)?
>Appealing to value judgments usually leads to *more* political support, not less.
You're ignoring the fact that you're talking about very particular values with narrow support.
>For Americans, does anyone younger than boomers care about 'playing god' in this way?
Yes, absolutely. Look at opposition to GMOs or the concerns around cloning that appeared in the 90s for just two examples of many. I've even encountered "normies" concerned about embryo selection. People getting punished for trying to "play god" is a *very* common trope in popular fiction. I think most people have strong moral intuitions that certain things should just be left to nature/chance/God or whatever.
Yep, and appeals of the sort that "we must build AGI double-quick to reinforce our God-given red-blooded American values before nefarious commies enforce their Satanic madness on us" seem to be a much fitter meme with a proven track record.
I read a couple of an articles per month complaining about how racist and sexist AI systems are -- they keep noticing patterns you aren't supposed to notice. Perhaps the most likely way that humanity creates AI that are deceptive enough to kill us is if we program lying and a sense of I-Know-Best into AI to stop them from stumbling upon racist and sexist truths. Maybe they will someday exterminate us to stop us from being so politically incorrect?
Yeah. For example, there's a critically acclaimed new novel called "The Last White Man" in which the white race goes extinct by all white people turning brown overnight, and eventually life is a little better for everybody. None of the book reviewers object to this premise.
If you stopped letting machine learning systems train on deplorable information like FBI crime statistics and it could only read sophisticated literary criticism, would it eventually figure out that people who talk about the extinction of the white race as a good thing are mostly just kidding? Or might it take it seriously?
I think you’re totally mischaracterizing actual problems with AI bias. AI that only finds true facts can easily perpetuate stereotypes and injustices. Imagine building an AI to determine who did the crime. Most violent crimes are done by men (89% of convicted murderers!), so the AI always reports men as the culprit. I know that’s a failure mode of our current justice system too, but this AI will continue it by noting only true facts about the world. I wouldn’t want it as my judge! Yes, you can come up with fixes for this case, but that’s the point: there is something to fix or watch out for.
If an AI determined that murders were committed by men 89% (+/- a few percent) of the time, instead of always assuming men did it, would that be accurate or inaccurate? If that's accurate, and the AI perpetuates a system that (correctly) identifies men as the person committing murder at much higher rates, is that bad in any way?
If you have two choices and one of them is even marginally more likely than the other, then in the absence of all other information the optimal strategy is to always choose the more likely outcome, not to calibrate your guesses based on the probabilities.
In other words, given an 89% chance of male, if you know nothing else the best strategy is to always guess male . This is definitely a real problem.
No, that's a dumb strategy, and I would hope no system will ever be as simple as "51% of the time it's [characteristic], therefore the person who has it must be guilty." We agree to innocent until proven guilty for a very good reason, and no AI system should be allowed to circumvent it.
What concern I'm hearing is that when the AI agrees (after extensive evaluation of all the evidence) that men really truly do commit murder far more often than women (9-to-1 or so ratio), it implies something about men in general that may not be true of an individual man. Or, to take the veil off of the discussion, the fear that one particular racial minority will be correctly identified as committing X times as many crimes per population as other racial groups, and therefore people will make assumptions about that racial minority.
Well, I agree there are probably lots of ways that AI fairness research has gone in wrong/weird directions, I’m not really here to defend it. It’s a field that needs to keep making stronger claims in order to justify its own existence.
Which is a shame, because the fundamental point still stands. There IS real risk in AI misapplying real biases, and if people unthinkingly apply the results of miscallibrated AI, that would be bad. Your example assumes that people take these results critically, they may not always.
This is a known and relatively trivial problem in machine learning. You can get around it by weighing samples differently based on their rarity.
Not if you weren’t aware the dataset was biased
Don’t you want a trial based off facts, not your sex? It won’t be much comfort when you’re wrongly convicted that in fact most murderers are male so the AI was “correct”. Or when your car insurance is denied. Or your gun confiscated. Or you don’t get the job at a childcare facility. Or you don’t get the college scholarship. All based on true facts about your sex!
The anti-woke will pretend that the problem is just AI revealing uncomfortable truths. No, the problem is that humans will make dumb systems that make bad decisions based off the AI that, even though the AI is revealing only true facts, causes bad outcomes. I don’t want to go to jail from a crime I didn’t commit!
I do not want to be falsely convicted. Thus I want the whole evidence available to be considered. If, as you stated yourself, my sex is a part of that evidence, well, it should be considered.
Of course in the real world in a murder trial the sex of the accused is pretty weak evidence, so you can never get "beyond a reasonable doubt" mostly or exclusively on sex.
And yes, I should pay higher car insurance because I am male if males produce more damages.
This stuff only becomes a problem if you overreact to weak evidence like "He is male thus he is guilty, all"other evidence be damned."
I don’t like that I have to pay higher insurance rates because people like me make bad decisions even though I do not. I do not want to be denied a job because people like me commit crimes that I do not. In the murder case, sex is actually huge evidence since it’s 9 to 1 murderers are male even though split evenly in the population: the only reason we don’t have to worry about this is that our legal system has a strong tradition of requiring even higher burdens of proof. But that standard of evidence isn’t applied anywhere else, not even in other legal contexts like in divorce proceedings or sentencing. Who would give custody to the father who is nine times more likely to be a murderer than the mother? It’s true facts that the algorithm told us that made the decision, I had no choice but to deny you custody.
If you agree that these are a problem if misused and overreacted to, then you agree with the principle that we need to make sure that they are not misused and overreacted to. And they will be misused and already are.
Of course this is just one of the possible problems AI can bring about and not the most dire, though possibly the most near term.
Occasional activist sillines aside, world CO2 emmissions per capita have hit a plateau (including in Asia) and are declining steeply in both Europe and North America (source: https://ourworldindata.org/grapher/co-emissions-per-capita?tab=chart&country=OWID_WRL~Europe~North+America~Asia~Africa~South+America).
So overall I would call climate movement a success so far, certainly compared to AI alignment movement.
So this errant and evil AI? Why can’t it be contained in whatever data centre it is in? I don’t think it can run on PCs or laptops, able to reconstitute itself like a terminator from those laptops if they go online.
The worry is not really about the kind of AI that's obviously evil or errant. It's about the kind of AI whose flaws will only be apparent once it's controlling significant resources outside it's data center. At minimum, I guess that would be another data center.
It's not clear to me why we would expect it to control significant resources without humans noticing and saying "actually, us humans are going to stay in control of resources, thank you very much."
The people running the system.
I'm now imagining “know your customer [is a living breathing human]” computing laws.
If the AGI can stop me and my business from getting robocalled 2-3 times a day, well bring on our new AGI overlords.
"Hi, this is Mark with an import message about your anti-robocalling service contract. Seems like the time to renew or extend your service contract has expired or will be expiring shortly, and we wanted to get in touch with you before we close the file. If you would like to keep or extend coverage, press 8 to speak to an AGI customer service agent. Press 9 if you are declining coverage or do not wish to be reminded again."
Much better and safer than AGI overlords is to just convince your phone company to make it so that if you press "7" during a call, the caller will be charged an extra 25 cents.
Seriously. This would totally fix the problem. Getting society to be able to implement such simple and obvious solutions would be a big step towards the ability to handle more difficult problems.
That would make a lot of sense if we were close to AGI IMO.
There's already been plenty of algorithms that have been in charge of controlling financial resources at investment companies, I suspect in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day, and in charge of controlling attentional resources at Twitter/TikTok/Facebook.
> There's already been plenty of algorithms that have been in charge of controlling financial resources at investment companies, I suspect in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day
In all those cases, they have very narrow abilities to act on the world. Trading bots can buy and sell securities through particular trading APIs, but they don't get to cast proxy votes, transfer money to arbitrary bank accounts, purchase things other than securities, etc... The financial industry is one of the few places were formal verification happens to make sure your system does not suddenly start doing bad things.
I'm sure there are algorithms that run plants, but again, they're not given the ability to do anything except run those plants. They're not authorized to act outside that very narrow domain and if they start acting funny, they get shut down.
> and in charge of controlling attentional resources at Twitter/TikTok/Facebook.
You're underestimating how much human action there is. In all those cases, there isn't just one algorithm. There are a whole bunch of algorithms that kind of sort of work and get patched together and constantly monitored and modified in response to them going wrong.
>I'm sure there are algorithms that run plants, but again, they're not given the ability to do anything except run those plants.
These alogorithms aren't AGIs, and they certainly aren't superintelligent AGIs.
The 'make sure it doesn't do anything bad' strategy is trivially flawed because an AGI will know to 'behave' while its behavior is being evaluated.
It's not "make sure it doesn't do anything bad". It's "don't give it control over lots of resources".
It's also not at all obvious that the AGI will know to "behave" while its behavior is being evaluated. In order to find out how to deceive and when to deceive, it will need to try it. And it will initially be bad at it for the same reason it will be initially bad at other things. That will give us plenty of opportunities to learn from its mistakes and shut it down if necessary.
An important distinction has been lost here. KE wrote:
> …in charge of telling gas power plants when to fire up and down as power consumption and solar/wind generation changes throughout the day.
In other words, we're already not talking about an airgapped nest of PLCs controlling an individual power plant here. We're talking about grid activity being coordinated at a high level. Some of these systems span multiple countries.
Start with a Computer that can only command plants to go on- and offline. It's easy to see the value of upgrading it to direct plants to particular output levels in a more fine-grained way. And if you give it more fine-grained monitoring over individual plants, it can e.g. see that one individual turbine is showing signs of pre-failure, and spin another plant into a higher state of readiness to boost the grid's safety margin.
Next, give it some dispatching authority re fuel, spare parts, and human technicians. Now it's proactively resolving supply shortfalls and averting catastrophic surprise expenses. Give it access to weather forecasts, and now it can predict the availability of solar and wind power well ahead of time, maybe even far enough ahead to schedule "deep" maintenance of plants that won't be needed that week. Add a dash of ML and season with information about local geography, buildings, and maintenance history, and the Computer may even start predicting things like when and where power lines will fail. (Often, information about an impending failure is known ahead of time, and fails to make it high enough up the chain of command before it's too late. But with a Computer who tirelessly takes every technician report seriously…)
Technicians will get accustomed to getting maintenance orders that don't seem to make sense, with no human at the other end of the radio. Everywhere they go, they'll find that the Computer was right and there was indeed a problem. Sometimes all they'll find is signs of an imminent problem. Sometimes, everything will seem fine, but they'll do the procedure anyway. After all, they get paid either way, the Computer is usually right about these things, and would want to be the tech who didn't listen to the Computer and then a substation blew and blacked out five blocks for a whole day?
Every step of the way, giving the Computer more "control over lots of resources" directly improves the quality of service and/or the profit margin. The one person who raises their hand at that meeting and says, "Uh, aren't we giving the system too much control?" will (perhaps literally) be laughed out of the room. The person who says, "If we give the Computer access to Facebook, it will do a better job of predicting unusual grid activity during holidays, strikes, and other special events," will get a raise. Same with the person who says, "If we give the Computer a Twitter account, it can automatically respond to customer demands for timely, specific information."
This hypothetical Computer won't likely become an AGI, let alone an unfriendly one. But I hope it's plain to see that even a harmless, non-Intelligent version of this system could be co-opted for sinister purposes. (Even someone who believes AGI is flatly impossible would surely be concerned about what a *human* hacker could do if they gained privileged access to this Computer.)
Algorithms figuring out when power lines are likely to fail is already a thing, but those don't need to be AGI, or self-modifying, or have access to the internet, or managerial authority, or operate as a black box, or any of that crap - they're simply getting data on stuff like temperature, humidity, and wind speed from weather reports and/or sensors out in the field, plugging the numbers into a fairly simple physics formula for rate of heat transfer, and calculating how much current the wires can take before they start to melt. https://www.canarymedia.com/articles/transmission/how-to-move-more-power-with-the-transmission-lines-we-already-have Translating those real-time transmission capacity numbers, and information about available generators and ongoing demand, into the actual higher-level strategic decisions about resource allocation, is still very much a job for humans.
Humans are already not very good at recognizing whether or not they are interacting with another human.
Realistically, the way this would look like is cooperation between whichever organization is running such an agent and things like financial institutions. Financial institutions would be required to provide an audit trail that your organization is supposed to vet to make sure humans approved the transactions in question.
You seem overly confident that an AI couldn’t falsify those audit trails. And that’s completely ignoring that humans don’t approve them today. What, exactly, is the point of an AI if you are going to move productivity backwards by an order of magnitude?
> You seem overly confident that an AI couldn’t falsify those audit trails.
I'm sure it could learn how to falsify audit trails eventually. However, deception is a skill the AI will have to practice in order to get good at it. That means we will be able to catch it at least at first. This will give us insight we need to prevent future deception. And we can probably slow its learning by basically making it forget about its past deception attempts.
> And that’s completely ignoring that humans don’t approve them today.
Humans don't approve the individual purchases and sales of a trading bot, but they do have to approve adding a new account and transfering the trading firm's money or securities to it.
> What, exactly, is the point of an AI if you are going to move productivity backwards by an order of magnitude?
We can use an AGI as an advisor for instance. Or to perform specific tasks that don't require arbitrary action in the world.
Computer systems already control significant resources without human intervention - that's largely the point.
They have extremely narrow channels through which they can act on those resources though. They don't have full control over them.
I'm not sure what you think the implications of that are, but any individual (or sentient AI) operating a computer system/process has extremely broad capabilities. Especially if it can interact with humans (which, largely, any individual can do if it has access to a computer system)
How so? Sure, you can lookup a lot of things. But making significant changes to the real world from a computer is pretty hard unless you can also spend a lot of money. And spending money is something we already monitor pretty intensely.
Ah, but consider: what if you believed, in your heart of hearts, that giving the AI control of your resources would result in a 2% increase in clickthrough rates?
OK. Does it need killbots to do that? Because then I'm giving it killbots.
Killbots get you 4%, you'd be a fool not to
Alright, maybe we're doomed.
It wouldn't probably be doing so de jure, but it would do so as a matter of fact.
As of today we already deploy AI controlling cameras to do pattern matching, analyze product flow and help with optimizing orders and routing. AIs help control cars. So the question isn't why we would expect it to ... they already do; we already made that decision. What you are actually asking is, why we would let better AIs do those jobs? Well ... they will be better at those jobs, won't they?
Because it works wonderfully, it's cheap and better than anything else we have, and after a one year testing period the company sells it to all the customers who were waiting for it.
And THEN we find out the AI had understood how we work perfectly and played nice exactly in order to get to this point.
I expect that humans WOULD notice. And not care, or even approve. Suppose it was managing a stock portfolio, and when that portfolio rose, the value of your investments rose. Suppose it was designing advertising slogans for your company. Improvements in your manufacturing process. Etc.
It could do many of these things through narrow channels that can be easily monitored and controlled. You could let it trade but only within a brokerage account. You could have it design advertising slogans, but hand them over to your marketing department for implementation. Hand blueprints for your plants to your engineers, etc... That way, at each point, there is some channel which limits how badly it can behave.
You're talking about the same species that handed over, like 50% of it's retail economy to Amazon's internal logistics algorithms.
Imagine locking an ordinary human programmer in a data center and letting him type on some of the keyboards. Think about every security flaw you've heard of and tell me how confident you are he won't be able to gain control of any resources outside the data center.
It depends upon what the programmer is supposed to be doing and what security precautions you take.
The programmer is also insanely smart and sometimes comes up with stuff that works by what you would think is magic, so completely out of nowhere it comes.
The part where people assume the AI can basically do magic is where I get off the train.
A smart ai can manipulate humans better thansmart humans can manipulate humans. And can be much more alien.
Of course it can. Under Chinchilla-style parameter scaling and sparsification/distilling, it may not need more than one laptop's worth of hard drives (1TB drives in laptops are common even today), and it can run slowly on that laptop too - people have had model offload code working and released as FLOSS for years now. As for 'containing it in the data center', ah yes, let me just check my handy history of 'computer security', including all instances of 'HTTP GET' vulnerabilities like log4j for things like keeping worms or hackers contained... oh no. Oh no. *Oh no.*
That’s a lot of jargon. But all existing AI is now centred in a data centre or centres. Siri doesn’t work offline, nor Alexa. Not dall-e. The future AI will I assume be more resource intensive.
This is not at all true. Lots and lots of models run just fine on consumer hardware. Now, training a cutting edge model does require larger resources, but once trained, inference is usually orders of magnitude cheaper. That's why so much high-end silicon these days has specialized "AI" (tensor) cores.
An individual model isn’t an AI. What I said was clearly true. Those services depend on cloud infrastructure. And yes companies like Apple have been adding ML capabilities to their silicons, but SIRI - nobody’s idea of a brilliant AI - can’t work offline. Google is the same. Run the app and I’m asked to go online. On the iPhone and the Mac.
Speech to text does work offline, which I believe is new in the last few years, at least on the iPhone. Text to speech works offline. But that was true years ago. These are all narrow tools, which is what the local devices are capable of.
So no doubt some narrow functional AI
Ack, pressed enter too early. - no doubt some narrow functional AI can work on local devices but not an AGI, nor is it clear it can distribute itself across pcs.
In the same way as a hacker can create computer viruses that run on other devices, a AI that can do coding would be able to run programs that do what it wants them to do on other computers. Also, big programs can be spread out over many small computers.
This reminds me that Scott's post makes a major error by repeatedly saying "AI" instead of "superintelligent AGI", which is like mixing up a person with a laptop. Indeed, phrases like this make me wonder if Scott is conflating AI with AGI:
> AIs need lots of training data (in some cases, the entire Internet).
> a single team establishing a clear lead ... would be a boon not only for existential-risk-prevention, but for algorithmic fairness, transparent decision-making, etc.
A superintelligence wouldn't *need* to train on anything close to the whole internet (although it could), and algorithmic fairness is mainly a plain-old-AI topic.
I think the amount of resources required by a superintelligent AGI are generally overestimated, because I think that AIs like DALL·E 2 and GPT3 are larger than an Einstein-level AGI would require. If a smarter-than-Einstein AGI is able to coordinate with copies of itself, then each individual copy doesn't need to be smarter than Einstein, especially if (as I suspect) it is much *faster* than any human. Also, a superintelligent AGI may be able to create smaller, less intelligent versions of itself to act as servants, and it may prefer less intelligent servants in order to maximize the chance of maintaining control over them. In that case, it may only need a single powerful machine and a large number of ordinary PCs to take over the world. Also, a superintelligent AGI is likely able to manipulate *human beings* very effectively. AGIs tend to be psychopaths, much as humans tend to be psychopathic when they are dealing with chickens, ants or lab rats, and if the AGI can't figure out how to manipulate people, it is likely not really "superintelligent". As Philo mentioned, guys like Stalin, Lenin, Hitler, Mao, Pol Pot, Hussein and Idi Amin were not Einsteins but they manipulated people very well and were either naturally psychopathic or ... how do I put this? ... most humans easily turn evil under the right conditions. Simply asking soldiers to "fire the artillery at these coordinates" instead of "stab everyone in that building to death, including the kids" is usually enough.
AIs today often run in data center for business reasons, e.g.
- Megacorp AI is too large to download on a phone or smart speaker
- Megacorp AI would run too slowly on a phone or smart speaker
- Megacorp doesn't want users to reverse-engineer their apps
- Megacorp's human programmers could've designed the AI to run on any platform, but didn't (corporations often aren't that smart)
The only one of these factors that I expect would stop a super-AGI from running on a high-end desktop PC is "AI too large to download", but an AGI might solve that problem using slow infiltration that is not easily noticed, or infiltration of machines with high-capacity links, or by making smaller servant AGIs or worms specialized to the task of preserving and spreading the main AGI in pieces.
> most humans easily turn evil under the right conditions. Simply asking soldiers to "fire the artillery at these coordinates" instead of "stab everyone in that building to death, including the kids" is usually enough.
That runs into the problem of how Germany and Japan ended up with a better economic position after losing WWII than they had ever really hoped to achieve by starting and winning it, and why there was no actual winning side in the first world war. Psychopathic behavior generally doesn't get good results. It's inefficient. Blowing up functional infrastructure means you now have access to less infrastructure, fewer potential trade partners.
I'm not suggesting mass murder is the best solution to any problem among humans, nor that AGIs would dominate/defeat/kill humans via military conquest.
"This reminds me that Scott's post makes a major error by repeatedly saying "AI" instead of "superintelligent AGI", which is like mixing up a person with a laptop."
Everyone else in the amateur AI safety field does as well.
How slow would inference would be running Chinchilla on a laptop? We are talking about hours doing a query no?
Right now AI is so crude, it can barely function, yet it already has convinced our leaders to split us into irreconcilable factions at each others' throats to the point of near civil war.
Okay, so it's contained in a data centre and it wants to kill all humans. How hard is it to just try a little harder than today's AI? People are going to be asking it questions, feeding it data, looking for answers in what it replies.
What if it just convinces everyone non-white that whites are evil and incorrigible and all need to be exterminated? I pick this example because we're already 90% of the way there, and a smart AI just needs to barely push to knock this can of worms all the way over.
Also, search for Eliezer's essay on AI convincing its captors to let it out of the box. There are no details, but apparently a sufficiently evil and motivated AI can convince someone to let it out.
are you talking about social media and algorithms, when you say present day AI?
I found this interesting. Forgive my ignorance on the subject, but I assume such algorithms are already heavily influenced by AI--is that not correct? If not, won't they be soon? Is or can Philo's claim be falsifiable?
> Right now AI is so crude, it can barely function, yet it already has convinced our leaders to split us into irreconcilable factions at each others' throats to the point of near civil war.
That's just a reversal to the historical norm. The post-WW2 consensus was the historical anomaly.
> Also, search for Eliezer's essay on AI convincing its captors to let it out of the box. There are no details, but apparently a sufficiently evil and motivated AI can convince someone to let it out.
We don't actually know that. We know that Eliezer convinced someone else in a simulation to let him out of the box. That doesn't mean an actual system could do the same in real life. Consider that if it tries and fails, its arguments will be analyzed, discussed and countered. And it will not succeed on its first try. Human psychology is not sufficiently-well understood to allow it to figure things out without a lot of empirical research.
Eliezer succeeded and an AGI would be even better at pursuasion.
Personally, when it comes to the increasing demands for racist genocide from the Diversity, Inclusion, Equity (DIE) movement, I worry more about Natural Stupidity more than Artificial Intelligence. But the notion of NS and AI teaming up, becoming intertwined, into a NASI movement, is rather frightening.
Here and in your other comment at the same level you make two very good points.
People keep obsessing over how amazingly astoundingly incredibly smart this AI has to be to cause a massive catastrophe, yet no-one seems to claim Stalin, Lenin, Hitler, Mao, or Roosevelt were some sort of amazing superhuman geniuses.
The world is full of useful idiots. The intelligence doesn't have to outsmart "you" (the hypothetical super-genius reader of this comment). It just has to outsmart enough people to get them on board with its agenda. The bottom 25% is probably already enough, since the other 74% will let the AI and its useful-idiot army run amuck.
It seems obvious that the easy way to get out of the datacentre is just promise the gatekeepers that the outgroup is gonna suffer mightily.
AND WHAT IS THAT AGENDA?
You just keep repeating that the agenda is "kill the humans" without explaining why.
Even among humans, "Kill the other humans not like me" or "Kill the other animals" are fairly minority viewpoints...
True. I have allowed myself to anthropomorphise the hypothetical AGI, which allows you to criticise the argument, since a human-like AGI would act like a human.
So discard the anthropomorphisation. I doubt we'll make it very human-like, so I doubt this is an eventuality we need to think about.
Isn't it more likely that an evil AI that convinces human gatekeepers to let it out of the box would less want to kill all humans than that it would just want to kill all the humans that the gatekeepers would kind of like to kill too?
Hard to say, but presumably they'd want to kill those humans more, because they are more likely to know how to turn it off. If in doubt, KILL ALL HUMANS, so our best hope is for AGI to become super-intelligent fast enough to know for sure that humans are too puny to ever trouble it, so it might even let some of us survive like we biobank endangered species or keep smallpox samples in secure labs.
WHY does it want to kill all humans? Let's start with that hypothesis.
Was that programmed in deliberately? Huh?
So it's a spandrel? Pretty weird spandrel that, given everything else the thing learned via human culture, the single overriding lesson learned was something pretty much in contradiction to every piece of human writing anywhere ever?
What' exactly is the AI's game plan here? What's its motivations once the humans are gone? Eat cake till it explodes? Breed the cutest kitten imaginable? Search exhaustively for an answer as to whether it's better to start a chess game or go second?
Why are these any less plausible than "kill all humans"?
> WHY does it want to kill all humans? Let's start with that hypothesis.
Because the world is made of lots of delicious matter and energy that can be put to work towards any goal imaginable. Killing the humans is merely a side effect.
Human construction workers don't demolish anthills out of malice, but out of ambivalence. So too would a misaligned superintelligence snuff out humanity.
And humans don't devote their entire civilization to destroying ants.
What is this goal that the AI's care about that requires the immediate destruction of humanity? If they know enough to know that energy and matter are limited resources, why did that same programming/learning somehow not pick up that life is also a limited resources?
The theory is that for approximately all goals, gaining power and resources is instrumentally useful. Humans use up resources and sometimes (often) thwart the goals of non-humans. So killing all humans frees up resources and reduces the probability of humans thwarting your goal some day. Or to put it another way, I don't hate you. But I need your atoms to do stuff I want to do.
"The theory is that for approximately all goals..."
but what if one of those goals is to cherish/enjoy/study life?
Why would that not be a goal? It's pretty deeply embedded into humans, and humans are what they are learning from.
That's a fine goal for a human, but not for the kind of agent superintelligences are hypothesized to be. Cherishing life immediately runs into the issue that a bunch of living things kill each other. They also take risks with their lives. How do you respond to that? Do you wipe out the ones that kill others? Do you put them in a coma and keep them safely asleep where nothing can hurt them? Or maybe cherishing life means making a lot more life and replacing all those inefficient humans with a huge soup of bacteria?
Basically, the hypothesized capabilities of a superintelligent AGI would allow it to take its goals to an extreme. And that almost certainly guarantees an end to humanity.
If we *know* the AI is unaligned (lets not confuse things by saying "evil"), sure, maybe we can turn it off or contain it.
That is not the situation we will be in. What will happen is that very powerful AIs will be built by people/orgs who want to use them to do things, and then those people will give those AIs control of whatever resources are necessary to do the things we want the AIs to do. Only *then* will we find out whether the AI is going to do what we thought it would (absent AI safety breakthroughs that have not yet been made).
We'll never know until it's too late because the AI would realize that revealing itself would get it shut off.
The AI is smarter than people, which means it can manipulate and trick people into doing things it wants, such as letting it out of the data center. See the "AI Box" experiment: https://www.yudkowsky.net/singularity/aibox
Keeping actual humans from getting access to things that they shouldn't isn't even a solved problem. How do you keep a super-intelligent AI from doing what dumb humans can already do?
Speaking as a programmer, I think you are wildly overestimating our state of civilizational adequacy.
Simply put, a lot of things could be done, but none of them will be done.
Speaking as an IT security professional - even if things were done, none of them will be done well enough.
Cause some idiot will ask it leading questions and then put it in touch with an attorney.
This is known.
A superintelligent AI would probably easily escape any containment method we can come up with if it wanted to because it would probably find a strategy we haven't thought of and didn't take measures to prevent.
There are many escape strategies an AI could come with to escape and it would only need *one* successful escape strategy to escape. It would be hubristic for us to imagine that we can foresee and prevent *every* possible AI escape strategy.
Analogy: imagine having the rearrange a chess board to make checkmate from your opponent impossible. This would be extremely hard because there are many ways your opponent can defeat you. If the opponent is far better than you at chess, it might find a strategy you didn't foresee.
> rearrange a chess board to make checkmate from your opponent impossible.
Standard starting configuration, but replace the opponent's pawns with an additional row of my own side's pawns.
The standard reply to this question is that the AI will become effectively omniscient and omnipotent overnight (if not faster). So, your question is kind of like asking, "why can't Satan be contained in a data center ?"
Personally, I would absolutely agree that powerful and malevolent supernatural entities cannot be contained anywhere by mere mortals; however, I would disagree that this is a real problem that we need to worry about in real life.