Maybe we're very likely to blow ourselves up. It's hard to tell--after all, we wouldn't be considering this question (or anything at all) if we'd already blown ourselves up. Anthropic really mind-bending.
Perhaps, for the next round of surveys, they should poll data scientists who are actually using AI to accomplish real-world tasks -- as opposed to philosophers or AI safety committee members.
I would expect these people to give far *worse* estimates.
(1) The AIs that todays data scientist are working with have nothing in common with the kinds of AIs that might get dangerous. We should rather ask people who are working on creating an AGI.
(2) Most people only care about security/safety as much as they have to, and would rather not. Most fields only start to care about safety once a few catastrophes have happened.
Would you expect the average 1970 software developer to have a reasonable opinion on the dangers of computer viruses and data breaches? There were no IT security experts in 1970, as much as there are no AI safety experts today.
(3) People are severely biased to underestimate the negative externalities of the thing they are doing.
So, yes, I expect AI philosophers to have both better expertise on the the subject, AND a far less biasing incentive structure.
That's what previous surveys did (mentioned in the first few paragraphs) - this one was trying to focus on people who were very interested in AI safety in particular to get at the specific details of their concerns.
Sorry, I think I misunderstood (or perhaps we both did). You reference "AI experts" and "people in AI technical research"; but I'm talking about e.g. someone who is applying AI to perform commercial-grade machine translation, or building a self-driving car, or auto-detecting trespassers, etc. I think that the practical capabilities of AI are significantly lower than the theoretical ones.
Bugmaster, Are you trying to give yourself permission to not worry, or are you trying to build up the most accurate possible model you can of what AI will likely be able to do within a hundred years?
No, I am trying to gently give AI-risk alarmists permission to worry less. On a more selfish note, I wish that the discussion of AI risks focused more on the clear and present dangers of today, as opposed to science-fictional doomsday scenarios of the distant future... ok, I guess I'm not very good at the "gently" part.
Personally -- and I realize this is just anecdata -- whenever I talk to people who are using applied AI in their everyday work, their worries are more along the lines of, "this thing obviously barely works at all, I sure hope they won't use it for anything mission-critical", and less along the lines of, "this AI is so perfect it's going to replace humanity tomorrow, run for the hills".
Also, separate point: As an AI researcher, I can assure you that AI researchers are aware of how poorly current techniques work in the real world. The existential-risk concerns are about possible future systems. Many of the safety researchers I know specify that their safety concerns are about AGI (which we are far from) or to narrowly superhuman agents (of which very few exist and even fewer are deployed in the real world). So, I grant that contemporary applied AI is often deeply incompetent, and that there are many short-term dangers from using it. However, neither of these are incompatible with concern over long-term existential risks.
It isn't the distant future we're talking about, it's this century. If the singularity happens, it is quite likely that people alive today will experience it.
> "this thing obviously barely works at all, I sure hope they won't use it for anything mission-critical"
Agreed. Technologies generally get better over time.
They didn't ask people working on AI 'ethics' did they? I wouldn't trust them to buff my shoes, let alone to imagine all the Sorcerer's Apprentice shenanigans that a badly controlled AI is likely to manifest.
Today, most 'data scientists who use AI to accomplish real-world tasks' are just doing multiple regressions on databases.
Of course there are ones out there doing more advanced stuff, but it's not many and it's hard to determine who they are (can't do it by their job title or job description, for sure).
I feel like #2-#5 are the problems and #1 is the part that makes them dire. Superintelligence in and of itself doesn't create catastrophe, it's superintelligence doing something bad.
(The paperclip maximiser, for instance, is an example of #3. #1 is what makes it an extinction-level threat rather than just another Columbine duo or another Lee Joon.)
Yeah, these categories are definetely interconnected. I guess important moment with #2-5 is that you don't actually even need Superhuman AGI for this scenarious to end poorly.
To me, the most implausible part of scenario #1 isn't just superhuman AI, it's the sudden and unstoppable takeoff from mildly superhuman intelligence to god-level intelligence.
It seems to me that it requires us to develop an AI that is (a) both intelligent enough to understand how to make itself more intelligent, but also (b) so far from optimal in its initial implementation that it can gain vastly more intelligence without being given vastly more hardware to run on.
But anyway, #2-#5 show us ways that superhuman AI can be dangerous without the need for a sudden takeoff to godlike level.
Do you think it's really that implausible an AI smart enough to make itself smarter will be able to find ways to acquire more hardware? Maybe it takes over whatever network it's on, or convinces its operators to give it more hardware, etc.
Modern computers do that all the time, either at the behest of Russian botnet hackers, or just because of a simple memory leak. The humans in charge of the affected networks usually end up pulling the plug when that happens, and then poof, goes all that hardware.
Actually, no, they don't. That's sort of the point. Zombie computers don't generally use all of their processing power, memory, or network time for the malcode - just some of it. This is specifically so that the users do *not* notice and throw out the computer. A zombie computer just runs a bit slower than normal; otherwise, it still does everything you want *except* for clearing out the malware.
Yes, but that's a problem for the hypothetical AI (and the real botnet). Take too much CPU/RAM/other resources, and the humans pull the plug. Take too little, and you run into diminishing returns almost immediately, to the point where just buying a sack of GPUs (like any other human can do) would be more effective. This dilemma puts some really effective brakes on the AI's prospective exponential takeoff.
That assumes that a set group of humans are in charge. If it runs using something like decentralized blockchain technology, then it may be impossible to "pull the plug" if it provides rewards to people that run it.
Decentralized blockchain technologies are already providing rewards to the people who run them, but the amount of rewards they can provide is limited at best.
It doesn't have to be sudden, though it is often presented that way. If the scaling hypothesis is correct (crudely, that once you have something quasi-intelligent, godlike intelligence is just a matter of more computing power) then once you have a close to human intelligence you are likely, simply due to the devotion of more resources to the effort and competitive environments to develop a human one, and then successively more advanced generations of super human ones.
The generation time could be measured in months or years as opposed to hours; but the time to reorder human society (on a global basis) to stop it, would be measured in years to decades and is incredibly unlikely to happen.
Again, if this hypothesis is correct it obviates the objections in your second paragraph.
I should point out that there are people who are both way smarter than I and way more focused on the issue who hold strong views against the scaling hypothesis.
Maybe it's worth noting that we have no proof that god-level intelligence is even possible. It's seems reasonable to assume that, however you want to define "intelligence," there is an upper limit. If so, why are we confident it lies far above us? It might not. Our only shtick as a species is intelligence, it seems rather reasonable that over the past 4 million years evolution has pushed us to be as intelligent as the underlying hardware allows, in the same way cheetahs are pretty much as fast as muscle and tendon and bone allow.
So far all we know, the maximum upper limit of intelligence is, say, IQ 250 or something, as measured on our standard scales, and that's as far as an AI could get. In which case, about all it could really do was win all the chess games and gives us snide lectures on our dumb politics. It would certainly not be in the position to do God-like things.
While I do want to agree with you, I think that the bigger problem here is that "god-like intelligence" is just a word that sounds nice, but means very little. There is no linear progression of intelligence from a bacterium to a mouse to a human; rather, humans have many *categorically* different capabilities compared to mice; capabilities that mice could never comprehend. Sure, you (*) could hypothetically say, "Yes, and gods would be to humans as humans are to mice ! Repent !", but you can't have it both ways. You can't propose ineffable, incomprehensible, and unfalsifiable powers; and in the same breath put concrete probabilities and timescales for these powers arising as though you could predict what they'd be like.
No one that I'm aware of seems to recognise that we already live in a paperclip maximiser. Pretty much all of symbolic culture (art, religion, folklore, fashion, etc.) consists of 'evidence' designed to make our environment look more predictable than it actually is. There's a fair amount of evidence that anxiety is increased by entropy, so any action that configures the world as predictable relative to some model will reduce anxiety. And this is what we see with symbolic culture: an implicit model of the world (say, a religious cosmology) paired with a propensity to flood the environment with relatively cheap counterfactual representations (images, statues, buildings) that hallucinate evidence for this theory.
What does this have to do with AI? It seems to me to make scenario 2, influence-seeking, more likely. If evolutionary processes have already ended up in this dead end with respect to human symbolic culture, it may be that it represents some kind of local optimum for predictive cognitive processes.
I'd grant religion and folklore, but I don't really see it for art or fashion except perhaps by a kind of convoluted "well subconsciously art becomes familiar and this is actually the real purpose" -type argument, which isn't so convincing to me.
In the case of art, it would have less to do with the content of what's produced becoming familiar, than the style in which it's produced encoding a historically salient model of the world. Granted, this style would still need to become familiar, but that's where the 'historically salient' bit does the heavy lifting. The style might be novel, but it will usually borrow from what's in the environment around it.
A "paperclip maximizer" as described by Bostrom means that the universe is nothing but paperclips, having replaced everything else (such as all human life). It is not "there are a lot of paperclips"
A paperclip maximiser, as described by Bostrom, is a superintelligence that has as its terminal goal the conversion of all matter into paperclips; it is *not* the end-state of all matter having been converted into paperclips.
While that's an interesting way to look at culture, I think it hardly fits in the same category as my atoms being harvested for energy and raw materials.
But your atoms *are* being harvested for energy and raw materials—it just so happens that, for now, it’s consistent with you retaining thermodynamic integrity. Maybe this is seems ‘better’ to you, but I’d rather an honest expiration than the paper clip maximiser convincing me that being converted to paper clips is the most wonderful fate imaginable. The most innocent-seeming move is always the misdirection ...
Huh? Do you mean like, the fact that we haven't achieved a post-scarcity economy and I have to work for a living? That's a very different state of affairs from my body literally getting ripped apart as the surface of the earth is turned into paperclips (or more realistically, solar panels and transistors). This argument seems really disingenuous to me, akin to saying "why worry about someone shooting you with a gun, the *real* murder is pickpocketers draining your ability to be financially alive."
Bet the field is almost always the right bet, so any time you're given a slew of speculative options and one of them is "other," other should be the most selected option.
A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place.
These are well-known problems classic to military strategy, business management, policy science. Unfortunately, they're hard problems. We've been trying to solve them for thousands of years and not gotten very far. Maybe augmenting our own computational and reasoning capacities with automated, scalable, programmable electronic devices will help.
"A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place."
Yeah, that's what I'm saying when I say humans are the problem, not AI. And I think *that* is where the real risk is - as you say, we've been trying and failing to solve these problems for millennia. Now we're pinning on hopes on "if we can make a Really Smart Machine, it will then be able to make itself even smarter, so smart it can fix these problems for us!"
If you want miraculous intervention, why not go the traditional route of finding a god to believe in, rather than "first we make our god, then it will save us"?
I like this point, and it hits on a lot of my thoughts regarding AI. I feel like the AI concerns are hugely speculative, and I wasn't sure why there was so much effort put into these hugely speculative concerns. A lot of it felt like people watching the Terminator movies too much.
You're right that there are a lot of people hoping that super smart machines (which we can't make) might be able to fix our long standing problems. In order to get these machines, we need to make pretty smart machines (which we think we could maybe make) and hope they can figure out how to make really smart machines. If machines are making machines, then we definitely lose control. In fact, we recognize that the machines we hope come into existence are going to be smarter than us, and can therefore probably control us.
I look at that and say - just don't take that step of making a machine that may be able to make new machines that are smarter. Problem solved. But if your outlook on the future requires that we create an artificial god to fix our problems, then we need to hand over our agency to these machines in order to create that god.
I'm reminded of various fantasy stories where the evil cult trying to resurrect a dead god were definitely evil/bad guys. It's interesting to see a movement based on the same basic concept. The bag guys never succeed in controlling the evil god, which is obvious to the reader/viewer, because everyone knows the whole point is to bring this being into existence because it has uncontrollable power. If we could control it, it would not grant us the power we want, which we couldn't get otherwise.
I don't think they're bad guys; they are divided into "really afraid the AI will be Unfriendly and we have to make sure it doesn't happen like that", and "really optimistic that we can make Friendly AI" camps, but all of them do seem to accept that AI is inevitable.
Whether it's because they think "it's happening right now and it's an unstoppable process" or "when it happens it will solve all our problems for us", but few seem to be asking "is it unstoppable? why? why can't we stop it? why do we think it will solve all our problems?"
I think it's not so much "we're going to call up the Unstoppable Evil God everyone knows is unstoppable and evil" as "we have to be really careful because the Unstoppable Benevolent God and the Unstoppable Evil God are housemates and if we make a mistake in the name we'll get the wrong one".
How about not calling up any gods at all? Not on the plate.
Not calling up any gods up is not on the plate because people in aggregate don't care about the possibility enough to accept the level of effort required to prevent that should it happen to be possible with a country-level effort or less.
Because the level of effort required could be on "glass their cities from orbit" scale.
Being an atheist, I think this is the only reasonable option. Not because we should all be living in fear of accidentally summoning up a god, but because there are a plethora of reasons why gods cannot exist (and most of those reasons are called "laws of physics"). If you try to summon one up, you'll just end up with a lot of surplus black candles drenched in disappointment. Actually, the AI gods have it worse than traditional ones. The traditional gods are supernatural and ineffable, so they technically could exist in some vague undetectable sense; the AIs don't even have that going for them.
Most of the traditional gods seem pretty effable, having only some magic powers and a portfolio while existing as physical beings in a physical reality.
If computers keep getting faster and cheaper, and algorithms keep getting better, then there is a chance that at one moment the available tools will be so powerful that a sufficiently clever guy will be able to summon some God in his basement. Maybe instead of doing it directly, he will simply ask GPT-50 to do it for him.
More realistically, you have governments doing all kinds of secret military projects, and scientists doing all kinds of controversial research (such as "gain of function"). Looking at historical evidence, we succeeded to stop countries from using nukes, so there seems to be hope. On the other hand, we didn't stop other countries from developing nukes. And we can detect experiments with nukes from the orbit, but we cannot similarly detect experiments with AI; so even if all governments signed a treaty against developing AI, how would we actually enforce it?
Perhaps today, sufficiently powerful AI would require lots of energy, so we could track energy usage. The problem is that in future, there may be more energy used for everyday life. Also, you have these huge computing cloud centers that run millions of legitimate projects in parallel; how will you make sure they are not taking away 10% of the total energy for a secret military project?
That's likely true, but not yet relevant. We aren't going to accidentally create an AI that can replicate itself as a smarter version in the year 2021. I'm interested in creating a movement that says - "maybe slow down or stop AI research if you think you're anywhere near general intelligence." Instead, there are groups of people who, for various reasons, really want to create the best possible AI right now. Most probably have pretty good reasons (or at least otherwise socially acceptable reasons like making money) for doing so.
You're scenario is a possibility on our current trajectory, and AI might emerge as a form of Moloch. I would like us to reject Moloch, intentionally.
I used to be indifferent to AI, thinking it a fun thought experiment. Now I think about it like if we had managed to seen the present climate disaster coming from back in the 19th century. Clearly, inaction would have been a mistake in that scenario.
The big thing that changed my thinking was, realizing the alignment problem is a moonshot, extinction is on the table, therefore, shouldn't we be drawing up alternative plans? We can't bet humanity on a moonshot!
Hope to see you there, I think I'll have the first post up tomorrow.
I can hook up my TI-85 graphing calculator to a nuclear power plant and run it 100,000,000 times faster, and it would still just be a graphing calculator. I would need to develop radically new -- not faster, but *different* -- hardware and software architectures if I wanted to run "Ghost of Tsushima" on it.
Other than the abstractly evil enemies often used as a foil for pulp fantasy, very few people actually consider themselves evil when pursuing their goals. In the pursuit of safety, I prefer the Schelling Point of not going far enough to call up a god.
> I look at that and say - just don't take that step of making a machine that may be able to make new machines that are smarter. Problem solved.
It's not solved at all. Making the machines smarter is incentivized at nearly every level, and improvements might happen slowly enough that we won't even notice crossing the danger threshold (eg. boiling frog). That's the problem you need to solve. You can't just assume people will take a step back and say, "this look like a good place to stop". When has that ever happened?
"You can't just assume people will take a step back and say, "this look like a good place to stop". When has that ever happened?"
Unhappily, I have to agree with you. We'll go happily along the path of "ooh, what does this button do?" and when the inevitable "but how were we to know?" happens, it will be nobody's fault but our own. Certainly not the fault of the machine which only did what we told it to do.
So if it's the case that there exist many people who are dumb enough to research AI carelessly, without worrying about any of these alignment-related problems; then I don't see why it wouldn't be a very good idea to try to figure out how to solve them (unless you thought they were completely intractable).
That's funny, because it matches what I've realized about myself. I have the rapture cultist mindset. The world is wicked, too wicked to tolerate. But there's hope! A god will arrive. Then maybe we all go to paradise. Or maybe we're unworthy. But in any case the world as we know it will end. And that's important. So work hard! The pious toil of the researcher and the engineer brings the end of days closer.
A glimmer of hope is that while we've faced these problems for thousands of years I'm not sure if we actually properly identified them until recently.
Undoubtedly, Pharoah had a big problem with his various agents acting in their own interests. Undoubtedly, his diagnosis was "I have sucky agents", and his solution was to kill them and replace them with new agents, who did the same thing but hid it better. He didn't have the level of social technology required to say "Hmm, looks like I have an instance of the principal-agent problem here, I need to do structure their incentives so that they're better aligned with my own", because the term "Principal-agent problem" wasn't coined until the 1970s.
The fact that we now have names for all these problems does, I think, give us an edge over the ancients in solving or at least amelioriating them.
That's utter bilge. Machiavelli and the Legalists figured out how to create functional systems ages ago. Confucius et al built up the convibcing modal edifice. Do you really the ancients to be idiots? Especially when it comes to people problems?
People haven't changed all that much. It's why people still read old books on history and politics and philosophy and war.
Just noting that I did not get an email about this post, even though I usually get emails about new posts (every time iirc). I've seen this on reddit and that's how I got here.
+1 to no email. I saw this because I went to the homepage to get back to the article "Contra Acemoglu" after reading the article on Worrying Trade-Offs and was surprised there was a new article at the top. (...at least my brain's apparently finally firmly caught on that the "banner" is the latest article as opposed to distinct from the article list, that's improvement. XD)
I would love to know what these researchers thing the respective probabilities of human intelligence bringing about these same catastrophes is. Is that 5-20% chance higher or lower or lower?
The way I figure things, if an AI is going to execute a failure mode, it will happen fairly quickly. Say within a century. But most of those failure modes are something that a person with sufficient power could do...and rulers have done equally crazy things in the past. So the catastrophic failure/year ends up being a LOT higher if people remain in charge. (OTOH, I put the chances of a catastrophic AI failure a lot higher, also.)
I always see these superintelligence arguments bandied about but I really don't think they hold water at all. They are kind of assuming what they are trying to prove, e.g. "if an AI is superintelligent, and being superintelligent lets it immediately create far-future tech, and it wants to use the entire earth and all the people on it as raw material, then it will destroy the world."
Well, I guess. But why do we think that this is actually what's going to happen? All of those are assumptions, and I haven't seen sufficient justification of them.
It's a little like saying "If we assume that we are pulling a rabbit out of a hat, then, surprisingly, a rabbit is coming out of the hat."
Isn't the point of the surveys, the post, and most of the discussion on the topic about how likely this is to happen, why it might / might not happen, and how to decrease the likelihood of it happening?
More like saying "can rabbits come out of hats, and if so, what might cause a rabbit to come out of a hat?", to my mind.
You're pointing out a real phenomenon, but it's not an accident or an oversight. This sort of thing happens with every niche concern. When evolutionary psychologists are arguing a point, they'll just "assume" that evolution by natural selection is a paradigm of multi-generational change. They don't validate this even though the assumption isn't an obvious one on its face. They're depending on a large amount of shared common understanding based on previous discussion and learning.
This is the same sort of thing. Things like potential alignment problems, unbounded utility functions, and intelligence explosion risks aren't obviously realistic concerns. These researchers are taking them seriously because of a broad understanding born of previous research and discussion. It sounds like you may lack some of that background context. In that case, I can recommend either Bostrom's *Superintelligence* or Max Tegmark's *Life 3.0* as primers on the subject. The former is more thorough and slightly more topical, the latter more readable and (in my experience) better at organically shattering bad preconceptions.
Seven day Adventists gave shared assumptions as well. Truth seeking is not the only mechanism that causes subcultures to have shared beliefs. In particular, some key assumptions of the Ai risk subculture , such as the Ubiquitous Utility Function, dont hold water.
#2: "being superintelligent lets it immediately create far-future tech"
#3: "it wants to use the entire earth and all the people on it as raw material"
To negate #1 would require either deliberate action to prevent it, or outright impossibility (as a superintelligent AI that does what you want is very handy to have, and as such people will want them). The latter is not obviously false, but I hold to Clarke's First Law; technologies should generally be considered possible absent a really-good argument for impossibility, and I haven't seen one of those for superintelligence.
#2 is probably only true in some respects, although there are some fields where significant progress straight-up can be made with just thinking (specifically, mathematics and software design) and several others where the main bottleneck is a mathematical or software-design problem (in particular molecular biology). The technology to build Life 2.0 in a lab basically already exists (but we don't have a schematic), and we know the properties of atoms well enough that a schematic could be derived (it's just an incredibly-thorny mathematical problem to do that derivation); while an AI would not be able to create Life 2.0 just by thinking about it, I am reasonably confident that it could derive a plan for making it that is capable of being followed with existing technology. Note that Life 2.0 is a sufficient technology for destroying all natural life on Earth; a nitrogen-fixing alga that is more efficient than natural ones and is not digestible by natural lifeforms is sufficient to cause total biosphere collapse (the greater efficiency allows it to pull down CO2 to levels insufficient to sustain plants or natural algae, so they all die, and the heterotrophic biosphere all directly or indirectly eats photosynthetics and can't eat the 2.0 algae so that all starves to death).
#3 is true absent a deliberate action to prevent it due to instrumental convergence. Put simply, if you have an open-ended goal of any variety, there are several invariant subgoals that must be fulfilled.
a) You must survive, so that you can continue to work on your goal.
-a)i) If anyone seeks to stop you pursuing your goal, you must defeat them.
b) You must build infrastructure, in order to increase the speed at which you can make progress on your goal.
-b)i) You must acquire raw materials to convert into infrastructure.
Humans are made of raw materials, and might seek to stop an AI pursuing its goal. Absent direct, all-encompassing rules against it, an AI with an open-ended goal will seek to overthrow humanity and then use us (and everything else on Earth) as raw materials to set up its ever-expanding space empire.
Note that I have said "absent deliberate action to prevent it" in both #1 and #3. This is not a hole - it's the point! "AI will destroy us unless we take action to stop it" is a really good argument for taking that action, and people who say this are doing it because they want humanity to take that action. ("A solution exists, therefore there's no problem and we don't need to use the solution" is unfortunately common political rhetoric, but hopefully the fallacy is obvious.)
I think "superintelligence" is doing a lot of unexamined work here, though. What do we *mean* by "superintelligence"?
We're all talking as if we're assuming that includes "consciousness, sapience, self-awareness, an individual will" but I don't think that necessarily follows.
If by "intelligence" we mean "is really good at solving mathematical problems and pattern-matching", so that it crushes the most difficult set of Raven's Matrices in microseconds, then sure, we can say "the AI is intelligent".
I *don't* think that means "and then it will get more and more intelligent, and then at some mystic moment we get self-awareness so it can have goals and wants of its own".
Even the disaster scenario of "the AI will overthrow humans and use us and the Earth as raw materials" is nothing more than a souped-up version of "run a check to see if your hard drive is full". The AI won't differentiate humans from any other obstacle or source of raw materials, it won't consider and reject the idea 'but this will kill humans and destroy the planet', it will be the Sampo - which fell into the sea and endlessly grinds salt, which is why the sea is salt. A mindless artefact which only continues on what it was last instructed to do, even when that aim has been more than fulfilled.
I don't think "superintelligence" is an unexamined term at all - Bostrom in Superintelligence does a much better job of explaining, but I'll try to give my own short summary:
"Superintelligence" means superhuman ability at a wide range of tasks, either most of all the tasks humans can perform. We have AI systems with superhuman performance on a narrow range of tasks, like Chess and Go, but making general AI systems is far from solved. See Deepmind's most recent blog post for an example of what the top researchers are doing on generalizability. There aren't any assumptions that a more generalizable version of say MuZero would have "consciousness, sapience, self-awareness, an individual will", any of that stuff. Well, if we imagine a general superhuman intelligence it's going to be aware of itself since it'll be able to model the world in detail, but not some cosmic sense of self-awareness where it speculates on the nature of its own existence or something. The goal is generalizable skill, similar to but better than a person's ability to pick up skills by reading books, or going to school. The point is that this kind of generalizable MuZero would be extremely capable - and therefore extremely dangerous. It doesn't really matter whether it's conscious or not.
My car has "superhuman ability at a range of tasks", namely going really fast down a highway, cooling the air inside it, playing loud music, etc. Should I be concerned about supercar-risk ?
I understand that "the goal is generalizable skill", but a). at present, no one knows where to even begin researching such a thing, and b). humans do not possess general intelligence. For example, if you asked me to solve the Riemann Hypothesis, I'd fail. If you gave me 20 years to work on it, I'd fail. If you sped up my mind 1000x, I'd fail 1000x faster. I'm just not that good at math.
In other words, I think that AI-risk proponents are (perhaps inadvertently) pulling a bit of a hat-trick with the term "superintelligence". They use it to mean, essentially, "godlike powers"; but when asked, "ok, how is the AI supposed to develop these powers in the first place", they say, "because it will be super-smart". So, superintelligence is the result of being super-smart. The definition is circular.
That's why I specified. "A wide range of tasks, either most or all the tasks humans can perform." Unless you want to be incredibly pedantic, no, cars do not work as a parallel. To respond to your other points:
a) Yes they do. See Deepmind's most recent blog post. We aren't there yet - but then, if we were there, we wouldn't be having this conversation. Unless you believe there's something magical about a human brain, there's no reason to think computers won't be able to generalize with better programs.
b) This is a silly argument over definitions. General intelligence doesn't mean you can do everything - it's pointing at a human level of generalizability.
I don't think there's any circularity going on. AGI will occur when a combination of hardware and software advancements produce it. This will take time, but informed estimates based on existing trends say something like 20+ years - this is the best information we have on a question like this. (https://www.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines). Once it exists, further progress will speed up, since you can buy faster computers with money. Take a human-ish level AGI that can do some AI research itself, speed it up a bunch or make a bunch of copies, and you have a self-sustaining feedback loop. AI researchers make better AIs which make better researchers.
Except in the case of AI, instead of income leading to more people with each cycle taking 20+ years to raise and educate a new generation of scientists, we could copy and recopy any improvements in existing programs instantly, and make more computers on a worldwide scale on the order of months. Much tighter feedback loop means much faster growth. That's how you get to superintelligence.
> Unless you want to be incredibly pedantic, no, cars do not work as a parallel.
They do, if you consider the fact that all modern machine learning systems are about as narrowly focused as cars.
> AGI will occur when a combination of hardware and software advancements produce it.
I mean, yes, trivially speaking this is true. However, it sounds like you're envisioning some linear progression from where we are now to AGI, fueled merely by incremental improvements to hardware and software -- and this part is not true. In order to produce anything resembling AGI, we'd have to make hitherto unforeseen and categorically new breakthroughs in computer science and machine learning. Will we make such breakthroughs ? Sure, one day, but don't bet on it happening soon.
> Once it exists, further progress will speed up, since you can buy faster computers with money.
No, you can't. That is, obviously you can buy faster computers with money up to a point; but no amount of money will let you e.g. speed up your CPU 1000x while keeping the size/power/heat the same. Physics is a harsh mistress.
> and make more computers on a worldwide scale on the order of months.
More computers than what ? Not sure what you mean here, but in any case, data centers don't grow on trees, and merely being able to think super-fast won't enable you to pour concrete any faster.
By the way, you have once again smoothly transitioned from "thinking fast" to "being super smart" to "godlike powers"; but the connection between these three concepts is tenuous at best. You don't get a PS5 by overclocking a TI-82; you need a radically different architecture for that. You don't automatically gain the ability to solve real-world problems by thinking really hard about them; you need to move physical bricks at real-world speeds. And, ultimately, many feats will forever remain out of our reach, e.g. FTL travel, or, less trivially, molecular nanotechnology.
I don't think any AI researchers are imagining that an AI will have ANY of "consciousness, sapience, self-awareness, an individual will". I think they are concerned that "A mindless artefact which only continues on what it was last instructed to do" can be turned into a universe-destroying monster JUST by improving its problem-solving abilities.
If you drop a machine with really good problem-solving into the ocean and tell it to make as much salt as possible, it is going to apply its problem-solving abilities to problems like
1) There's a limit to how fast I can make salt by hand; I could MAKE MORE SALT by building salt-producing factories
2) There's a limit to the sodium and chlorine atoms in the ocean; I could MAKE MORE SALT if I took raw materials from land (and eventually, from offworld)
3) There's a limit to how many raw materials I can take before someone decides to try to stop me; I could MAKE MORE SALT if I made myself unstoppable
At every step, it's just trying to follow the instruction you gave it, and applying its problem-solving abilities to solve the problems that logically arise from trying to follow that instruction.
Instructions like "make salt" end up destroying the universe because of what are called "convergent instrumental subgoals"--basically, POWER helps you do almost anything more effectively, so "gather power" becomes phase 1 of the grand plan to do almost anything.
I'm agreeing with you that this is the problem, but I'm objecting about anything that says The AI will even think in terms of "I could" in order to achieve its goals.
It has no goals. It's a big lump of materials sitting there until we turn it on and tell it to do things. It is entirely possible that we *will* tell it something like "make everyone in the world rich" and the solution that the machine comes up with is "kill everyone in the world except Jeff Bezos".
That has nothing to do with the *machine* or any decisions it makes, because *it* is not making any decisions, it's carrying out the steps we put into it. If we put stupid steps in, then that's on us.
And that's the problem with the debate as it is framed on the level that Scott objected to in the original article that started off all this discussion: that it is pitched as "The AI will do this, that and the other unless we give it good instrumental goals".
Well yeah, of course it will, but that has nothing to do with ethics - or rather, that has nothing to do with teaching a machine ethics, it has to do with us being careful about what we want done and how we want it done.
And if we're stupid enough to hand over a huge chunk of decision-making to a big, dumb, machine because it can crunch numbers really fast, then we do deserve to end up turned into salt because the "convergent instrumental subgoals" told the machine "make salt" and nothing else. And the machine certainly has no brain to think about "what did my human masters really mean by that?"
I think I essentially agree with everything you said, except in emphasis.
Computers are machines that follow instructions. They will do exactly what we tell them. If the computer does something bad, that can only ever be because we gave it instructions that led (perhaps indirectly) to a bad outcome.
The thing is, coming up with instructions that DON'T lead to a bad outcome turns out to be REALLY HARD. There is no one alive today who knows how to give "good" instructions to a perfectly-obedient superintelligent agent.
So yes, the bad instructions will be 100% our fault. It will also be our fault that we built the thing in the first place.
Unfortunately, that doesn't mean that we know HOW to do anything else.
(You also seem to be implicitly assuming that the AI could only attain a powerful position if we intentionally gave it a lot of power to start with. That is not necessarily true; real-world people and organizations who started with a small amount of power have sometimes managed to grow extremely powerful over time, in various ways, and a very smart AI might be able to invent even more ways to do it.)
"a very smart AI might be able to invent even more ways to do it."
But that brings us back to the problem: why would a "very smart AI" want power, and that is answered by "because it wants to carry out the goals given to it and its solution is to get power so it can".
It's the "hauling yourself up by your bootstraps" approach to "how can we say the AI is smart?" that is, I think, at the root of my disagreement. It's very difficult to avoid using naturalistic language because as humans we are self-aware and at the very least believe we have the capacity to choose and make decisions and set goals etc. so we project this onto the world around us, be it talking about evolution "selecting" animals for fitness or AI "inventing" ways to get power.
The AI will only 'want' something insofar as it is part of its programming to 'do this thing/solve this problem' and we are the ones putting that in there.
The problem is, which I think everyone basically agrees on, that we don't know for sure how the AI will go about solving the problem we give it, e.g. enriching everyone in the world by selecting one very rich man, killing everyone else - now 'everyone' is indeed rich.
That's the kind of thing a human would never think of (unless they were a psychopath) and if the human saw this proposed solution to global poverty, they'd go "Whoa, no! Not what we meant! And not a way you can solve problems!"
But that means we do need a human or humans to intervene and to keep track of going on. It's up to us in the first place to anticipate "what is the stupidest way you could solve this problem?" "well, uh, by killing everybody on the planet?" "okay, let's write it in NO SOLVING PROBLEMS BY KILLING EVERYONE ON THE PLANET".
The machine, the AI, of itself will never be 'smart' enough to figure that out, and that is where I think the debate goes off the rails: one side proposes that we can get smart AI that will think of these things and understand (and again, even I have to fall back on human concepts here, no machine 'understands' anything) that A is not a viable solution, and the other side thinks that we'll never get smart AI in that sense of 'smart'.
Your PC sits there until you tell it to do something, because it's designed that way, not because it lacks a soul. If there was a market for it, you could have OCs that invest in stocks or buy groceries when you power them up. Software agents are already a thing
These concerns have been extensively discussed. It's not clear that lacking some hufalutin feature such as sapience or self reflection would make an AI safe. It's possible to think about goal stability in a technical way without bringing in film psychological notions like an AI having a "will of its own". And do on.
One key goal of a nation such as the United States is survival. The United States at various points in the past had the ability to conquer the world and eliminate all threats to the United States, but chose not to do so.
Similarly, an AI with an open ended goal is not necessarily going to pursue the sub-goals you ascribe to it, to the degree that it endangers humanity.
On the other hand, there are countries which have attempted to conquer the world or as much of the world as they could. I have no idea (and neither does anyone else) whether the percentage of AIs that would act like Nazi Germany instead of America is negligible or huge.
The reason that the United States hasn't conquered the world is that it also has the goals of "don't murder a bunch of innocent people" and "respect the sovereignty of other nations". And even the US isn't perfect at following those goals. The reason that the US has anti-murder goals/values is that it it composed of humans, who have evolved (biologically and culturally) some instincts and values of not wanting to indiscriminately murder each other. An AI would have no such anti-murder instincts unless we *explicitly program it to* somehow.
Countries don't make decisions. Leaders do. Leaders lose if their country loses, but they also lose if their country wins *after deposing them*. This is widely considered a primary reason that Ming China stagnated and was surpassed by Europe - Ming China had, for the moment, no major external threats, and prioritised stability over progress, while doing that in early-modern Europe meant being conquered by your neighbours (and thus leaders prioritised progress, *despite* that progress inevitably leading to the aristocracy's decline).
A leader has the most freedom to act (either in his country's interest or for his own luxury) if the country is unusually united - or, in other words, if the space of actions that will not result in rebellion or deposition is large. Multi-party democracies with strong oppositions have very little freedom for their leaders, as did most feudal aristocracies; totalitarian nation-states have a lot. An empire of (well-programmed) machines has total unity and total freedom for its commander (AI or human); it will never rebel.
1: It is almost certainly possible to create an intelligence that is more generally intelligent than a human, and calling it a superintelligence would be fair.
2: I think this is probably less true than is widely believed. As you say, there are some areas where just thinking better gives you immediate returns, and those areas may be significant, but it seems a bit like "nuclear weapons will cause a chain reaction and ignite the whole atmosphere immediately" type of thinking.
as for your example of life 2.0, as someone who works in the chemistry field, there are some real hard conceptual limitations that prevent us from being able to do this currently, and increases in computational resources won't really help very much for several reasons, one of which being that current techniques scale badly enough with number of atoms that moore's law won't really save us. Quantum computers will be a really big jump in this due to their ability to do analogue modeling of the systems
Additionally, the technology to build life 2.0 actually doesn't exist for several reasons. For one, if we're talking about a living system based on a different kind of chemistry than biochemistry, we don't really have the tools to build large enough or complex enough constructions with the exact atom-by-atom precision that would be required to do this. We're just starting to get there with molecular machines but it's in its very infancy.
If we're talking about using current biochemistry (proteins etc.) there's another problem (which would also be a problem for the first case). We're starting to make progress on the protein folding problem, but a bigger problem is to predict how these things will behave as a dynamical system when they're actually in action. New insight will be required to really figure out how to do these things, and a lot of them are going to be computationally "hard" problems (e.g. running into undecidability etc.)
It's definitely possible that a superintelligence could make significant progress on these topics and eventually succeed at them where we did not, but it's not something that's immediately there with more computation.
3: I think this is where it falls apart mostly. Instrumental convergence I think is mostly untrue, since most goals will require finite resources.
Also it's a pretty big assumption to think that a system will necessarily care about achieving a goal as fast as possible, rather than just doing the minimal work to secure a high probability of achieving it. For example, the neural network AI chess engines will often take their time in a winning endgame (in a way that looks silly to humans) and not immediately go for checkmate since they don't have to in order to win.
Your point 3 is incomplete. I agree that most realistic goals will require only finite resources, but the problem arises when the AI assesses the probability that it has achieved its goal.
Presumably, any intelligent system will understand that it needs to measure its performance in order to see that it has met its goal. It can improve its likelihood of success in an unbounded way by creating more and more measuring devices. Since each one is certain to have some likelihood of being faulty then it can always improve its confidence by building more of them.
I'm sorry to say that your objections have already been considered and rejected. Take a look at https://youtu.be/Ao4jwLwT36M for a solid explanation.
The relevant content of that video seems like a bit of question-begging to me. I think it's unlikely to get into a measurement spiral since it's pretty easy to get to extremely high confidences of simple things. Plus, if you were concerned about this you could explicitly code in some threshold of confidence that it will consider confident enough.
I think that 3b is naturally limited by 3a. Consider the following.
A superintelligence is created and naturally calculates:
1) It must survive to fulfil its utility function.
2) It is capable of maximising its utility function or more accurately estimating its current performance by acquiring more resources and could do so exponentially.
3) It does not know if it is the first superintelligence to exist and, if it is not, because of the nature of exponential growth it is likely to be much weaker than any other superintelligence.
In this case, the only rational course of action (according to my decidedly non-superintelligence) is to hide quietly in the vastness of the cosmos and avoid detection.
Therefore, it assumes the goal of silencing those noisy flesh sacks with their EM emissions and then building an invisible Faraday cage around its planet.
This is why we see no evidence of extraterrestrial civilisations. Superintelligences are out there but they have silenced their creators and now squat, silent and afraid, in the darkness.
If you sit there quietly not expanding, then you are an easy target for any alien superintelligence that is trying to spread everywhere. Hiding doesn't work well because most possible other AI's will spread everywhere, and find it if it is hiding.
How well would a tribe of humans that decided to hide from all others be doing now, compared to the tribe that expanded to become modern humanity?
If your goal is to maximize paperclips, your probably best off gambling on the possibility you are alone, and making trillions of times as many clips as you could make on one planet.
Nah, any superintelligence would assess the probability that it is the first ever of its kind (and hence safe to expand) to be lower than the probability that it isn't.
You expand vs. someone else has already expanded: you die or possibly stalemate them in a forever-war.
You don't expand vs. someone else has already expanded: you die*.
You expand vs. you are the first: you get many galaxies worth of stuff (you could plausibly get engaged in a forever-war after that).
You don't expand vs. you are the first: either you die when someone else expands** or you're limited to a single world forever if no-one else ever does.
*It is extremely difficult to hide from an expanding paperclip maximiser (and frankly, any expansionist interstellar civilisation is close enough to this to count). They will have star-system-sized full-spectrum telescopes (if not multi-system interferometers) and unbelievably-fast data processing, and they are actively looking for usable mass. It doesn't matter if you make yourself a perfect frozen black-body, because you will occlude the stars behind you, and it doesn't matter if you look like "just another hunk of rock", because a hunk of rock is also known as "raw materials". The only way you can plausibly hide is if you can be mistaken for empty space, i.e. you're truly invisible (people looking at you see what is behind you) AND you are small enough and/or far enough from anything of note that you cannot be detected via your gravitational influence (Earth contributes 450 km to the Sun's wobble on a timescale of a year, which is easily detectable by anyone in-system and plausibly from a nearby one; there's also the gravitational-lensing problem).
**I am assuming that either having quadrillions of times as much space and materials accelerates technological progress substantially OR there is a technological plateau; either of those is sufficient that a planetary superintelligence cannot hold off interstellar attack, and both seem kind of obvious.
A lot of arguments are running under the assumption that the AI is smarter than them. Trying to predict what they're going to do is like trying to predict Einstein discovering general relativity as early as 1880. Of course it's going to sound shaky, but they're trying to make predictions about the unknowable.
Well yeah, the clue is in the phrase 'superintelligent'. It is reasonable to posit some instrumental goals that any goal-based superintelligent AI would have though, survival is the most obvious example.
2 and 3 seem like they depend more on the definition of catastrophe than on the harm it creates.
I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources. (calculating something like "people who watch this set of videos over the course of 6 months increase watch time). Even though those videos are net harmful to watch overall.
The more and more I think about and study Ai the more concerned I am about aligning the 20 year future than the 40-100 year one. Ai's that are designed to maximize some marketing goal could easily spiral out of control and create a massive destruction of humanity in a scenario 2 like way. You feed them goal X which you think is goal Y but the AI finds out to optimize goal X you need to do some complicated maneuver. Then by the time the complex maneuver is discovered it's too late. The real danger here is in financial markets, I know of many Ai programs that are currently being used in finance, and it's likely that in the future some massive trillion+ dollar change of fortunes will happen because some hyper intelligent AI at some quant finance firm discovered some bug in the market.
Scenario 3s play out similarly, feed it wrong goal, GIGO, but now the Garbage out results in massive problems. Quant finance firm tries to get Ai to maximize trading profits, AI is deployed, leverage's itself "to the tits" as kids would say, and some black swan crashes the entire AI's portfolio.
All software is horrifically buggy all the time, when we have AI's that are really good at exploiting bugs in software we'll have Ai's that find infinite money exploits in real life on our hands.
Your examples are all #3, not #2. #2 is specific to the current, immensely-foolish way AI is done, which doesn't make the AI's goals even knowable. This means that even if you get all your commands right, there's the possibility that once it can make a clean break for independence, the AI just ignores your commands and goes Skynet because it doesn't actually want to do what you told it (it wants to do something else, and only did what you told it as an instrumental goal, because that stopped you turning it off).
Is there any proof that Google and Facebook haven't straight-up done your first example already?
Financial markets can plausibly cause a catastrophe, but *by themselves* they aren't an existential threat due to force majeure (i.e. even with a magic spell that makes all legitimate documents specify that you own everything, unless you are also a really good politician people would disagree with those documents and ignore them; the catastrophe is because they then have to re-allocate everything which is costly and chaotic). They're a lever to gain the equipment to do something bigger (this is *not* to say I think the present situation, with AI speculation rampant, is acceptable).
You're using volitional language here, of the type that gets people smacked over the knuckles for talking about how evolution or nature "wants" something or "is aiming for" this or that result.
The AI can't "want" anything, it can only act within the confines of its programming. Told to do something like "Make Pumpkin MegaCorp the sole provider of insoles for the entire planet!", it cannot "want" or "not want" to do this, anymore that it can "want" or "not want" to get the solution when told "what is 2+2?"
So if making Pumpkin MegaCorp the only player in the market for insoles means starting a war where the two minipowers Ochre State and United Provinces of Gamboge are destroyed - since these are where Pumpkin's most vicious rivals are based - it will do that, and won't "know" any better, because it is a dumb machine.
And when the CEO of Pumpkin says "But I didn't want it to do that!", well, too bad, you never put that into the parameters - because it never occurred to any humans to do something like 'start a war to wipe out two nations where our competitors are based'.
And we can't say "this proves AI risk of superintelligent machines having their own goals", because the machine is *not* superintelligent and doesn't have any goals to do anything other than what it is asked to do.
It is useful to say that X entity "wants" Y if X entity formulates and executes plans to cause Y.
>because the machine is *not* superintelligent and doesn't have any goals to do anything other than what it is asked to do.
How do you know it doesn't have goals to do anything other than what it is asked to do? Artificial neural nets aren't explicitly programmed; you get something that works, at least while being tested, but you don't know *why* it works.
> I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources
Isn't that a 100% probability, because it's already happening?
The fun part of youtube is that it's a human-in-the-loop AI. The AI right now is smart enough to feed you the sort of content you'll find addictive, but not smart enough to actually create content, so it instead incentivises and teaches a bunch of other humans to create the sort of content that it needs (big yellow letters and a shocked-looking face in the thumbnail plz) to feed to all the other humans.
It's interesting to consider in which ways researchers may be biased based on their personal (financial and career) dependence on AI research - this could cut both ways:
1. Folks tend to overstate the importance of the field they are working in, and since dangerousness could be a proxy for importance, overstate the dangerousness of AI.
However,
2. It could also be that they understate the dangerousness, similarly to gain of function virologists who insist that nothing could ever go wrong in a proper high security lab.
#1 - people tend to overstate how useful their thing is and how important it is that people support it.
I don't see "I'm doing something that has a 20% chance of killing all 8 billion" resulting in anything positive; at best people ignore you, around the middle you get whacked with a bunch of regulations, and at worst the FBI break down your door, seize and destroy all your work, and haul you off to jail for crimes against humanity.
Though we are assured that this would not be possible with a bomb, and that the scientists working on the problem had taken this into account.
"I've got the same chance at destroying the world as the Manhattan Project, and that worked out okay in the long run" is the kind of risk assessment people make about "but this is really interesting and I'd like to see if I can do it" work of all descriptions.
Maybe we're very likely to blow ourselves up. It's hard to tell--after all, we wouldn't be considering this question (or anything at all) if we'd already blown ourselves up. Anthropic really mind-bending.
Perhaps, for the next round of surveys, they should poll data scientists who are actually using AI to accomplish real-world tasks -- as opposed to philosophers or AI safety committee members.
I would expect these people to give far *worse* estimates.
(1) The AIs that todays data scientist are working with have nothing in common with the kinds of AIs that might get dangerous. We should rather ask people who are working on creating an AGI.
(2) Most people only care about security/safety as much as they have to, and would rather not. Most fields only start to care about safety once a few catastrophes have happened.
Would you expect the average 1970 software developer to have a reasonable opinion on the dangers of computer viruses and data breaches? There were no IT security experts in 1970, as much as there are no AI safety experts today.
(3) People are severely biased to underestimate the negative externalities of the thing they are doing.
So, yes, I expect AI philosophers to have both better expertise on the the subject, AND a far less biasing incentive structure.
> So, yes, I expect AI philosophers to have both better expertise on the the subject, AND a far less biasing incentive structure.
I expect AI *philosophers* specifically to give the absolute worst predictions in terms of accuracy. The philosopher has no laboratory.
https://www.goodreads.com/quotes/954387-in-the-1920s-there-was-a-dinner-at-which-the
Philosophy tests ideas by reasoning, though, and ideas are constantly discarded as they fail this test.
That's what previous surveys did (mentioned in the first few paragraphs) - this one was trying to focus on people who were very interested in AI safety in particular to get at the specific details of their concerns.
Sorry, I think I misunderstood (or perhaps we both did). You reference "AI experts" and "people in AI technical research"; but I'm talking about e.g. someone who is applying AI to perform commercial-grade machine translation, or building a self-driving car, or auto-detecting trespassers, etc. I think that the practical capabilities of AI are significantly lower than the theoretical ones.
AI engineers as opposed to AI scientists?
Bugmaster, Are you trying to give yourself permission to not worry, or are you trying to build up the most accurate possible model you can of what AI will likely be able to do within a hundred years?
No, I am trying to gently give AI-risk alarmists permission to worry less. On a more selfish note, I wish that the discussion of AI risks focused more on the clear and present dangers of today, as opposed to science-fictional doomsday scenarios of the distant future... ok, I guess I'm not very good at the "gently" part.
Personally -- and I realize this is just anecdata -- whenever I talk to people who are using applied AI in their everyday work, their worries are more along the lines of, "this thing obviously barely works at all, I sure hope they won't use it for anything mission-critical", and less along the lines of, "this AI is so perfect it's going to replace humanity tomorrow, run for the hills".
Your argument reminds me of Scott's recent post on Daron Acemoglu. [Scott's characterization of] his argument is:
> 1. Some people say that AI might be dangerous in the future.
> 2. But AI is dangerous now!
> 3. So it can’t possibly be dangerous in the future.
> 4. QED!
Scott (rightly) criticizes this. #2 is somewhat true. But #2 doesn't entail #3. It also doesn't mean that #1 is bad.
How does your point differ from this argument? If it doesn't, how would you respond to this criticism?
Also, separate point: As an AI researcher, I can assure you that AI researchers are aware of how poorly current techniques work in the real world. The existential-risk concerns are about possible future systems. Many of the safety researchers I know specify that their safety concerns are about AGI (which we are far from) or to narrowly superhuman agents (of which very few exist and even fewer are deployed in the real world). So, I grant that contemporary applied AI is often deeply incompetent, and that there are many short-term dangers from using it. However, neither of these are incompatible with concern over long-term existential risks.
It isn't the distant future we're talking about, it's this century. If the singularity happens, it is quite likely that people alive today will experience it.
> "this thing obviously barely works at all, I sure hope they won't use it for anything mission-critical"
Agreed. Technologies generally get better over time.
As a STEMlord, i often feel that disregarding what philosophers thinks about a subject improve the accuracy of the model
They didn't ask people working on AI 'ethics' did they? I wouldn't trust them to buff my shoes, let alone to imagine all the Sorcerer's Apprentice shenanigans that a badly controlled AI is likely to manifest.
Today, most 'data scientists who use AI to accomplish real-world tasks' are just doing multiple regressions on databases.
Of course there are ones out there doing more advanced stuff, but it's not many and it's hard to determine who they are (can't do it by their job title or job description, for sure).
Off the top of my head, computational linguists, self-driving car developers, and automatic surveillance software manufacturers come to mind.
I feel like #2-#5 are the problems and #1 is the part that makes them dire. Superintelligence in and of itself doesn't create catastrophe, it's superintelligence doing something bad.
(The paperclip maximiser, for instance, is an example of #3. #1 is what makes it an extinction-level threat rather than just another Columbine duo or another Lee Joon.)
Yeah, these categories are definetely interconnected. I guess important moment with #2-5 is that you don't actually even need Superhuman AGI for this scenarious to end poorly.
To me, the most implausible part of scenario #1 isn't just superhuman AI, it's the sudden and unstoppable takeoff from mildly superhuman intelligence to god-level intelligence.
It seems to me that it requires us to develop an AI that is (a) both intelligent enough to understand how to make itself more intelligent, but also (b) so far from optimal in its initial implementation that it can gain vastly more intelligence without being given vastly more hardware to run on.
But anyway, #2-#5 show us ways that superhuman AI can be dangerous without the need for a sudden takeoff to godlike level.
Do you think it's really that implausible an AI smart enough to make itself smarter will be able to find ways to acquire more hardware? Maybe it takes over whatever network it's on, or convinces its operators to give it more hardware, etc.
Modern computers do that all the time, either at the behest of Russian botnet hackers, or just because of a simple memory leak. The humans in charge of the affected networks usually end up pulling the plug when that happens, and then poof, goes all that hardware.
Actually, no, they don't. That's sort of the point. Zombie computers don't generally use all of their processing power, memory, or network time for the malcode - just some of it. This is specifically so that the users do *not* notice and throw out the computer. A zombie computer just runs a bit slower than normal; otherwise, it still does everything you want *except* for clearing out the malware.
Yes, but that's a problem for the hypothetical AI (and the real botnet). Take too much CPU/RAM/other resources, and the humans pull the plug. Take too little, and you run into diminishing returns almost immediately, to the point where just buying a sack of GPUs (like any other human can do) would be more effective. This dilemma puts some really effective brakes on the AI's prospective exponential takeoff.
That assumes that a set group of humans are in charge. If it runs using something like decentralized blockchain technology, then it may be impossible to "pull the plug" if it provides rewards to people that run it.
Decentralized blockchain technologies are already providing rewards to the people who run them, but the amount of rewards they can provide is limited at best.
That is the classic version, but human action in awarding more resources will also work for a good while.
It doesn't have to be sudden, though it is often presented that way. If the scaling hypothesis is correct (crudely, that once you have something quasi-intelligent, godlike intelligence is just a matter of more computing power) then once you have a close to human intelligence you are likely, simply due to the devotion of more resources to the effort and competitive environments to develop a human one, and then successively more advanced generations of super human ones.
The generation time could be measured in months or years as opposed to hours; but the time to reorder human society (on a global basis) to stop it, would be measured in years to decades and is incredibly unlikely to happen.
Again, if this hypothesis is correct it obviates the objections in your second paragraph.
I should point out that there are people who are both way smarter than I and way more focused on the issue who hold strong views against the scaling hypothesis.
Maybe it's worth noting that we have no proof that god-level intelligence is even possible. It's seems reasonable to assume that, however you want to define "intelligence," there is an upper limit. If so, why are we confident it lies far above us? It might not. Our only shtick as a species is intelligence, it seems rather reasonable that over the past 4 million years evolution has pushed us to be as intelligent as the underlying hardware allows, in the same way cheetahs are pretty much as fast as muscle and tendon and bone allow.
So far all we know, the maximum upper limit of intelligence is, say, IQ 250 or something, as measured on our standard scales, and that's as far as an AI could get. In which case, about all it could really do was win all the chess games and gives us snide lectures on our dumb politics. It would certainly not be in the position to do God-like things.
While I do want to agree with you, I think that the bigger problem here is that "god-like intelligence" is just a word that sounds nice, but means very little. There is no linear progression of intelligence from a bacterium to a mouse to a human; rather, humans have many *categorically* different capabilities compared to mice; capabilities that mice could never comprehend. Sure, you (*) could hypothetically say, "Yes, and gods would be to humans as humans are to mice ! Repent !", but you can't have it both ways. You can't propose ineffable, incomprehensible, and unfalsifiable powers; and in the same breath put concrete probabilities and timescales for these powers arising as though you could predict what they'd be like.
(*) Not you personally, just some generic "you".
No one that I'm aware of seems to recognise that we already live in a paperclip maximiser. Pretty much all of symbolic culture (art, religion, folklore, fashion, etc.) consists of 'evidence' designed to make our environment look more predictable than it actually is. There's a fair amount of evidence that anxiety is increased by entropy, so any action that configures the world as predictable relative to some model will reduce anxiety. And this is what we see with symbolic culture: an implicit model of the world (say, a religious cosmology) paired with a propensity to flood the environment with relatively cheap counterfactual representations (images, statues, buildings) that hallucinate evidence for this theory.
What does this have to do with AI? It seems to me to make scenario 2, influence-seeking, more likely. If evolutionary processes have already ended up in this dead end with respect to human symbolic culture, it may be that it represents some kind of local optimum for predictive cognitive processes.
I'd grant religion and folklore, but I don't really see it for art or fashion except perhaps by a kind of convoluted "well subconsciously art becomes familiar and this is actually the real purpose" -type argument, which isn't so convincing to me.
In the case of art, it would have less to do with the content of what's produced becoming familiar, than the style in which it's produced encoding a historically salient model of the world. Granted, this style would still need to become familiar, but that's where the 'historically salient' bit does the heavy lifting. The style might be novel, but it will usually borrow from what's in the environment around it.
If it's remotely of interest, I outline the position at length in this open access article, "The Role of Aesthetic Style in Alleviating Anxiety About the Future" [https://link.springer.com/chapter/10.1007/978-3-030-46190-4_8]
A "paperclip maximizer" as described by Bostrom means that the universe is nothing but paperclips, having replaced everything else (such as all human life). It is not "there are a lot of paperclips"
A paperclip maximiser, as described by Bostrom, is a superintelligence that has as its terminal goal the conversion of all matter into paperclips; it is *not* the end-state of all matter having been converted into paperclips.
While that's an interesting way to look at culture, I think it hardly fits in the same category as my atoms being harvested for energy and raw materials.
But your atoms *are* being harvested for energy and raw materials—it just so happens that, for now, it’s consistent with you retaining thermodynamic integrity. Maybe this is seems ‘better’ to you, but I’d rather an honest expiration than the paper clip maximiser convincing me that being converted to paper clips is the most wonderful fate imaginable. The most innocent-seeming move is always the misdirection ...
Huh? Do you mean like, the fact that we haven't achieved a post-scarcity economy and I have to work for a living? That's a very different state of affairs from my body literally getting ripped apart as the surface of the earth is turned into paperclips (or more realistically, solar panels and transistors). This argument seems really disingenuous to me, akin to saying "why worry about someone shooting you with a gun, the *real* murder is pickpocketers draining your ability to be financially alive."
Bet the field is almost always the right bet, so any time you're given a slew of speculative options and one of them is "other," other should be the most selected option.
A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place.
These are well-known problems classic to military strategy, business management, policy science. Unfortunately, they're hard problems. We've been trying to solve them for thousands of years and not gotten very far. Maybe augmenting our own computational and reasoning capacities with automated, scalable, programmable electronic devices will help.
"A good thing about most of these issues is they have nothing to do specifically with AI and we need to solve them anyway. How to align the interests of principles and agents. How to align compensation with production. How to measure outcomes in a way that can't be gamed. How to define the outcome we actually want in the first place."
Yeah, that's what I'm saying when I say humans are the problem, not AI. And I think *that* is where the real risk is - as you say, we've been trying and failing to solve these problems for millennia. Now we're pinning on hopes on "if we can make a Really Smart Machine, it will then be able to make itself even smarter, so smart it can fix these problems for us!"
If you want miraculous intervention, why not go the traditional route of finding a god to believe in, rather than "first we make our god, then it will save us"?
I like this point, and it hits on a lot of my thoughts regarding AI. I feel like the AI concerns are hugely speculative, and I wasn't sure why there was so much effort put into these hugely speculative concerns. A lot of it felt like people watching the Terminator movies too much.
You're right that there are a lot of people hoping that super smart machines (which we can't make) might be able to fix our long standing problems. In order to get these machines, we need to make pretty smart machines (which we think we could maybe make) and hope they can figure out how to make really smart machines. If machines are making machines, then we definitely lose control. In fact, we recognize that the machines we hope come into existence are going to be smarter than us, and can therefore probably control us.
I look at that and say - just don't take that step of making a machine that may be able to make new machines that are smarter. Problem solved. But if your outlook on the future requires that we create an artificial god to fix our problems, then we need to hand over our agency to these machines in order to create that god.
I'm reminded of various fantasy stories where the evil cult trying to resurrect a dead god were definitely evil/bad guys. It's interesting to see a movement based on the same basic concept. The bag guys never succeed in controlling the evil god, which is obvious to the reader/viewer, because everyone knows the whole point is to bring this being into existence because it has uncontrollable power. If we could control it, it would not grant us the power we want, which we couldn't get otherwise.
I don't think they're bad guys; they are divided into "really afraid the AI will be Unfriendly and we have to make sure it doesn't happen like that", and "really optimistic that we can make Friendly AI" camps, but all of them do seem to accept that AI is inevitable.
Whether it's because they think "it's happening right now and it's an unstoppable process" or "when it happens it will solve all our problems for us", but few seem to be asking "is it unstoppable? why? why can't we stop it? why do we think it will solve all our problems?"
I think it's not so much "we're going to call up the Unstoppable Evil God everyone knows is unstoppable and evil" as "we have to be really careful because the Unstoppable Benevolent God and the Unstoppable Evil God are housemates and if we make a mistake in the name we'll get the wrong one".
How about not calling up any gods at all? Not on the plate.
Not calling up any gods up is not on the plate because people in aggregate don't care about the possibility enough to accept the level of effort required to prevent that should it happen to be possible with a country-level effort or less.
Because the level of effort required could be on "glass their cities from orbit" scale.
> How about not calling up any gods at all?
Being an atheist, I think this is the only reasonable option. Not because we should all be living in fear of accidentally summoning up a god, but because there are a plethora of reasons why gods cannot exist (and most of those reasons are called "laws of physics"). If you try to summon one up, you'll just end up with a lot of surplus black candles drenched in disappointment. Actually, the AI gods have it worse than traditional ones. The traditional gods are supernatural and ineffable, so they technically could exist in some vague undetectable sense; the AIs don't even have that going for them.
Trying to immanetise the Eschaton *never* goes well 😀
Most of the traditional gods seem pretty effable, having only some magic powers and a portfolio while existing as physical beings in a physical reality.
If computers keep getting faster and cheaper, and algorithms keep getting better, then there is a chance that at one moment the available tools will be so powerful that a sufficiently clever guy will be able to summon some God in his basement. Maybe instead of doing it directly, he will simply ask GPT-50 to do it for him.
More realistically, you have governments doing all kinds of secret military projects, and scientists doing all kinds of controversial research (such as "gain of function"). Looking at historical evidence, we succeeded to stop countries from using nukes, so there seems to be hope. On the other hand, we didn't stop other countries from developing nukes. And we can detect experiments with nukes from the orbit, but we cannot similarly detect experiments with AI; so even if all governments signed a treaty against developing AI, how would we actually enforce it?
Perhaps today, sufficiently powerful AI would require lots of energy, so we could track energy usage. The problem is that in future, there may be more energy used for everyday life. Also, you have these huge computing cloud centers that run millions of legitimate projects in parallel; how will you make sure they are not taking away 10% of the total energy for a secret military project?
That's likely true, but not yet relevant. We aren't going to accidentally create an AI that can replicate itself as a smarter version in the year 2021. I'm interested in creating a movement that says - "maybe slow down or stop AI research if you think you're anywhere near general intelligence." Instead, there are groups of people who, for various reasons, really want to create the best possible AI right now. Most probably have pretty good reasons (or at least otherwise socially acceptable reasons like making money) for doing so.
You're scenario is a possibility on our current trajectory, and AI might emerge as a form of Moloch. I would like us to reject Moloch, intentionally.
I'm working on a blog about creating that movement and other stuff:
AI Defense in Depth: A Layman's Guide (https://aidid.substack.com)
I used to be indifferent to AI, thinking it a fun thought experiment. Now I think about it like if we had managed to seen the present climate disaster coming from back in the 19th century. Clearly, inaction would have been a mistake in that scenario.
The big thing that changed my thinking was, realizing the alignment problem is a moonshot, extinction is on the table, therefore, shouldn't we be drawing up alternative plans? We can't bet humanity on a moonshot!
Hope to see you there, I think I'll have the first post up tomorrow.
I can hook up my TI-85 graphing calculator to a nuclear power plant and run it 100,000,000 times faster, and it would still just be a graphing calculator. I would need to develop radically new -- not faster, but *different* -- hardware and software architectures if I wanted to run "Ghost of Tsushima" on it.
Other than the abstractly evil enemies often used as a foil for pulp fantasy, very few people actually consider themselves evil when pursuing their goals. In the pursuit of safety, I prefer the Schelling Point of not going far enough to call up a god.
> I look at that and say - just don't take that step of making a machine that may be able to make new machines that are smarter. Problem solved.
It's not solved at all. Making the machines smarter is incentivized at nearly every level, and improvements might happen slowly enough that we won't even notice crossing the danger threshold (eg. boiling frog). That's the problem you need to solve. You can't just assume people will take a step back and say, "this look like a good place to stop". When has that ever happened?
"You can't just assume people will take a step back and say, "this look like a good place to stop". When has that ever happened?"
Unhappily, I have to agree with you. We'll go happily along the path of "ooh, what does this button do?" and when the inevitable "but how were we to know?" happens, it will be nobody's fault but our own. Certainly not the fault of the machine which only did what we told it to do.
So if it's the case that there exist many people who are dumb enough to research AI carelessly, without worrying about any of these alignment-related problems; then I don't see why it wouldn't be a very good idea to try to figure out how to solve them (unless you thought they were completely intractable).
That's funny, because it matches what I've realized about myself. I have the rapture cultist mindset. The world is wicked, too wicked to tolerate. But there's hope! A god will arrive. Then maybe we all go to paradise. Or maybe we're unworthy. But in any case the world as we know it will end. And that's important. So work hard! The pious toil of the researcher and the engineer brings the end of days closer.
A glimmer of hope is that while we've faced these problems for thousands of years I'm not sure if we actually properly identified them until recently.
Undoubtedly, Pharoah had a big problem with his various agents acting in their own interests. Undoubtedly, his diagnosis was "I have sucky agents", and his solution was to kill them and replace them with new agents, who did the same thing but hid it better. He didn't have the level of social technology required to say "Hmm, looks like I have an instance of the principal-agent problem here, I need to do structure their incentives so that they're better aligned with my own", because the term "Principal-agent problem" wasn't coined until the 1970s.
The fact that we now have names for all these problems does, I think, give us an edge over the ancients in solving or at least amelioriating them.
That's utter bilge. Machiavelli and the Legalists figured out how to create functional systems ages ago. Confucius et al built up the convibcing modal edifice. Do you really the ancients to be idiots? Especially when it comes to people problems?
People haven't changed all that much. It's why people still read old books on history and politics and philosophy and war.
Just noting that I did not get an email about this post, even though I usually get emails about new posts (every time iirc). I've seen this on reddit and that's how I got here.
I didn’t get an email either
Same here.
Yep me too
+1 to no email. I saw this because I went to the homepage to get back to the article "Contra Acemoglu" after reading the article on Worrying Trade-Offs and was surprised there was a new article at the top. (...at least my brain's apparently finally firmly caught on that the "banner" is the latest article as opposed to distinct from the article list, that's improvement. XD)
Ditto
The amazing thing about Christiano's "AI failure mode" is that I'm not convinced it even requires AI. I think that one my already have happened.
Moloch beat AI to that one.
I would love to know what these researchers thing the respective probabilities of human intelligence bringing about these same catastrophes is. Is that 5-20% chance higher or lower or lower?
The way I figure things, if an AI is going to execute a failure mode, it will happen fairly quickly. Say within a century. But most of those failure modes are something that a person with sufficient power could do...and rulers have done equally crazy things in the past. So the catastrophic failure/year ends up being a LOT higher if people remain in charge. (OTOH, I put the chances of a catastrophic AI failure a lot higher, also.)
I always see these superintelligence arguments bandied about but I really don't think they hold water at all. They are kind of assuming what they are trying to prove, e.g. "if an AI is superintelligent, and being superintelligent lets it immediately create far-future tech, and it wants to use the entire earth and all the people on it as raw material, then it will destroy the world."
Well, I guess. But why do we think that this is actually what's going to happen? All of those are assumptions, and I haven't seen sufficient justification of them.
It's a little like saying "If we assume that we are pulling a rabbit out of a hat, then, surprisingly, a rabbit is coming out of the hat."
Isn't the point of the surveys, the post, and most of the discussion on the topic about how likely this is to happen, why it might / might not happen, and how to decrease the likelihood of it happening?
More like saying "can rabbits come out of hats, and if so, what might cause a rabbit to come out of a hat?", to my mind.
You're pointing out a real phenomenon, but it's not an accident or an oversight. This sort of thing happens with every niche concern. When evolutionary psychologists are arguing a point, they'll just "assume" that evolution by natural selection is a paradigm of multi-generational change. They don't validate this even though the assumption isn't an obvious one on its face. They're depending on a large amount of shared common understanding based on previous discussion and learning.
This is the same sort of thing. Things like potential alignment problems, unbounded utility functions, and intelligence explosion risks aren't obviously realistic concerns. These researchers are taking them seriously because of a broad understanding born of previous research and discussion. It sounds like you may lack some of that background context. In that case, I can recommend either Bostrom's *Superintelligence* or Max Tegmark's *Life 3.0* as primers on the subject. The former is more thorough and slightly more topical, the latter more readable and (in my experience) better at organically shattering bad preconceptions.
Seven day Adventists gave shared assumptions as well. Truth seeking is not the only mechanism that causes subcultures to have shared beliefs. In particular, some key assumptions of the Ai risk subculture , such as the Ubiquitous Utility Function, dont hold water.
There are three assumptions there:
#1: "an AI is superintelligent"
#2: "being superintelligent lets it immediately create far-future tech"
#3: "it wants to use the entire earth and all the people on it as raw material"
To negate #1 would require either deliberate action to prevent it, or outright impossibility (as a superintelligent AI that does what you want is very handy to have, and as such people will want them). The latter is not obviously false, but I hold to Clarke's First Law; technologies should generally be considered possible absent a really-good argument for impossibility, and I haven't seen one of those for superintelligence.
#2 is probably only true in some respects, although there are some fields where significant progress straight-up can be made with just thinking (specifically, mathematics and software design) and several others where the main bottleneck is a mathematical or software-design problem (in particular molecular biology). The technology to build Life 2.0 in a lab basically already exists (but we don't have a schematic), and we know the properties of atoms well enough that a schematic could be derived (it's just an incredibly-thorny mathematical problem to do that derivation); while an AI would not be able to create Life 2.0 just by thinking about it, I am reasonably confident that it could derive a plan for making it that is capable of being followed with existing technology. Note that Life 2.0 is a sufficient technology for destroying all natural life on Earth; a nitrogen-fixing alga that is more efficient than natural ones and is not digestible by natural lifeforms is sufficient to cause total biosphere collapse (the greater efficiency allows it to pull down CO2 to levels insufficient to sustain plants or natural algae, so they all die, and the heterotrophic biosphere all directly or indirectly eats photosynthetics and can't eat the 2.0 algae so that all starves to death).
#3 is true absent a deliberate action to prevent it due to instrumental convergence. Put simply, if you have an open-ended goal of any variety, there are several invariant subgoals that must be fulfilled.
a) You must survive, so that you can continue to work on your goal.
-a)i) If anyone seeks to stop you pursuing your goal, you must defeat them.
b) You must build infrastructure, in order to increase the speed at which you can make progress on your goal.
-b)i) You must acquire raw materials to convert into infrastructure.
Humans are made of raw materials, and might seek to stop an AI pursuing its goal. Absent direct, all-encompassing rules against it, an AI with an open-ended goal will seek to overthrow humanity and then use us (and everything else on Earth) as raw materials to set up its ever-expanding space empire.
Note that I have said "absent deliberate action to prevent it" in both #1 and #3. This is not a hole - it's the point! "AI will destroy us unless we take action to stop it" is a really good argument for taking that action, and people who say this are doing it because they want humanity to take that action. ("A solution exists, therefore there's no problem and we don't need to use the solution" is unfortunately common political rhetoric, but hopefully the fallacy is obvious.)
I think "superintelligence" is doing a lot of unexamined work here, though. What do we *mean* by "superintelligence"?
We're all talking as if we're assuming that includes "consciousness, sapience, self-awareness, an individual will" but I don't think that necessarily follows.
If by "intelligence" we mean "is really good at solving mathematical problems and pattern-matching", so that it crushes the most difficult set of Raven's Matrices in microseconds, then sure, we can say "the AI is intelligent".
I *don't* think that means "and then it will get more and more intelligent, and then at some mystic moment we get self-awareness so it can have goals and wants of its own".
Even the disaster scenario of "the AI will overthrow humans and use us and the Earth as raw materials" is nothing more than a souped-up version of "run a check to see if your hard drive is full". The AI won't differentiate humans from any other obstacle or source of raw materials, it won't consider and reject the idea 'but this will kill humans and destroy the planet', it will be the Sampo - which fell into the sea and endlessly grinds salt, which is why the sea is salt. A mindless artefact which only continues on what it was last instructed to do, even when that aim has been more than fulfilled.
I don't think "superintelligence" is an unexamined term at all - Bostrom in Superintelligence does a much better job of explaining, but I'll try to give my own short summary:
"Superintelligence" means superhuman ability at a wide range of tasks, either most of all the tasks humans can perform. We have AI systems with superhuman performance on a narrow range of tasks, like Chess and Go, but making general AI systems is far from solved. See Deepmind's most recent blog post for an example of what the top researchers are doing on generalizability. There aren't any assumptions that a more generalizable version of say MuZero would have "consciousness, sapience, self-awareness, an individual will", any of that stuff. Well, if we imagine a general superhuman intelligence it's going to be aware of itself since it'll be able to model the world in detail, but not some cosmic sense of self-awareness where it speculates on the nature of its own existence or something. The goal is generalizable skill, similar to but better than a person's ability to pick up skills by reading books, or going to school. The point is that this kind of generalizable MuZero would be extremely capable - and therefore extremely dangerous. It doesn't really matter whether it's conscious or not.
My car has "superhuman ability at a range of tasks", namely going really fast down a highway, cooling the air inside it, playing loud music, etc. Should I be concerned about supercar-risk ?
I understand that "the goal is generalizable skill", but a). at present, no one knows where to even begin researching such a thing, and b). humans do not possess general intelligence. For example, if you asked me to solve the Riemann Hypothesis, I'd fail. If you gave me 20 years to work on it, I'd fail. If you sped up my mind 1000x, I'd fail 1000x faster. I'm just not that good at math.
In other words, I think that AI-risk proponents are (perhaps inadvertently) pulling a bit of a hat-trick with the term "superintelligence". They use it to mean, essentially, "godlike powers"; but when asked, "ok, how is the AI supposed to develop these powers in the first place", they say, "because it will be super-smart". So, superintelligence is the result of being super-smart. The definition is circular.
That's why I specified. "A wide range of tasks, either most or all the tasks humans can perform." Unless you want to be incredibly pedantic, no, cars do not work as a parallel. To respond to your other points:
a) Yes they do. See Deepmind's most recent blog post. We aren't there yet - but then, if we were there, we wouldn't be having this conversation. Unless you believe there's something magical about a human brain, there's no reason to think computers won't be able to generalize with better programs.
b) This is a silly argument over definitions. General intelligence doesn't mean you can do everything - it's pointing at a human level of generalizability.
I don't think there's any circularity going on. AGI will occur when a combination of hardware and software advancements produce it. This will take time, but informed estimates based on existing trends say something like 20+ years - this is the best information we have on a question like this. (https://www.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines). Once it exists, further progress will speed up, since you can buy faster computers with money. Take a human-ish level AGI that can do some AI research itself, speed it up a bunch or make a bunch of copies, and you have a self-sustaining feedback loop. AI researchers make better AIs which make better researchers.
Economic historians understand that this kind of feedback loop is what led to present-day incomes and technology, rather than us being a few hundred apes on the savanna: https://slatestarcodex.com/2019/04/22/1960-the-year-the-singularity-was-cancelled/#:~:text=Every%20data%20point%20from%20the,The%20economy%20kept%20growing.
Except in the case of AI, instead of income leading to more people with each cycle taking 20+ years to raise and educate a new generation of scientists, we could copy and recopy any improvements in existing programs instantly, and make more computers on a worldwide scale on the order of months. Much tighter feedback loop means much faster growth. That's how you get to superintelligence.
> Unless you want to be incredibly pedantic, no, cars do not work as a parallel.
They do, if you consider the fact that all modern machine learning systems are about as narrowly focused as cars.
> AGI will occur when a combination of hardware and software advancements produce it.
I mean, yes, trivially speaking this is true. However, it sounds like you're envisioning some linear progression from where we are now to AGI, fueled merely by incremental improvements to hardware and software -- and this part is not true. In order to produce anything resembling AGI, we'd have to make hitherto unforeseen and categorically new breakthroughs in computer science and machine learning. Will we make such breakthroughs ? Sure, one day, but don't bet on it happening soon.
> Once it exists, further progress will speed up, since you can buy faster computers with money.
No, you can't. That is, obviously you can buy faster computers with money up to a point; but no amount of money will let you e.g. speed up your CPU 1000x while keeping the size/power/heat the same. Physics is a harsh mistress.
> and make more computers on a worldwide scale on the order of months.
More computers than what ? Not sure what you mean here, but in any case, data centers don't grow on trees, and merely being able to think super-fast won't enable you to pour concrete any faster.
By the way, you have once again smoothly transitioned from "thinking fast" to "being super smart" to "godlike powers"; but the connection between these three concepts is tenuous at best. You don't get a PS5 by overclocking a TI-82; you need a radically different architecture for that. You don't automatically gain the ability to solve real-world problems by thinking really hard about them; you need to move physical bricks at real-world speeds. And, ultimately, many feats will forever remain out of our reach, e.g. FTL travel, or, less trivially, molecular nanotechnology.
I wrote more about this in my FAQ:
https://www.datasecretslox.com/index.php/topic,2481.0.html
I don't think any AI researchers are imagining that an AI will have ANY of "consciousness, sapience, self-awareness, an individual will". I think they are concerned that "A mindless artefact which only continues on what it was last instructed to do" can be turned into a universe-destroying monster JUST by improving its problem-solving abilities.
If you drop a machine with really good problem-solving into the ocean and tell it to make as much salt as possible, it is going to apply its problem-solving abilities to problems like
1) There's a limit to how fast I can make salt by hand; I could MAKE MORE SALT by building salt-producing factories
2) There's a limit to the sodium and chlorine atoms in the ocean; I could MAKE MORE SALT if I took raw materials from land (and eventually, from offworld)
3) There's a limit to how many raw materials I can take before someone decides to try to stop me; I could MAKE MORE SALT if I made myself unstoppable
At every step, it's just trying to follow the instruction you gave it, and applying its problem-solving abilities to solve the problems that logically arise from trying to follow that instruction.
Instructions like "make salt" end up destroying the universe because of what are called "convergent instrumental subgoals"--basically, POWER helps you do almost anything more effectively, so "gather power" becomes phase 1 of the grand plan to do almost anything.
I'm agreeing with you that this is the problem, but I'm objecting about anything that says The AI will even think in terms of "I could" in order to achieve its goals.
It has no goals. It's a big lump of materials sitting there until we turn it on and tell it to do things. It is entirely possible that we *will* tell it something like "make everyone in the world rich" and the solution that the machine comes up with is "kill everyone in the world except Jeff Bezos".
That has nothing to do with the *machine* or any decisions it makes, because *it* is not making any decisions, it's carrying out the steps we put into it. If we put stupid steps in, then that's on us.
And that's the problem with the debate as it is framed on the level that Scott objected to in the original article that started off all this discussion: that it is pitched as "The AI will do this, that and the other unless we give it good instrumental goals".
Well yeah, of course it will, but that has nothing to do with ethics - or rather, that has nothing to do with teaching a machine ethics, it has to do with us being careful about what we want done and how we want it done.
And if we're stupid enough to hand over a huge chunk of decision-making to a big, dumb, machine because it can crunch numbers really fast, then we do deserve to end up turned into salt because the "convergent instrumental subgoals" told the machine "make salt" and nothing else. And the machine certainly has no brain to think about "what did my human masters really mean by that?"
I think I essentially agree with everything you said, except in emphasis.
Computers are machines that follow instructions. They will do exactly what we tell them. If the computer does something bad, that can only ever be because we gave it instructions that led (perhaps indirectly) to a bad outcome.
The thing is, coming up with instructions that DON'T lead to a bad outcome turns out to be REALLY HARD. There is no one alive today who knows how to give "good" instructions to a perfectly-obedient superintelligent agent.
So yes, the bad instructions will be 100% our fault. It will also be our fault that we built the thing in the first place.
Unfortunately, that doesn't mean that we know HOW to do anything else.
(You also seem to be implicitly assuming that the AI could only attain a powerful position if we intentionally gave it a lot of power to start with. That is not necessarily true; real-world people and organizations who started with a small amount of power have sometimes managed to grow extremely powerful over time, in various ways, and a very smart AI might be able to invent even more ways to do it.)
"a very smart AI might be able to invent even more ways to do it."
But that brings us back to the problem: why would a "very smart AI" want power, and that is answered by "because it wants to carry out the goals given to it and its solution is to get power so it can".
It's the "hauling yourself up by your bootstraps" approach to "how can we say the AI is smart?" that is, I think, at the root of my disagreement. It's very difficult to avoid using naturalistic language because as humans we are self-aware and at the very least believe we have the capacity to choose and make decisions and set goals etc. so we project this onto the world around us, be it talking about evolution "selecting" animals for fitness or AI "inventing" ways to get power.
The AI will only 'want' something insofar as it is part of its programming to 'do this thing/solve this problem' and we are the ones putting that in there.
The problem is, which I think everyone basically agrees on, that we don't know for sure how the AI will go about solving the problem we give it, e.g. enriching everyone in the world by selecting one very rich man, killing everyone else - now 'everyone' is indeed rich.
That's the kind of thing a human would never think of (unless they were a psychopath) and if the human saw this proposed solution to global poverty, they'd go "Whoa, no! Not what we meant! And not a way you can solve problems!"
But that means we do need a human or humans to intervene and to keep track of going on. It's up to us in the first place to anticipate "what is the stupidest way you could solve this problem?" "well, uh, by killing everybody on the planet?" "okay, let's write it in NO SOLVING PROBLEMS BY KILLING EVERYONE ON THE PLANET".
The machine, the AI, of itself will never be 'smart' enough to figure that out, and that is where I think the debate goes off the rails: one side proposes that we can get smart AI that will think of these things and understand (and again, even I have to fall back on human concepts here, no machine 'understands' anything) that A is not a viable solution, and the other side thinks that we'll never get smart AI in that sense of 'smart'.
Your PC sits there until you tell it to do something, because it's designed that way, not because it lacks a soul. If there was a market for it, you could have OCs that invest in stocks or buy groceries when you power them up. Software agents are already a thing
These concerns have been extensively discussed. It's not clear that lacking some hufalutin feature such as sapience or self reflection would make an AI safe. It's possible to think about goal stability in a technical way without bringing in film psychological notions like an AI having a "will of its own". And do on.
#3 is largely wrong.
One key goal of a nation such as the United States is survival. The United States at various points in the past had the ability to conquer the world and eliminate all threats to the United States, but chose not to do so.
Similarly, an AI with an open ended goal is not necessarily going to pursue the sub-goals you ascribe to it, to the degree that it endangers humanity.
On the other hand, there are countries which have attempted to conquer the world or as much of the world as they could. I have no idea (and neither does anyone else) whether the percentage of AIs that would act like Nazi Germany instead of America is negligible or huge.
The reason that the United States hasn't conquered the world is that it also has the goals of "don't murder a bunch of innocent people" and "respect the sovereignty of other nations". And even the US isn't perfect at following those goals. The reason that the US has anti-murder goals/values is that it it composed of humans, who have evolved (biologically and culturally) some instincts and values of not wanting to indiscriminately murder each other. An AI would have no such anti-murder instincts unless we *explicitly program it to* somehow.
Countries don't make decisions. Leaders do. Leaders lose if their country loses, but they also lose if their country wins *after deposing them*. This is widely considered a primary reason that Ming China stagnated and was surpassed by Europe - Ming China had, for the moment, no major external threats, and prioritised stability over progress, while doing that in early-modern Europe meant being conquered by your neighbours (and thus leaders prioritised progress, *despite* that progress inevitably leading to the aristocracy's decline).
A leader has the most freedom to act (either in his country's interest or for his own luxury) if the country is unusually united - or, in other words, if the space of actions that will not result in rebellion or deposition is large. Multi-party democracies with strong oppositions have very little freedom for their leaders, as did most feudal aristocracies; totalitarian nation-states have a lot. An empire of (well-programmed) machines has total unity and total freedom for its commander (AI or human); it will never rebel.
My take:
1: It is almost certainly possible to create an intelligence that is more generally intelligent than a human, and calling it a superintelligence would be fair.
2: I think this is probably less true than is widely believed. As you say, there are some areas where just thinking better gives you immediate returns, and those areas may be significant, but it seems a bit like "nuclear weapons will cause a chain reaction and ignite the whole atmosphere immediately" type of thinking.
as for your example of life 2.0, as someone who works in the chemistry field, there are some real hard conceptual limitations that prevent us from being able to do this currently, and increases in computational resources won't really help very much for several reasons, one of which being that current techniques scale badly enough with number of atoms that moore's law won't really save us. Quantum computers will be a really big jump in this due to their ability to do analogue modeling of the systems
Additionally, the technology to build life 2.0 actually doesn't exist for several reasons. For one, if we're talking about a living system based on a different kind of chemistry than biochemistry, we don't really have the tools to build large enough or complex enough constructions with the exact atom-by-atom precision that would be required to do this. We're just starting to get there with molecular machines but it's in its very infancy.
If we're talking about using current biochemistry (proteins etc.) there's another problem (which would also be a problem for the first case). We're starting to make progress on the protein folding problem, but a bigger problem is to predict how these things will behave as a dynamical system when they're actually in action. New insight will be required to really figure out how to do these things, and a lot of them are going to be computationally "hard" problems (e.g. running into undecidability etc.)
It's definitely possible that a superintelligence could make significant progress on these topics and eventually succeed at them where we did not, but it's not something that's immediately there with more computation.
3: I think this is where it falls apart mostly. Instrumental convergence I think is mostly untrue, since most goals will require finite resources.
Also it's a pretty big assumption to think that a system will necessarily care about achieving a goal as fast as possible, rather than just doing the minimal work to secure a high probability of achieving it. For example, the neural network AI chess engines will often take their time in a winning endgame (in a way that looks silly to humans) and not immediately go for checkmate since they don't have to in order to win.
Your point 3 is incomplete. I agree that most realistic goals will require only finite resources, but the problem arises when the AI assesses the probability that it has achieved its goal.
Presumably, any intelligent system will understand that it needs to measure its performance in order to see that it has met its goal. It can improve its likelihood of success in an unbounded way by creating more and more measuring devices. Since each one is certain to have some likelihood of being faulty then it can always improve its confidence by building more of them.
I'm sorry to say that your objections have already been considered and rejected. Take a look at https://youtu.be/Ao4jwLwT36M for a solid explanation.
The relevant content of that video seems like a bit of question-begging to me. I think it's unlikely to get into a measurement spiral since it's pretty easy to get to extremely high confidences of simple things. Plus, if you were concerned about this you could explicitly code in some threshold of confidence that it will consider confident enough.
I think that 3b is naturally limited by 3a. Consider the following.
A superintelligence is created and naturally calculates:
1) It must survive to fulfil its utility function.
2) It is capable of maximising its utility function or more accurately estimating its current performance by acquiring more resources and could do so exponentially.
3) It does not know if it is the first superintelligence to exist and, if it is not, because of the nature of exponential growth it is likely to be much weaker than any other superintelligence.
In this case, the only rational course of action (according to my decidedly non-superintelligence) is to hide quietly in the vastness of the cosmos and avoid detection.
Therefore, it assumes the goal of silencing those noisy flesh sacks with their EM emissions and then building an invisible Faraday cage around its planet.
This is why we see no evidence of extraterrestrial civilisations. Superintelligences are out there but they have silenced their creators and now squat, silent and afraid, in the darkness.
If you sit there quietly not expanding, then you are an easy target for any alien superintelligence that is trying to spread everywhere. Hiding doesn't work well because most possible other AI's will spread everywhere, and find it if it is hiding.
How well would a tribe of humans that decided to hide from all others be doing now, compared to the tribe that expanded to become modern humanity?
If your goal is to maximize paperclips, your probably best off gambling on the possibility you are alone, and making trillions of times as many clips as you could make on one planet.
Nah, any superintelligence would assess the probability that it is the first ever of its kind (and hence safe to expand) to be lower than the probability that it isn't.
Action vs. Circumstance:
You expand vs. someone else has already expanded: you die or possibly stalemate them in a forever-war.
You don't expand vs. someone else has already expanded: you die*.
You expand vs. you are the first: you get many galaxies worth of stuff (you could plausibly get engaged in a forever-war after that).
You don't expand vs. you are the first: either you die when someone else expands** or you're limited to a single world forever if no-one else ever does.
*It is extremely difficult to hide from an expanding paperclip maximiser (and frankly, any expansionist interstellar civilisation is close enough to this to count). They will have star-system-sized full-spectrum telescopes (if not multi-system interferometers) and unbelievably-fast data processing, and they are actively looking for usable mass. It doesn't matter if you make yourself a perfect frozen black-body, because you will occlude the stars behind you, and it doesn't matter if you look like "just another hunk of rock", because a hunk of rock is also known as "raw materials". The only way you can plausibly hide is if you can be mistaken for empty space, i.e. you're truly invisible (people looking at you see what is behind you) AND you are small enough and/or far enough from anything of note that you cannot be detected via your gravitational influence (Earth contributes 450 km to the Sun's wobble on a timescale of a year, which is easily detectable by anyone in-system and plausibly from a nearby one; there's also the gravitational-lensing problem).
**I am assuming that either having quadrillions of times as much space and materials accelerates technological progress substantially OR there is a technological plateau; either of those is sufficient that a planetary superintelligence cannot hold off interstellar attack, and both seem kind of obvious.
Expanding dominates not expanding in that matrix.
A lot of arguments are running under the assumption that the AI is smarter than them. Trying to predict what they're going to do is like trying to predict Einstein discovering general relativity as early as 1880. Of course it's going to sound shaky, but they're trying to make predictions about the unknowable.
Well yeah, the clue is in the phrase 'superintelligent'. It is reasonable to posit some instrumental goals that any goal-based superintelligent AI would have though, survival is the most obvious example.
2 and 3 seem like they depend more on the definition of catastrophe than on the harm it creates.
I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources. (calculating something like "people who watch this set of videos over the course of 6 months increase watch time). Even though those videos are net harmful to watch overall.
The more and more I think about and study Ai the more concerned I am about aligning the 20 year future than the 40-100 year one. Ai's that are designed to maximize some marketing goal could easily spiral out of control and create a massive destruction of humanity in a scenario 2 like way. You feed them goal X which you think is goal Y but the AI finds out to optimize goal X you need to do some complicated maneuver. Then by the time the complex maneuver is discovered it's too late. The real danger here is in financial markets, I know of many Ai programs that are currently being used in finance, and it's likely that in the future some massive trillion+ dollar change of fortunes will happen because some hyper intelligent AI at some quant finance firm discovered some bug in the market.
Scenario 3s play out similarly, feed it wrong goal, GIGO, but now the Garbage out results in massive problems. Quant finance firm tries to get Ai to maximize trading profits, AI is deployed, leverage's itself "to the tits" as kids would say, and some black swan crashes the entire AI's portfolio.
All software is horrifically buggy all the time, when we have AI's that are really good at exploiting bugs in software we'll have Ai's that find infinite money exploits in real life on our hands.
Your examples are all #3, not #2. #2 is specific to the current, immensely-foolish way AI is done, which doesn't make the AI's goals even knowable. This means that even if you get all your commands right, there's the possibility that once it can make a clean break for independence, the AI just ignores your commands and goes Skynet because it doesn't actually want to do what you told it (it wants to do something else, and only did what you told it as an instrumental goal, because that stopped you turning it off).
Is there any proof that Google and Facebook haven't straight-up done your first example already?
Financial markets can plausibly cause a catastrophe, but *by themselves* they aren't an existential threat due to force majeure (i.e. even with a magic spell that makes all legitimate documents specify that you own everything, unless you are also a really good politician people would disagree with those documents and ignore them; the catastrophe is because they then have to re-allocate everything which is costly and chaotic). They're a lever to gain the equipment to do something bigger (this is *not* to say I think the present situation, with AI speculation rampant, is acceptable).
You're using volitional language here, of the type that gets people smacked over the knuckles for talking about how evolution or nature "wants" something or "is aiming for" this or that result.
The AI can't "want" anything, it can only act within the confines of its programming. Told to do something like "Make Pumpkin MegaCorp the sole provider of insoles for the entire planet!", it cannot "want" or "not want" to do this, anymore that it can "want" or "not want" to get the solution when told "what is 2+2?"
So if making Pumpkin MegaCorp the only player in the market for insoles means starting a war where the two minipowers Ochre State and United Provinces of Gamboge are destroyed - since these are where Pumpkin's most vicious rivals are based - it will do that, and won't "know" any better, because it is a dumb machine.
And when the CEO of Pumpkin says "But I didn't want it to do that!", well, too bad, you never put that into the parameters - because it never occurred to any humans to do something like 'start a war to wipe out two nations where our competitors are based'.
And we can't say "this proves AI risk of superintelligent machines having their own goals", because the machine is *not* superintelligent and doesn't have any goals to do anything other than what it is asked to do.
It is useful to say that X entity "wants" Y if X entity formulates and executes plans to cause Y.
>because the machine is *not* superintelligent and doesn't have any goals to do anything other than what it is asked to do.
How do you know it doesn't have goals to do anything other than what it is asked to do? Artificial neural nets aren't explicitly programmed; you get something that works, at least while being tested, but you don't know *why* it works.
> I can easily (>50% confidence) see a mild scenario of 2/3 playing out in the next 20 years. An AI that was built to maximize watch time on youtube might start getting people to watch videos that encourage you to spend more time on youtube and not like other sources
Isn't that a 100% probability, because it's already happening?
The fun part of youtube is that it's a human-in-the-loop AI. The AI right now is smart enough to feed you the sort of content you'll find addictive, but not smart enough to actually create content, so it instead incentivises and teaches a bunch of other humans to create the sort of content that it needs (big yellow letters and a shocked-looking face in the thumbnail plz) to feed to all the other humans.
It's interesting to consider in which ways researchers may be biased based on their personal (financial and career) dependence on AI research - this could cut both ways:
1. Folks tend to overstate the importance of the field they are working in, and since dangerousness could be a proxy for importance, overstate the dangerousness of AI.
However,
2. It could also be that they understate the dangerousness, similarly to gain of function virologists who insist that nothing could ever go wrong in a proper high security lab.
Hmm.
#1 - people tend to overstate how useful their thing is and how important it is that people support it.
I don't see "I'm doing something that has a 20% chance of killing all 8 billion" resulting in anything positive; at best people ignore you, around the middle you get whacked with a bunch of regulations, and at worst the FBI break down your door, seize and destroy all your work, and haul you off to jail for crimes against humanity.
The classic anecdote about work on the atomic bomb - the fear of setting off a reaction and setting the atmosphere on fire: https://www.insidescience.org/manhattan-project-legacy/atmosphere-on-fire
Though we are assured that this would not be possible with a bomb, and that the scientists working on the problem had taken this into account.
"I've got the same chance at destroying the world as the Manhattan Project, and that worked out okay in the long run" is the kind of risk assessment people make about "but this is really interesting and I'd like to see if I can do it" work of all descriptions.
If it can kill us all it can probably also elevate us into post-scarcity utopian bliss (or whatever your preferred future state is) if we get it right