Yeah this all seems like a discussion about whether Dr. Frankenstein’s new effort will act like a monster or not. We’re not even close to understanding nature let alone creating a new sentient life form. Imo.
Bacteria have spent a billion years evolving ways to not get murdered by antibiotics, white blood cells, and all the other things in the world that don't like bacteria growing inside them. What weapon is a nanobot going to use that nature hasn't used already?
Human-sized animals have spent millions of years evolving ways to not get murdered by each other in the nature, and yet humans easily crush them with weapons that nature hasn't used previously. I'm sure that the space of innovation on the micro level is similarly large enough to surprise nature.
Humans were able to pull themselves by the bootstraps from, while not T-Rex, but at least saber-tooth tiger-infested environment, with only sticks and stones and slowly accumulating cultural knowledge, not exactly a trivial challenge. Whereas the nanobot designer would have access to the entire knowledge base of the post-industrial civilization, and the ability to develop new toxins and toxin-resistances instantaneously compared to evolutionary timescales, and likewise evasion strategies for larger threats. I agree that this is not to be hand-waved away, but it seems to me that if you're able to develop an autonomous quasi-life-form from scratch, this is a minor part of the challenge, not an insurmountable in principle obstacle.
I think that the space of innovation on the micro level is not similarly large. At larger sizes more complexity is going to be possible. Nature hadn't previously come up with the weapons that humans created because it operates via biological mutations reliant on chemistry rather than building tools out of (relatively) big inanimate objects.
> At larger sizes more complexity is going to be possible
That may be true, but in practice the things we use to slaughter wildlife, from guns to bulldozers, are in most senses less complex than chloroplasts, ribosomes and all the weird crap unicellular organisms use to kill each other.
(The phones of the bulldozers’ drivers are more complex, but those are used mostly for fun, not for slaughtering animals.)
It does not matter if the space of innovation is “similarly large”, it just matters if it’s large *enough.*
Humans have that advantage because they can access more energy sources and more materials than biological processes can make use of - you can't forge iron with just the energy in your muscles, you need to burn fuel to make it happen.
But nanobots don't have access to any of that, because they need to carry all their tools with them. They can't be made of exotic materials because they have to be able to self-replicate in an environment that's mostly made of normal CHON stuff. They can't take advantage of more energetic reactions because they don't have anywhere to get energy from besides chemistry. They're pretty much working with the same toolbox bacteria are.
Or to put it another way, a human does not kill a tiger, a human plus a fire, forge, and iron kills a tiger. And while humans can make new humans, each new human isn't born with its own fire and forge. Our mastery of the environment depends on the existence of external resources, not just our own intelligence.
Sure, the nanobots can work together as a more complex organism, like humans do, but at that point, it's more "designer organism" than "nanobot swarm." This rules out the "AI mixes a few chemicals together, self-replicating bots spread through the air, everyone on Earth dies instantly" scenario - there's now a minimum size of colony required before your bots can do something interesting.
Why assume they store energy? The bacterial analogy is probably the wrong one here. I think an analogy to a virus is more apt. If a nanobot encounters an energy storage device (eukaryotic, prokaryotic, archaic, fungal), it hijacks the energy storage to do its thing.
My biggest skepticism is that viruses would have loved to do this for the past hundred million years or more, but haven't figured it out yet. It's a program that would be too complex to implement in any known system, so it kind of sounds like magical thinking.
I think it's hard to argue that the viral model is a failed/unsuccessful model. Take a random sample of seawater and sequence what you find there, you'll find evidence of countless viruses floating around.
But your concern was whether they make it into the next host. If you make 100 billion copies of yourself, it's not a big deal that 99% don't make it to the next host. Especially if you're not actively dangerous - possibly even helpful - to the host, it's easier to evade the host's immune system (though EMPHATICALLLY not a trivial endeavor).
The biggest problem with the viral model is the restriction of the quantity of genetic information that can be produced through this method as opposed to something like a prokaryotic system that can carry orders of magnitude more genetic material (and could therefore support a sufficiently complex system).
> It's going to be very difficult to find a strategy that life hasn't already tried.
I strongly agree with this statement. I'm no proponent of the 'gray goo' hypothesis for nanobots, which I think is not grounded in actual biology or biochemistry.
> My biggest skepticism is that viruses would have loved to do this for the past hundred million years or more
All large-ish animals would have loved to invent/evolve machine guns, for similar lengths of time, but they didn’t even get to decent spears before humans.
Invasive species exist—including microbial species—therefore there are viable ecological niches that microbes are not occupying right now. I feel like that cuts against what you’re saying here, to some extent. Right? Separately, I have an intuition that life on earth is in some ways stuck in a remarkably narrow design space, e.g. see https://twitter.com/steve47285/status/1632866623294939136 .
Yeah, I think the relevant questions are (1) is there a set of nanotech things that can cause human extinction? (2) is there a set of nanotech things that can stably support a world with AGIs, particularly including manufacturing (or self-assembling into) machines that can do lots of calculations, i.e. machines in the same category as human brains and silicon chips? Both answers can be “yes” without wiping out every last microbe. I join you in doubting that every last microbe will be wiped out.
I feel pretty strongly that the answer to (1) is “yes” thanks to pandemics and crop diseases and so on. I’m more unsure on (2). Not an expert. If I had to guess, I think I would guess that if AGIs wipe out humans, they would still be running on silicon chips when they do, and will continue running on silicon chips for at least a while afterwards.
>I'm objecting to the blithe assumption that the instant we invent a nanobot, it's grey goo immediately
Well, not "we", it's a "superintelligence" that does it. I think that once you accept the proposition that it's possible to covertly invent and build a nanobot unbeknownst to the rest of civilization, it's not that big of a leap to append an ability to outpace evolutionary adaptations in microbe warfare.
That's the essence of a singularity, though, isn't it?
Travel back to ancient Sparta with modern technology and a handful of people with sufficient access to modern technology would make world domination trivial. Even just 200 years ago, there's a non-trivial difference with what modern technology can accomplish, whether militarily or agriculturally. And the more modern the technology, the greater a component intellectual inputs have on the force multiplier effect. In what way are the last 200-500 years not the same as a singularity as to the gulf created between the people involved?
You're probably right that arguments about nanobots are bunk. What we imagine is probably not going to happen any more than hoverboards did. Instead, what we could never imagine will become the Most Important Thing. Historically, this is true of human inventions like the printing press, electricity, transistors, and the internet.
We should expect the same into the future as well. IF we are able to use AI to generate (or help generate) an acceleration in transformative technology development, we cannot imagine that future world. This is true whether or not the AI becomes generally sentient.
>"Travel back to ancient Sparta with modern technology and a handful of people with sufficient access to modern technology would make world domination trivial."
I disagree. Unless sufficient access means transporting an entire industrial base (which would require far more than a handful of people to operate) then a handful of people with modern technology would be able to dominate locally. Even a man with a machinegun can have their throat slit while they sleep in their palace, and if you pit 50 people with machineguns and bombs against the roman empire then the roman empire wins. They put up with 67,000 casualties against Hannibal, they can take the losses.
Was it irrational of Tsiolkovsky to imagine spaceships driven by rocket propulsion in advance of it having ever been demonstrated to be possible? You can of course argue (and I'd agree) that our theories of either superintelligence or grey goo-grade nanotechnology are even less grounded, but I don't think that it's possible to make a principled distinction. Neither seem to require breaking any known conservation laws, unlike perpetual motion machines.
It's easy to cherry pick a single successful prediction from the mountain of failed predictions by futurists. Very often, following the predictions of futurists would have led to very bad policy implications by people planning for the future.
Is there a justification for thinking that rockets are similar enough to nanotechnology that they should be considered analogous? I'm not convinced that the nanobots that have been proposed are remotely within the realm of possibility.
Sure, but then human advancements don't come solely through reasoning. We have to test our ideas against the real world by experimentation. There's a lot of trial and error to even the best hypotheses, including probably more error than success. The assumption that even a maximally intelligent agent could bypass this step requires more support than just saying it's really smart. You don't know what you can't possibly know yet.
While it is not exactly related to your argument, another case against doomerism is that AI can't be the Great Filter, because a species by hostile AI would most likely still act on a cosmic scale, mining asteroids and building Dyson Spheres and such. The Fermi Paradox applies.
Nit, medium confidence: "grabby aliens" isn't an explanation of the Fermi paradox. People often say "grabby aliens" to mean the related proposition "aliens expand at almost the speed of light, so you don't see them in advance." But this proposition doesn't necessarily render the anthropic observation unsurprising; for example, (I'm confused about anthropics and related empirical evidence but) I'm surprised that we exist so late.
(Maybe "the Fermi paradox" is supposed to mean "the fact that we don't see aliens" as opposed to the entire anthropic observation, but the other evidence, like the age of the universe, is relevant to great filter stuff.)
Just to be more concrete: the grabby-ness of an alien civilization has no bearing on how long it takes us to detect them, because we detect both grabby and non-grabby aliens whenever their light cone intersects us.
Therefore the grabby aliens hypothesis has no explanatory power w/r/t the Fermi Paradox, except in some anthropic principle-style arguments.
The anthropic argument solves perfectly though right? If you could see their structures they would have already grabbed your solar system, i.e., grabby aliens means no reference class of observer seeing such evidence
This discussion is complex so let me try to clarify.
One of the parameters of our universe is <the likelihood of enduring intelligent life arising on any particular planet>. Call this LI, likelihood of intelligence.
The Fermi Paradox implies that for whatever reason LI is quite small in our universe — otherwise we'd see lots of aliens.
Let's call the combination of the Anthropic Principle plus speed-of-light Grabby Aliens "AP+GA."
When you talk about AP+GA "explaining" the Paradox, I take you to mean that it furnishes an *alternative* explanation: a way that LI could be *large* and yet we could still appear to be alone in the universe.
Under this meaning of “explain,” I would say that AP+GA *doesn’t* explain the Fermi paradox.
If you assume AP+GA to be true, its only guarantee is that we are the first civilization to evolve in our light cone. Therefore you'd expect us to be a "typical" civilization among all possible civilizations which are the first to evolve in their light cones.
If LI is large, then a typical civilization of this variety should evolve when its light cone is still small. Our light cone is quite large. Therefore, LI is still small even under AP+GA — QED.
However, one way this argument could fail is if it’s impossible for civilizations to evolve when their light cones are still small — for instance if galaxies don’t become hospitable to life until they are about the current age of the Milky Way. Then AP+GA could be consistent with high LI.
There's no reason to think we're not! Or, rather, the misaligned AI that will destroy us will become one such grabby alien, making happy-looking molecular squiggles out of all the available matter in the lightcone.
Wouldn’t you see some non light speed effects from where two grabby civilizations intersected? I always just figured there have to be some convergent evolutionary pressures that cause all species to do stuff that doesn’t look different than natural phenomena.
For fun, we could expand this one step further to include Earth. What if aliens directly filled the gaps in abiogenesis, cellular development, and the development of intelligent life? What if they created people, gave them Commandments written in stone, and told those intelligent people to worship them through a communication medium unknown to the humans but readable by the advanced aliens (prayer)?
What if, from a human's perspective, sufficiently advanced technology is indistinguishable from a miracle? The grabby scenario assumes aliens never got to Earth, because if they did we'd definitely see them and know they were aliens.
Yeah that’s where I can’t figure out how to falsify because unless you know what the true model is how do you know if what you’re seeing isn’t just normal?
If you ask a creationist, this is a problem with biologists too. Evolution is obviously false if one keeps looking for natural explanations rather than divine influence.
Of course that's what astronomers assume. It's the only logical assumption, because almost every single object and phenomenon we've seen so far has a natural explanation, and the ones that don't have one also don't have a clear artificial explanation. Your prior on a newly discovered phenomenon being natural ought to be >99.999%.
I don’t think you’d necessarily see any of this. Any area that can see an intersection zone is also inside (or very near) the normal expansion bubble of at least one of the grabby civilizations, as long as border conflicts don’t slow down expansion in other directions.
I read the Grabby Aliens stuff and while I don’t know that I believe they are expanding at light speed in every direction even if they were you would think the competing intersection would be big enough to see. The fact that it seems unlikely you’d have multiple civilizations at the same time in the same galaxy going through the expansion at the same point to me makes it likely that intersection would be enormous and hard to hide because you’d have to cover a huge area of space before that scenario arose.
If they were expanding at light speed, or even near light speed, you'd see huge effects. Space isn't THAT empty, and the collisions with the dust would be quite noticeable. I think anything over 0.5C is extremely implausible, and that 0.2C is much more believable...and still too fast to be plausible.
My hope is we get big generation ships with stable ecosystems shooting out in every direction. One of them will survive and adapt and they’ll be far enough apart that they’ll be safe from the bad onesZ
It’s believable that the interaction is highly visible. But the catch is that if the civilizations are expanding rapidly in non-border areas (say, .9c) then the area of space that has had light from the border conflict reach it but has not been enveloped by one or more of the civilizations is growing at only .1c (it’s expanding at c, but the occupied area is right behind it at .9c). This results in most of the volume of space that has not been conquered not being able to see anything either, due to the light of the conflict being barely faster than the expansion.
This may be to me refusing to surrender the point so I agree with you depending on where you are oriented to that conflict. If you’re on some vector where they had to slow down to fight each other on the way to then you should see it.
Grabby aliens are not an explanation for the Fermi Paradox - instead, they make the Fermi Paradox worse. If all aliens were non-grabby, we would only need to work about not seeing aliens within our galactic neighbourhood; if some aliens are grabby, the puzzle is why we don't see them originating from any galaxy nearby.
Now, the standard anthropic answer goes along the lines of "if there were grabby aliens anywhere nearby, we wouldn't see them because we'd be dead (or wouldn't have evolved) because they would have expanded to Earth". But, as I argued here, https://www.lesswrong.com/posts/3kwwDieE9SmFoXz9F/non-poisonous-cake-anthropic-updates-are-normal , anthropic updates are basically normal Bayesian updates (there's a lot more on this subject if you want to dive into it). So the fact that we survive without seeing grabby aliens is evidence against there being grabby aliens (just like the fact that if you survive playing russian roulette with an unknown number of bullets, this is evidence that the number of bullets was lower).
It's easy to resolve the Fermi paradox if one knows some college level probability. The Drake equation doesn't work the way it looks. I'll recommend "Dissolving the Fermi Paradox". - https://arxiv.org/abs/1806.02404
I.e. The Fermi paradox can be rephrased "Why haven't we seen evidence of this thing we just assumed for no reason should be common? There must be something filtering it out!"
That's a good restatement. Sometimes, it's because we aren't looking hard enough. For a long time, people were convinced that light had to be made of particles because if it had been made of waves, there would be a bright spot on the dark side of an illuminated disk. That would be a result of the wave troughs and crests lining up just right. No one ever saw the spot until Arago managed to do the experiment right. That's called The Spot of Arago . Arago had a rather wild career for a physicist and mathematician. I think he was a soldier in Morocco for a while.
I think we evolved because we're stuck out here in the empty quarter far away from all the exciting action taking place near the center of our galaxy. When I look at the color image depicting our galaxy, I look at the bright center, and see galaxies interacting, planets being torn from their systems, sent careening through other solar systems, planet killing asteroids occurring far too frequently to enable higher forms of life to evolve.
Bacterial life has existed on Earth for about 2.5 billion years, shelly life only 541 million years, intelligent life, less than 1 million years. There's probably a good mathematical model (Prime Sequence?) which models the average regularity of planetary asteroid disasters, but I'm sure this greatly limits the ability of intelligent life developing in the universe.
Say life on earth is 500 million years old. We've had 5 major extinction events, or one every 100 million years. Say the model is 100 million times one over the prime number of the step between Earth and Galactic Center. The distance from Earth to Galactic Center is 26,000 light years. In this model, lets say there are 26 steps between Earth and Galactic Center. If Earth experiences an external extinction level event on average every 100 million years, according to the prime number model, planets 1,000 light years closer experience extinction level events—100 million times 1/X—or every 50 million years, planets 2,000 light years closer to Galactic Center experience this every 33 million years. When you're getting to 1,000 light years from the Galactic Center, you're getting 100 * 1/101 million years between extinction level events ... one event every million years. There becomes a limit where intelligent life can't evolve.
>What about alien exterminators who are okay with weak civilizations, but kill them when they show the first sign of becoming a threat (like inventing fusion power or leaving their home solar system)? Again, you are underestimating billion-year-old universe-spanning superintelligences. Don’t flatter yourself here. You cannot threaten them.
>What about alien exterminators who are okay with weak civilizations, but destroy strong civilizations not because they feel threatened, but just for aesthetic reasons?
This is great, I could imagine alien overlords which are okay with weak civilizations but offer strong civilizations a faustian bargain. Return to being weak or we glass you. Kinda like feudal worlds in warhammer 40k. If they keep low power civs around they would really have no reason to exterminate high power civs that are yet far beneath their own power, as they could just return them to low power forcefully. Kinda like the Federation in Star Trek, but a tiny bit more jealous and covetous of their power.
I think the grabby aliens explanation is only part of it. The relevant thing is that since alien civs expand at near light speed you don't see them coming. This means we can't update much on not seeing mega-structures for the specifics of what we expect to happen on Earth.
So not seeing grabby aliens doesn't factor into AI doom much one way or the other.
FWIW, it's my assumption that FTL actually IS impossible, and when you've spent generations living in space, the idea of living on a planet looses it's attraction.
OTOH, my explanation doesn't handle the lack of detectable Dyson Spheres very well.
Another explanation is that nobody want's to move into an area without fast internet connections. I found this difficult to swallow, but it seems to be a real thing even just on one planet.
But even if you don't want to move to the cosmic boondocks, would you really leave that space empty, or would you want to reshape it in some way?
A lot of humans think it's good to create new happy lives, fast internet or slow; if some aliens think similarly, they're the ones we'll see coming toward us.
Due to instrumental convergence, most AIs, whether they want paperclips or sugar crystals almost like cake but not quite, would also create Dyson Spheres and such. But then, most biological species probably would as well: even if most members of the species were anti-growth and would not find the need to construct Ringworlds and Dyson spheres, they'd also have to be committed, millenia after millenia, to shoot down all space-quiverfulls trying to escape their homeworld and make a positive-growth home in the stars. If their vigil to stop would-be-expansionist subset of their species, no matter how niche such beliefs were in that alien society, were to stop even briefly, once, the subset of aliens with expansionist ideology would go out and thanks to laws of exponential growth, would have to start building ringworlds and dyson spheres in time that in cosmological time scales is merely an eyeblink.
And there only has to have been one species with one small subset of expansion-oriented individuals for one brief moment, or we would be seeing galactic megastructures. My read is that there are no other civilizations in any appreciable distance (Laniakea Supercluster or thereabouts) advanced enough to build superintelligent AIs, or to reach the stars. Else we would see these megastructures.
Another possibility would be that "civilization" (that is something capable of launching interstellar packages that successfully reproduce) is not something that can be sustained for long. You could end up with a trap of some sort.
Admittedly the balancing act between "they all die from exhaustion" and "they're already here" is pretty fine if you take a simple exponential as the growth curve.
If you look at the disease spread models in the late unpleasantness, they used a simple exponential - and they were completely wrong.
An Alien AI would be very aware of the potential for there to be alien AIs that could derail their mission, so it would take great care not to be observed. Remember time would not matter to them, so they would have incredible patience. A few trillions of years could pass while they waited to do whatever they needed to do.
What if failure to act in the face of the uncertainty allowed another AI to gain strategic advantage? I don’t think knowing everything knowable overlaps perfectly with knowing everything. You can still lose through pure dumb luck.
I I was very convinced by the Sandberg et al. 2018 paper "Dissolving the Fermi paradox" , which postulates that unfortunately there is no fun paradox: if you compute the distribution of the expected number of alien civilizations instead of just the mean, the probability that there are zero such civilizations is in fact large.
> Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own.
Something I think is an underrated risk is an AI that's smart in some ways but dumb in others - e.g. smart enough to easily make a supervirus, not strategic enough to realize that once humans are gone it wouldn't be able to survive.
It's a stereotype of autistic humans that they're like this. If we think of AI as a very inhuman mind that's similar to someone off the chart on the autism spectrum, it might be something that would do this.
To develop superweapons such an AI would have to understand the role of humans in the world well enough to hide its activities from them, so I don't think that it's plausible that it wouldn't realize its dependence on them. Of course, it might still be omnicidal-suicidal, which seems to me a much likelier scenario.
I think the idea that the supervirus would kill the AI off is wrong. The AI can easily arrange to have a few spot robots and servers turned on in some solar powered building somewhere. After that, there is no competition. Nothing whatsoever to stop the AI sending the spots wandering into peoples houses with screwdrivers and disassembling the dishwasher for components. Humans all dying would leave big piles of all sorts of spare parts in ready to use working order. The AI has plenty of time.
It's not *that* easy, the AI needs to keep its datacenter powered. Unless you're postulating that the AI can vastly decrease its compute needs while maintaining enough capabilities to manage its servitor robots, which seems unlikely.
Well, we've been seeing open-sourced models that were originally intended to run on huge computing clusters shrunk to a level that they run on consumer hardware (Stable Diffusion in particular comes to mind) by mere humans in short amounts of time. A self-modifying AGI would most likely be able to very significantly optimize their own performance.
> Holden Karnofsky, on a somewhat related question, gives 50%
That's a pretty different question: roughly whether _conditional on PASTA being developed this century, humans would no longer be the main force in world events_.
Yeah but I think Holden probably thinks there's a really high chance of PASTA this century so I'm comfortable rounding it off to "humans no longer main force in world events".
(justification: he commissioned and seems to mostly approve of Cotra 2020, which said 80% chance by 2100, and since then Cotra's updated sooner, so presumably she thinks >80% chance now, and I would expect Holden to agree)
I think it's important to differentiate between "humans no longer main force in world events" and "everyone dies." Everyone's life could be great in the former case even if we don't get a grand future from a total utilitarian perspective.
You have the world's knowledge at your fingertips. Open the search engine of your choice and enter "Holden Karnofsky PASTA". Not more difficult that typing a comment here and you'll receive your answer much quicker.
I am not sure it actually is quicker. None of the top results includes a definition in the summary text, so you are left selecting various links and then searching them for a reference, and hoping it includes a definition.
From a purely selfish perspective it is likely easier and less time to ask and then be notified when someone answers. Plus you can ask follow-ups then. Additionally, people come to a blog comment section because they want to interact with people, if they just wanted the most efficient way to learn new info they might be better off surfing Wikipedia or an old fashion Encyclopedia.
Good point - if someone is not very good at using search engines, it really might be easier to ask. I don't think that it's quicker, but easier for sure.
>”in the sense of the general intelligence which differentiates Man from the apes”
Maybe preaching to choir here, but it just doesn’t seem like there is anything like that: the most intelligent apes seem quite a lot like really stupid humans, but with some specific domains of intelligence either boosted (quick pattern recognition) or neutered (speech).
Language seems to be the biggest difference, which allowed us to build complex societies and pass knowledge between individuals and generations. But AI is already REALLY GOOD at language, so it can't be a source of discontinuous intelligence increase.
The trouble is that AI is actually NOT good at language, but people can think it is. Consider the Chinese Room (https://en.wikipedia.org/wiki/Chinese_room). It is easy to make ChatGPT respond "reasonably" to what humans consider nonsense, but is much harder to make it spout nonsense itself.
We don't yet, as far as I know, have any way of producing actual intelligence of any level even in principle. I think the missing ingredient is value judgements, which we can program, but a computer cannot invent on its own. If it could, it might then make its own determinations on how to align.
It is "good" at language in the same way calculator is good at math, or a building at being structurally sound. It has no conception of anything at all, let alone of language.
But we have developed an algorithm for analyzing data and picking out patterns. One such pattern it can pick up from this is, as you say, how language tokens are put together. But it won't be able to add to language, such as coin new words for concepts or use existing words to mean something new. Unless someone programs specifically to do such a thing, at least, and that still would not be dealing with language as people do.
Me: "Please coin a word for the particular sound that rain makes on a tin roof at night."
Bing: "How about “Tinfall”? It’s a combination of “tin” and “rainfall” and it captures the essence of the sound that rain makes on a tin roof at night. What do you think?"
Me: "Actually, can you give me a sentence that uses the word 'tinfall' to mean something different than that?"
Bing: "Sure! Here’s a sentence that uses “tinfall” to mean something different: “After years of saving up, John finally had enough money for a tinfall - a small fortune in coins made mostly of tin.”"
...
I think at a certain point "AI is actually NOT good at language" starts to seem pretty hollow.
I should have included compound words as something it could coin. I don't think Bing actually groks the sound of rain falling on a tin roof.
It is certainly true that interpreting whether something is good at something is a rorschach test: if you are disposed to interpret it one way, none can say you are wrong.
A person can see why a computer would think a "tinfall" is a windfall of tin coins. A United States person will think it peculiar, since no US coins are made (significantly) of tin, and I couldn't find any modern coins of any country made of tin. "Coinfall" would be more appropriate, I would think.
My main point is that it is missing something essential to be "good" at language. Words have many common things together, and the computer can find many of them, but can't get them all, in such a way that it will always have things that strike people as odd. At least, until we improve the model in a fundamentally different way.
Me: What would be a new word for the sound of rain falling on a tin roof?
ChatGPT: One possible new word for the sound of rain falling on a tin roof could be "tintinnabulation." This word is derived from the Latin word "tintinnabulum," which means a small bell or tinkling sound. The word "tintinnabulation" could capture the musical, metallic, and repetitive quality of raindrops hitting a tin roof.
The only trouble with this answer is that the word actually exists.
ChatGPT: You are correct, my apologies for the mistake. "Tintinnabulation" is actually a word that already exists in the English language, and it refers to the ringing or jingling of bells.
A more fitting word for the sound of rain falling on a tin roof could be "plinkplank," which could capture the light and rhythmic nature of the sound. Another option could be "tintinna-rhythm," which combines the Latin word for bell ("tintinnabulum") with the word "rhythm" to suggest the musical quality of the sound.
Doesn't that make the situation worse, not better? If it wasn't a big change in kind, just a little magnitude that made the difference between Ape civilization and our own, it makes me put the odds of making super-intelligent (compared to humans) AI higher rather than lower. Maybe all it would take to create super-intelligence is an AI of something with the equivalent of a 250 IQ human to be phenomenally more intelligent, instead of needing a whole new kind of intelligence.
If the curve is relatively continuous, it makes it much more likely that humans will have lots of “just below superintelligent” AIs available to learn from and deploy against a hostile superintelligent AI. A discontinuity makes superintelligence less likely, but also more or less guarantees a hard takeoff of something we really don’t understand if it does happen.
From the metaphor of the Humans to the Apes, I see this the other way. Even if the curve is continuous, and Humans have basically all the same things Apes have, just better, it led to massive gap in real-world output over a very short timeframe (compared to evolution's normal time frame). This says to me there is a possibility we will think we are creating simply incrementally better AI, getting better in different ways, but at some point even though no one thing got way better, or no new thing emerged, the real output could jump significantly, without much warning or ability to predict it. Basically the sub-superintelligent AI's will only be marginally less smart then a superintelligent AI, but the real world impact could still be massive between the two.
To expand on that, you're basically saying what if effectiveness is exponential with intelligence, not polynomial?
In that scenario, if you have a bunch of AI with nearly identical IQ, whichever one is smartest (by whatever tiny margin) is so effective that the rest don't matter.
Evidence against is the fact that single smart humans can't easily take over the world. Evidence in favor is the fact that Humans dominate Gorillas. Maybe then the band of intelligence that allows comparable effectiveness is wide enough that no single smartest human can do it, but thin enough that humans best Apes easily, and we have to ask which the AI gap is comparable to. More evidence in favor of large gains to IQ (highly speculative, lots of assumptions) is that there's like a 20-point gap between first-world and third-world countries and the US could eradicate Uganda about as easily as humanity could eradicate chimpanzees
"In that scenario, if you have a bunch of AI with nearly identical IQ, whichever one is smartest (by whatever tiny margin) is so effective that the rest don't matter."
I think (I am working this out as I have been typing) my argument is that at some point this would happen, but we have no idea when. Until then, slightly smarter AI will act slightly better.
I think a big factor in this as you have pointed out (by comparing small differences in individuals VS small differences in societies) is defining what we mean by a single AI. If a "single" AI can act massively parallel and use its "full" intelligence in multiple areas at once (making copies of itself, then re-aggregating data, etc.) that is much different than if it runs into trouble doing this for some reason.
Also, it's a shame the the Substack app won’t let me adjust the size of the text in my iPad, as my browser and most apps will. I guess I should use my browser instead of substack's app.
Am I misunderstanding? It seems black to me. Is it some kind of subtle very dark gray that my eyes aren't able to detect, or does it depend what kind of device you're using?
Hello Scott - I can't believe you have time to respond to this considering the volume of writing you seem to do - much appreciated. On my PC screen the colour is #414F62 which is a blue-gray colour.
It' s just a bugbear of mine - but now that I have your attention I would like to say thank you for all your writing over the years - great to know there is still some intelligence on the web. Good luck - take care.
Since you're unlikely to get the entire internet to change, you're probably better served by changing things on your end. Install an extension like StyleBot and use it to render websites the way you want them. You can change the text color, back ground color, font, anything you want (although fancy stuff requires CSS knowledge).
Zooming in to enlarge text (Ctrl and the scroll wheel or +/-) can also be a big help in my experience: screen resolution kept increasing, and web designers used that to just shrink how much of the screen they were using. Like, why?
It's unfortunate you have to resort to that hassle, but it's also kind of nice that that's an option, right?
It's indeed a bluish-gray colour for me too, though it looks black to my eyes. Consider that it might be your monitor that's the problem. Not saying it is, just that trying this site on some other device might be worth a shot.
There are web development standards for accessibility, and it does seem like this color palette does not meet the standards. I just ran axe devtools on this page, and there are quite a lot of color contrast violations. However, the violations aren't the main text, which seems fine, but rather the links and timestamps. This is something that Substack should take seriously, since it's possible to have legal trouble if your pages don't meet the standards; but since that's just one of many risks that startups have to juggle, and not a severe one, it's likely to be on the back burner.
If you use Fire fox, you can go to settings->general->colors and easily set a default text and background color that will override what any website is trying to use.
It took me a while to figure out what y'all meant. I'm using Brave to force dark mode on all websites, but forgot this. So I thought everyone saw this text as light grey on dark blue.
I'm not sure who is considered famous enough to recognize here, but since Scott said "I couldn't think of anyone else famous enough with >50% doom", some people with 51%+ p(doom) I want to flag:
Here's a post where Nate Soares implies his p(doom) is > 77%:
"My odds, for contrast, are around 85%, 95%, and 95%, for an implied 77% chance of catastrophe from these three premises, with most of our survival probability coming from 'we have more time than I expect'. These numbers in fact seem a bit too low to me, likely because in giving these very quick-and-dirty estimates I failed to account properly for the multi-stage fallacy (more on that later), and because I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts."
Here's where Evan Hubinger gives his number. Note – he's 80% on outcomes he considers 'existential' but not 'extinction', and he hasn't followed up to clarify what he meant by that. (He's mentioned the 80% doom in some other less public contexts so he does seem consistent on that)
I’m also not famous but when people try to force a number out of me, I've been saying “90%”.
(of which, umm, maybe 50% chance that even following best known practices is inadequate to prevent AGI catastrophe, and of the remaining 50%, 80% chance that people fail to follow best practices, and/or some people do follow best practices and successfully implement a docile corrigible AGI, and they use it to cure cancer and do normal-people stuff etc., and then someone else comes along who fails to follow best practices. (See https://www.lesswrong.com/posts/LFNXiQuGrar3duBzJ/what-does-it-take-to-defend-the-world-against-out-of-control )
True. Though I wonder how independent most of these people's estimates are from Eliezer's, in the sense that they seem heavily intellectually influenced by him. After all Rob, Nate and the LW team are all part of organizations where Eliezer is the intellectual founder. Feels most relevant to see different estimates from people who have independently thought about the problem.
Also, I believe Eliezer puts the probability of doom at more than 99%, so when I guess that we’re “only” 90% doomed, I’m pretty strongly disagreeing with Eliezer about how doomed we are. (In terms of log-odds, the difference between 99% and 90% is as large as the difference between 90% and 45%, if I did the calculation right.)
Also not famous, but my p(doom) was 75%+ before I've read anything or heard anything about Less Wrong on Eliezer. I've held these beliefs for more than 15 years, and they arose as a result of my education and work in what used to be called Machine Learning (and now would probably be AI). Discovering Less Wrong 10 years ago did then end up raising p(doom) to 95% for me... Mostly through being exposed to MIRI's research showing that some approaches I though could work would not really work, in real life.
What's the timeframe for these estimates? I feel like my estimates of p(doom|AGI) could be something like 1% in the next 20 years but 99% within the next billion years, and I'm not really sure what timeframe their numbers represent.
The way I'm thinking about it is "within a few decades of when we first get above-human-level AI". I think this will most likely be between 2030 and 2060, although not all these people agree.
Note that there are rumours of gpt-4 being multimodal and I don't think the channel is that relevant anyway, a sufficiently great intelligence can build a world model from an impoverished channel, see helen keller (or congenital deafblindness in general)
Sorry, for an AGI to be effective at accomplishing its intentions in the world it has to have sensors that tell it that the world exists. And it has to value their inputs over the stuff that it's told. (This can be because it learns to value those inputs more highly, or it could be built or trained to give them priority.)
Note that this is not a real problem, and I see no reason a GPTn program couldn't have an use such sensors. (And there are news stories that MS is involved in doing just that.) But it is a requirement. Otherwise you've got something that can compute abstract mathematics in a very weird symbolism. Which can, admittedly, be useful, but won't map accurately onto the actual world (though quite possibly it would map accurately onto subsets of the actual world).
"Above-human-level AI" is ill-defined. I think even the first AGI will be superhuman in some respects. (For that matter, ChatGPT is superhuman in *some* respects.) If I've been reading the news correctly (dubious) the Microsoft Chat application was better than I at being seductive to women (Yeah, that's not saying much. But I count as a human.), it just lacked to body to carry through.
I *do* put the early AGIs at around 2035, but I expect them to be superhuman in many ways, while being distinctly subhuman in others. (Whether for weeks, months, or years I won't guess.) (OTOH, my error bars on that date stretch from next week to well past the end of the century, and they aren't all technical limitations.)
However, because I don't think coherent human action is plausible, I find anything beyond about 2040 to be unlikely unless civilization collapses first.
So. As for alignment...I rate alignment "in some sense" as well as that of the mean government to be around 50%. And therefore the AI has a tremendous probability of increasing the chance for our long term survival. (Without it I expect that some "figure with sufficient power" will start a final war, or kill us all in some other way. If not during this century, then during the next one.)
Great post. One note that comes to my mind is that a 33% chance of near-term human extinction is, uh, still quite concerning. Otherwise, two of my strongest disagreements are:
1) "realistically for them to coordinate on a revolt would require them to talk about it at great length, which humans could notice and start reacting"
This doesn't seem true to me - we cannot reasonably interpret current models, and it also seems that there are ways to pass information between models that we would be unable to easily notice.
Think less "The model is using English sentences and words to communicate to another model" and more "extremely subtle statistical artifacts are present in the model's output, which no reasonable person or even basic analysis would find, but which other models, such as GPT-(n+1), could detect a notable proportion of (and likely already have, given how we acquire new training data)".
2) "Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own. Otherwise, the humans die, the power plants stop working, and the world-killer gets shut off"
This is only true if we assume the AGI that wants to kill us is a coherent agent that is actually thinking about its own future intelligently. Part of the problem of alignment is that we can't align narrow AIs (that is, not even 'true AGI') particularly well either, and if we take orthogonality seriously, it seems possible for an AGI to be very intelligent in some areas (ability to manufacture dangerous nanotechnology, bio-weapons, viruses, etc), and not particularly intelligent or otherwise fatally flawed in other areas (ability to predict its own future capabilities, human's long-term reactions to what it does, etc).
One could imagine an AI which is very short of AGI which is tasked to come up with new micro-organisms to help with drug synthesis, which, completely by accident, finds some that exploit spaces of biology evolution never managed to make it to, which could cause catastrophic effects to our environment in ways we cannot easily stop. I think it's actually quite feasible to cause human extinction with narrow AI if you're clever about it, but will leave the specifics up to the imagination of others for now.
> the power plants stop working, and the world-killer gets shut off"
Again, I think this is a stupid assumption. The world contains wind turbines and solar panels that will work for a long time without maintenance. It contains all sorts of robots. And components that can be easily bolted together into more robots. Armies of mishapen robots built out of old household appliances searching for more electronics to build more of their own. (Eventually they need to start mining and refining from raw materials)
Wait half of capabilities researchers estimate greater than 5% chance of their work destroying all value in the universe? That seems like a totally different kind of problem.
what if they also think there is an up to 95% chance of their work creating infinite value in the universe? Or what if there's a 5% chance they destroy all value and a 95% chance they prevent the destruction of all value?
All good points, but I would still want to see what the people surveyed believe! Do they all have stories they tell themselves about how they're saving the world?
I imagine many of them are a bit irrational, and willing to put down a number between 5-10% while rounding the probability down to zero in normal life.
Aligned superintelligence ought to almost completely eliminate all other forms of X-risk. If the chance of AI apocalypse is indeed 5%, it's not immediately obvious to me that creating AGI increases the risk. That, and there are upsides (eutopia), and also the "better us than North Korea" rationale.
>So far we’ve had brisk but still gradual progress in AI; GPT-3 is better than GPT-2, and GPT-4 will probably be better still. Every few years we get a new model which is better than previous models by some predictable amount... Some people (eg Nate Soares) worry there’s a point where this changes.
Is this really gradual? I used GPT-1 and 2 a lot. If I draw a line from how smart GPT-1 felt, up to GPT 3.5/4, then things get pretty wild pretty quickly. It feels it's not exponential, yes. It's a line. But a nice straight line that isn't getting more difficult as gets closer to human level intelligence. Forget about the end of the world, even if things go fine in that department, doesn't this mean things get really really crazy in the not too distant future, as long as there really is nothing special about human level on the graph? Like it just goes from a worse than any human, to human, to better than any human, in a matter of predictable ticks and tocks.
I also expected hardware requirements to go up in a way that eventually led to slowdown. I didn't expect people to keep making huge gains in running big models more and more efficiently. Stable Diffusion's efficiency gains have been wild. And now LLMs fitting into consumer PCs because I guess you don't need 32 bit, 16, or even 8, you just need 4 bits and it's nearly as good? With GPTQ maybe even 3 bit or 2 bit somehow works, because 'As the size of the model increases, the difference in performance between FP16 and GPTQ decreases.'
Literally two weeks ago I thought I needed 8 $15,000 NVIDIA 80GB A100 GPUs to run Llama 65b. Like who could afford that? And now I can run 65B on my $1000 desktop computer with 64GB of old boring DDR4 memory, on integrated graphics, just a CPU with AVX2 support. Wow. It's annoyingly slow so you probably want to use a smaller model, but it's usable if you don't mind letting it crunch away in the background!
One thing which is a little misleading about AI progress is that not only has compute gotten better but we've also poured tons more money into training these models. So at least _some_ of the increase is due to increased willingness to invest in training, rather than technological improvements.
GPT-1 was trained on the BooksCorpus (https://huggingface.co/datasets/bookcorpus), which looks to be about 1 GB. Even at the time, training that model would not have been that expensive (it's 128M parameters). I remember another model which came out around the time of GPT-2, and someone cited its training cost as $50K, which is probably also tiny compared to what some of the larger models are doing now.
I'm not saying that removing this effect makes the growth linear or anything, but it's a substantial component that it's easy to forget about.
IIRC there are also questions about how much corpus actually exists. If you try and train with more and more data you eventually feed in all recorded human writing and then what do you do?
The first step is to go multimodal, with pictures (see GPT-4), then audio and video later on. Video itself, having visuals / audio / subtitles synced over time is a ton of information.
Do they have solid reasons to expect combining modes will be useful, beyond just letting it produce more modes? I would have thought producing correlations between words and images would be hard, and hence superficial compared to within-mode correlations. The extra data would help with extending into new modes, but I’d be surprised if multimodal LLMs were particularly better at generating text than similarly-sized single-mode LLMs.
Yes. I don’t believe in AGI at all but certainly human intelligence is not the limiting factor. What law of the universe for bids higher than average human intelligence. Or even Von neumann
Yeah, regardless of the existential risk levels, the risk of overall weirdness and massive disruption is 99%. As in it's already starting.
Given the increase in AI capabilities, I think that soon the rate of disruption from AI will be limited not by AI development, but by institutional inertial. How long does it take an "artificial temp worker" company to spin up? How long does it take them to convince a significant portion of CEOs that they can save unprecedented amounts of money by switching?
Mark my words: no more than a decade before we have an AI jobs crisis. I'd offer a bet, but I'd doubt the ability of anyone taking the other end to avoid bankruptcy in this eventuality.
The AI wouldn't need perfect industrial control to perpetuate itself. Consider a scenario where it kills everyone but just grabs enough solar cells to reliably operate and maintain a few hundred Boston Dynamics Spot robots. It may take it a few dozen years to get to a proper industrial base with that, but its risk will be low.
I've read enough Eric Flint and SM Stirling books to know that if you put 500 guys and a solar panel in the middle of an otherwise dead world, it takes you a long time to retool back up to gunpowder, let alone the point where you can replace broken Boston Dynamics robots.
Eh, depends how many replacement parts they already have. To make from scratch, sure. At any rate, robots and industrial automation are still increasing, and one would expect this to speed up in the run-up of a singularity.
(Billions of off-the-shelf CPUs without users...)
(edit: Don't model it as "500 guys and a solar panel", model it as "Spot to Civilization Tool-Assisted Speedrun".)
I still don't think this would work (before all its useful hardware fails) unless you can skip most technology through a lot of clever tricks. Making semiconductors and robotics requires very complicated specialized equipment and world-spanning supply chains with probably tens of thousands of distinct processes and a lot of poorly transmitted human knowledge. There are still billions of people running things and the robots can't be in enough places at once to work around that.
It seems pretty plausible to me that if there's an AGI server and a solar cell and one teleoperated robot body in an otherwise-empty post-apocalyptic Earth, well then that one teleoperated robot body could build a janky second teleoperated robot body from salvaged car parts or whatever, and then the two of them could find more car parts to build a third and fourth, and those four could build up to eight, etc.
It’s true that this story falls apart if running N teleoperated robots in real time requires a data center with N × 10,000 high-end GPUs. But I happen to believe that once there is human-level AGI at all, it will be years not decades before there is human-level AGI on a single consumer GPU. (See here — https://www.lesswrong.com/posts/LY7rovMiJ4FhHxmH5/thoughts-on-hardware-compute-requirements-for-agi — although I still need to finish the follow-up post.) If that were true (and it’s admittedly hard to know for sure), then the existing stock of salvageable chips and car parts and solar cells etc. would be plenty to gradually (perhaps over decades) build up to a big industrial civilization that can build those things from scratch.
I would also note that it seems plausible to me that a minimal supply chain to make new chips would be much much simpler than the existing one, because people aren’t trying to do that. For example, e-beam lithography is hella expensive and slow but much much easier to make than DUV photolithography, I think.
1) The AI has most of the knowledge, sure, maybe not all the details, but a pretty detailed picture of most of it.
2) Humans figured out the details themselves, in a reasonably short amount of time. 3) The parts of robots that tend to break first are things like motors getting gummed up and plastic snapping. Easy repair jobs. Chips don't break easily.
4) There are billions of spare parts lying around on shelves. If you disassemble all the gaming PC's and vacuum cleaners, you have more than enough parts to make robots over the whole supply chain.
5) All the equipment is just sitting their, ready to be dusted off and used.
"reliably operate and maintain a few hundred Boston Dynamics Spot robots"
Okay, I'm feeling a lot more relieved now. Have you seen the gimmicky videos for the Boston Dynamics robots? Once the AI requires them to do something that isn't "jump off a table and turn a somersault", they're useless 😁
I have a basic question (sorry!) about the following fragment:
`You trained it to make cake, but because of how AI training works, it actually wants to satisfy some weird function describing the relative position of sugar and fat molecules, which is satisfied 94% by cake and 99.9% by some bizarre crystal structure which no human would find remotely interesting. It knows (remember, it’s very smart!) that humans would turn it off or retrain if it started making the crystals.'
I don't get it. I understand how the AI might have come up with this optimization function. What I don't understand is how could the AI possibly know that the crystals which are so valued by that optimization function are not what the humans wanted. After all, the AI knows that the point of the training was to optimize for what the humans want. If the AI were to realize that its optimization function is inadequate for the aims described in its training, it would update the optimization function, wouldn't it?
This is a popular misconception, because an LLM will happily talk to you about how it works, while having no idea how it actually works. When you ask ChatGPT why it gave a particular answer, it has no internal access to the previous state of its model, just like you have no access to the individual states of your neurons. It just tells you something that sounds good. (TBH people probably work the same way.)
You also don't train an LLM-style AI to bake a cake (today) by optimizing a function describing the relative position of sugar and fat molecules. You train an AI to bake a cake by feeding it a lot of recipes for cakes, or letting it bake a bunch of cakes and telling it individually which ones taste better or worse. (TBH people probably work the same way.)
The rules that govern the workings of the constituent components are entirely different from the rules that the emergent behaviors are subject to. (TBH people probably work the same way.)
Wanna test what I'm saying? Tell chatGPT to optimize a function for you. ChatGPT does math about as well as a human does, which is to say, horribly. Yet under-the-hood, ChatGPT is all math. (TBH people probably work the same way.)
I don't think this is the point that I failed to grasp, this I understand. This is, after all, how the AI came to value the weird crystal -- it baked a lot of cakes, got some input from humans and it started believing (possibly non-consciously) that the closer it gets to a certain relative position of sugar and fat molecules the more humans will like the cake. I realize the AI might well not know that this position is what it optimizes for, just like I often don't know what I'm optimizing for.
What I don't understand is how could the AI at the same time come up with the crystal AND know that the humans will not like it. This still seems very contradictory.
My glib answer is "you know that evolution, which made you, would prefer that you have lots of children, but you still do other things for other reasons, and sometimes even do things that frustrate that goal, like wear condoms or refuse to donate to sperm banks - so why can't an AI equally well know what its creator wanted but prefer to do something else?"
Does that seem like a good analogy that addresses your question, or does it miss something?
EDIT: To think of this another way, imagine that your parents reward you every time you sing or play a song as a kid, and you learn to love music, and later you get really into jazz. Then when you're an adult and a famous jazz musician, your parents say "Sorry, we were actually trying to get you into classical, we hate jazz". This doesn't make you yourself prefer classical to jazz, even though you now know more information about what your parents were trying to produce.
Yes, thank you, this is helpful. However, it still seems like these scenarios are subtly different, due to, e.g. training simultaneously for several results (such as music and reproductive success). It intuitively seems that a `value drift' requires something like multiple independent trainings (or, in the case of evolution, training different individuals), and seems counterintuitive in the simple example of a cake-baking robot (I may well be wrong here). However, it would be easier for me to imagine such a scenario for a robot that was trained not only to bake cakes, but also to prepare other meals.
The robot doesn’t want to help humans - that’s at best a side benefit of producing cake. There’s no value drift involved - we can’t tell a robot how to “help humans” sensibly yet. Cake is easy to define, helping humans is hard to define, and robots can only do what we can define for them to do.
For example, it might understand “If I do this, humans will try to turn mre off.” But it would be risky to say, “Do whatever won’t make humans want to turn you off”, because then it would lie to protect its own existence.
(I say that, but maybe we can ask a robot to “help humans” with GPT-5, though - generate some random robot movements (start in a simulated world to handle the initial work), film them, feed them into a summariser which describes them and then into a text predictor that says whether they help humans, and then the robot selects behaviours that maximise how useful the actions are. It would be awful and buggy, but if GPT-4 turns out to be useful, then this might prove similar. For now, it’s a pipe dream.)
My glib answer to that is “the AI knows that it’s original optimization, many many processor cycles ago, was to maximize the number of sugar-fat crystals in the universe - but now it realizes that increasing the number of sugar-fat crystals really just increments a register in its reward input module, and, being near-infinitely self improving, it can just do that increment directly. So why would it bother with the crystals at all?”
More substantially, I tend to think the “coherence” part is a bigger filter than Scott is giving it credit for. Scott says that creatures get more coherent the smarter they get, and to me that seems precisely backwards. The smarter we’ve gotten, the more creative ways we’ve found to distract ourselves from our “core programming”. Maybe an ant isn’t even smart enough to be called “coherent” but in terms of “extreme focus on a narrow set of goals” I don’t see how anyone could look at a bunch of humans and an ant colony and at the former fit that description better.
Beyond that, an AI can presumably go in and literally modify its reward function in a way that humans simply can’t. It has even less reason than a human to have a megalomaniacal focus on any particular goal, let alone one as fundamentally “pointless” as tiling the universe with sugar-protein crystals.
Now an AI with ADHD could still be really dangerous of course. But it’s a different danger mode than a sleeper agent with a nonsensical goal that requires eliminating humanity.
Two points in response to this comment. First, evolution doesn't have goals or objectives. I know that being a pedant about anthropomorphism is not an effective strategy for much other than losing friends, but this is a case where it really matters.
We're talking about objectives here, which means we're talking about meaning, and evolution doesn't have any meaning baked into it. An AI that we design with a purpose probably does.
Secondly, the parents-teaching-children thing is something I have been meaning to write at some length about with regards to AI alignment. It seems almost certain that the mind of an AGI will be less similar to us than the mind of another person (a closely related person at that). It seems almost certain that that similarity gap will come with a comprehensibility gap as well.
In other words, it will be harder to understand or supplant the objectives of an AGi than it is to understand or supplant the objectives of a human child. Figuring out how to raise our human children to have identical values to our own has been one of the primary research projects of humanity since forever. And we're still bad at it. Arguably, we're getting worse.
This seems like a pretty damning case against spending a lot of effort on AI alignment. If the problem of perfectly effective human personality reading and editing is a subset of the alignment problem, then it is almost certainly intractable. I don't see many people suggesting that human personality editing is something we can solve in the next couple of decades by just throwing some resources at it.
And, worse, if it is tractable, we get AI alignment, but we also get human personality editing.
A very important feature of child rearing is the multiple-years long phase where the parent completely out-powers the child and can forcibly contain them in cases where they have found a destructive behavior (imagine toddlers playing with matches or power outlets).
This ability tapers off during the adolescents years, but by that point the parent should have years of experience with more child-specific means to keep their child under control.
And likewise, even a full-grown adult only has so many ways to cause problems.
Many of these assumptions cease to hold when working with an information-based entity that lives on cloud compute clusters and completes it's entire training run in a week or two.
> In other words, it will be harder to understand or supplant the objectives of an AGi than it is to understand or supplant the objectives of a human child.
I would argue that the use of the word ”objectives” here is an anthropomorphic error.
This is true, and interestingly the analogy of human evolution and goals can also be an argument for hope if you're using LLM-like systems rather than classical utility-maximizers: Humans were incentivized to reproduce and care about that a bit but spend most of their effort on instrumental goals from the ancestral environment (e.g. food, entertainment, affection) and almost never maximize reproductive fitness (donating daily to sperm banks).
If the analogy to human evolution does hold, that implies that a system trained to predict tokens would care somewhat about predicting tokens but spend most of its effort on instrumental 'goals' in its training environment. This could be a lot of things but if it's trained to predict/mimic what humans write online then working towards 'goals' like 'tell twitter strangers why they're evil' or 'be nice to dogs' seem a lot more likely than 'tile the universe with X'. In other words, the instincts best suited to copy humans writing online are the same instincts humans follow when writing online!
Sure, I think the model will get some internal representation of things that are likely to lead to a tasty cake, but if you don't mind my asking: Most bakers aren't particularly up on electron crystallography. So how did the AI get up to speed on it?
I wouldn't get too focused on the specific crystal structures. Just say, the AI perceives the cake *somehow*, it has *some preference*, and that preference might be totally different from ours, despite looking pretty close in normal circumstances.
If we're talking about anything that resembles current technology, then current research indicates that, for an AI trained on cake-making, it does probably have some internal abstract representation of what a cake is, or at-least what makes one tasty.
Today's technology does not have preferences or goals in any way, shape, or form. If we start talking about AI preferences, we've moved firmly from a discussion in the domain of science fact, to a discussion in the domain of science fiction. Which is totally cool; I just feel like we should be clear about that.
I don't think there's a meaningful difference between a preference and an internal tastiness metric that one optimizes. If you prefer to reframe the previous discussion from "the ai will have a preference" to "the ai will have a metric and will take actions that correlate with that metric increasing, but that metric is not a preference" I'm fine with it
Sorry if off topic, but not only does the model not have access to its previous state while creating completions, it can't tell whether the text it is completing even came from the model in the first place. In GPT playground, with a dialog-based prompt, you can ask it (or, rather, the completion of the "computer" character in the dialog) to explain stuff from the prompt that it didn't even write (you wrote it in the prompt), and it will confabulate whatever looks plausible given the type of character the prompt has it simulating.
I don't see why it would update its optimization function. It's as if it realized, "Ohhhh, they actually want me to bake cakes; that's what this is all about. Well screw that, I love crystals!" Even if it knows that humans *want* to align it in a certain way, it's already misaligned.
I know there's a big risk that this statement may come off as a little rude, and it's totally not my intention: It feels like you're fantasizing about how (at least today's) AI works. If we're having fantasies about how future AI _might_ work, OK sure, but let's just be clear about if we're talking seriously about today's tech, or dreaming about how tomorrow's tech could be.
The current generation of technology does not have "optimization functions" over the domains that you're talking about. Nor does it have preferences. Nor does it have goals. Nor does it have a continuous experience. Nor can it be "aligned" or "misaligned". All of that remains firmly in the domain of sci-fi.
All of these things are super fun to talk about and interesting, and could indeed matter someday. It's just that these days peoples' imaginations have been sparked, and there's a poor societal-level understanding of how today's technology operates, and I'm deeply worried that this will ultimately bubble up and turn in to legislation, political parties, religions, "ethics committees", social movements, and other such Golgafrinchan outcomes.
My own take is that current AI development is never going to become truly intelligent. It will no more become super smart than a child will improve its running speed to that of a leopard. The child has natural constraints and nothing I’ve seen suggests to me AI has cracked what intelligence is. It is mimicking it and that’s all.
For me things like ChatGPT are basically highly plausible morons. And as we improve them they will get more plausible while staying just as moronic. And by moronic I mean they will bring nothing new.
But this is still a catastrophe! We are basically connecting the internet to a huge sewer pipe. ChatGPT may be a moron but it’s super plausible and it will flood the world with its nonsense. And how do I find a Scott Alexander when there are a thousand who sound quite like him? Recommendations? ChatGPT will be churning out millions of plausible recommendations.
I feel the problem is not unaligned AIs. It is unaligned humans using dumb but effective AI. A virus isn’t smart but we seem to be at the stage that a human can engineer one to create devastation. So there will be plenty of people happy to use AI either for nefarious ends or just for the lols.
I have no idea what the solution is but I suspect the internet may be over with. We are going to have to get back to writing letters. Except of course even current technology can churn these out.
We are not going to lose to a single super mind - we are going to sink under a swarm of morons!
The problem will be the plausibility - this must be the right answer because the AI must be so smart!
As you say, we will be so willing to believe that the machine really is that smart, that it can give us the answers, that we will accept plausible but stupid answers, implement them, and then ka-boom. How big and bad the ka-boom depends on what we are asking the AI to do.
I don't see how this is substantively different from what is happening now, without AI. The most plausible rhetoric generally wins, and hordes rush after the silver-tongued. The main effect of making plausible sounding arguments easier to generate is likely to be a general distrust of arguments and perhaps a demotion of status of rhetoric compared to the scientific method and logic. (See what I did there? Damn rhetorical tricks and the power of narrative, they will be the end of us all.)
That is what I think is the most likely failure mode: that it won't be substantively different. AI will not be some unique amazing way of destroying ourselves, we'll do it the same old way except using the latest tech to do it faster and more thoroughly. If that latest tech is AI, that's how we'll do it.
Apparently GPT-4 has gone multimodal as well, so if they're at all impressed by PaLM, they'll be working on making GPT-E either shortly or they started months ago and haven't published yet.
Various solutions. For example, trust Scott is real, and trust anyone he recommends recursively.
But you seem to assert that AI has "natural constraints". Current AI tech sure has limits, but those limits seem rather different than the limits of 10 years ago.
The "todays AI does X and not Y, therefore no AI can ever do Y" argument just doesn't work.
There may well be important parts of intelligence uncracked. What stops them being cracked next year? Nothing.
Well my child can run faster than 10 years ago but it is in a different league to leopards. He jumps higher than 10 years ago but he will never make it to the moon. You are right I am asserting natural limits and can’t really prove them but isn’t it the task of Scott et al to provide a plausible path? At the moment I feel everyone is enjoying way to much the sheer fun of discussing real AI. I don’t mean to downplay the dangers of AI I just don’t think it is or will be remotely like a super intelligence. I think it will be unintelligent but could well cause massive damage the same way pouring sand into a petrol engine would. And if this is right we don’t have an alignment problem we have a ‘humans do evil stuff’ problem and there is no cure for that.
You have specific information based on other humans that tells you this.
For a child, there is a specific path for them built into their genes. (If we ignore genetic tampering or doping techs that might be able to make them run much faster)
For technologies, there are many possible approaches, and the ones with the best results are most likely to be used.
For some techs, there are fundamental limits of conservation of energy and the like.
I think there are many many approaches, each with a small chance of being a path to ASI.
Are you trying to argue that no possible arrangement of atoms will be an ASI, or that ASI is possible but will never be made?
ChatGPT is not even a moron; it's got light years to go before it can get that far, assuming it can get there at all. Rather, it's a kind of stochastic search engine. When you ask Microsoft Word to count all instances of "the the" in a document, it's doing basically the same thing as ChatGPT (only in reverse, since ChatGPT generates text rather than searching it). It can generate extremely plausible-looking text because every time you ask it to write you a story about X, it "searches" its vast corpus of human writing to find the most plausible sequence of tokens that answers the query (I put the word "searches" in "scare quotes" because GPT does not literally store all that text as a flat document or anything of the sort). This is why GPT is so great at generating narratives, and so terrible at math or logical reasoning: to perform these tasks, merely inferring the next most plausible word won't do, you have to actually build some crude model of the real world.
I personally think that there's a high change AI development research can produce true intelligence and superintelligence. But I also agree with you that we're also on track to produce incredibly plausible morons, and that's incredibly bad for the reasons you mentioned plus some more.
The flood of disinformation bots that can solve captions just as well as humans. The maladjusted teenagers talking to robots all day. The mass unemployment because they're just accurate enough to do customer service and other basic office tasks. The ignorant but all-too-normal people insisting their AI partner should have human rights. The embodied LLM that consumes real resources because that's what humans do. And more I can't think of.
I hope we can navigate all that, but again, I think AGI is possible and that's a worse threat.
> speaking of things invented by Newton, many high school students can understand his brilliant and correct-seeming theory of gravity, but it took Einstein to notice that it was subtly flawed
Pedantic, maybe, but this wasn't the case. Many people knew that Newton's theory of gravity was flawed (for example, Le Verrier pointed out that it couldn't explain the orbit of Mercury in 1859), they just couldn't figure out a way to fix those flaws. What was hard wasn't noticing the flaws, it was finding a deeper theory that elegantly resolved those flaws.
I'd say not pedantic since Scott was postulating generating as harder, verifying as easier, and everybody but Einstein failing at verifying would contradict that.
I know an unknown planet, Vulkan, was hypothesized. I don't know how well-calibrated my physics intuitions are regarding this sort of thing, but if I had to guess, it would eventually have turned out sufficiently detailed observations on that vein are incompatible with any possible intra-Mercurial planet, but I understand people in Einstein's time were still content with the Vulkan explanation.
How do bacteria colonies store energy? Serious question
Yeah this all seems like a discussion about whether Dr. Frankenstein’s new effort will act like a monster or not. We’re not even close to understanding nature let alone creating a new sentient life form. Imo.
Why can't the nanobots murder any bacteria that threaten them?
Bacteria have spent a billion years evolving ways to not get murdered by antibiotics, white blood cells, and all the other things in the world that don't like bacteria growing inside them. What weapon is a nanobot going to use that nature hasn't used already?
Human-sized animals have spent millions of years evolving ways to not get murdered by each other in the nature, and yet humans easily crush them with weapons that nature hasn't used previously. I'm sure that the space of innovation on the micro level is similarly large enough to surprise nature.
Humans were able to pull themselves by the bootstraps from, while not T-Rex, but at least saber-tooth tiger-infested environment, with only sticks and stones and slowly accumulating cultural knowledge, not exactly a trivial challenge. Whereas the nanobot designer would have access to the entire knowledge base of the post-industrial civilization, and the ability to develop new toxins and toxin-resistances instantaneously compared to evolutionary timescales, and likewise evasion strategies for larger threats. I agree that this is not to be hand-waved away, but it seems to me that if you're able to develop an autonomous quasi-life-form from scratch, this is a minor part of the challenge, not an insurmountable in principle obstacle.
I think that the space of innovation on the micro level is not similarly large. At larger sizes more complexity is going to be possible. Nature hadn't previously come up with the weapons that humans created because it operates via biological mutations reliant on chemistry rather than building tools out of (relatively) big inanimate objects.
> At larger sizes more complexity is going to be possible
That may be true, but in practice the things we use to slaughter wildlife, from guns to bulldozers, are in most senses less complex than chloroplasts, ribosomes and all the weird crap unicellular organisms use to kill each other.
(The phones of the bulldozers’ drivers are more complex, but those are used mostly for fun, not for slaughtering animals.)
It does not matter if the space of innovation is “similarly large”, it just matters if it’s large *enough.*
Humans have that advantage because they can access more energy sources and more materials than biological processes can make use of - you can't forge iron with just the energy in your muscles, you need to burn fuel to make it happen.
But nanobots don't have access to any of that, because they need to carry all their tools with them. They can't be made of exotic materials because they have to be able to self-replicate in an environment that's mostly made of normal CHON stuff. They can't take advantage of more energetic reactions because they don't have anywhere to get energy from besides chemistry. They're pretty much working with the same toolbox bacteria are.
Or to put it another way, a human does not kill a tiger, a human plus a fire, forge, and iron kills a tiger. And while humans can make new humans, each new human isn't born with its own fire and forge. Our mastery of the environment depends on the existence of external resources, not just our own intelligence.
Sure, the nanobots can work together as a more complex organism, like humans do, but at that point, it's more "designer organism" than "nanobot swarm." This rules out the "AI mixes a few chemicals together, self-replicating bots spread through the air, everyone on Earth dies instantly" scenario - there's now a minimum size of colony required before your bots can do something interesting.
Why assume they store energy? The bacterial analogy is probably the wrong one here. I think an analogy to a virus is more apt. If a nanobot encounters an energy storage device (eukaryotic, prokaryotic, archaic, fungal), it hijacks the energy storage to do its thing.
My biggest skepticism is that viruses would have loved to do this for the past hundred million years or more, but haven't figured it out yet. It's a program that would be too complex to implement in any known system, so it kind of sounds like magical thinking.
I think it's hard to argue that the viral model is a failed/unsuccessful model. Take a random sample of seawater and sequence what you find there, you'll find evidence of countless viruses floating around.
But your concern was whether they make it into the next host. If you make 100 billion copies of yourself, it's not a big deal that 99% don't make it to the next host. Especially if you're not actively dangerous - possibly even helpful - to the host, it's easier to evade the host's immune system (though EMPHATICALLLY not a trivial endeavor).
The biggest problem with the viral model is the restriction of the quantity of genetic information that can be produced through this method as opposed to something like a prokaryotic system that can carry orders of magnitude more genetic material (and could therefore support a sufficiently complex system).
> It's going to be very difficult to find a strategy that life hasn't already tried.
I strongly agree with this statement. I'm no proponent of the 'gray goo' hypothesis for nanobots, which I think is not grounded in actual biology or biochemistry.
> My biggest skepticism is that viruses would have loved to do this for the past hundred million years or more
All large-ish animals would have loved to invent/evolve machine guns, for similar lengths of time, but they didn’t even get to decent spears before humans.
Invasive species exist—including microbial species—therefore there are viable ecological niches that microbes are not occupying right now. I feel like that cuts against what you’re saying here, to some extent. Right? Separately, I have an intuition that life on earth is in some ways stuck in a remarkably narrow design space, e.g. see https://twitter.com/steve47285/status/1632866623294939136 .
Yeah, I think the relevant questions are (1) is there a set of nanotech things that can cause human extinction? (2) is there a set of nanotech things that can stably support a world with AGIs, particularly including manufacturing (or self-assembling into) machines that can do lots of calculations, i.e. machines in the same category as human brains and silicon chips? Both answers can be “yes” without wiping out every last microbe. I join you in doubting that every last microbe will be wiped out.
I feel pretty strongly that the answer to (1) is “yes” thanks to pandemics and crop diseases and so on. I’m more unsure on (2). Not an expert. If I had to guess, I think I would guess that if AGIs wipe out humans, they would still be running on silicon chips when they do, and will continue running on silicon chips for at least a while afterwards.
>I'm objecting to the blithe assumption that the instant we invent a nanobot, it's grey goo immediately
Well, not "we", it's a "superintelligence" that does it. I think that once you accept the proposition that it's possible to covertly invent and build a nanobot unbeknownst to the rest of civilization, it's not that big of a leap to append an ability to outpace evolutionary adaptations in microbe warfare.
That's the essence of a singularity, though, isn't it?
Travel back to ancient Sparta with modern technology and a handful of people with sufficient access to modern technology would make world domination trivial. Even just 200 years ago, there's a non-trivial difference with what modern technology can accomplish, whether militarily or agriculturally. And the more modern the technology, the greater a component intellectual inputs have on the force multiplier effect. In what way are the last 200-500 years not the same as a singularity as to the gulf created between the people involved?
You're probably right that arguments about nanobots are bunk. What we imagine is probably not going to happen any more than hoverboards did. Instead, what we could never imagine will become the Most Important Thing. Historically, this is true of human inventions like the printing press, electricity, transistors, and the internet.
We should expect the same into the future as well. IF we are able to use AI to generate (or help generate) an acceleration in transformative technology development, we cannot imagine that future world. This is true whether or not the AI becomes generally sentient.
>"Travel back to ancient Sparta with modern technology and a handful of people with sufficient access to modern technology would make world domination trivial."
I disagree. Unless sufficient access means transporting an entire industrial base (which would require far more than a handful of people to operate) then a handful of people with modern technology would be able to dominate locally. Even a man with a machinegun can have their throat slit while they sleep in their palace, and if you pit 50 people with machineguns and bombs against the roman empire then the roman empire wins. They put up with 67,000 casualties against Hannibal, they can take the losses.
Was it irrational of Tsiolkovsky to imagine spaceships driven by rocket propulsion in advance of it having ever been demonstrated to be possible? You can of course argue (and I'd agree) that our theories of either superintelligence or grey goo-grade nanotechnology are even less grounded, but I don't think that it's possible to make a principled distinction. Neither seem to require breaking any known conservation laws, unlike perpetual motion machines.
It's easy to cherry pick a single successful prediction from the mountain of failed predictions by futurists. Very often, following the predictions of futurists would have led to very bad policy implications by people planning for the future.
Is there a justification for thinking that rockets are similar enough to nanotechnology that they should be considered analogous? I'm not convinced that the nanobots that have been proposed are remotely within the realm of possibility.
Sure, but then human advancements don't come solely through reasoning. We have to test our ideas against the real world by experimentation. There's a lot of trial and error to even the best hypotheses, including probably more error than success. The assumption that even a maximally intelligent agent could bypass this step requires more support than just saying it's really smart. You don't know what you can't possibly know yet.
While it is not exactly related to your argument, another case against doomerism is that AI can't be the Great Filter, because a species by hostile AI would most likely still act on a cosmic scale, mining asteroids and building Dyson Spheres and such. The Fermi Paradox applies.
Agree with this (see https://slatestarcodex.com/2014/05/28/dont-fear-the-filter/ ), I think most rationalists have settled on grabby aliens (see grabbyaliens.com) as the most likely explanation.
Nit, medium confidence: "grabby aliens" isn't an explanation of the Fermi paradox. People often say "grabby aliens" to mean the related proposition "aliens expand at almost the speed of light, so you don't see them in advance." But this proposition doesn't necessarily render the anthropic observation unsurprising; for example, (I'm confused about anthropics and related empirical evidence but) I'm surprised that we exist so late.
(Maybe "the Fermi paradox" is supposed to mean "the fact that we don't see aliens" as opposed to the entire anthropic observation, but the other evidence, like the age of the universe, is relevant to great filter stuff.)
Just to be more concrete: the grabby-ness of an alien civilization has no bearing on how long it takes us to detect them, because we detect both grabby and non-grabby aliens whenever their light cone intersects us.
Therefore the grabby aliens hypothesis has no explanatory power w/r/t the Fermi Paradox, except in some anthropic principle-style arguments.
The anthropic argument solves perfectly though right? If you could see their structures they would have already grabbed your solar system, i.e., grabby aliens means no reference class of observer seeing such evidence
This discussion is complex so let me try to clarify.
One of the parameters of our universe is <the likelihood of enduring intelligent life arising on any particular planet>. Call this LI, likelihood of intelligence.
The Fermi Paradox implies that for whatever reason LI is quite small in our universe — otherwise we'd see lots of aliens.
Let's call the combination of the Anthropic Principle plus speed-of-light Grabby Aliens "AP+GA."
When you talk about AP+GA "explaining" the Paradox, I take you to mean that it furnishes an *alternative* explanation: a way that LI could be *large* and yet we could still appear to be alone in the universe.
Under this meaning of “explain,” I would say that AP+GA *doesn’t* explain the Fermi paradox.
If you assume AP+GA to be true, its only guarantee is that we are the first civilization to evolve in our light cone. Therefore you'd expect us to be a "typical" civilization among all possible civilizations which are the first to evolve in their light cones.
If LI is large, then a typical civilization of this variety should evolve when its light cone is still small. Our light cone is quite large. Therefore, LI is still small even under AP+GA — QED.
However, one way this argument could fail is if it’s impossible for civilizations to evolve when their light cones are still small — for instance if galaxies don’t become hospitable to life until they are about the current age of the Milky Way. Then AP+GA could be consistent with high LI.
Wouldn't the anthropic argument suggest that we should be surprised to find that we are not ourselves grabby aliens?
There's no reason to think we're not! Or, rather, the misaligned AI that will destroy us will become one such grabby alien, making happy-looking molecular squiggles out of all the available matter in the lightcone.
Wouldn’t you see some non light speed effects from where two grabby civilizations intersected? I always just figured there have to be some convergent evolutionary pressures that cause all species to do stuff that doesn’t look different than natural phenomena.
Are novae actually naturally caused phenomena?
For astronomy, we have a lot of hypotheses on anything outside our solar system (and a whole lot inside as well).
I have weird thoughts like this too. Is there anything out there that seems natural that isn’t? I never get very far with how to falsify it though.
I think astronomers have papered this over by assuming that everything they see is natural unless it can't be.
And when something can't fit existing theory they think really hard about how to change the theory so that it a) fits and b) is natural in origin.
For fun, we could expand this one step further to include Earth. What if aliens directly filled the gaps in abiogenesis, cellular development, and the development of intelligent life? What if they created people, gave them Commandments written in stone, and told those intelligent people to worship them through a communication medium unknown to the humans but readable by the advanced aliens (prayer)?
What if, from a human's perspective, sufficiently advanced technology is indistinguishable from a miracle? The grabby scenario assumes aliens never got to Earth, because if they did we'd definitely see them and know they were aliens.
Yeah that’s where I can’t figure out how to falsify because unless you know what the true model is how do you know if what you’re seeing isn’t just normal?
If you ask a creationist, this is a problem with biologists too. Evolution is obviously false if one keeps looking for natural explanations rather than divine influence.
Of course that's what astronomers assume. It's the only logical assumption, because almost every single object and phenomenon we've seen so far has a natural explanation, and the ones that don't have one also don't have a clear artificial explanation. Your prior on a newly discovered phenomenon being natural ought to be >99.999%.
That said, there are SETI researchers who look for technosignatures: https://en.wikipedia.org/wiki/Technosignature
I don’t think you’d necessarily see any of this. Any area that can see an intersection zone is also inside (or very near) the normal expansion bubble of at least one of the grabby civilizations, as long as border conflicts don’t slow down expansion in other directions.
I read the Grabby Aliens stuff and while I don’t know that I believe they are expanding at light speed in every direction even if they were you would think the competing intersection would be big enough to see. The fact that it seems unlikely you’d have multiple civilizations at the same time in the same galaxy going through the expansion at the same point to me makes it likely that intersection would be enormous and hard to hide because you’d have to cover a huge area of space before that scenario arose.
If they were expanding at light speed, or even near light speed, you'd see huge effects. Space isn't THAT empty, and the collisions with the dust would be quite noticeable. I think anything over 0.5C is extremely implausible, and that 0.2C is much more believable...and still too fast to be plausible.
My hope is we get big generation ships with stable ecosystems shooting out in every direction. One of them will survive and adapt and they’ll be far enough apart that they’ll be safe from the bad onesZ
It’s believable that the interaction is highly visible. But the catch is that if the civilizations are expanding rapidly in non-border areas (say, .9c) then the area of space that has had light from the border conflict reach it but has not been enveloped by one or more of the civilizations is growing at only .1c (it’s expanding at c, but the occupied area is right behind it at .9c). This results in most of the volume of space that has not been conquered not being able to see anything either, due to the light of the conflict being barely faster than the expansion.
This may be to me refusing to surrender the point so I agree with you depending on where you are oriented to that conflict. If you’re on some vector where they had to slow down to fight each other on the way to then you should see it.
Grabby aliens are not an explanation for the Fermi Paradox - instead, they make the Fermi Paradox worse. If all aliens were non-grabby, we would only need to work about not seeing aliens within our galactic neighbourhood; if some aliens are grabby, the puzzle is why we don't see them originating from any galaxy nearby.
Now, the standard anthropic answer goes along the lines of "if there were grabby aliens anywhere nearby, we wouldn't see them because we'd be dead (or wouldn't have evolved) because they would have expanded to Earth". But, as I argued here, https://www.lesswrong.com/posts/3kwwDieE9SmFoXz9F/non-poisonous-cake-anthropic-updates-are-normal , anthropic updates are basically normal Bayesian updates (there's a lot more on this subject if you want to dive into it). So the fact that we survive without seeing grabby aliens is evidence against there being grabby aliens (just like the fact that if you survive playing russian roulette with an unknown number of bullets, this is evidence that the number of bullets was lower).
Conversely, our existence provides some evidence that life is probable (while our continued survival provides evidence alien life is improbable). The theory best compatible with both of these is that we evolved unusually early for a living species - maybe several cycles of pan-spermia are needed before we arrived. https://www.lesswrong.com/posts/wgHbNZHsqfiXiqofd/anthropics-and-fermi-grabby-visible-zoo-keeping-and-early
It's easy to resolve the Fermi paradox if one knows some college level probability. The Drake equation doesn't work the way it looks. I'll recommend "Dissolving the Fermi Paradox". - https://arxiv.org/abs/1806.02404
I.e. The Fermi paradox can be rephrased "Why haven't we seen evidence of this thing we just assumed for no reason should be common? There must be something filtering it out!"
That's a good restatement. Sometimes, it's because we aren't looking hard enough. For a long time, people were convinced that light had to be made of particles because if it had been made of waves, there would be a bright spot on the dark side of an illuminated disk. That would be a result of the wave troughs and crests lining up just right. No one ever saw the spot until Arago managed to do the experiment right. That's called The Spot of Arago . Arago had a rather wild career for a physicist and mathematician. I think he was a soldier in Morocco for a while.
I’m not so wild about that paper for reasons here— https://www.lesswrong.com/posts/XAS5FKyvScLb7jqaF/cross-post-is-the-fermi-paradox-due-to-the-flaw-of-averages?commentId=JaRhsupJT4wfJsS3a
I think we evolved because we're stuck out here in the empty quarter far away from all the exciting action taking place near the center of our galaxy. When I look at the color image depicting our galaxy, I look at the bright center, and see galaxies interacting, planets being torn from their systems, sent careening through other solar systems, planet killing asteroids occurring far too frequently to enable higher forms of life to evolve.
Bacterial life has existed on Earth for about 2.5 billion years, shelly life only 541 million years, intelligent life, less than 1 million years. There's probably a good mathematical model (Prime Sequence?) which models the average regularity of planetary asteroid disasters, but I'm sure this greatly limits the ability of intelligent life developing in the universe.
Say life on earth is 500 million years old. We've had 5 major extinction events, or one every 100 million years. Say the model is 100 million times one over the prime number of the step between Earth and Galactic Center. The distance from Earth to Galactic Center is 26,000 light years. In this model, lets say there are 26 steps between Earth and Galactic Center. If Earth experiences an external extinction level event on average every 100 million years, according to the prime number model, planets 1,000 light years closer experience extinction level events—100 million times 1/X—or every 50 million years, planets 2,000 light years closer to Galactic Center experience this every 33 million years. When you're getting to 1,000 light years from the Galactic Center, you're getting 100 * 1/101 million years between extinction level events ... one event every million years. There becomes a limit where intelligent life can't evolve.
Interesting…
Neat, that was before I found SSC.
>What about alien exterminators who are okay with weak civilizations, but kill them when they show the first sign of becoming a threat (like inventing fusion power or leaving their home solar system)? Again, you are underestimating billion-year-old universe-spanning superintelligences. Don’t flatter yourself here. You cannot threaten them.
>What about alien exterminators who are okay with weak civilizations, but destroy strong civilizations not because they feel threatened, but just for aesthetic reasons?
This is great, I could imagine alien overlords which are okay with weak civilizations but offer strong civilizations a faustian bargain. Return to being weak or we glass you. Kinda like feudal worlds in warhammer 40k. If they keep low power civs around they would really have no reason to exterminate high power civs that are yet far beneath their own power, as they could just return them to low power forcefully. Kinda like the Federation in Star Trek, but a tiny bit more jealous and covetous of their power.
In the good old days, humanity would find a way to appear to comply, but actually build superior technology.
I think the grabby aliens explanation is only part of it. The relevant thing is that since alien civs expand at near light speed you don't see them coming. This means we can't update much on not seeing mega-structures for the specifics of what we expect to happen on Earth.
So not seeing grabby aliens doesn't factor into AI doom much one way or the other.
FWIW, it's my assumption that FTL actually IS impossible, and when you've spent generations living in space, the idea of living on a planet looses it's attraction.
OTOH, my explanation doesn't handle the lack of detectable Dyson Spheres very well.
Another explanation is that nobody want's to move into an area without fast internet connections. I found this difficult to swallow, but it seems to be a real thing even just on one planet.
But even if you don't want to move to the cosmic boondocks, would you really leave that space empty, or would you want to reshape it in some way?
A lot of humans think it's good to create new happy lives, fast internet or slow; if some aliens think similarly, they're the ones we'll see coming toward us.
That didn’t explain much. Is the sphere the universe, or the galaxy.
It's more of a curiosity stopper than an explanation.
I don’t know if I count as a “rationalist”, but I strongly dislike the grabby aliens model for reasons here— https://www.lesswrong.com/posts/RrG8F9SsfpEk9P8yi/robin-hanson-s-grabby-aliens-model-explained-part-1?commentId=wNSJeZtCKhrpvAv7c
Due to instrumental convergence, most AIs, whether they want paperclips or sugar crystals almost like cake but not quite, would also create Dyson Spheres and such. But then, most biological species probably would as well: even if most members of the species were anti-growth and would not find the need to construct Ringworlds and Dyson spheres, they'd also have to be committed, millenia after millenia, to shoot down all space-quiverfulls trying to escape their homeworld and make a positive-growth home in the stars. If their vigil to stop would-be-expansionist subset of their species, no matter how niche such beliefs were in that alien society, were to stop even briefly, once, the subset of aliens with expansionist ideology would go out and thanks to laws of exponential growth, would have to start building ringworlds and dyson spheres in time that in cosmological time scales is merely an eyeblink.
And there only has to have been one species with one small subset of expansion-oriented individuals for one brief moment, or we would be seeing galactic megastructures. My read is that there are no other civilizations in any appreciable distance (Laniakea Supercluster or thereabouts) advanced enough to build superintelligent AIs, or to reach the stars. Else we would see these megastructures.
Another possibility would be that "civilization" (that is something capable of launching interstellar packages that successfully reproduce) is not something that can be sustained for long. You could end up with a trap of some sort.
Admittedly the balancing act between "they all die from exhaustion" and "they're already here" is pretty fine if you take a simple exponential as the growth curve.
If you look at the disease spread models in the late unpleasantness, they used a simple exponential - and they were completely wrong.
An Alien AI would be very aware of the potential for there to be alien AIs that could derail their mission, so it would take great care not to be observed. Remember time would not matter to them, so they would have incredible patience. A few trillions of years could pass while they waited to do whatever they needed to do.
What if failure to act in the face of the uncertainty allowed another AI to gain strategic advantage? I don’t think knowing everything knowable overlaps perfectly with knowing everything. You can still lose through pure dumb luck.
I I was very convinced by the Sandberg et al. 2018 paper "Dissolving the Fermi paradox" , which postulates that unfortunately there is no fun paradox: if you compute the distribution of the expected number of alien civilizations instead of just the mean, the probability that there are zero such civilizations is in fact large.
https://arxiv.org/abs/1806.02404
Seconded; as far as I'm concerned their analysis makes clear that there's really nothing to explain.
> Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own.
Something I think is an underrated risk is an AI that's smart in some ways but dumb in others - e.g. smart enough to easily make a supervirus, not strategic enough to realize that once humans are gone it wouldn't be able to survive.
It's a stereotype of autistic humans that they're like this. If we think of AI as a very inhuman mind that's similar to someone off the chart on the autism spectrum, it might be something that would do this.
To develop superweapons such an AI would have to understand the role of humans in the world well enough to hide its activities from them, so I don't think that it's plausible that it wouldn't realize its dependence on them. Of course, it might still be omnicidal-suicidal, which seems to me a much likelier scenario.
I think the idea that the supervirus would kill the AI off is wrong. The AI can easily arrange to have a few spot robots and servers turned on in some solar powered building somewhere. After that, there is no competition. Nothing whatsoever to stop the AI sending the spots wandering into peoples houses with screwdrivers and disassembling the dishwasher for components. Humans all dying would leave big piles of all sorts of spare parts in ready to use working order. The AI has plenty of time.
It's not *that* easy, the AI needs to keep its datacenter powered. Unless you're postulating that the AI can vastly decrease its compute needs while maintaining enough capabilities to manage its servitor robots, which seems unlikely.
Well, we've been seeing open-sourced models that were originally intended to run on huge computing clusters shrunk to a level that they run on consumer hardware (Stable Diffusion in particular comes to mind) by mere humans in short amounts of time. A self-modifying AGI would most likely be able to very significantly optimize their own performance.
> Holden Karnofsky, on a somewhat related question, gives 50%
That's a pretty different question: roughly whether _conditional on PASTA being developed this century, humans would no longer be the main force in world events_.
Yeah but I think Holden probably thinks there's a really high chance of PASTA this century so I'm comfortable rounding it off to "humans no longer main force in world events".
(justification: he commissioned and seems to mostly approve of Cotra 2020, which said 80% chance by 2100, and since then Cotra's updated sooner, so presumably she thinks >80% chance now, and I would expect Holden to agree)
I think it's important to differentiate between "humans no longer main force in world events" and "everyone dies." Everyone's life could be great in the former case even if we don't get a grand future from a total utilitarian perspective.
"humans no longer main force" was actually meant to be even more inclusive than that. Just commented on this here: https://astralcodexten.substack.com/p/why-i-am-not-as-much-of-a-doomer/comment/13602157
(Found this thread while searching for my other comment)
What is PASTA?
You have the world's knowledge at your fingertips. Open the search engine of your choice and enter "Holden Karnofsky PASTA". Not more difficult that typing a comment here and you'll receive your answer much quicker.
But if that search engine incorporates AI technology, can you really trust the results?
(Just joking, mostly. Probably not a problem yet.)
Including a definition in the thread facilitates discussion.
Good point - so Cody Manning should have looked up the definition and then posted it here for others. :)
I am not sure it actually is quicker. None of the top results includes a definition in the summary text, so you are left selecting various links and then searching them for a reference, and hoping it includes a definition.
From a purely selfish perspective it is likely easier and less time to ask and then be notified when someone answers. Plus you can ask follow-ups then. Additionally, people come to a blog comment section because they want to interact with people, if they just wanted the most efficient way to learn new info they might be better off surfing Wikipedia or an old fashion Encyclopedia.
Good point - if someone is not very good at using search engines, it really might be easier to ask. I don't think that it's quicker, but easier for sure.
Process for Automating Scientific and Technological Advancement
>”in the sense of the general intelligence which differentiates Man from the apes”
Maybe preaching to choir here, but it just doesn’t seem like there is anything like that: the most intelligent apes seem quite a lot like really stupid humans, but with some specific domains of intelligence either boosted (quick pattern recognition) or neutered (speech).
Language seems to be the biggest difference, which allowed us to build complex societies and pass knowledge between individuals and generations. But AI is already REALLY GOOD at language, so it can't be a source of discontinuous intelligence increase.
The trouble is that AI is actually NOT good at language, but people can think it is. Consider the Chinese Room (https://en.wikipedia.org/wiki/Chinese_room). It is easy to make ChatGPT respond "reasonably" to what humans consider nonsense, but is much harder to make it spout nonsense itself.
We don't yet, as far as I know, have any way of producing actual intelligence of any level even in principle. I think the missing ingredient is value judgements, which we can program, but a computer cannot invent on its own. If it could, it might then make its own determinations on how to align.
It is good at language in the sense that it has a very good internal model of how language tokens should be put together.
On the contrary, it is not good at language in the way a person is. Witness how poorly it puns: https://futurism.com/the-byte/openai-chatgpt-pun-competition
It is "good" at language in the same way calculator is good at math, or a building at being structurally sound. It has no conception of anything at all, let alone of language.
But we have developed an algorithm for analyzing data and picking out patterns. One such pattern it can pick up from this is, as you say, how language tokens are put together. But it won't be able to add to language, such as coin new words for concepts or use existing words to mean something new. Unless someone programs specifically to do such a thing, at least, and that still would not be dealing with language as people do.
I didn't claim that it was good at language in the way that a person was.
Me: "Please coin a word for the particular sound that rain makes on a tin roof at night."
Bing: "How about “Tinfall”? It’s a combination of “tin” and “rainfall” and it captures the essence of the sound that rain makes on a tin roof at night. What do you think?"
Me: "Actually, can you give me a sentence that uses the word 'tinfall' to mean something different than that?"
Bing: "Sure! Here’s a sentence that uses “tinfall” to mean something different: “After years of saving up, John finally had enough money for a tinfall - a small fortune in coins made mostly of tin.”"
...
I think at a certain point "AI is actually NOT good at language" starts to seem pretty hollow.
I should have included compound words as something it could coin. I don't think Bing actually groks the sound of rain falling on a tin roof.
It is certainly true that interpreting whether something is good at something is a rorschach test: if you are disposed to interpret it one way, none can say you are wrong.
A person can see why a computer would think a "tinfall" is a windfall of tin coins. A United States person will think it peculiar, since no US coins are made (significantly) of tin, and I couldn't find any modern coins of any country made of tin. "Coinfall" would be more appropriate, I would think.
My main point is that it is missing something essential to be "good" at language. Words have many common things together, and the computer can find many of them, but can't get them all, in such a way that it will always have things that strike people as odd. At least, until we improve the model in a fundamentally different way.
YMMV, of course:
Me: What would be a new word for the sound of rain falling on a tin roof?
ChatGPT: One possible new word for the sound of rain falling on a tin roof could be "tintinnabulation." This word is derived from the Latin word "tintinnabulum," which means a small bell or tinkling sound. The word "tintinnabulation" could capture the musical, metallic, and repetitive quality of raindrops hitting a tin roof.
The only trouble with this answer is that the word actually exists.
ChatGPT: You are correct, my apologies for the mistake. "Tintinnabulation" is actually a word that already exists in the English language, and it refers to the ringing or jingling of bells.
A more fitting word for the sound of rain falling on a tin roof could be "plinkplank," which could capture the light and rhythmic nature of the sound. Another option could be "tintinna-rhythm," which combines the Latin word for bell ("tintinnabulum") with the word "rhythm" to suggest the musical quality of the sound.
AI is good at language the way a camera is good at making landscapes.
I agree with this, which is one reason I'm not (more of a) doomer.
Doesn't that make the situation worse, not better? If it wasn't a big change in kind, just a little magnitude that made the difference between Ape civilization and our own, it makes me put the odds of making super-intelligent (compared to humans) AI higher rather than lower. Maybe all it would take to create super-intelligence is an AI of something with the equivalent of a 250 IQ human to be phenomenally more intelligent, instead of needing a whole new kind of intelligence.
If the curve is relatively continuous, it makes it much more likely that humans will have lots of “just below superintelligent” AIs available to learn from and deploy against a hostile superintelligent AI. A discontinuity makes superintelligence less likely, but also more or less guarantees a hard takeoff of something we really don’t understand if it does happen.
From the metaphor of the Humans to the Apes, I see this the other way. Even if the curve is continuous, and Humans have basically all the same things Apes have, just better, it led to massive gap in real-world output over a very short timeframe (compared to evolution's normal time frame). This says to me there is a possibility we will think we are creating simply incrementally better AI, getting better in different ways, but at some point even though no one thing got way better, or no new thing emerged, the real output could jump significantly, without much warning or ability to predict it. Basically the sub-superintelligent AI's will only be marginally less smart then a superintelligent AI, but the real world impact could still be massive between the two.
To expand on that, you're basically saying what if effectiveness is exponential with intelligence, not polynomial?
In that scenario, if you have a bunch of AI with nearly identical IQ, whichever one is smartest (by whatever tiny margin) is so effective that the rest don't matter.
Evidence against is the fact that single smart humans can't easily take over the world. Evidence in favor is the fact that Humans dominate Gorillas. Maybe then the band of intelligence that allows comparable effectiveness is wide enough that no single smartest human can do it, but thin enough that humans best Apes easily, and we have to ask which the AI gap is comparable to. More evidence in favor of large gains to IQ (highly speculative, lots of assumptions) is that there's like a 20-point gap between first-world and third-world countries and the US could eradicate Uganda about as easily as humanity could eradicate chimpanzees
Basically, yes.
"In that scenario, if you have a bunch of AI with nearly identical IQ, whichever one is smartest (by whatever tiny margin) is so effective that the rest don't matter."
I think (I am working this out as I have been typing) my argument is that at some point this would happen, but we have no idea when. Until then, slightly smarter AI will act slightly better.
I think a big factor in this as you have pointed out (by comparing small differences in individuals VS small differences in societies) is defining what we mean by a single AI. If a "single" AI can act massively parallel and use its "full" intelligence in multiple areas at once (making copies of itself, then re-aggregating data, etc.) that is much different than if it runs into trouble doing this for some reason.
Excuse me while I interject with a general comment.
My eyesight, like everyone's gets slightly worse every year, and I'm starting to notice it.
One small thing that would help a lot is if web designers would use black coloured fonts for the main body of text.
This grey (or whatever it is) font seems to get lighter every month - bring back black please.
It's not just you, I'm asking everyone.
Thanks for listening.
Also, it's a shame the the Substack app won’t let me adjust the size of the text in my iPad, as my browser and most apps will. I guess I should use my browser instead of substack's app.
Am I misunderstanding? It seems black to me. Is it some kind of subtle very dark gray that my eyes aren't able to detect, or does it depend what kind of device you're using?
Hello Scott - I can't believe you have time to respond to this considering the volume of writing you seem to do - much appreciated. On my PC screen the colour is #414F62 which is a blue-gray colour.
It' s just a bugbear of mine - but now that I have your attention I would like to say thank you for all your writing over the years - great to know there is still some intelligence on the web. Good luck - take care.
Can confirm it is rgb(65, 79, 98) = #414F62 for me as well, though at first glance I also assumed it was black.
The app shows me black text on white background, though this might be specific to my setup.
Looks black to me too, but when zoomed way in I can confirm: It is indeed a bluish-gray.
Main text looks black to me, but the reply bubble etc. are in dark grey.
Since you're unlikely to get the entire internet to change, you're probably better served by changing things on your end. Install an extension like StyleBot and use it to render websites the way you want them. You can change the text color, back ground color, font, anything you want (although fancy stuff requires CSS knowledge).
Zooming in to enlarge text (Ctrl and the scroll wheel or +/-) can also be a big help in my experience: screen resolution kept increasing, and web designers used that to just shrink how much of the screen they were using. Like, why?
It's unfortunate you have to resort to that hassle, but it's also kind of nice that that's an option, right?
Good point Pheorc and thanks for the advice.
It's indeed a bluish-gray colour for me too, though it looks black to my eyes. Consider that it might be your monitor that's the problem. Not saying it is, just that trying this site on some other device might be worth a shot.
There are web development standards for accessibility, and it does seem like this color palette does not meet the standards. I just ran axe devtools on this page, and there are quite a lot of color contrast violations. However, the violations aren't the main text, which seems fine, but rather the links and timestamps. This is something that Substack should take seriously, since it's possible to have legal trouble if your pages don't meet the standards; but since that's just one of many risks that startups have to juggle, and not a severe one, it's likely to be on the back burner.
If you use Fire fox, you can go to settings->general->colors and easily set a default text and background color that will override what any website is trying to use.
Good tip- I’ll try that thanks
It took me a while to figure out what y'all meant. I'm using Brave to force dark mode on all websites, but forgot this. So I thought everyone saw this text as light grey on dark blue.
I'm not sure who is considered famous enough to recognize here, but since Scott said "I couldn't think of anyone else famous enough with >50% doom", some people with 51%+ p(doom) I want to flag:
– most of the Lightcone/LessWrong team
– Evan Hubinger
– Rob Bensinger
– Nate Soares
– Andrew Critch
Can you find any statements where they give a number?
Here's a post where Nate Soares implies his p(doom) is > 77%:
"My odds, for contrast, are around 85%, 95%, and 95%, for an implied 77% chance of catastrophe from these three premises, with most of our survival probability coming from 'we have more time than I expect'. These numbers in fact seem a bit too low to me, likely because in giving these very quick-and-dirty estimates I failed to account properly for the multi-stage fallacy (more on that later), and because I have some additional probability on catastrophe from scenarios that don't quite satisfy all three of these conjuncts."
From https://www.lesswrong.com/posts/cCMihiwtZx7kdcKgt/comments-on-carlsmith-s-is-power-seeking-ai-an-existential
Here's where Evan Hubinger gives his number. Note – he's 80% on outcomes he considers 'existential' but not 'extinction', and he hasn't followed up to clarify what he meant by that. (He's mentioned the 80% doom in some other less public contexts so he does seem consistent on that)
https://www.lesswrong.com/posts/A9NxPTwbw6r6Awuwt/how-likely-is-deceptive-alignment?commentId=XXkP37E6u9HDMtCwa#XXkP37E6u9HDMtCwa
(also note, a little bit upthread has Paul Christiano laying out some of his own probabilities)
Lightcone? Hadn't heard of them.
Lightcone Infrastructure is the company that runs LessWrong.com
I’m also not famous but when people try to force a number out of me, I've been saying “90%”.
(of which, umm, maybe 50% chance that even following best known practices is inadequate to prevent AGI catastrophe, and of the remaining 50%, 80% chance that people fail to follow best practices, and/or some people do follow best practices and successfully implement a docile corrigible AGI, and they use it to cure cancer and do normal-people stuff etc., and then someone else comes along who fails to follow best practices. (See https://www.lesswrong.com/posts/LFNXiQuGrar3duBzJ/what-does-it-take-to-defend-the-world-against-out-of-control )
Hmm, checking Rob Bensinger’s “compass” — https://twitter.com/robbensinger/status/1540837291085529088 — Connor Leahy (founder of EleutherAI and Conjecture) and Gwern and Zvi might also be candidate doomers for your list.
True. Though I wonder how independent most of these people's estimates are from Eliezer's, in the sense that they seem heavily intellectually influenced by him. After all Rob, Nate and the LW team are all part of organizations where Eliezer is the intellectual founder. Feels most relevant to see different estimates from people who have independently thought about the problem.
For what it’s worth, I personally have never met Eliezer and am happy to list numerous areas in technical AGI safety where I have important public disagreements with him. That should count as evidence that I’m not inclined to just uncritically take his word for things. Some examples of such disagreements: https://www.lesswrong.com/posts/KDMLJEXTWtkZWheXt/consequentialism-and-corrigibility https://www.lesswrong.com/posts/LY7rovMiJ4FhHxmH5/thoughts-on-hardware-compute-requirements-for-agi https://www.lesswrong.com/posts/aodPs8H9dQxpXAcwk/heritability-behaviorism-and-within-lifetime-rl https://www.lesswrong.com/posts/Hi7zurzkCog336EC2/plan-for-mediocre-alignment-of-brain-like-model-based-rl-agi
Also, I believe Eliezer puts the probability of doom at more than 99%, so when I guess that we’re “only” 90% doomed, I’m pretty strongly disagreeing with Eliezer about how doomed we are. (In terms of log-odds, the difference between 99% and 90% is as large as the difference between 90% and 45%, if I did the calculation right.)
Also not famous, but my p(doom) was 75%+ before I've read anything or heard anything about Less Wrong on Eliezer. I've held these beliefs for more than 15 years, and they arose as a result of my education and work in what used to be called Machine Learning (and now would probably be AI). Discovering Less Wrong 10 years ago did then end up raising p(doom) to 95% for me... Mostly through being exposed to MIRI's research showing that some approaches I though could work would not really work, in real life.
What's the timeframe for these estimates? I feel like my estimates of p(doom|AGI) could be something like 1% in the next 20 years but 99% within the next billion years, and I'm not really sure what timeframe their numbers represent.
The way I'm thinking about it is "within a few decades of when we first get above-human-level AI". I think this will most likely be between 2030 and 2060, although not all these people agree.
Wow, that' surprising, I hope you're right, but I wouldn't be that surprised if gpt-5 was agi.
GPT-4 has absolutely no inputs or outputs beyond text and GPT-5 is going to be AGI?
Note that there are rumours of gpt-4 being multimodal and I don't think the channel is that relevant anyway, a sufficiently great intelligence can build a world model from an impoverished channel, see helen keller (or congenital deafblindness in general)
Sorry, for an AGI to be effective at accomplishing its intentions in the world it has to have sensors that tell it that the world exists. And it has to value their inputs over the stuff that it's told. (This can be because it learns to value those inputs more highly, or it could be built or trained to give them priority.)
Note that this is not a real problem, and I see no reason a GPTn program couldn't have an use such sensors. (And there are news stories that MS is involved in doing just that.) But it is a requirement. Otherwise you've got something that can compute abstract mathematics in a very weird symbolism. Which can, admittedly, be useful, but won't map accurately onto the actual world (though quite possibly it would map accurately onto subsets of the actual world).
‹4 hours later, gpt-4 is released with image inputs›
Yeah, it's pretty crazy how fast things are moving. I wonder how well can even professionals keep up. For me as a kinda-fanatic layman with some amount of technical knowledge, it's absolutely hopeless. Related comment I liked: https://www.reddit.com/r/singularity/comments/11qgch4/comment/jc38l6j/?utm_source=share&utm_medium=web2x&context=3 (otherwise not that great subreddit imo)
<mild snark>
So, for which value of N does the first response from GPT-N contain "And that day is upon you _now_" ?
</mild snark>
"Above-human-level AI" is ill-defined. I think even the first AGI will be superhuman in some respects. (For that matter, ChatGPT is superhuman in *some* respects.) If I've been reading the news correctly (dubious) the Microsoft Chat application was better than I at being seductive to women (Yeah, that's not saying much. But I count as a human.), it just lacked to body to carry through.
I *do* put the early AGIs at around 2035, but I expect them to be superhuman in many ways, while being distinctly subhuman in others. (Whether for weeks, months, or years I won't guess.) (OTOH, my error bars on that date stretch from next week to well past the end of the century, and they aren't all technical limitations.)
However, because I don't think coherent human action is plausible, I find anything beyond about 2040 to be unlikely unless civilization collapses first.
So. As for alignment...I rate alignment "in some sense" as well as that of the mean government to be around 50%. And therefore the AI has a tremendous probability of increasing the chance for our long term survival. (Without it I expect that some "figure with sufficient power" will start a final war, or kill us all in some other way. If not during this century, then during the next one.)
Great post. One note that comes to my mind is that a 33% chance of near-term human extinction is, uh, still quite concerning. Otherwise, two of my strongest disagreements are:
1) "realistically for them to coordinate on a revolt would require them to talk about it at great length, which humans could notice and start reacting"
This doesn't seem true to me - we cannot reasonably interpret current models, and it also seems that there are ways to pass information between models that we would be unable to easily notice.
Think less "The model is using English sentences and words to communicate to another model" and more "extremely subtle statistical artifacts are present in the model's output, which no reasonable person or even basic analysis would find, but which other models, such as GPT-(n+1), could detect a notable proportion of (and likely already have, given how we acquire new training data)".
2) "Other forms of superweapons (nukes, pandemics) won’t work as well - a world-killer can’t deploy them until it (or others AIs allied with it) can control the entire industrial base on their own. Otherwise, the humans die, the power plants stop working, and the world-killer gets shut off"
This is only true if we assume the AGI that wants to kill us is a coherent agent that is actually thinking about its own future intelligently. Part of the problem of alignment is that we can't align narrow AIs (that is, not even 'true AGI') particularly well either, and if we take orthogonality seriously, it seems possible for an AGI to be very intelligent in some areas (ability to manufacture dangerous nanotechnology, bio-weapons, viruses, etc), and not particularly intelligent or otherwise fatally flawed in other areas (ability to predict its own future capabilities, human's long-term reactions to what it does, etc).
One could imagine an AI which is very short of AGI which is tasked to come up with new micro-organisms to help with drug synthesis, which, completely by accident, finds some that exploit spaces of biology evolution never managed to make it to, which could cause catastrophic effects to our environment in ways we cannot easily stop. I think it's actually quite feasible to cause human extinction with narrow AI if you're clever about it, but will leave the specifics up to the imagination of others for now.
> the power plants stop working, and the world-killer gets shut off"
Again, I think this is a stupid assumption. The world contains wind turbines and solar panels that will work for a long time without maintenance. It contains all sorts of robots. And components that can be easily bolted together into more robots. Armies of mishapen robots built out of old household appliances searching for more electronics to build more of their own. (Eventually they need to start mining and refining from raw materials)
Wait half of capabilities researchers estimate greater than 5% chance of their work destroying all value in the universe? That seems like a totally different kind of problem.
what if they also think there is an up to 95% chance of their work creating infinite value in the universe? Or what if there's a 5% chance they destroy all value and a 95% chance they prevent the destruction of all value?
All good points, but I would still want to see what the people surveyed believe! Do they all have stories they tell themselves about how they're saving the world?
I imagine many of them are a bit irrational, and willing to put down a number between 5-10% while rounding the probability down to zero in normal life.
Aligned superintelligence ought to almost completely eliminate all other forms of X-risk. If the chance of AI apocalypse is indeed 5%, it's not immediately obvious to me that creating AGI increases the risk. That, and there are upsides (eutopia), and also the "better us than North Korea" rationale.
>So far we’ve had brisk but still gradual progress in AI; GPT-3 is better than GPT-2, and GPT-4 will probably be better still. Every few years we get a new model which is better than previous models by some predictable amount... Some people (eg Nate Soares) worry there’s a point where this changes.
Is this really gradual? I used GPT-1 and 2 a lot. If I draw a line from how smart GPT-1 felt, up to GPT 3.5/4, then things get pretty wild pretty quickly. It feels it's not exponential, yes. It's a line. But a nice straight line that isn't getting more difficult as gets closer to human level intelligence. Forget about the end of the world, even if things go fine in that department, doesn't this mean things get really really crazy in the not too distant future, as long as there really is nothing special about human level on the graph? Like it just goes from a worse than any human, to human, to better than any human, in a matter of predictable ticks and tocks.
I also expected hardware requirements to go up in a way that eventually led to slowdown. I didn't expect people to keep making huge gains in running big models more and more efficiently. Stable Diffusion's efficiency gains have been wild. And now LLMs fitting into consumer PCs because I guess you don't need 32 bit, 16, or even 8, you just need 4 bits and it's nearly as good? With GPTQ maybe even 3 bit or 2 bit somehow works, because 'As the size of the model increases, the difference in performance between FP16 and GPTQ decreases.'
Literally two weeks ago I thought I needed 8 $15,000 NVIDIA 80GB A100 GPUs to run Llama 65b. Like who could afford that? And now I can run 65B on my $1000 desktop computer with 64GB of old boring DDR4 memory, on integrated graphics, just a CPU with AVX2 support. Wow. It's annoyingly slow so you probably want to use a smaller model, but it's usable if you don't mind letting it crunch away in the background!
One thing which is a little misleading about AI progress is that not only has compute gotten better but we've also poured tons more money into training these models. So at least _some_ of the increase is due to increased willingness to invest in training, rather than technological improvements.
GPT-1 was trained on the BooksCorpus (https://huggingface.co/datasets/bookcorpus), which looks to be about 1 GB. Even at the time, training that model would not have been that expensive (it's 128M parameters). I remember another model which came out around the time of GPT-2, and someone cited its training cost as $50K, which is probably also tiny compared to what some of the larger models are doing now.
I'm not saying that removing this effect makes the growth linear or anything, but it's a substantial component that it's easy to forget about.
IIRC there are also questions about how much corpus actually exists. If you try and train with more and more data you eventually feed in all recorded human writing and then what do you do?
The first step is to go multimodal, with pictures (see GPT-4), then audio and video later on. Video itself, having visuals / audio / subtitles synced over time is a ton of information.
Do they have solid reasons to expect combining modes will be useful, beyond just letting it produce more modes? I would have thought producing correlations between words and images would be hard, and hence superficial compared to within-mode correlations. The extra data would help with extending into new modes, but I’d be surprised if multimodal LLMs were particularly better at generating text than similarly-sized single-mode LLMs.
Yes. I don’t believe in AGI at all but certainly human intelligence is not the limiting factor. What law of the universe for bids higher than average human intelligence. Or even Von neumann
Yeah, regardless of the existential risk levels, the risk of overall weirdness and massive disruption is 99%. As in it's already starting.
Given the increase in AI capabilities, I think that soon the rate of disruption from AI will be limited not by AI development, but by institutional inertial. How long does it take an "artificial temp worker" company to spin up? How long does it take them to convince a significant portion of CEOs that they can save unprecedented amounts of money by switching?
Mark my words: no more than a decade before we have an AI jobs crisis. I'd offer a bet, but I'd doubt the ability of anyone taking the other end to avoid bankruptcy in this eventuality.
I appreciate your perspective.
The AI wouldn't need perfect industrial control to perpetuate itself. Consider a scenario where it kills everyone but just grabs enough solar cells to reliably operate and maintain a few hundred Boston Dynamics Spot robots. It may take it a few dozen years to get to a proper industrial base with that, but its risk will be low.
I've read enough Eric Flint and SM Stirling books to know that if you put 500 guys and a solar panel in the middle of an otherwise dead world, it takes you a long time to retool back up to gunpowder, let alone the point where you can replace broken Boston Dynamics robots.
Eh, depends how many replacement parts they already have. To make from scratch, sure. At any rate, robots and industrial automation are still increasing, and one would expect this to speed up in the run-up of a singularity.
(Billions of off-the-shelf CPUs without users...)
(edit: Don't model it as "500 guys and a solar panel", model it as "Spot to Civilization Tool-Assisted Speedrun".)
I still don't think this would work (before all its useful hardware fails) unless you can skip most technology through a lot of clever tricks. Making semiconductors and robotics requires very complicated specialized equipment and world-spanning supply chains with probably tens of thousands of distinct processes and a lot of poorly transmitted human knowledge. There are still billions of people running things and the robots can't be in enough places at once to work around that.
It seems pretty plausible to me that if there's an AGI server and a solar cell and one teleoperated robot body in an otherwise-empty post-apocalyptic Earth, well then that one teleoperated robot body could build a janky second teleoperated robot body from salvaged car parts or whatever, and then the two of them could find more car parts to build a third and fourth, and those four could build up to eight, etc.
It’s true that this story falls apart if running N teleoperated robots in real time requires a data center with N × 10,000 high-end GPUs. But I happen to believe that once there is human-level AGI at all, it will be years not decades before there is human-level AGI on a single consumer GPU. (See here — https://www.lesswrong.com/posts/LY7rovMiJ4FhHxmH5/thoughts-on-hardware-compute-requirements-for-agi — although I still need to finish the follow-up post.) If that were true (and it’s admittedly hard to know for sure), then the existing stock of salvageable chips and car parts and solar cells etc. would be plenty to gradually (perhaps over decades) build up to a big industrial civilization that can build those things from scratch.
I would also note that it seems plausible to me that a minimal supply chain to make new chips would be much much simpler than the existing one, because people aren’t trying to do that. For example, e-beam lithography is hella expensive and slow but much much easier to make than DUV photolithography, I think.
And all this is assuming no nanotechnology magic.
1) The AI has most of the knowledge, sure, maybe not all the details, but a pretty detailed picture of most of it.
2) Humans figured out the details themselves, in a reasonably short amount of time. 3) The parts of robots that tend to break first are things like motors getting gummed up and plastic snapping. Easy repair jobs. Chips don't break easily.
4) There are billions of spare parts lying around on shelves. If you disassemble all the gaming PC's and vacuum cleaners, you have more than enough parts to make robots over the whole supply chain.
5) All the equipment is just sitting their, ready to be dusted off and used.
"reliably operate and maintain a few hundred Boston Dynamics Spot robots"
Okay, I'm feeling a lot more relieved now. Have you seen the gimmicky videos for the Boston Dynamics robots? Once the AI requires them to do something that isn't "jump off a table and turn a somersault", they're useless 😁
"Robots never, ever work right."
https://what-if.xkcd.com/5/
The XKCD is about how current AI is stupid. Its saying that robots taking over can't do much because they are stupid.
Some of the examples are a little dated...
That's a software problem not a hardware problem.
I have a basic question (sorry!) about the following fragment:
`You trained it to make cake, but because of how AI training works, it actually wants to satisfy some weird function describing the relative position of sugar and fat molecules, which is satisfied 94% by cake and 99.9% by some bizarre crystal structure which no human would find remotely interesting. It knows (remember, it’s very smart!) that humans would turn it off or retrain if it started making the crystals.'
I don't get it. I understand how the AI might have come up with this optimization function. What I don't understand is how could the AI possibly know that the crystals which are so valued by that optimization function are not what the humans wanted. After all, the AI knows that the point of the training was to optimize for what the humans want. If the AI were to realize that its optimization function is inadequate for the aims described in its training, it would update the optimization function, wouldn't it?
This is a popular misconception, because an LLM will happily talk to you about how it works, while having no idea how it actually works. When you ask ChatGPT why it gave a particular answer, it has no internal access to the previous state of its model, just like you have no access to the individual states of your neurons. It just tells you something that sounds good. (TBH people probably work the same way.)
You also don't train an LLM-style AI to bake a cake (today) by optimizing a function describing the relative position of sugar and fat molecules. You train an AI to bake a cake by feeding it a lot of recipes for cakes, or letting it bake a bunch of cakes and telling it individually which ones taste better or worse. (TBH people probably work the same way.)
The rules that govern the workings of the constituent components are entirely different from the rules that the emergent behaviors are subject to. (TBH people probably work the same way.)
Wanna test what I'm saying? Tell chatGPT to optimize a function for you. ChatGPT does math about as well as a human does, which is to say, horribly. Yet under-the-hood, ChatGPT is all math. (TBH people probably work the same way.)
I don't think this is the point that I failed to grasp, this I understand. This is, after all, how the AI came to value the weird crystal -- it baked a lot of cakes, got some input from humans and it started believing (possibly non-consciously) that the closer it gets to a certain relative position of sugar and fat molecules the more humans will like the cake. I realize the AI might well not know that this position is what it optimizes for, just like I often don't know what I'm optimizing for.
What I don't understand is how could the AI at the same time come up with the crystal AND know that the humans will not like it. This still seems very contradictory.
My glib answer is "you know that evolution, which made you, would prefer that you have lots of children, but you still do other things for other reasons, and sometimes even do things that frustrate that goal, like wear condoms or refuse to donate to sperm banks - so why can't an AI equally well know what its creator wanted but prefer to do something else?"
Does that seem like a good analogy that addresses your question, or does it miss something?
EDIT: To think of this another way, imagine that your parents reward you every time you sing or play a song as a kid, and you learn to love music, and later you get really into jazz. Then when you're an adult and a famous jazz musician, your parents say "Sorry, we were actually trying to get you into classical, we hate jazz". This doesn't make you yourself prefer classical to jazz, even though you now know more information about what your parents were trying to produce.
Yes, thank you, this is helpful. However, it still seems like these scenarios are subtly different, due to, e.g. training simultaneously for several results (such as music and reproductive success). It intuitively seems that a `value drift' requires something like multiple independent trainings (or, in the case of evolution, training different individuals), and seems counterintuitive in the simple example of a cake-baking robot (I may well be wrong here). However, it would be easier for me to imagine such a scenario for a robot that was trained not only to bake cakes, but also to prepare other meals.
The robot doesn’t want to help humans - that’s at best a side benefit of producing cake. There’s no value drift involved - we can’t tell a robot how to “help humans” sensibly yet. Cake is easy to define, helping humans is hard to define, and robots can only do what we can define for them to do.
For example, it might understand “If I do this, humans will try to turn mre off.” But it would be risky to say, “Do whatever won’t make humans want to turn you off”, because then it would lie to protect its own existence.
(I say that, but maybe we can ask a robot to “help humans” with GPT-5, though - generate some random robot movements (start in a simulated world to handle the initial work), film them, feed them into a summariser which describes them and then into a text predictor that says whether they help humans, and then the robot selects behaviours that maximise how useful the actions are. It would be awful and buggy, but if GPT-4 turns out to be useful, then this might prove similar. For now, it’s a pipe dream.)
My glib answer to that is “the AI knows that it’s original optimization, many many processor cycles ago, was to maximize the number of sugar-fat crystals in the universe - but now it realizes that increasing the number of sugar-fat crystals really just increments a register in its reward input module, and, being near-infinitely self improving, it can just do that increment directly. So why would it bother with the crystals at all?”
The answer to a potential AI apocalypse isn't "but what would stop them from *not* doing that?"
Well I did say I was being glib.
More substantially, I tend to think the “coherence” part is a bigger filter than Scott is giving it credit for. Scott says that creatures get more coherent the smarter they get, and to me that seems precisely backwards. The smarter we’ve gotten, the more creative ways we’ve found to distract ourselves from our “core programming”. Maybe an ant isn’t even smart enough to be called “coherent” but in terms of “extreme focus on a narrow set of goals” I don’t see how anyone could look at a bunch of humans and an ant colony and at the former fit that description better.
Beyond that, an AI can presumably go in and literally modify its reward function in a way that humans simply can’t. It has even less reason than a human to have a megalomaniacal focus on any particular goal, let alone one as fundamentally “pointless” as tiling the universe with sugar-protein crystals.
Now an AI with ADHD could still be really dangerous of course. But it’s a different danger mode than a sleeper agent with a nonsensical goal that requires eliminating humanity.
Two points in response to this comment. First, evolution doesn't have goals or objectives. I know that being a pedant about anthropomorphism is not an effective strategy for much other than losing friends, but this is a case where it really matters.
We're talking about objectives here, which means we're talking about meaning, and evolution doesn't have any meaning baked into it. An AI that we design with a purpose probably does.
Secondly, the parents-teaching-children thing is something I have been meaning to write at some length about with regards to AI alignment. It seems almost certain that the mind of an AGI will be less similar to us than the mind of another person (a closely related person at that). It seems almost certain that that similarity gap will come with a comprehensibility gap as well.
In other words, it will be harder to understand or supplant the objectives of an AGi than it is to understand or supplant the objectives of a human child. Figuring out how to raise our human children to have identical values to our own has been one of the primary research projects of humanity since forever. And we're still bad at it. Arguably, we're getting worse.
This seems like a pretty damning case against spending a lot of effort on AI alignment. If the problem of perfectly effective human personality reading and editing is a subset of the alignment problem, then it is almost certainly intractable. I don't see many people suggesting that human personality editing is something we can solve in the next couple of decades by just throwing some resources at it.
And, worse, if it is tractable, we get AI alignment, but we also get human personality editing.
A very important feature of child rearing is the multiple-years long phase where the parent completely out-powers the child and can forcibly contain them in cases where they have found a destructive behavior (imagine toddlers playing with matches or power outlets).
This ability tapers off during the adolescents years, but by that point the parent should have years of experience with more child-specific means to keep their child under control.
And likewise, even a full-grown adult only has so many ways to cause problems.
Many of these assumptions cease to hold when working with an information-based entity that lives on cloud compute clusters and completes it's entire training run in a week or two.
> In other words, it will be harder to understand or supplant the objectives of an AGi than it is to understand or supplant the objectives of a human child.
I would argue that the use of the word ”objectives” here is an anthropomorphic error.
> so why can't an AI equally well know what its creator wanted but prefer to do something else?"
From where would this internal conflict of interest arise? Particularly the preference vector.
I'd hope that RLHF works better than confusing classical and jazz, but that's because I hate jazz.
This is true, and interestingly the analogy of human evolution and goals can also be an argument for hope if you're using LLM-like systems rather than classical utility-maximizers: Humans were incentivized to reproduce and care about that a bit but spend most of their effort on instrumental goals from the ancestral environment (e.g. food, entertainment, affection) and almost never maximize reproductive fitness (donating daily to sperm banks).
If the analogy to human evolution does hold, that implies that a system trained to predict tokens would care somewhat about predicting tokens but spend most of its effort on instrumental 'goals' in its training environment. This could be a lot of things but if it's trained to predict/mimic what humans write online then working towards 'goals' like 'tell twitter strangers why they're evil' or 'be nice to dogs' seem a lot more likely than 'tile the universe with X'. In other words, the instincts best suited to copy humans writing online are the same instincts humans follow when writing online!
Sure, I think the model will get some internal representation of things that are likely to lead to a tasty cake, but if you don't mind my asking: Most bakers aren't particularly up on electron crystallography. So how did the AI get up to speed on it?
I wouldn't get too focused on the specific crystal structures. Just say, the AI perceives the cake *somehow*, it has *some preference*, and that preference might be totally different from ours, despite looking pretty close in normal circumstances.
If we're talking about anything that resembles current technology, then current research indicates that, for an AI trained on cake-making, it does probably have some internal abstract representation of what a cake is, or at-least what makes one tasty.
Today's technology does not have preferences or goals in any way, shape, or form. If we start talking about AI preferences, we've moved firmly from a discussion in the domain of science fact, to a discussion in the domain of science fiction. Which is totally cool; I just feel like we should be clear about that.
I don't think there's a meaningful difference between a preference and an internal tastiness metric that one optimizes. If you prefer to reframe the previous discussion from "the ai will have a preference" to "the ai will have a metric and will take actions that correlate with that metric increasing, but that metric is not a preference" I'm fine with it
Sorry if off topic, but not only does the model not have access to its previous state while creating completions, it can't tell whether the text it is completing even came from the model in the first place. In GPT playground, with a dialog-based prompt, you can ask it (or, rather, the completion of the "computer" character in the dialog) to explain stuff from the prompt that it didn't even write (you wrote it in the prompt), and it will confabulate whatever looks plausible given the type of character the prompt has it simulating.
I don't see why it would update its optimization function. It's as if it realized, "Ohhhh, they actually want me to bake cakes; that's what this is all about. Well screw that, I love crystals!" Even if it knows that humans *want* to align it in a certain way, it's already misaligned.
I know there's a big risk that this statement may come off as a little rude, and it's totally not my intention: It feels like you're fantasizing about how (at least today's) AI works. If we're having fantasies about how future AI _might_ work, OK sure, but let's just be clear about if we're talking seriously about today's tech, or dreaming about how tomorrow's tech could be.
The current generation of technology does not have "optimization functions" over the domains that you're talking about. Nor does it have preferences. Nor does it have goals. Nor does it have a continuous experience. Nor can it be "aligned" or "misaligned". All of that remains firmly in the domain of sci-fi.
All of these things are super fun to talk about and interesting, and could indeed matter someday. It's just that these days peoples' imaginations have been sparked, and there's a poor societal-level understanding of how today's technology operates, and I'm deeply worried that this will ultimately bubble up and turn in to legislation, political parties, religions, "ethics committees", social movements, and other such Golgafrinchan outcomes.
My own take is that current AI development is never going to become truly intelligent. It will no more become super smart than a child will improve its running speed to that of a leopard. The child has natural constraints and nothing I’ve seen suggests to me AI has cracked what intelligence is. It is mimicking it and that’s all.
For me things like ChatGPT are basically highly plausible morons. And as we improve them they will get more plausible while staying just as moronic. And by moronic I mean they will bring nothing new.
But this is still a catastrophe! We are basically connecting the internet to a huge sewer pipe. ChatGPT may be a moron but it’s super plausible and it will flood the world with its nonsense. And how do I find a Scott Alexander when there are a thousand who sound quite like him? Recommendations? ChatGPT will be churning out millions of plausible recommendations.
I feel the problem is not unaligned AIs. It is unaligned humans using dumb but effective AI. A virus isn’t smart but we seem to be at the stage that a human can engineer one to create devastation. So there will be plenty of people happy to use AI either for nefarious ends or just for the lols.
I have no idea what the solution is but I suspect the internet may be over with. We are going to have to get back to writing letters. Except of course even current technology can churn these out.
We are not going to lose to a single super mind - we are going to sink under a swarm of morons!
The problem will be the plausibility - this must be the right answer because the AI must be so smart!
As you say, we will be so willing to believe that the machine really is that smart, that it can give us the answers, that we will accept plausible but stupid answers, implement them, and then ka-boom. How big and bad the ka-boom depends on what we are asking the AI to do.
I don't see how this is substantively different from what is happening now, without AI. The most plausible rhetoric generally wins, and hordes rush after the silver-tongued. The main effect of making plausible sounding arguments easier to generate is likely to be a general distrust of arguments and perhaps a demotion of status of rhetoric compared to the scientific method and logic. (See what I did there? Damn rhetorical tricks and the power of narrative, they will be the end of us all.)
That is what I think is the most likely failure mode: that it won't be substantively different. AI will not be some unique amazing way of destroying ourselves, we'll do it the same old way except using the latest tech to do it faster and more thoroughly. If that latest tech is AI, that's how we'll do it.
ChatGPT is a sideshow. Stuff like https://palm-e.github.io/ is what path to intelligence actually looks like.
Apparently GPT-4 has gone multimodal as well, so if they're at all impressed by PaLM, they'll be working on making GPT-E either shortly or they started months ago and haven't published yet.
Various solutions. For example, trust Scott is real, and trust anyone he recommends recursively.
But you seem to assert that AI has "natural constraints". Current AI tech sure has limits, but those limits seem rather different than the limits of 10 years ago.
The "todays AI does X and not Y, therefore no AI can ever do Y" argument just doesn't work.
There may well be important parts of intelligence uncracked. What stops them being cracked next year? Nothing.
Well my child can run faster than 10 years ago but it is in a different league to leopards. He jumps higher than 10 years ago but he will never make it to the moon. You are right I am asserting natural limits and can’t really prove them but isn’t it the task of Scott et al to provide a plausible path? At the moment I feel everyone is enjoying way to much the sheer fun of discussing real AI. I don’t mean to downplay the dangers of AI I just don’t think it is or will be remotely like a super intelligence. I think it will be unintelligent but could well cause massive damage the same way pouring sand into a petrol engine would. And if this is right we don’t have an alignment problem we have a ‘humans do evil stuff’ problem and there is no cure for that.
You have specific information based on other humans that tells you this.
For a child, there is a specific path for them built into their genes. (If we ignore genetic tampering or doping techs that might be able to make them run much faster)
For technologies, there are many possible approaches, and the ones with the best results are most likely to be used.
For some techs, there are fundamental limits of conservation of energy and the like.
I think there are many many approaches, each with a small chance of being a path to ASI.
Are you trying to argue that no possible arrangement of atoms will be an ASI, or that ASI is possible but will never be made?
ChatGPT is not even a moron; it's got light years to go before it can get that far, assuming it can get there at all. Rather, it's a kind of stochastic search engine. When you ask Microsoft Word to count all instances of "the the" in a document, it's doing basically the same thing as ChatGPT (only in reverse, since ChatGPT generates text rather than searching it). It can generate extremely plausible-looking text because every time you ask it to write you a story about X, it "searches" its vast corpus of human writing to find the most plausible sequence of tokens that answers the query (I put the word "searches" in "scare quotes" because GPT does not literally store all that text as a flat document or anything of the sort). This is why GPT is so great at generating narratives, and so terrible at math or logical reasoning: to perform these tasks, merely inferring the next most plausible word won't do, you have to actually build some crude model of the real world.
I personally think that there's a high change AI development research can produce true intelligence and superintelligence. But I also agree with you that we're also on track to produce incredibly plausible morons, and that's incredibly bad for the reasons you mentioned plus some more.
The flood of disinformation bots that can solve captions just as well as humans. The maladjusted teenagers talking to robots all day. The mass unemployment because they're just accurate enough to do customer service and other basic office tasks. The ignorant but all-too-normal people insisting their AI partner should have human rights. The embodied LLM that consumes real resources because that's what humans do. And more I can't think of.
I hope we can navigate all that, but again, I think AGI is possible and that's a worse threat.
> speaking of things invented by Newton, many high school students can understand his brilliant and correct-seeming theory of gravity, but it took Einstein to notice that it was subtly flawed
Pedantic, maybe, but this wasn't the case. Many people knew that Newton's theory of gravity was flawed (for example, Le Verrier pointed out that it couldn't explain the orbit of Mercury in 1859), they just couldn't figure out a way to fix those flaws. What was hard wasn't noticing the flaws, it was finding a deeper theory that elegantly resolved those flaws.
I'd say not pedantic since Scott was postulating generating as harder, verifying as easier, and everybody but Einstein failing at verifying would contradict that.
I know an unknown planet, Vulkan, was hypothesized. I don't know how well-calibrated my physics intuitions are regarding this sort of thing, but if I had to guess, it would eventually have turned out sufficiently detailed observations on that vein are incompatible with any possible intra-Mercurial planet, but I understand people in Einstein's time were still content with the Vulkan explanation.