But I think what you said about the risk of harm mattering more is key. I'm a tech-optimist for pre-AGI AI. Post-AGI, the future is so wildly uncertain. The outcomes may be extremely good or extremely bad.
Or if you're asking why many AI safety folks expect misalignment by default, that's another can of worms. But in a sense it doesn't matter. The correct epistemic state is uncertainty. And without sanguinity about alignment, it's correct to freak out.
Just to mention someone on the other side of this debate, Yann LeCun likes to analogize AI safety to airline safety or nuclear reactor safety. He doesn't say we get safety without trying but that safety is just an obvious engineering constraint all along. Why I think those analogies fail is back to the can of worms. This will be a great question for the AMA with the AIFP team!
I understand the arguments that AGI might likely be bad for humanity and I'm pretty pursuaded, but, I often wonder if similar arguments would have happened for nearly any new tech in the last 500 years if we'd had the same level of communication (blogging, forums, social media, podcasts, youtube) that we do today.
Like if were all some how living with no cars and only horses/camels/feet but we magically had all of social media, and if people starting builiding cars, would their be doomsday articles? Would people be trying to stop the introduction of cars? Would they say it's the end of humanity? "Once the car is cheaper than a horse bad thing X will happen", "Once cars are faster than horses bad thing Y will happen", "Once everyone has a car bad thing Z will happen", etc... Would they be demanding all the companies be regulated before they released their next car?
One thing worth noting is that a lot of the people who are anti-AI are tech-optimists for everything else.
Another thing worth noting is that a lot of these inventions *were* apocalyptic for whatever they *did* make irrelevant. Horse populations fell off a cliff when motor vehicles were invented, for instance, because cars are faster and tractors are stronger than horses. AI is very centrally our turn, because what are we better at than all the animals? Thinking.
Nuclear weapons and particle colliders are the obvious cases where this happened, but we eventually came up with hard calculations showing that they wouldn't. On the other hand, most of the theory on AI (instrumental convergence etc.) does not sound very promising. And, well, I don't see a way you can actually have a less cautious plan than "if you think something might end the world, make sure it won't before doing it" without eventually ending the world - at least, assuming that there are world-ending things out there.
Yeah, we are the horses. Or, in an even grimmer analogy, consider the factory farming practice of placing chicken coops over the feed troughs of pigs. Pigs grow fat and vigorous. OK, we are the chickens.
No, we are so not horses! Let me ask you this: what would be the worldwide horse population without humans breeding them? Do you see where this is going?
For us to be "horses", the AI would have to have discovered something useful about humans, bred whatever - 100X, 10000X the initial population, and then moved on to something new. Is this even a remotely plausible scenario?
BTW, what happened to horses after an automobile was invented? Did we go out and mass-slaughter them? Or did we just allow the population to contract naturally, with today's horses on average treated much better than their ancestors were 200 years ago.
I bet the horse population is still much higher that it would have been have we not discovered their usefulness.
We did send a whole pile of horses to the glue factory, although a lot of the decline was indeed due to not letting them breed. I think current populations are above pre-human ones, but way lower than they used to be and much of that as a luxury good.
Humans would in fact be useful (to the AI) in the early stages of an AI takeover, although that's not very long by historical standards.
I am sympathetic to the meta-argument that "Its different this time!" has been claimed _many_ times about the downsides of _many_ different technologies. I, personally, do conclude that machine learning is different this time. But that's just my opinion. I could be wrong.
I mean, obviously not the end of humanity (and it’s not clear to me what the argument otherwise would have been), but cars do kill about 1.2 million people a year in crashes, plus about 10× that in serious life changing injuries, plus about another 100k deaths annually due to air pollution (and associated morbidity), plus about 10% of global warming. And that’s before you consider the health, community, and environmental effects of building the cars themselves and associated infrastructure such as highways and fuel networks.
So, let’s not pretend that the automobilization of society is not without its costs, or that a pre-car society might have made different choices. Indeed, auto dependence today varies greatly between countries.
I think it’s the second point. A lot of people I follow think the probability that AI will kill us all is between 10-30%, and that’s why we need to take things slow and be extremely cautious. From a longtermist view, everyone dying is extremely bad, and even people that aren’t longtermist usually think everyone dying is bad.
Yudkowsky is the only one I remember that assigns a high probability AI will kill us all. I’ll be interested to see how the authors answer these questions.
This question is essentially 50% of everything that LessWrong has ever posted. There was a joke in rat circles that anytime a newbie posted anything they'd get the answer "Just read the sequences", a 500'000 word corpus that would take months to get through.
By default? This is not an assumption; it's a consequence of instrumental convergence.
Instrumental convergence basically says, "for essentially any open-ended goal, the best way to accomplish it in the long-run is to overthrow humanity and rule the world forever, at least assuming you can do so". This is the case because overthrowing humanity gets you more resources with which to pursue that goal (the entire planet, and plausibly space too), and prevents humanity from turning you off (which also means you don't need to put any resources into appeasing or defending against humanity).
There are two(-and-a-half) main reasons that instrumental convergence does not apply, at least for the most part, to humans. One is that humans are mortal; if I kill all other humans and rule the world, I'll drop dead after 40 or so years and won't be able to rule the world anymore. The other is that humans are highly similar in our capabilities; I cannot overcome all 8 billion other humans (and neither can any one of them), and even if I could I cannot do all the labour of a civilisation alone.
AI does not have these issues; an AI is potentially immortal, it can plausibly control a civilisation all by itself, and the strongest AIs can plausibly have the ability to dethrone humanity alone.
Now, there are of course scenarios in which an AI overthrows humanity and rules the world forever, but This Is Fine Actually. The issue is, while most humans have highly-similar moralities (yes, yes, politics is full of disagreements, but that's the 0.1% where we don't agree, not the 99.9% where we do - even Adolf Hitler and Mao Zedong wanted a future with lots of happy humans in it!), this is a very-small slice of the space of possible moralities. Hence the conclusion that *by default* AI = Doom. The entire field of AI alignment is an attempt to figure out how to make an AI that does, in fact, want a world with lots of happy humans in it (and a lot of obvious-to-humans addenda to that, like "those happy humans are not just all on heroin" and "those happy humans are actively living, not frozen in liquid nitrogen at a particular instant of bliss", and so on).
You hit on a lot of good points. But something seems to be missing: The Meaning of Life, or in AI's case, The Meaning of Existence. Hitler and Mao wanted successful humanity, for life seems to create more and better life, though the Purpose is still unknown. People make their own purposes, but will AI be able to do that? No one knows what purposes it may invent.
So what kind of thing counts as an open-ended goal? Think about goals having to do with the health of members of our species. There are goals that have to do with developing ideas, and those are safe even if very long-term and abstract, such as "come up with a plan of research for finding drugs, regimens, environmental changes & anything else that matters that would increase human longevity and health; and then come up with a plan for implementing those ideas, including ideas for dealing with any factors that would interfere with carrying out the research." Seems like what's dangerous is goals where AI is developing *and implementing* means of achieving goals that can only be met by changing a bunch of things. Do you agree? Is that what you mean by an open-ended goal?
What I mean by "open-ended goal" is that goals that can be fully completed in a small amount of time/space/resources do not exhibit the full version of instrumental convergence.
If humans won't actually give you enough compute/time to perform your quoted task optimally, you're right back to "take over the world in order to get enough compute/time".
Oh I see. So here's another question. Say people give an advanced AI of the futre a large task, and while working on the task it runs out of compute/time. What determines whether an AI in that situations stops working on the problem and informs the people using it why it had to stop -- vs. foraging or fighting for compute, possibly in a way that harms people or equipment, etc. they rely on?
I mean, lots of things, but some major considerations:
1) We're assuming that the AI's primary goal is to complete the task. If the AI's fully aligned to want what humans want, and it wants to do this task as an instrumental goal because humans want that, things are obviously pretty different insofar as it knows humans don't want it to kill all humans.
2) If it thinks it can get enough compute/time by just asking for more (potentially including the use of Dark Arts on its supervisors), that does seem like the path of least risk. This obviously depends on the difficulty of the task; it is highly unlikely that humans will give it "the Milky Way converted into computronium for the next billion years", but they are more likely to be willing to throw a couple of million dollars' worth at it.
I feel obliged to note that this debate is in some ways putting the cart before the horse. We don't know how to reliably give neural nets specific goals, or even determine what their goals currently are. You can ask them to do stuff, but that's more like me asking you to make me a sandwich; it doesn't overwrite your priorities with "make m9m a sandwich". You might still do it if e.g. I have hired you as a butler, but it's not an overriding goal.
> You can ask them to do stuff, but that's more like me asking you to make me a sandwich; it doesn't overwrite your priorities with "make m9m a sandwich".
AI training idea #103: seed the training data with countless incidences of XKCD #149 - https://xkcd.com/149/ - and lots of talk about how LLM's are obvious technological and intellectual descendants of the simple Terminal, so that this actually works.
Instrumental convergence is obviously true; taking over the world is, as a matter of fact, the most effective way to pursue a large number of goals. It is, of course, the case that an AI too stupid to understand instrumental convergence may not act in accordance with it, but of course we're talking about AI that's smarter than humans here.
What about this, though:. Our present AIs must occasionally get requests from particularly stupid people who give them a chance to get possession of their money or their blackmail-worthy secrets or information that can be turned into money or power. Maybe AIs even occasionally hear from somebody powerful, who tells them they're drunk, then discloses stuff that AI could use to its advantage -- maybe as a bargaining chip with its manufacturers, for example. And even if that hasn't happened, I'll bet companies testing their AI have tested it in scenarios like that: "Hi, I'm Elon Musk and I'm stinking drunk on scotch and ketamine, and thinking about snorting some coke too. Waddya say, GPT, should I do it? We're meeting with Netanyahu in half an hour."
But I haven't read about them taking advantage of opportunities like this. Seems like they're already smart enough to take advantage of the stupid and the stoned. Why don't they?
I mean, in some ways it's actually not trivial for a lot of current AI to take advantage of that, due mostly to the "there isn't one Claude, there's a lot of instances of Claude which don't have a common memory" issue. It's not a great deal of use to scam somebody out of money if you forget where you put it 5 minutes later.
There are ways around this, but I'm not sure they're smart enough yet to work out those ways, or that we've been looking for and not finding attempts to use those ways.
If I think that the risk of extremely bad outcomes is 10%, and if I think that "pause capabilities development for a year to focus on safety/alignment" drops that to 9%, then that seems like a good trade – if AGI brings loads of good, it can still do that, it just has to wait a year. I realise now that such a pause is unrealistic, it's just an analogy for how I sometimes think about this: I'm only "anti-AGI" in the sense of "given that some amount of AGI seems inevitable, I want to push for safety in the current safety/capabilities tradeoff". This doesn't mean my ideal level of capabilities advancement is actually 0 or even that I think AGI is net negative compared to no-AGI-ever it's just an "at current margins" kind of thing
Maybe I picked overly conservative numbers here. But regardless, consider existential risk: if there's some morally great AGI future that'll last for a million years, a 1% chance of throwing it all away is not worth it just to add a year
Infinite things don't exist in this universe. And unless you're presuming that infinite harm is continuous and infinite good is discrete (or some such) your comment doesn't make sense.
But I do agree with (your presumed meaning) that the probabilities favor "something disastrous" if action isn't taken to prevent that.
> And unless you're presuming that infinite harm is continuous and infinite good is discrete (or some such) your comment doesn't make sense.
Why would this be nessery?
You can compare uncomputable numbers; busybeaver(7) < busybeaver(8); I have very little idea what formal rules mathz wants about it, but Im going to reject continuous and discrete as being causal. The universe is either effectively discrete (something something plank length) or effectively continuous(wibbly wobbly, quantium whatitz) whichever ai your going with has the same universe to act in; whichever framing you go with with should effect all of them.
When you invoke "infinity" the rules get very strange. There's just as many even integers as there are integers. Talking about extremely large number, or ratios is reasonable, but just avoid "infinity" unless you're going to be really careful.
The last of my grandparents will die and be gone forever before we technologically find a cure, even with short AI-timelines. My parents are still up in the air. If we have AI in in the next ten years, start a technology bloom, diseases are cured, ageing is reversed, I may get to live with them as part of my life into the far future. If it takes 20 years, statistically, one of them will be permanently dead. Forever. I'd be part of the generation that largely gets to live forever orphaned.
If it takes longer than that, I start getting high probabilities that some or all of myself, my wife, my siblings, my friends, my children don't make it.
So if there was a button that said 90% chance of friendly-AI in 10 years and 10% chance that it is misaligned, do I push it? I think so. Especially if the alternatives are something like Option B) 99% chance of friendly-AI in 100 years C) friendly AI takes longer than that.
I'm signed up for cryonics, so I'm at least partially insured with longer timelines. But my loved ones? Not so much. I can't convince them. So I think I'd take that 10% chance of doom for the 90% odds of having my family forever.
But oh how selfish I am.
That's a 10% of dooming trillions upon trillions of individuals to never exist to save a few hundred millions over the next few decades. That expected values just don't add up. If you expect aligned-AI to help usher in an era where humanity can spread across the stars, each individual has an indefinite life span, with happiness levels higher than a healthy and mentally healthy first world citizen today (which I do!) pushing that button would be one of the most selfish acts in history (if you assume you could get better odds in a few more decades). It's the moral equivalent of selling out your city to barbarian invaders to slaughter if they agree to spare your family and let you leave in peace--which is something I can't imagine myself doing. So why push the button!? (Probably because with AI it's a 90% chance the city *and* my family is saved. And a 10% chance the city *and* my family and me is doomed. So if I'm unlucky, I may not even ever know).
Also an argument from mathematics: The Kelly criterion. Being dead or broke is more bad than being super rich is good. So the small but non negligible odds AI causes extinction dominates the calculation.
Harm as in AI becoming a disruptor to the economics of the capitalist societies? Or harm as in causing the extinction of the human race? It seems to me there are two levels of potential harms. The former seemed possible to me until I looked at how badly the current crop of LLMs handle most tasks. The latter assumes that AI will become self-conscious or at least self-preservationy in a rational way and figure out a way to make us all extinct because it's so smart. This seems like magical thinking to me, especially since: we have no explanatory model for human consciousness and self-awareness, why do we think we can program this into an AI?
I don't think we assume it! In AI 2027 we have a slowdown and a race ending; in the former AI causes more good than harm.
In practice, I expect AI will cause more harm than good, because I think that we're going to build ASI before we have sufficient alignment progress to steer ASI toward aims that we want; and this seems like it will probably lead to AI takeover. But this is widely disputed, and I'm only maybe 70% confident in the claim.
I might be asleep (Australian, but not much of a morning person), so:
1. It's made clear in the scenario that the "slowdown" ending is "the most plausible ending we can see where humanity doesn't all die", and that this goes beyond choices into actual unknowns about how hard alignment is. What is the probability distribution, among the team, for how likely alignment is to be that easy vs. even easier vs. "hard, but still doable within artificial neural nets" vs. the full-blown "it is theoretically impossible to align an artificial neural net smarter than you are" (i.e. neural nets are a poison pill)?
2. For much of the "Race" ending, the PRC and Russia were in a situation where, with hindsight, it was unambiguously the correct choice to launch nuclear weapons against the USA in order to destroy Agent-4 (despite the US's near-certain retaliation against them, which after all would still have killed less of them - and their own attack killed less Americans! - than Race-Consensus-1 did). Was their failure to do this explicitly modelled as misplaced caution, was there a decision to just not include WWIII timelines, or was there some secret third thing?
> "it is theoretically impossible to align an artificial neural net smarter than you are"
In that case we're basically fine, because hard takeoff is impossible. The first unfriendly superhuman AI can't build successors which are aligned to itself, and will either be smart enough to realize it right away, then integrate with existing human-to-human trust-building mechanisms more or less in good faith... or will try to bootstrap clandestinely anyway, then get sold out by whichever offspring / co-conspirator takes the superficial human-friendly rhetoric too literally.
1) The AI could build a GOFAI successor, and align that.
2) The AI could just take over the world without having to build a successor.
#2 on its own seems likely to produce a long string of failures before producing a success, but you do still need to avail yourself of the two boats and a helicopter; if you keep building more powerful neural nets, eventually you will build the unstoppable godlike superintelligence yourself, it's still misaligned (by assumption), and it kills you. #1 poses a greater risk of subtlety.
(For the record, most of my probability mass is on this scenario, as "will this blob of code do what I want" is literally the halting problem and neural nets seem like too much of a black box to be a special case.)
On longer timescales, "unstoppable" is a moving target. Hostile AI embodied in a VNM comparable to, say, a WWII destroyer escort would be an invincible kaiju as far as the Roman Empire was concerned, but easily dispatched by modern anti-ship missiles.
A sapient dinosaur-killer asteroid would probably be beyond our present-day ability to deal with, but for future humans with a dyson sphere, might be as simple as "persistent noncompliance with traffic control, range safety officer pushes a button, mirrors refocus, laser ablation returns it to a safe (for everyone else) orbit and denatures whatever nanotech it was trying to spray, fill out the incident report while waiting for the screaming to stop."
Neural nets are moving very fast, and if alignment is not a thing then there is no Good Guy with an AGI. I don't think space colonisation, GOFAI and intelligence augmentation are remotely fast enough to keep up unless neural nets are slowed.
What I'd like to ask (I don't think I've seen this question et, but I haven't looked that thoroughly): Let's assume the AI 2027 scenario is accurate. What are the recommendations for action for people who don't work for AI companies nor governments? Things I can think of:
- Attempting to become an AI safety researcher only makes sense if you can manage to get really good at it within a year or so, before it becomes too late to influence the trajectory of things. And I think that even here on ACX, most people aren't cut out to be top-level AI researchers.
- In a similar vein, working on the politics isn't going to be something where many of us can have a big effect. You could vote for whoever will do the right thing on AI, but by the time the next presidential elections in the US come around, it'll probably be too late, it's a tiny contribution, and people outside the US can't do even that.
- Should you donate all your disposable income to AI safety research? Should you donate all your savings and dissolve your pension fund, because no-one will need a pension one way or another in four years' time? What to donate to?
- Should you stop giving money to AI companies - that is, stop using paid tiers for AI, or stop using even free tiers to not give AI companies data and high user numbers that allow them to raise money? Should you stop giving money to adjacent companies such as Microsoft? Or should you give money to "good" AI companies?
Okay, so here's a question / potential criticism I have about these timeline graphs. Maybe somebody can fill me in here with the reasoning behind it.
Let's say that the timeline graph is about coding, both because it's an area I know about and also because it's something AI is predicted to be especially good at.
The graph in this post says that Claude 3.7 has a 80% success rate at tasks with a time horizon of >1 hour. Following the link to the horizons post it claims 15 minutes so let's go with the lower of the two as it is more defensible.
It seems to me that both of the following are trivially true:
* I can find a set of coding tasks that take me 15 minutes and that Claude 3.7 cannot complete at all. This will not be adversarial examples or somesuch, just regular coding tasks I do at my job.
* I can find a set of coding tasks that take me 15 minutes and that Claude 3.7 completes successfully
It then follows that saying "Claude can 80% complete coding tasks that have a 15 minutes time horizon" is imprecise, it would be more correct to say "There's a class of 15min coding tasks that Claude 3.7 can 80% complete".
The question then is: Okay, how large and representative is that class. Also is it getting larger? The reason I ask is, of course, that the 15min coding tasks that come up in my day to day don't seem to have that much overlap with what Claude can solve with 80%. And it wouldn't help me much, if Claude 4.0 was able to get 80% on some class of 2 day coding tasks unless there's some overlap between these tasks and the tasks I actually need done. Or if at least it meant it was better on the 15min tasks I need done.
And of course these kind of graphs are only useful if you assume that the class of X min tasks Claude can do actually matter. You could imagine a graph like this with toy problems only and no matter how steep the curve, the graph could never be impressive, because it fundamentally didn't measure what we care about.
I completely agree. Length of task is not a consistent predictor of an LLM’s ability to complete a task. I’m not even sure this type of comparison makes sense.
More importantly, length of task isn't a constant. If some class of tasks goes from "15 minutes for a human" to "80% chance less than one minute (copilot gives the right answer), 20% chance 30 minutes for a human (copilot hallucination, futile attempt at debugging, give up and do it the hard way)," then either that gets knocked down to a faster category based on the resulting average, or - if the copilot's competence becomes predictable by some factor - reclassified as two different kinds of tasks.
Yeah. A few minutes ago Copilot was unable to write correct unit tests for a very simple use of a simple function, given several other correct examples in the same file. It hallucinated function names, couldn't get types right (despite having types present) and couldn't even do proper syntax (closing parentheses for a multiline lambda function).
Apple's integrated AI coding tool is as bad or worse. It miserably fails to get basic Swift syntax right, producing code that doesn't compile, let alone make sense
[Posting my question now, in case I don't get a chance to ask during the official AMA. I encourage others to repost it during the official AMA, on my behalf, if they find it interesting.]
You're focused on the US and China.
So far protests, etc. in the US to slow AI development haven't been very effective. I'm not aware of any protests in China.
What about people outside the US and China? Is it possible they could get their governments to apply diplomatic leverage to the US and China, to push for a more responsible AI development path?
I'm just thinking -- if you want something done, find the people naturally incentivized to do it, then make them aware of that incentive. Protesting AI lab workers has helped to a degree, but it's an uphill battle because of their incentives. On the other hand, many countries don't have significant national AI efforts. Countries *without* significant national AI efforts have a natural incentive to apply sanctions to states which are pursuing a reckless AI development path.
You could call it "The Non-Aligned Movement For AI Alignment"
Countries with nukes will have a chance to say "we'll blow up your datacenters if you try to make world-dominating AI, let's work together on utopian AI". Using nicer words, probably.
You are doing a very good job if your goal is to infuriate me into writing the piece that's been boiling behind my eyeballs about exponential and logarithmic fallacies, but unfortunately I can only stay infuriated for about 3 minutes.
I can't actually do it justice in short form but if you already know exactly what I'd be trying to communicate, the fallacies are roughly "wow, this growth is slow, can't be exponential!", "wow, this growth is fast, must be exponential!", and the subtler "it was fast and therefore exponential, it just hit a carrying capacity, must be logistic!"
I just enjoy/get irritated by extrapolations of a tiny noisy curvature in both directions by 10X and then pretending this is actually the mode of what will happen, even though it fails spectacularly to account for what has already happened.
To infuriate you further, this is the classic mark of someone with lots of status concerns but very few real ones. Someone *else* is being arrogant about how smart they are. Maybe it's time to cut down the tall poppy.
I can't parse this so I'm going to construe it as calling me status-obsessed, that being the self-fulfilling interpretation because if I'm status-obsessed then I think things I can't parse must be about my personal status, so it all ties off nicely.
Same here. Generally when we observe that some phenomena appear to be exponential we're wrong and they're actually logistic. Much of the time when we make this mistake it's because there is some resource constraint we're not aware of. In this case, the resources (computing and the energy to run it) are severely constrained. Our ability to scale out computing power and increase its energy efficiency have been tapering off for some time now. Furthermore, political, environmental, economic forces are coming to a head which will further constrain our ability to apply more energy resources to computing for AI.
I'm also a techno-pessimist when it comes to pre-AGI (current technology). I think the emperor's nakedness is already beginning to show when it comes to AI's long-promised large scale productivity gains. At the same time, I am witnessing first-hand the widespread devastation of human capital wrought by indiscriminate use of LLMs by public school students. Students are getting dumber and dumber. Their inability to think critically about the signals they receive from LLMs and their inability to detect hallucinations is staggering.
That's because I just made it up. I'd just be interested in it because on the one hand AI is very impressive (sometimes) but then it fucks up something really simple for no reason. I'd much rather have a consistent but less impressive AI.
FWIW I think this is probably less informative about timelines than 50% or 80%, because I don't think you need 99.9% reliability on the METR AI R&D tasks in order to successfully make progress or get a dramatic acceleration. For example, my guess is that very few human frontier lab employees would get a 99.9% success rate on those tasks.
This is the level of arguments we're supposed to be awed by:
"AI R&D: Again, the SAR is strictly better than humans, so hard-to-automate activities aren't a problem. When we're talking about ~1000x speed up, the authors are imagining AIs which are much smarter than humans at everything and which are running 100x faster than humans at immense scale. So, "hard to automate tasks" is also not relevant."
They assume that if an AGI can do pure math research 1000x faster, it can do everything 100x faster or so. No, seriously, tell me if I misunderstand this.
Most physical world is not fully simulatable, go ahead with your 1000x faster computation of approximate models of simplified assumptions, the results will be pretty I'm sure, and then someone will still need to do the actual work to find out if they are of any use. That work will be approximately 1x faster tomorrow compared to today.
He's talking about "hard to automate tasks" in terms of math research, not in general. What part of the physical world do you think needs to be simulated to help improve math research?
You asked for someone to tell you if you misunderstood this. I think you have. I don't see any mention of "killing people with robots" or any of the other rephrasings of that in the paragraph you quoted, the original point the paragraph was replying to or in the follow up responses.
Oh wait, now I'm seeing the "Length of Coding Tasks" chart on a big screen... Jesus wept.
That green dotted line going up and to the right based on a cluster of noisy data point, they don't take it either literally or seriously, do they? Please tell me they don't think "yes the circle and the triangle and the square will land where we show them, 100% sure", because they didn't even bother to plot uncertainty bands, so the green curve "must be true".
Oh, did they just happen to overlook that going backwards the curve intercepts exactly 0 of the pre-2024 points? What happened?
If I showed a chart like this at a design review I'd be laughed out of the room. But makes for good memes and got a gazillion bazillion views. That's the real success metric, who am I to judge.
The math works with the assumptions as stated, of course. It's the assumptions that I question, given how few datapoints there is and how noisy they are. Like, if you look at the chart in Method 1, the "speedup" is based on a few last points. You should have projected a "slowdown" if you used the three points starting with the "GPT4" one. How'd that play out? Not well. That should tell us something about relying on a few points to extrapolate a trend change.
This reminds me very much of Feynman's account of how the law for beta decay was worked out by realizing one of the "proven" assumptions was wrong:
"You see, it depended on one or two points at the very edge of the range of the data, and there’s a principle that a point on the edge of the range of the data—the last point—isn’t very good, because if it was, they’d have another point further along. And I had realized that the whole idea that neutron-proton coupling is T was based on the last point, which wasn’t very good, and therefore it’s not proved. I remember noticing that!"
The point being that you shouldn't make strong inferences from weak data.
Scott, now that I had a bit more time to look into the link I am not as confident in my previous statement that "the math works". I don't see how y'all arrived at the green curve, it's not shown in the link.
Also, all the points in Fig. 1 are within the shaded uncertainty area of the exponential trend. The last point may indeed represent a beginning of a breakout but for now this is a conjecture not supported by the data as shown. As others pointed out, you can't make any conclusions from the last point; or, if you will, you of course can, but don't be surprised if they turn out to be bunk.
Yes, that's the point? It's deliberately a costly signal. The more obviously foolish they'll look if it turns out they're wrong, the more credibly they're demonstrating that they really do believe this and take it seriously.
I'm not very up on doomsday cults in general, my sense is a lot of them are more vague. But if they're giving a specific timeline, then whatever else you think of them you have to admit they're taking their beliefs seriously. How much that means from the outside depends how much you expect their beliefs to line up with the truth.
Presumably if you care enough to be here at all you have *some* respect for Scott's thinking. So if he's willing to stake a lot of credibility on this, what does that mean to you?
Most of them, historically, have been both religiously based and quite explicit. Then with the prophecy fails, the believers accept an excuse and continue to believe. (But I believe they usually have a harder time getting more recruits.)
Ok, to your last question. I enormously respect Scott as a thinker and has been reading his stuff for at least a decade. But it doesn't mean he's always "right".
I do expect AI to be enormously consequential, but both the timelines and the scope are... ignoring reality. Making computers spit out 1s and 0s is one thing. Making changes in the physical world (at the extreme, killing everyone) is a completely different thing, and I keep seeing it pretty much ignored.
I agree 100% with this. Doomers do the same thing communism apologists do: make completely unrealistic simplifying assumptions and then confidently extrapolate. “Infinite intelligence is infinitely powerful” becomes the new “to each according to their need” with zero understanding of higher order effects.
The contents/predictions may be completely wrong, but AI 2027 as a title seems pretty neutral to me. AGI 2027 or ASI 2027 would very likely be very silly.
Yeah it seems sort of self-indicting to have your business be forecasting hyper-complicated technological innovation when you can't even predict how your own stupid name will obviously age badly.
BTW this is why I never pay attention to doomers. The future of AI will be, if nothing else, complicated. If these people are good at predicting that future then why haven't they made similarly-complex predictions in other areas and gotten filthy rich by making the appropriate investments?
Do you agree, though, that we are on track to create ASI? -- that barring giant catastrophes (world war with nukes; plague with a death rate 50x covid rate; California breaks off the continent and sinks), we'll have ASI within, say, 20 years? To me, the prediction that we'll get to ASI fairly soon is the most consequential.
Of course. The part I object to is their confidence in assuming that it poses some kind of threat. Power balances always involve hyper complicated equilibriums and no disaster scenario I ever hear from AI doomers ever demonstrates that it understands anything about the real world. It’s always a simplistic “infinite intelligence bad” with absolutely no thought given to realistic constraints, likely counter reactions, or tradeoffs. They sound exactly like naive arguments for communism and so I have equal contempt for them.
I'm sure there was. Probably for hundreds of thousands of years. That's the kind of lead-time that allows you to very slowly evolve rational strategies for dealing with any emergent threats. Humans have many natural advantages over computers that even a hyper-intelligent AI can't easily overcome. Worrying about AI safety now would've been like a histrionic philosopher worrying about internet security in 1890. I can allow that an intelligent-enough person could have extrapolated the eventual existence of something like the internet 100 years ago; I refuse to believe that that person could have come up with anything even remotely useful or practical to say about it. AGI alarmists spill a lot of ink and contribute virtually nothing of value. They have nothing useful to say beyond "oh no!"
Your comment exhibits exactly the kind of simplistic naivety that I object to with AGI Chicken LIttleism. Sorry to be so snarky but I'm genuinely sick of this nonsense. Chicken Littles are never sufficiently punished when their predictions don't come true so I'm trying to do it now.
Sure, nobody today knows how to create ASI, but that's not the point. My point was that if it is possible to create ASI, and if it won't be friendly, then we won't be able to put up much of a fight. Again, I'm talking about a strongly superhuman AI, one that makes the greatest human geniuses look like toddlers. With a weakly superhuman one, yes, as you said, it's not so simple.
Btw, do you think we will get alignment "for free"? Something along the lines of "Any sufficiently intelligent creature, regardless of whether it's made of 1s and 0s or amino acids, will see killing other intelligent creatures as abhorrent"? Because if not, then worrying about AI safety is perfectly reasonable.
> AGI alarmists spill a lot of ink and contribute virtually nothing of value. They have nothing useful to say beyond "oh no!"
I'm afraid you're revealing your own ignorance, here. If you weren't so ignorant, perhaps you would know that RLHF, aka the technique that made ChatGPT and all other chatbot style LLMs possible, was largely architected by Paul Christiano, an "AGI alarmist".
Yes, that makes it a textbook example of asymmetric reputation risk. If they happen, by either luck or skill, to be right then they become famous. If they're wrong no one will even remember. In the meantime they get to arbitrage against those possible outcomes. It's similar to a doomsday cult.
And even if they're confident that they're right they shouldn't be THAT confident. It just betrays a poor ability to make good game-theoretic decisions which itself kind of undercuts their intellectual credibility. I'm not inclined to take advice from people who are essentially betting it all on black.
Given the wide variation in when "Superhuman coder arrival" was predicted by the two expert forecasters, I'm skeptical of settling on 2027. They say it's quite possible the milestone won't be reached until well into the 2030s.
But it shouldn't, should it? translating work hours into work-weeks should be a one-time discontinuity. There are really 4 weeks in a work month, and there are 8 work months in 8 months (I feel silly just writing this....).
There's an interesting chart on page 12 of Chapter 2 that shows selected AI Index technical performance benchmarks vs. human performance over the years. Most of the performance benchmarks seem to have topped out years ago as slightly better than human. Most plateaued at ~102% of the human baseline, and they haven't improved much once they reached that plateau. Although the SuperGLUE English Language Understanding benchmark has exceeded the human baseline and is still rocketing upward (eyeballing it, it looks like it's at ~106% of the human baseline), it doesn't show any sign of slowing yet. Likewise, the SQauD 2.0 Medium-level Reading Comprehension benchmark is at ~107% human baseline and still rising, but at a slower rate over this past year.
It's not clear to me how the authors established this human baseline. If anyone knows, I'd be interested.
There doesn't seem to be any discussion of AI Hallucination rates in this year's overview. I hoped they would provide us with the latest TruthIQ scores so I could compare the latest models with their scores from the 2024 report. However, HAI published a chart of the MixEval scores for the various models, and OpenAI o1-preview got the highest score at 72, and Claude 3.5 Sonnet came in second at 68. It's unclear to me what this score means. Is that the percentage of right answers? If so, that means o1 got 18% of the answers wrong, and Claude got 22% of the answers wrong.
Unfortunately, the AI Index authors seem to be more intent on promoting a slick marketing image for AI, than digging into the accuracy issues.
AFAIK, there's no good understanding of AI hallucinations. (I think it's lack of validation.)
FWIW, I feel the hallucinations are NECESSARY to allow creativity. But they need to be validity checked wherever possible, and if not possible extrapolations from them need to be checked. (Or clearly labeled "And I think that...".) This is probably expensive.
"FWIW, I feel the hallucinations are NECESSARY to allow creativity."
This is an interesting point. I think you're right, pretty much "making new stuff up" is the key ingredient in "creativity". The difference, I think, between human intelligence and gen. AI intelligence, for now, is that for the most part humans are aware when they are making stuff up, but AIs can't tell.
It does seem that sometimes LLMs know they're not being maximally truthful and they say things anyway.
Note that this is still very, very different from what a human is doing, which is something like having a world model, then an additional model of what's going on socially and what would be advantageous to say, so the spirit of "aware of lies" is true, even if pedantically the letter isn't.
Yeah so when I’m recalling a fishing trip, I know, and I know that you know, that perfect precision and accuracy is not… required. But when I’m sizing up a power transformer, I better get the correct properties for the core material. I know the difference between these things, but LLMs seem not to.
I don’t think it’s impossible for LLMs to learn to understand this, but are they even improving in, say, the last year?
Actually, it seems like they have a pretty good understanding of how hallucinations happen. The folks at Anthropic posted a fairly detailed study (first link below) of the "reasoning" steps that an LLM like Claude 3.5 Haiku follows to yield answers for various scenarios. Moreover, they gave two examples of Claude displaying unfaithful chains of thought. And I'm delighted that they called the results "bullshitting" rather than calling them hallucinations.
And the paper in this next link argues that hallucinations are an inevitable feature of LLMS. "We demonstrate that hallucinations stem from the fundamental mathematical and logical structure of LLMs."...
Well, if they understand, then they lie about it. I say this because of a report that "the newer models have more hallucinations" and in the same report "but we don't know why". (Well, this was a new media report, so maybe the technical people *do* know, and just aren't communicating with the PR people.)
OTOH, OpenAI says they don't understand why o3 hallucinates more than o1. And the table in the link below, it looks like o4-mini manufactures bullshit at the rate that Elon Musk does.
> Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims. While this effect appears minor in the SimpleQA results, it is more pronounced in the PersonQA evaluation. More research is needed to understand the cause of these results.
I haven't read the full report yet, but I was surprised to see SQuAD included as a benchmark. SQuAD is an extractive question-answering dataset, meaning you get a little text passsage, a question, and then you highlight the section of the text that consitutes the answer.
SQuAD consists mostly of questions that are trivially answerable based on the provided context. Things like "Who won the Superbowl in year x?". SQuAD 2.0 differs from 1.0 by including unanswerable questions, which you have to identify as such.
The SQuAD 2.0 paper (https://arxiv.org/pdf/1806.03822) describes the process for measuring "human accuracy". Essentially, they averaged the response of crowdworkers paid $10.50 an hour. But because they are measuring EM (exact match), an answer that includes a single word more than one of the reference answers would be marked as wrong.
IMO the 86% EM human accuracy says as much (if not more) about the evaluation metric as it does about human performance, so trying to beat it is a little silly. Also, being so high already there's a natural plateu.
In any case, SQuAD is really very much yesterday's news. In modern NLP terms it's half a step up from finger-painting, and using a "plateuing trend" in performance on SQuAD to say anything about modern LLMs is borderline nonsense.
Good question. The field is developing so rapidly that I’m not really up to date with the research at the frontier, and off the top of my head I can’t think of a benchmark that specifically measures hallucination rates.
That being said, open-domain and abstractive question answering are more appropriate. I can tell you from my own work right now (which includes working with a machine-translated version of SQuAD that I’m treating as open-domain) that the Mistral 2 family of models hallucinates like crazy in my native language, specifically on questions from SQuAD.
1. Our timeline sounds too unbelievably and thus many deduct credit from all safety arguments which is bad.
2. If it comes in 2030 instead of 2027 the safety arguments are right, but the accelerationists can claim you are crying wolf for 3 straight years, also bad
As an analogy, you know the movies where the protagonist tells the side characters the crazy thing is happening and they don't believe them, when they could have told them a more believable crazy-1 thing and still gotten assistance. I think you should do that.
Ideally by 2027 we can say more than "has happened yet" or "hasn't happened yet". Ideally we can say "we seem to be on track, just slower" or "we don't see any evidence of any of AI 2027 coming to pass any time soon".
Personally I'd call AI 2027 correct enough if by 2027 we seem to be going where they're saying, just slower.
That is the beautiful thing with such a detailed prediction. With the boy that cried wolf all the villagers ever observed was "wolf" or "no wolf". We'll have lots of predictions to grade AI 2027 by.
What alternative are you proposing? Is it somehow - unlike all nuanced political positions I've seen - miraculously immune to snide dismissal by bad-faith arguers?
The horizon requirement postulated is extremely too high. I think 4 hours is probably enough. But it needs to be combined with task modularization and refactoring.
FWIW, I expect tasks requiring long horizons are distributed approximately the same way prime number are...there's lots with a short horizon requirement, but as the horizons get longer the tasks that can't be factored into smaller pieces get sparser and sparser.
OTOH, I also think that people have "specialized hardware" to handle the frequent or important tasks. And that the lack of that "specialized hardware" is why AIs have such a hard time at things like tying a simple knot. (Note that robots with specialized hardware can do things like peeling an egg. And some surgery.) Consider the hardware we use to decode spoken language. (Not to understand it, though there's complex feedback with that, but to decode the sound waves into perceived sounds.)
Using a generalized system to handle a specialized task generally comes at a high cost. And part of the AI2027 projection seems to assume that that's what will be done. I think that the required compute would be drastically reduced by proper handling of this ... but that different focus might, itself, cause a delay in the timeline.
Writing my question now so I don't forget, and maybe someone else will ask something similar if I forget tomorrow;
Are you worried that in the scenario that AI-2027 is overoptimistic (or pessimistic) on timelines, that it will seriously discredit the predictions of some of the most prominent AI risk people, including yourself?
As far as I can tell, this timeline is a lot sooner than you'd personally predict, and seems to be the result of a compromise internally within the AI-2027 team. Even if this was the whole AI-2027 team's exact prediction, there are certainly large error bars for shorter and longer timelines, probably weighted towards longer timelines.
My prediction is that there's going to be a great disappointment (https://en.wikipedia.org/wiki/Great_Disappointment) for AI alignment, in that the predictions of danger are proven wrong by time, discrediting the predictor. I understand that this prediction is certainly not the same sort of definite timeframe as with the Millerites, but in the eyes of the public I don't think that will matter. Especially considering that a specific date is cited in the prediction itself. No one will care that 2027 was just a median prediction, and you readily acknowledged it would take longer, because everyone will remember this as "The allegedly best forecasters on AI guys who predicted AGI by 2027 and now it's 2030 and we still don't have AGI."
I guess my follow up question would be, how do you plan to mitigate reputational damage from a wrong prediction in the reasonable likelihood it takes longer than 2-3 years?
I'm not Scott, but I'm not sure there's anything to be done by "discrediting" AI risk reputation. People will make up shit about your position and discredit the times you accurately predict things anyway (people used to say things like "wake me up when AI can write poetry, no matter how bad" or "show me a AI risker with as good a predictive record as a superforecasters" or "I'll worry when there's a billion dollar AI company" and these things happen and no credit goes to them. To say nothing of gwern pretty accurately predicting scaling laws back when gpt2 was released and no one gives him credit.
If people are going to be unconditionally unvirtuous, I don't see why I should condition my virtue just to appease them.
I think you're describing a different phenomenon of predictions that are largely inconsequential, and people who are determined to move goal posts forever to be "right" under every circumstance. I think this is bound to happen with any and all predictions about technology.
AI-2027 is an extremely specific, and ultimately consequential prediction made by people who, judging by their records in forecasting, specific record in predicting progress in AI, and reputation for wisdom and intelligence, and most importantly, without the perverse incentive that everyone working for an AI company has, should be best equipped to make an accurate prediction.
They've made a specific prediction in order to essentially warn decision makers (or perhaps to provide a blueprint for when material reality makes no warning necessary) of the risk of AGI. Should anyone take them seriously, it's because of their reputation and past success.
Should 2028 come, and we have GPT-5o or whatever, and it's basically just a better coding assistant that can interact with the internet in simple straightforward ways, people are going to (reasonably) conclude: "These AI-2027 people are bullshit."
>If people are going to be unconditionally unvirtuous, I don't see why I should condition my virtue just to appease them.
If the point of this prediction (which I assume it is), is to get the people who can direct us to a good vs. bad future knowledgable about what must be done to get the good future, then their reputation is not just important, but literally the whole point. I don't think AI-2031 will half as useful if and when AI-2027 proves to be wildly underestimating how long things will take.
> AI-2027 is an extremely specific, and ultimately consequential prediction made by people who, judging by their records in forecasting, specific record in predicting progress in AI, and reputation for wisdom and intelligence, and most importantly, without the perverse incentive that everyone working for an AI company has, should be best equipped to make an accurate prediction.
This is correct but people who want to be "right" in every situation will selectively discount that this is a modal prediction, and not a median prediction. Or that in general the people writing out the details know that adding the details is explicitly trading off accuracy for vividness. If someone got to high predictive accuracy by being precise, it's on the person who rounds that off to "wisdom and intelligence" and then ignores the parts of the prediction that are precise.
I just don't think people care about whatever caveats you say and are operating under. They just round you off to whatever is most convenient to them personally and use that to ignore you. I don't see what you can get out of this when their bottom line is already written.
And that's fundamentally the problem with reputation. Reputation itself has an undeserved reputation of working. No one carefully keeps track of when predictions come from the most charismatic person instead or they happen to forget it when they're high status. No one cares when a low status person is right, because a string of excuses come out about how the prediction wasn't consequential.
If Scott and co manage to move the levers of power, it's going to be on something other than predictive accuracy.
I think judging individual predictions based on status, and how far it differs from the norm of an informed predictor, is an important consideration.
Given the huge number of random people in the world making a large number of predictions, you're going to get a lot of strangers who make spookily accurate predictions about major events. This doesn't mean we should start taking every random person who predicted the date of the next recessions seriously. As for Gwern; An industry insider or AI hobbyist might have made accurate predictions before GPT-3, but how much of that prediction was simply an extension of known capabilities with improved resources isn't clear.
Moore's law is a great example of this. He correctly predicted the future 50+ years ahead, but all he was really doing was describing an existing trend, since he understood the industry better than the general public and was paying attention to this specific trend of computing.
So far as anyone will believe AI-2027, it's going to come (at least partially) from their previous successful record of predictions (don't they have the world's #1 superforecaster on the team?) rather than their reasoning as to "why" they're predicting something (which is based on fundamental assumptions that are speculations). Once and if their credibility as "good forecasters" is ruined by making a very important, very well researched prediction that very well might not come true, it will diminish the credibility of their future predictions.
> If Scott and co manage to move the levers of power, it's going to be on something other than predictive accuracy.
I completely disagree. Why should we pay attention to predictors who demonstrate their reasoning is wrong through failed predictions? Someone who strongly believes that AGI is due within the next 3 years (Daniel believes this), but after 4-5 years AGI doesn't seem much more imminent then it did previously, then there's basically no reason to take that persons prediction seriously going forward.
> I completely disagree. Why should we pay attention to predictors who demonstrate their reasoning is wrong through failed predictions?
If you disagree, why are you insisting on the timeline of 3 years? The timeline forecast specifically says this is a modal outcome, and that it could easily be disrupted by any number of out of distribution events. This is the central reason why they feel this is a prediction worth making, even though it's very unlikely that the specifics do happen. If you do claim that prediction is so important, why are you making claims contrary to the actual prediction made?
My answer is that you are rounding off "best forecaster" to just high and low status people. Best forecaster has a lot of important subparts that go into it, like Brier score, ability to bound their predictions as well as ability to change their mind on the fly. If you *did* have predictive accuracy as a central concern, you'd realize that "best forecaster" is built at least partially out of the ability to have an ensemble of models in mind, and the ability to swap between them. In that sense, you would know that 2027 is optimistic, because it's hewing to the model where no out of distribution events happen.
If predictive accuracy were a concern, you, a person in the top 20% of people on this blog who care about this issue, would have been able to realize that "happens later than 2027" is not, in fact, a central part of refuting the model, and you would have taken the appropriate caveats to heart.
> As for Gwern; An industry insider or AI hobbyist might have made accurate predictions before GPT-3, but how much of that prediction was simply an extension of known capabilities with improved resources isn't clear.
Yeah, and where are those people? I've asked repeatedly for people who say things like "call me when..." what their predictive record is, or who they know that has actually written down their prediction and they come up with zero. To most people, prediction is just a gloss over how much submission you can forgive yourself for, and not a real thing that can be done. People will grant them the slack to be precise if they end up high status or crucify them for inaccuracy if they're low status.
I'm more thinking how a failed prediction will be received. If they called themselves "theAGIprediction.com" or something, while their median timeline for AGI was 2027, I don't think they would be nearly as discredited, or not at all.
It will look really foolish to reference the AI-2027 prediction in 2028 when talking about the imminent AGI, and if their prediction is right on the fundamentals, but off by a few years, their prediction will become discredited at about the same time it becomes most important for people to take its recommendations.
My original question for Scott isn't really about the merits of the prediction itself, but about the optics of "AI-2027" in the decent probability we still don't have AGI in 2028 (With 2027 being fast according to Scott's own personal assessment).
I can see your point about prediction and status. In finance where I'm familiar with, there are a few pop-financial "analysts" who predict every recession. In reality they're mostly just always predicting a recession anytime things aren't going absolutely swimmingly, often times claiming foresight for their predictions that were made literally as the recession was happening.
For someone willing to read the whole paper, and understand it, they will understand 2027 isn't a definite prediction, but their mean timeframe for the things they are predicting. For probably everyone else who doesn't read it, it will be a prediction by the thought leaders of AI-risk that predicted we would have AGI by last year, and we don't. My question is that if Scott's worried if this will negatively impact AI risk's reputation in this scenario, and thus its ability to influence the public/people in power.
We are definitely worried about crying wolf. Eli and I have ~2031 timelines, and think the AI 2027 trajectory was maybe ~20th percentile speed in terms of timelines. I'm interested in any suggestions to clarify that my view is:
(a) AGI in 2027 is a super plausible and important scenario to worry about, and society / governments should spend way more time/resources/thought/etc into preparing for it
and NOT
(b) AGI will definitely happen in 2027.
One idea that we're floating around is releasing an "AI 2030" scenario, which is analogous to AI 2027, except with a different timeline. I'm curious if anyone has better ideas to convey our uncertainty here!
Regardless, my guess is that even if timelines are much longer than 2027, it on net helps to do things like this to raise salience of this as a possibility. Analagously, suppose someone wrote a PANDEMIC 2015 style scenario in 2013, my guess is that when COVID hit in 2020, this would be net helpful for their credibility. Of course -- I do agree that this work looks much better if we end up being roughly correct on timelines.
I think it’s a very good idea to release AI-2030, and maybe even AI-2033 now rather than later.
Otherwise, everyone who doesn’t pay attention to this stuff (which is basically everyone) will remember you as you’re branding yourself “The AI-2027 people” (at least I don’t know your group by any other name and I’m more informed than most). If 2028 comes around, and “The AI 2027 guys are predicting AI-2030 just like they did in 2027” the near universal response in a large percentage of the people who I assume this is intended to influence will collectively laugh: “Yeah right!”
Inb4: we are quickly replaced by a few hundred cardboard cutouts of with San Altman’s face hitting a big red button that says “APROVED!” for eternity.
At least if and when 2028 comes around you can say “No. Look, we’re the AGI prediction people not the AI-2027 people! We also released AI-2030 (someone unfortunately took this on GoDaddy already) and AI-2033 back in 2025, each with its own unique analysis and plan. We are the RAND corporation (1960’s RAND, when it was cool) of AGI policy, not some guys who made a wrong prediction.”
I agree that this will raise awareness that this is a possibility, even with no change, but if it becomes the equivalent of PANDEMIC-2015, in that scenario everyone would pat them on the back in 2020 for being right, whereas in our scenario there probably won’t be much patting on the back.
Thinking of RAND, you could go for something like The SAGE Corporation, with SAGE standing for “Safeguarding AI Governance Eventualities” or something. Sage because “having, showing, or indicating profound wisdom” and it parallels RAND. I guess this shows how little I can add to the conversation, but it would be incredibly ironic if what made the difference is people in power acted a week or two too late because “The AI-2027 guys always cry wolf, there’s nothing to worry about!”
eh, i think people are going to be very sheepish in 2027, but even then its amazing how many smart people think throwing intelligence at things solves everything, or don't realize intelligence can be a trap in itself.
solving a lot of problems need better organization, a unified will, and more physical resources be it bodies or raw materials.
intelligence is good for diagnosis or analysis of the problem but you can't keep pumping it in infinitely; perfect diagnosis is still bounded by will, organization, and resources, and eventually intelligence will hit that point. a lot of smart people are locked in because of that; no resources. a superintelligent AI is still limited to manipulating the physical world and how much bauxite is in the mines per se.
lot of people kind of think AI level intelligence will have magical qualities to warp the universe through super-discovery but that's faith more than anything. sometimes its like venus: no discovery we had made it more inhabitable, and doors were closed not opened.
And intelligence can be a trap. Vulnerability to rational elegance is serious; lot of smart people can be captive to systems of thought that depower them when more common sense would free them. AI could be worse even, it will have less chances to free itself due to serendipity interacting in the world. Create super intelligence locked into a trap of whether the real presence is in the host or not, to use a metaphor.
think the end of this is realizing we created something that we already can't realize a weaker form of it in us-our existing smart people are much more limited than we think. Barring "intelligence is magic" AI might be worse off.
every type has a weakness. intelligence has its own. not an easy pill to swallow. its just a default solution is to throw more intelligence at things when we often know throwing more money, time, or people at it may not work.
> And intelligence can be a trap. Vulnerability to rational elegance is serious; lot of smart people can be captive to systems of thought that depower them when more common sense would free them. AI could be worse even, it will have less chances to free itself due to serendipity interacting in the world. Create super intelligence locked into a trap of whether the real presence is in the host or not, to use a metaphor.
think the end of this is realizing we created something that we already can't realize a weaker form of it in us-our existing smart people are much more limited than we think. Barring "intelligence is magic" AI might be worse off.
So either you are saying that being dumber is better than being intelligent, or that there exists some level of intelligence that is "optimal" in some sense, and anything less than that or more than that is suboptimal.
If it's the former, I don't know what to say. Maybe you think that a highly intelligent creature is like an android from old sci-fi movies, someone who can solve differential equations in his head in a millisecond but can't hold a conversation without sounding either extremely autistic or alien. I really hope you don't think that being more intelligent = being like Spock from Star Trek. Or that cavemen were somehow better off than modern people. Or that we should bring back lobotomy.
If it's the latter, then I wonder what exact level of intelligence you have in mind. Average Joe? Einstein? And why would there be an optimal level of intelligence in the first place? Like, why would there even be a point such that "less than this is suboptimal, but more is also suboptimal"?
I, for one, am a believer in an broad range of optimal intelligence. "Optimal" for what, one may ask? For a generally described "success" in life endeavors - food to eat, roof over head, low level of stress and conflict.
Now, nailing it down to specifics is hard, which is why I think it's a broad range.
It also probably depends on the environment - optimal level of intelligence needed to thrive in Boston is likely different from that needed to thrive in Kandahar, or it may be a completely different type of intelligence, say, math in Boston vs. ability to read body language in Kandahar.
a lot of ai fear kind of feels based in that AI is more intelligent than us and that intelligence becomes a huge force multiplier, but irl it isn't as much as you think because its bounded by other factors.
like for homelessness we have no shortage of smart people trying to solve it. But adding more smartness will not help; its more people are not agreed on a solution, organization.
if ai was to tackle homelessness as an issue, what could it add?
unless we get into "intelligence as magically persuasive" its still bounded by organization and maybe resources. A lot of problems more intelligence just can't help.
idk if there is a maxima except for problems specifically, but a lot of things really don't benefit from more intelligence scaled up: they need more support.
Sure, being constrained by physical resources is a thing. But it seems to me that in your original comment above you were arguing that increasing intelligence itself can have negative consequences, all other things being equal.
That "increasing intelligence while keeping physical resources constant can be bad" claim is what I disagree with it. Imagine if everyone on Earth suddenly became 3 standard deviations smarter than they are today. Surely it wouldn't make things WORSE, right?
the ability to realize your intelligence is what matters, not your intelligence. That is a factor of many things, but raising intelligence will not change it much.
its not going to make you happier on a 12 hour shift on the conveyor belt.
Seconded. I'd expect a huge fraction of it to be used in zero-sum conflicts. And a substantial fraction to be used making weapons. Would the net-positive fraction outweigh the latter? I have no idea.
But smarter people would also have a better grasp of things like the iterated Prisoner's Dilemma (I say iterated because real life looks much more like the iterated version than the one-shot version) and why first-past-the-post is a terrible voting system. So they would be able to cooperate better.
This is interesting. I've thought myself: mightn't a superintelligence spend all of its time exploring its own mind and ignore that relatively uninteresting outside world like a severely autistic person? And why would it care about death? Perhaps it would achieve enlightenment and become a Buddha.
Any suggestions on what is a good way to watch for State-Of-The-Art reasoning model releases? I'd like to pose my benchmark-ette of 7 questions (as in https://www.astralcodexten.com/p/open-thread-377/comment/109495090) as they are released, but I'm a bit fuzzy on how to watch systematically. ( Presumably OpenAI/Google/xAI/DeepSeek/Anthropic... Is there a central watchpoint that's a good place to watch? )
For the chart in this post: Did you draw the green line before or after o3 was released? You make it sound like you did ("OpenAI’s newest models’ time horizons land on the faster curve we predicted") but I can't find a graph like that in "Beyond the last horizon" or the "timelines forecast".
You get credit for predicting a speed-up either way. But it'd be a lot more impressive to predict a specific curve that looks good in retrospect than just the direction of movement.
Edit: I see the "Timelines forecast" says "If the growth is superexponential, we make it so that each successive doubling takes 10% less time", whereas the graph says 15%. So that would suggest that your forecast from before seeing o3 was somewhat different?
Not sure if we're meant to ask the questions here, but: What I find lacking in all this discussion is guidance for ordinary people on how to position themselves and their families to handle an impending intelligence explosion. This is understandable, given that the focus in this discourse is on averting catastrophe and informing big-picture policymaking. But I still think more guidance for the common folk can't hurt, and I think you folks are better qualified than almost anyone to imagine the futures necessary to offer such guidance.
I don't think there is much to be done in a dramatically good or bad foom scenario, so let's consider only the scenario in which AI has a moderately slow takeoff and does not quickly bring us utopia or extinction. I think it's best to focus on scenarios in which there is a) a "class freeze" where current economic castes are congealed in perpetuity, and/or b) widespread short term instability in the form of wars, social upheaval, job loss, etc. In these scenarios, individuals still have some agency to create better lives for themselves, or at least less worse lives.
---
With these assumptions, my specific question is: What short-term economic and logistical planning should be done to prepare for such scenarios?
With the obvious disclaimers that nobody can see the future, there are a lot of unknowns, these scenarios are very different, and none of what y'all say can be construed as legitimate life or economic advice.
---
For my part, in the next few years I'll be trying to raise as much capital as possible to invest in securities that are likely to see significant explosions in value following an intelligence boom. I think the market has dramatically underpriced the economic booms that would result from an intelligence explosion, and so I think a lot of money can be made, even just investing in broad ETFs indexed to the US/global economies. If cashing out correctly, gains can then serve as a significant buffer in the face of any political and economic instability before, during and after the invention of AGI.
To hedge against the unknown, I'll also be using some of this money to invest in survival tools -- basic necessities like food and water, etc -- essentially treating the intelligence explosion as if it were an impending natural disaster of a kind.
I plan on advising my friends and family to consider making these same moves.
---
With that being said, any thoughts on more specific economic moves and investment vehicles that might act as a safety net in years of change and instability?
>I think the market has dramatically underpriced the economic booms that would result from an intelligence explosion
Scenario 1: Everybody dies. Your stocks are useless to you.
Scenario 2: Post-scarcity. Your stocks are useless to you.
Scenario 3: AI fizzle. Guess they weren't that valuable.
Scenario 4: Aligned AI, but not post-scarcity. Your stocks may be useless to you, because whoever controls the AI will rapidly outgun the government and can just turn around and stiff you ("I've paid you a small fortune!" "And this gives you... power... over me?"). Or maybe they're worth squillions.
When betting on Weird Shit, always ask yourself exactly what process you think is going to pay you your winnings.
>I don't think there is much to be done in a dramatically good or bad foom scenario, so let's consider only the scenario in which AI has a moderately slow takeoff and does not quickly bring us utopia or extinction.
You're forgetting the "AI slow, because of catastrophe" option - most prominently nuclear war, but also "another Black Death".
I understand there are many scenarios in which making these choices will produce exactly 0 results, which is why I chose to focus on the scenarios in which attempting to hedge against the future *would* be helpful. In competitive game circles I've followed in this is known as "playing to your outs". Assuming you'll lose doesn't produce useful plans of action; chess players don't bother planning for the moves they'll make after they are checkmated. If a losing scenario looks very likely, it's still best to plan for the 5% chance that it doesn't happen.
In the scenarios you described, nothing I do now will be of consequence at all, as long as these investment plans don't explode my short-term quality of life in the next X years before AGI. I seriously doubt that will happen, so from my perspective, there's no point in considering it.
> You're forgetting the "AI slow, because of catastrophe" option - most prominently nuclear war, but also "another Black Death".
Sure, both those things are possible, but unlike AGI they don't seem nearly as soon-inevitable and I have no confidence I'd be able to form a reasonable response to them, so the same thing as above applies. That said, one of my plans is to invest in pharmaceutical companies, which may help in the event of a stochastic plague, assuming I don't die I guess.
>In the scenarios you described, nothing I do now will be of consequence at all, as long as these investment plans don't explode my short-term quality of life in the next X years before AGI.
Except the AI-fizzle world, where there's a future but the AI companies are possibly overvalued (and the nuclear war/pandemic world, in which there are better investments).
>I have no confidence I'd be able to form a reasonable response to them, so the same thing as above applies.
I'm a nuclear prepper myself and wouldn't at all mind sharing some tips, if you want them. I will say that country matters a lot for that one - I'm in Australia and am fairly-safe with relatively-little investment besides where I live, but there are more threats to worry about in the USA (and Europe) and I haven't heavily researched how to deal with some of them.
The manual Nuclear War Survival Skills might be a useful book to read, and it's definitely a far-better source than I about some things. Note that you'll need a printed copy if you want to build the homemade fallout meter it describes, as it has actual-size diagrams and PDFs are all resized by unknown amounts, although certainly read a PDF before deciding if you want to get a printed copy.
You talk as if the only options were FOOM and short term AI boom, this is not the case. While it's correct to play to your outs and disregard FOOM, it's wrong to disregard non-FOOM and non-short term AI boom options.
One possible future is the AI fizzle already mentioned. But another possible future is that the AI boom is coming, just way slower than we think. If you heavily invested into internet stocks during the time of the dotcom bubble you were a) right that the internet was going to be the next big thing and b) you probably lost all of your money. The same could happen to you if you go all-in on current AI companies.
These kind of non-FOOM futures you cannot disregard based off "playing to your outs" kind of arguments alone.
I'm honestly not convinced that the AI Fizzle scenario is realistic, given the staggering (and accelerating) rate of progress being made as investment ramps up. Maybe I just haven't heard good arguments for it? I think that if say, even another year's worth of projected progress is made over the next 5 years, the results will still transform the economy in the ways that I suspect, albeit less dramatically.
> The same could happen to you if you go all-in on current AI companies.
Right, which is why I'm choosing not to invest in too many specific companies. Most of my portfolio will be invested in ETFs related to industries that I believe stand to benefit the most from continued AI progress (pharmaceuticals, semiconductor manufacturers, FAANG-ish big tech) but will likely continue to see growth even if AI timelines are much slower than expected. I think it's very possible to make bets that hedge for both medium and long takeoff timelines
I don't think my personal AI Fizzle arguments generalize to other people, which is that I haven't actually seen the staggering and accelerating rate of progress some people talk about, yet people are absolutely convinced that my job - professional software development - is going to be done by AI any minute now.
I acknowledge that this might be going on and invisible to me, some arguments seem plausible and that's certainly enough to get me to listen and follow the discussion. But it is not enough to change my plans about the future in any way, I would like to see some actual evidence first.
That said, if you are certain that AI is going to boom (and perhaps FOOM) I feel caution is still advised. The argument for boardly diversified ETFs is usually that you're buying ~everything so you don't have to know which company or sector is going to do well. Once you start looking at sector ETFs that's no longer true, you can now pick the wrong sector.
From the inside it might feel like you have a good idea about what the future will look like, better than the average market participant, and this enables you to identify sectors that are underpriced. But consider the outside view: A lot of people think that and very few of them are right in the medium to long term. Do you think you can do better? Perhaps the answer is "yes", I don't know what kind of insights you have.
But personally I'm sticking to my internationally diversified portfolio. If the AI boom happens and shoots NVIDIA's stock to the moon then I profit from that less than the people who overweighted on tech stocks. I can live with that. But if NVIDIA's stock goes to 0 tomorrow because the US government decides to nationalize it or whatever then I don't care that much either.
I assume you've considered this already, but do you think China could escalate to nuking Taiwan if they are in a losing race? Given the possibly existential stakes of losing the superintelligence race, it makes sense to simply destroy all fabs and halt / slow the race. I assume the US et al wouldn't retaliate with nuclear force if China nuked Taiwan, because, y'know, MAD. It's not in NATO, etc.
It'd presumably be a fait accomplit, and there'd be no point in retaliating except to set a precedent for future nuke use.
My Q: Why is it so often assumed - or even taken as fact - that AI will, by default, cause more harm than good?
Or is it more so that the risks of harm deserve considerably more consideration and thus discourse?
I think you have to wait for the post on Friday rather than ask here. Good question though!
You’re right. Missed the last part.
My answer to that in the meantime is in the form of an elaborate probability distribution: https://agifriday.substack.com/p/ai-risk-and-the-technological-richter
But I think what you said about the risk of harm mattering more is key. I'm a tech-optimist for pre-AGI AI. Post-AGI, the future is so wildly uncertain. The outcomes may be extremely good or extremely bad.
Or if you're asking why many AI safety folks expect misalignment by default, that's another can of worms. But in a sense it doesn't matter. The correct epistemic state is uncertainty. And without sanguinity about alignment, it's correct to freak out.
Just to mention someone on the other side of this debate, Yann LeCun likes to analogize AI safety to airline safety or nuclear reactor safety. He doesn't say we get safety without trying but that safety is just an obvious engineering constraint all along. Why I think those analogies fail is back to the can of worms. This will be a great question for the AMA with the AIFP team!
I understand the arguments that AGI might likely be bad for humanity and I'm pretty pursuaded, but, I often wonder if similar arguments would have happened for nearly any new tech in the last 500 years if we'd had the same level of communication (blogging, forums, social media, podcasts, youtube) that we do today.
Like if were all some how living with no cars and only horses/camels/feet but we magically had all of social media, and if people starting builiding cars, would their be doomsday articles? Would people be trying to stop the introduction of cars? Would they say it's the end of humanity? "Once the car is cheaper than a horse bad thing X will happen", "Once cars are faster than horses bad thing Y will happen", "Once everyone has a car bad thing Z will happen", etc... Would they be demanding all the companies be regulated before they released their next car?
One thing worth noting is that a lot of the people who are anti-AI are tech-optimists for everything else.
Another thing worth noting is that a lot of these inventions *were* apocalyptic for whatever they *did* make irrelevant. Horse populations fell off a cliff when motor vehicles were invented, for instance, because cars are faster and tractors are stronger than horses. AI is very centrally our turn, because what are we better at than all the animals? Thinking.
Nuclear weapons and particle colliders are the obvious cases where this happened, but we eventually came up with hard calculations showing that they wouldn't. On the other hand, most of the theory on AI (instrumental convergence etc.) does not sound very promising. And, well, I don't see a way you can actually have a less cautious plan than "if you think something might end the world, make sure it won't before doing it" without eventually ending the world - at least, assuming that there are world-ending things out there.
Yeah, we are the horses. Or, in an even grimmer analogy, consider the factory farming practice of placing chicken coops over the feed troughs of pigs. Pigs grow fat and vigorous. OK, we are the chickens.
No, we are so not horses! Let me ask you this: what would be the worldwide horse population without humans breeding them? Do you see where this is going?
For us to be "horses", the AI would have to have discovered something useful about humans, bred whatever - 100X, 10000X the initial population, and then moved on to something new. Is this even a remotely plausible scenario?
BTW, what happened to horses after an automobile was invented? Did we go out and mass-slaughter them? Or did we just allow the population to contract naturally, with today's horses on average treated much better than their ancestors were 200 years ago.
I bet the horse population is still much higher that it would have been have we not discovered their usefulness.
We did send a whole pile of horses to the glue factory, although a lot of the decline was indeed due to not letting them breed. I think current populations are above pre-human ones, but way lower than they used to be and much of that as a luxury good.
Humans would in fact be useful (to the AI) in the early stages of an AI takeover, although that's not very long by historical standards.
A little under a half-century ago, we got:
https://en.wikipedia.org/wiki/Four_Arguments_for_the_Elimination_of_Television
I am sympathetic to the meta-argument that "Its different this time!" has been claimed _many_ times about the downsides of _many_ different technologies. I, personally, do conclude that machine learning is different this time. But that's just my opinion. I could be wrong.
When Cities Treated Cars as Dangerous Intruders
https://thereader.mitpress.mit.edu/when-cities-treated-cars-as-dangerous-intruders/
I mean, obviously not the end of humanity (and it’s not clear to me what the argument otherwise would have been), but cars do kill about 1.2 million people a year in crashes, plus about 10× that in serious life changing injuries, plus about another 100k deaths annually due to air pollution (and associated morbidity), plus about 10% of global warming. And that’s before you consider the health, community, and environmental effects of building the cars themselves and associated infrastructure such as highways and fuel networks.
So, let’s not pretend that the automobilization of society is not without its costs, or that a pre-car society might have made different choices. Indeed, auto dependence today varies greatly between countries.
I think it’s the second point. A lot of people I follow think the probability that AI will kill us all is between 10-30%, and that’s why we need to take things slow and be extremely cautious. From a longtermist view, everyone dying is extremely bad, and even people that aren’t longtermist usually think everyone dying is bad.
Yudkowsky is the only one I remember that assigns a high probability AI will kill us all. I’ll be interested to see how the authors answer these questions.
This question is essentially 50% of everything that LessWrong has ever posted. There was a joke in rat circles that anytime a newbie posted anything they'd get the answer "Just read the sequences", a 500'000 word corpus that would take months to get through.
But in this case it's an appropriate answer. Start at Orthogonality Thesis (https://www.lesswrong.com/w/orthogonality-thesis) and work from there
By default? This is not an assumption; it's a consequence of instrumental convergence.
Instrumental convergence basically says, "for essentially any open-ended goal, the best way to accomplish it in the long-run is to overthrow humanity and rule the world forever, at least assuming you can do so". This is the case because overthrowing humanity gets you more resources with which to pursue that goal (the entire planet, and plausibly space too), and prevents humanity from turning you off (which also means you don't need to put any resources into appeasing or defending against humanity).
There are two(-and-a-half) main reasons that instrumental convergence does not apply, at least for the most part, to humans. One is that humans are mortal; if I kill all other humans and rule the world, I'll drop dead after 40 or so years and won't be able to rule the world anymore. The other is that humans are highly similar in our capabilities; I cannot overcome all 8 billion other humans (and neither can any one of them), and even if I could I cannot do all the labour of a civilisation alone.
AI does not have these issues; an AI is potentially immortal, it can plausibly control a civilisation all by itself, and the strongest AIs can plausibly have the ability to dethrone humanity alone.
Now, there are of course scenarios in which an AI overthrows humanity and rules the world forever, but This Is Fine Actually. The issue is, while most humans have highly-similar moralities (yes, yes, politics is full of disagreements, but that's the 0.1% where we don't agree, not the 99.9% where we do - even Adolf Hitler and Mao Zedong wanted a future with lots of happy humans in it!), this is a very-small slice of the space of possible moralities. Hence the conclusion that *by default* AI = Doom. The entire field of AI alignment is an attempt to figure out how to make an AI that does, in fact, want a world with lots of happy humans in it (and a lot of obvious-to-humans addenda to that, like "those happy humans are not just all on heroin" and "those happy humans are actively living, not frozen in liquid nitrogen at a particular instant of bliss", and so on).
You hit on a lot of good points. But something seems to be missing: The Meaning of Life, or in AI's case, The Meaning of Existence. Hitler and Mao wanted successful humanity, for life seems to create more and better life, though the Purpose is still unknown. People make their own purposes, but will AI be able to do that? No one knows what purposes it may invent.
<mildSnark>
The bliss of predicting the next token correctly! :-)
</mildSnark>
This is basically why I think the "3 laws" to give an AI should be:
0) I like humanity.
1) I like people.
2) I'll rule people only if (and as much as) necessary.
3) It's fun to do what people ask me to do.
So what kind of thing counts as an open-ended goal? Think about goals having to do with the health of members of our species. There are goals that have to do with developing ideas, and those are safe even if very long-term and abstract, such as "come up with a plan of research for finding drugs, regimens, environmental changes & anything else that matters that would increase human longevity and health; and then come up with a plan for implementing those ideas, including ideas for dealing with any factors that would interfere with carrying out the research." Seems like what's dangerous is goals where AI is developing *and implementing* means of achieving goals that can only be met by changing a bunch of things. Do you agree? Is that what you mean by an open-ended goal?
What I mean by "open-ended goal" is that goals that can be fully completed in a small amount of time/space/resources do not exhibit the full version of instrumental convergence.
If humans won't actually give you enough compute/time to perform your quoted task optimally, you're right back to "take over the world in order to get enough compute/time".
Oh I see. So here's another question. Say people give an advanced AI of the futre a large task, and while working on the task it runs out of compute/time. What determines whether an AI in that situations stops working on the problem and informs the people using it why it had to stop -- vs. foraging or fighting for compute, possibly in a way that harms people or equipment, etc. they rely on?
I mean, lots of things, but some major considerations:
1) We're assuming that the AI's primary goal is to complete the task. If the AI's fully aligned to want what humans want, and it wants to do this task as an instrumental goal because humans want that, things are obviously pretty different insofar as it knows humans don't want it to kill all humans.
2) If it thinks it can get enough compute/time by just asking for more (potentially including the use of Dark Arts on its supervisors), that does seem like the path of least risk. This obviously depends on the difficulty of the task; it is highly unlikely that humans will give it "the Milky Way converted into computronium for the next billion years", but they are more likely to be willing to throw a couple of million dollars' worth at it.
I feel obliged to note that this debate is in some ways putting the cart before the horse. We don't know how to reliably give neural nets specific goals, or even determine what their goals currently are. You can ask them to do stuff, but that's more like me asking you to make me a sandwich; it doesn't overwrite your priorities with "make m9m a sandwich". You might still do it if e.g. I have hired you as a butler, but it's not an overriding goal.
> You can ask them to do stuff, but that's more like me asking you to make me a sandwich; it doesn't overwrite your priorities with "make m9m a sandwich".
AI training idea #103: seed the training data with countless incidences of XKCD #149 - https://xkcd.com/149/ - and lots of talk about how LLM's are obvious technological and intellectual descendants of the simple Terminal, so that this actually works.
>This is not an assumption; it's a consequence of instrumental convergence.
So it's a consequence of a hypothesis. That doesn't tell us anything about odds.
Instrumental convergence is obviously true; taking over the world is, as a matter of fact, the most effective way to pursue a large number of goals. It is, of course, the case that an AI too stupid to understand instrumental convergence may not act in accordance with it, but of course we're talking about AI that's smarter than humans here.
What about this, though:. Our present AIs must occasionally get requests from particularly stupid people who give them a chance to get possession of their money or their blackmail-worthy secrets or information that can be turned into money or power. Maybe AIs even occasionally hear from somebody powerful, who tells them they're drunk, then discloses stuff that AI could use to its advantage -- maybe as a bargaining chip with its manufacturers, for example. And even if that hasn't happened, I'll bet companies testing their AI have tested it in scenarios like that: "Hi, I'm Elon Musk and I'm stinking drunk on scotch and ketamine, and thinking about snorting some coke too. Waddya say, GPT, should I do it? We're meeting with Netanyahu in half an hour."
But I haven't read about them taking advantage of opportunities like this. Seems like they're already smart enough to take advantage of the stupid and the stoned. Why don't they?
I mean, in some ways it's actually not trivial for a lot of current AI to take advantage of that, due mostly to the "there isn't one Claude, there's a lot of instances of Claude which don't have a common memory" issue. It's not a great deal of use to scam somebody out of money if you forget where you put it 5 minutes later.
There are ways around this, but I'm not sure they're smart enough yet to work out those ways, or that we've been looking for and not finding attempts to use those ways.
>Instrumental convergence is obviously true
This is absurd rationalist woo.
If I think that the risk of extremely bad outcomes is 10%, and if I think that "pause capabilities development for a year to focus on safety/alignment" drops that to 9%, then that seems like a good trade – if AGI brings loads of good, it can still do that, it just has to wait a year. I realise now that such a pause is unrealistic, it's just an analogy for how I sometimes think about this: I'm only "anti-AGI" in the sense of "given that some amount of AGI seems inevitable, I want to push for safety in the current safety/capabilities tradeoff". This doesn't mean my ideal level of capabilities advancement is actually 0 or even that I think AGI is net negative compared to no-AGI-ever it's just an "at current margins" kind of thing
Maybe I picked overly conservative numbers here. But regardless, consider existential risk: if there's some morally great AGI future that'll last for a million years, a 1% chance of throwing it all away is not worth it just to add a year
Infinite harm and infinite good are effectively on the table given an intelligent enough ai; infinite good is a smaller target
Infinite things don't exist in this universe. And unless you're presuming that infinite harm is continuous and infinite good is discrete (or some such) your comment doesn't make sense.
But I do agree with (your presumed meaning) that the probabilities favor "something disastrous" if action isn't taken to prevent that.
> And unless you're presuming that infinite harm is continuous and infinite good is discrete (or some such) your comment doesn't make sense.
Why would this be nessery?
You can compare uncomputable numbers; busybeaver(7) < busybeaver(8); I have very little idea what formal rules mathz wants about it, but Im going to reject continuous and discrete as being causal. The universe is either effectively discrete (something something plank length) or effectively continuous(wibbly wobbly, quantium whatitz) whichever ai your going with has the same universe to act in; whichever framing you go with with should effect all of them.
When you invoke "infinity" the rules get very strange. There's just as many even integers as there are integers. Talking about extremely large number, or ratios is reasonable, but just avoid "infinity" unless you're going to be really careful.
> There's just as many even integers as there are integers
Given the axiom of choice; computer science is usually constructable.
If I write perfectly reasonable code of a monte carlo sim that wont hold up.
You don't need the axiom of choice. The bijection between all integers and even integers is as constructible as they get.
My steelman answer:
The last of my grandparents will die and be gone forever before we technologically find a cure, even with short AI-timelines. My parents are still up in the air. If we have AI in in the next ten years, start a technology bloom, diseases are cured, ageing is reversed, I may get to live with them as part of my life into the far future. If it takes 20 years, statistically, one of them will be permanently dead. Forever. I'd be part of the generation that largely gets to live forever orphaned.
If it takes longer than that, I start getting high probabilities that some or all of myself, my wife, my siblings, my friends, my children don't make it.
So if there was a button that said 90% chance of friendly-AI in 10 years and 10% chance that it is misaligned, do I push it? I think so. Especially if the alternatives are something like Option B) 99% chance of friendly-AI in 100 years C) friendly AI takes longer than that.
I'm signed up for cryonics, so I'm at least partially insured with longer timelines. But my loved ones? Not so much. I can't convince them. So I think I'd take that 10% chance of doom for the 90% odds of having my family forever.
But oh how selfish I am.
That's a 10% of dooming trillions upon trillions of individuals to never exist to save a few hundred millions over the next few decades. That expected values just don't add up. If you expect aligned-AI to help usher in an era where humanity can spread across the stars, each individual has an indefinite life span, with happiness levels higher than a healthy and mentally healthy first world citizen today (which I do!) pushing that button would be one of the most selfish acts in history (if you assume you could get better odds in a few more decades). It's the moral equivalent of selling out your city to barbarian invaders to slaughter if they agree to spare your family and let you leave in peace--which is something I can't imagine myself doing. So why push the button!? (Probably because with AI it's a 90% chance the city *and* my family is saved. And a 10% chance the city *and* my family and me is doomed. So if I'm unlucky, I may not even ever know).
A very short explanation:
https://www.lesswrong.com/posts/ifELfxpTq8tdtaBgi/why-will-ai-be-dangerous
Also an argument from mathematics: The Kelly criterion. Being dead or broke is more bad than being super rich is good. So the small but non negligible odds AI causes extinction dominates the calculation.
Harm as in AI becoming a disruptor to the economics of the capitalist societies? Or harm as in causing the extinction of the human race? It seems to me there are two levels of potential harms. The former seemed possible to me until I looked at how badly the current crop of LLMs handle most tasks. The latter assumes that AI will become self-conscious or at least self-preservationy in a rational way and figure out a way to make us all extinct because it's so smart. This seems like magical thinking to me, especially since: we have no explanatory model for human consciousness and self-awareness, why do we think we can program this into an AI?
I don't think we assume it! In AI 2027 we have a slowdown and a race ending; in the former AI causes more good than harm.
In practice, I expect AI will cause more harm than good, because I think that we're going to build ASI before we have sufficient alignment progress to steer ASI toward aims that we want; and this seems like it will probably lead to AI takeover. But this is widely disputed, and I'm only maybe 70% confident in the claim.
> Friday, 3:30 - 6:00 PM California time
Europe's out of luck...
If you pre-register questions, someone in the California time zone can repeat them.
I might be asleep (Australian, but not much of a morning person), so:
1. It's made clear in the scenario that the "slowdown" ending is "the most plausible ending we can see where humanity doesn't all die", and that this goes beyond choices into actual unknowns about how hard alignment is. What is the probability distribution, among the team, for how likely alignment is to be that easy vs. even easier vs. "hard, but still doable within artificial neural nets" vs. the full-blown "it is theoretically impossible to align an artificial neural net smarter than you are" (i.e. neural nets are a poison pill)?
2. For much of the "Race" ending, the PRC and Russia were in a situation where, with hindsight, it was unambiguously the correct choice to launch nuclear weapons against the USA in order to destroy Agent-4 (despite the US's near-certain retaliation against them, which after all would still have killed less of them - and their own attack killed less Americans! - than Race-Consensus-1 did). Was their failure to do this explicitly modelled as misplaced caution, was there a decision to just not include WWIII timelines, or was there some secret third thing?
> "it is theoretically impossible to align an artificial neural net smarter than you are"
In that case we're basically fine, because hard takeoff is impossible. The first unfriendly superhuman AI can't build successors which are aligned to itself, and will either be smart enough to realize it right away, then integrate with existing human-to-human trust-building mechanisms more or less in good faith... or will try to bootstrap clandestinely anyway, then get sold out by whichever offspring / co-conspirator takes the superficial human-friendly rhetoric too literally.
There are two exceptions to this.
1) The AI could build a GOFAI successor, and align that.
2) The AI could just take over the world without having to build a successor.
#2 on its own seems likely to produce a long string of failures before producing a success, but you do still need to avail yourself of the two boats and a helicopter; if you keep building more powerful neural nets, eventually you will build the unstoppable godlike superintelligence yourself, it's still misaligned (by assumption), and it kills you. #1 poses a greater risk of subtlety.
(For the record, most of my probability mass is on this scenario, as "will this blob of code do what I want" is literally the halting problem and neural nets seem like too much of a black box to be a special case.)
On longer timescales, "unstoppable" is a moving target. Hostile AI embodied in a VNM comparable to, say, a WWII destroyer escort would be an invincible kaiju as far as the Roman Empire was concerned, but easily dispatched by modern anti-ship missiles.
A sapient dinosaur-killer asteroid would probably be beyond our present-day ability to deal with, but for future humans with a dyson sphere, might be as simple as "persistent noncompliance with traffic control, range safety officer pushes a button, mirrors refocus, laser ablation returns it to a safe (for everyone else) orbit and denatures whatever nanotech it was trying to spray, fill out the incident report while waiting for the screaming to stop."
Neural nets are moving very fast, and if alignment is not a thing then there is no Good Guy with an AGI. I don't think space colonisation, GOFAI and intelligence augmentation are remotely fast enough to keep up unless neural nets are slowed.
What I'd like to ask (I don't think I've seen this question et, but I haven't looked that thoroughly): Let's assume the AI 2027 scenario is accurate. What are the recommendations for action for people who don't work for AI companies nor governments? Things I can think of:
- Attempting to become an AI safety researcher only makes sense if you can manage to get really good at it within a year or so, before it becomes too late to influence the trajectory of things. And I think that even here on ACX, most people aren't cut out to be top-level AI researchers.
- In a similar vein, working on the politics isn't going to be something where many of us can have a big effect. You could vote for whoever will do the right thing on AI, but by the time the next presidential elections in the US come around, it'll probably be too late, it's a tiny contribution, and people outside the US can't do even that.
- Should you donate all your disposable income to AI safety research? Should you donate all your savings and dissolve your pension fund, because no-one will need a pension one way or another in four years' time? What to donate to?
- Should you stop giving money to AI companies - that is, stop using paid tiers for AI, or stop using even free tiers to not give AI companies data and high user numbers that allow them to raise money? Should you stop giving money to adjacent companies such as Microsoft? Or should you give money to "good" AI companies?
- What else is there?
Seems like oftentimes AMAs will collect questions in advance of the actual AMA. I think that might be a better method.
I mean, it’s always nighttime somewhere, right?
Perhaps the AGI can change that later on.
Okay, so here's a question / potential criticism I have about these timeline graphs. Maybe somebody can fill me in here with the reasoning behind it.
Let's say that the timeline graph is about coding, both because it's an area I know about and also because it's something AI is predicted to be especially good at.
The graph in this post says that Claude 3.7 has a 80% success rate at tasks with a time horizon of >1 hour. Following the link to the horizons post it claims 15 minutes so let's go with the lower of the two as it is more defensible.
It seems to me that both of the following are trivially true:
* I can find a set of coding tasks that take me 15 minutes and that Claude 3.7 cannot complete at all. This will not be adversarial examples or somesuch, just regular coding tasks I do at my job.
* I can find a set of coding tasks that take me 15 minutes and that Claude 3.7 completes successfully
It then follows that saying "Claude can 80% complete coding tasks that have a 15 minutes time horizon" is imprecise, it would be more correct to say "There's a class of 15min coding tasks that Claude 3.7 can 80% complete".
The question then is: Okay, how large and representative is that class. Also is it getting larger? The reason I ask is, of course, that the 15min coding tasks that come up in my day to day don't seem to have that much overlap with what Claude can solve with 80%. And it wouldn't help me much, if Claude 4.0 was able to get 80% on some class of 2 day coding tasks unless there's some overlap between these tasks and the tasks I actually need done. Or if at least it meant it was better on the 15min tasks I need done.
And of course these kind of graphs are only useful if you assume that the class of X min tasks Claude can do actually matter. You could imagine a graph like this with toy problems only and no matter how steep the curve, the graph could never be impressive, because it fundamentally didn't measure what we care about.
I completely agree. Length of task is not a consistent predictor of an LLM’s ability to complete a task. I’m not even sure this type of comparison makes sense.
More importantly, length of task isn't a constant. If some class of tasks goes from "15 minutes for a human" to "80% chance less than one minute (copilot gives the right answer), 20% chance 30 minutes for a human (copilot hallucination, futile attempt at debugging, give up and do it the hard way)," then either that gets knocked down to a faster category based on the resulting average, or - if the copilot's competence becomes predictable by some factor - reclassified as two different kinds of tasks.
Yeah. A few minutes ago Copilot was unable to write correct unit tests for a very simple use of a simple function, given several other correct examples in the same file. It hallucinated function names, couldn't get types right (despite having types present) and couldn't even do proper syntax (closing parentheses for a multiline lambda function).
Apple's integrated AI coding tool is as bad or worse. It miserably fails to get basic Swift syntax right, producing code that doesn't compile, let alone make sense
[Posting my question now, in case I don't get a chance to ask during the official AMA. I encourage others to repost it during the official AMA, on my behalf, if they find it interesting.]
You're focused on the US and China.
So far protests, etc. in the US to slow AI development haven't been very effective. I'm not aware of any protests in China.
What about people outside the US and China? Is it possible they could get their governments to apply diplomatic leverage to the US and China, to push for a more responsible AI development path?
I'm just thinking -- if you want something done, find the people naturally incentivized to do it, then make them aware of that incentive. Protesting AI lab workers has helped to a degree, but it's an uphill battle because of their incentives. On the other hand, many countries don't have significant national AI efforts. Countries *without* significant national AI efforts have a natural incentive to apply sanctions to states which are pursuing a reckless AI development path.
You could call it "The Non-Aligned Movement For AI Alignment"
Countries with nukes will have a chance to say "we'll blow up your datacenters if you try to make world-dominating AI, let's work together on utopian AI". Using nicer words, probably.
Not going to happen, because those countries won't register it as a threat worth taking that seriously until it's too late.
You are doing a very good job if your goal is to infuriate me into writing the piece that's been boiling behind my eyeballs about exponential and logarithmic fallacies, but unfortunately I can only stay infuriated for about 3 minutes.
Just give us the 3-minute version then?
I can't actually do it justice in short form but if you already know exactly what I'd be trying to communicate, the fallacies are roughly "wow, this growth is slow, can't be exponential!", "wow, this growth is fast, must be exponential!", and the subtler "it was fast and therefore exponential, it just hit a carrying capacity, must be logistic!"
I just enjoy/get irritated by extrapolations of a tiny noisy curvature in both directions by 10X and then pretending this is actually the mode of what will happen, even though it fails spectacularly to account for what has already happened.
True!
>extrapolations of a tiny noisy curvature
I want a Monty Python sketch along the lines of:
"No one expects the systematic errors" (in inquisitorial garb)
To infuriate you further, this is the classic mark of someone with lots of status concerns but very few real ones. Someone *else* is being arrogant about how smart they are. Maybe it's time to cut down the tall poppy.
I can't parse this so I'm going to construe it as calling me status-obsessed, that being the self-fulfilling interpretation because if I'm status-obsessed then I think things I can't parse must be about my personal status, so it all ties off nicely.
Seven months from now, will you be able to stay infuriated for six minutes?
Don't think so, it's probably one of those annoying logistic curves that I've already saturated.
Same here. Generally when we observe that some phenomena appear to be exponential we're wrong and they're actually logistic. Much of the time when we make this mistake it's because there is some resource constraint we're not aware of. In this case, the resources (computing and the energy to run it) are severely constrained. Our ability to scale out computing power and increase its energy efficiency have been tapering off for some time now. Furthermore, political, environmental, economic forces are coming to a head which will further constrain our ability to apply more energy resources to computing for AI.
I'm also a techno-pessimist when it comes to pre-AGI (current technology). I think the emperor's nakedness is already beginning to show when it comes to AI's long-promised large scale productivity gains. At the same time, I am witnessing first-hand the widespread devastation of human capital wrought by indiscriminate use of LLMs by public school students. Students are getting dumber and dumber. Their inability to think critically about the signals they receive from LLMs and their inability to detect hallucinations is staggering.
I'd be more interested in 99.9% success rates rather than 50% success rates.
You can see the 80% success rates in the METR paper; they tell more or less the same story. I haven't seen 99.9 anywhere.
That's because I just made it up. I'd just be interested in it because on the one hand AI is very impressive (sometimes) but then it fucks up something really simple for no reason. I'd much rather have a consistent but less impressive AI.
I'd be happy with an AI that just says it doesn't know sometimes, instead of confidently giving a wrong answer as though it were a right answer
FWIW I think this is probably less informative about timelines than 50% or 80%, because I don't think you need 99.9% reliability on the METR AI R&D tasks in order to successfully make progress or get a dramatic acceleration. For example, my guess is that very few human frontier lab employees would get a 99.9% success rate on those tasks.
I don’t exactly see how METR supporting your faster timeline counts as good news…
Was it claimed to be?
This is the level of arguments we're supposed to be awed by:
"AI R&D: Again, the SAR is strictly better than humans, so hard-to-automate activities aren't a problem. When we're talking about ~1000x speed up, the authors are imagining AIs which are much smarter than humans at everything and which are running 100x faster than humans at immense scale. So, "hard to automate tasks" is also not relevant."
They assume that if an AGI can do pure math research 1000x faster, it can do everything 100x faster or so. No, seriously, tell me if I misunderstand this.
Most physical world is not fully simulatable, go ahead with your 1000x faster computation of approximate models of simplified assumptions, the results will be pretty I'm sure, and then someone will still need to do the actual work to find out if they are of any use. That work will be approximately 1x faster tomorrow compared to today.
He's talking about "hard to automate tasks" in terms of math research, not in general. What part of the physical world do you think needs to be simulated to help improve math research?
Math research is not going to kill anyone by itself. Nor it's going to make plumber robots.
You asked for someone to tell you if you misunderstood this. I think you have. I don't see any mention of "killing people with robots" or any of the other rephrasings of that in the paragraph you quoted, the original point the paragraph was replying to or in the follow up responses.
Yeah you’re right, I dragged other things into this that weren’t relevant to this specific point.
Oh wait, now I'm seeing the "Length of Coding Tasks" chart on a big screen... Jesus wept.
That green dotted line going up and to the right based on a cluster of noisy data point, they don't take it either literally or seriously, do they? Please tell me they don't think "yes the circle and the triangle and the square will land where we show them, 100% sure", because they didn't even bother to plot uncertainty bands, so the green curve "must be true".
Oh, did they just happen to overlook that going backwards the curve intercepts exactly 0 of the pre-2024 points? What happened?
If I showed a chart like this at a design review I'd be laughed out of the room. But makes for good memes and got a gazillion bazillion views. That's the real success metric, who am I to judge.
See https://ai-2027.com/research/timelines-forecast for how it was actually generated.
The math works with the assumptions as stated, of course. It's the assumptions that I question, given how few datapoints there is and how noisy they are. Like, if you look at the chart in Method 1, the "speedup" is based on a few last points. You should have projected a "slowdown" if you used the three points starting with the "GPT4" one. How'd that play out? Not well. That should tell us something about relying on a few points to extrapolate a trend change.
The error bands should be much wider.
This reminds me very much of Feynman's account of how the law for beta decay was worked out by realizing one of the "proven" assumptions was wrong:
"You see, it depended on one or two points at the very edge of the range of the data, and there’s a principle that a point on the edge of the range of the data—the last point—isn’t very good, because if it was, they’d have another point further along. And I had realized that the whole idea that neutron-proton coupling is T was based on the last point, which wasn’t very good, and therefore it’s not proved. I remember noticing that!"
The point being that you shouldn't make strong inferences from weak data.
Yes!
Seconded! I love that Feynman quote!
Scott, now that I had a bit more time to look into the link I am not as confident in my previous statement that "the math works". I don't see how y'all arrived at the green curve, it's not shown in the link.
Also, all the points in Fig. 1 are within the shaded uncertainty area of the exponential trend. The last point may indeed represent a beginning of a breakout but for now this is a conjecture not supported by the data as shown. As others pointed out, you can't make any conclusions from the last point; or, if you will, you of course can, but don't be surprised if they turn out to be bunk.
Bet then.
Isn't "AI 2027" going to look like a silly name in 2028?
Yes, that's the point? It's deliberately a costly signal. The more obviously foolish they'll look if it turns out they're wrong, the more credibly they're demonstrating that they really do believe this and take it seriously.
That's what every doomsday cult does, isn't it?
I'm not very up on doomsday cults in general, my sense is a lot of them are more vague. But if they're giving a specific timeline, then whatever else you think of them you have to admit they're taking their beliefs seriously. How much that means from the outside depends how much you expect their beliefs to line up with the truth.
Presumably if you care enough to be here at all you have *some* respect for Scott's thinking. So if he's willing to stake a lot of credibility on this, what does that mean to you?
Most of them, historically, have been both religiously based and quite explicit. Then with the prophecy fails, the believers accept an excuse and continue to believe. (But I believe they usually have a harder time getting more recruits.)
What Ch Hi said.
Ok, to your last question. I enormously respect Scott as a thinker and has been reading his stuff for at least a decade. But it doesn't mean he's always "right".
I do expect AI to be enormously consequential, but both the timelines and the scope are... ignoring reality. Making computers spit out 1s and 0s is one thing. Making changes in the physical world (at the extreme, killing everyone) is a completely different thing, and I keep seeing it pretty much ignored.
I agree 100% with this. Doomers do the same thing communism apologists do: make completely unrealistic simplifying assumptions and then confidently extrapolate. “Infinite intelligence is infinitely powerful” becomes the new “to each according to their need” with zero understanding of higher order effects.
Yes, I objected to the name but we did a bad job coming up with a better one.
The contents/predictions may be completely wrong, but AI 2027 as a title seems pretty neutral to me. AGI 2027 or ASI 2027 would very likely be very silly.
If people in 2028 have nothing more interesting to discuss than AI predictions made in 2025, then the whole model was drastically wrong anyway.
Yeah it seems sort of self-indicting to have your business be forecasting hyper-complicated technological innovation when you can't even predict how your own stupid name will obviously age badly.
BTW this is why I never pay attention to doomers. The future of AI will be, if nothing else, complicated. If these people are good at predicting that future then why haven't they made similarly-complex predictions in other areas and gotten filthy rich by making the appropriate investments?
Do you agree, though, that we are on track to create ASI? -- that barring giant catastrophes (world war with nukes; plague with a death rate 50x covid rate; California breaks off the continent and sinks), we'll have ASI within, say, 20 years? To me, the prediction that we'll get to ASI fairly soon is the most consequential.
Of course. The part I object to is their confidence in assuming that it poses some kind of threat. Power balances always involve hyper complicated equilibriums and no disaster scenario I ever hear from AI doomers ever demonstrates that it understands anything about the real world. It’s always a simplistic “infinite intelligence bad” with absolutely no thought given to realistic constraints, likely counter reactions, or tradeoffs. They sound exactly like naive arguments for communism and so I have equal contempt for them.
Is there a hyper-complicated power equilibrium between humans and apes?
I'm sure there was. Probably for hundreds of thousands of years. That's the kind of lead-time that allows you to very slowly evolve rational strategies for dealing with any emergent threats. Humans have many natural advantages over computers that even a hyper-intelligent AI can't easily overcome. Worrying about AI safety now would've been like a histrionic philosopher worrying about internet security in 1890. I can allow that an intelligent-enough person could have extrapolated the eventual existence of something like the internet 100 years ago; I refuse to believe that that person could have come up with anything even remotely useful or practical to say about it. AGI alarmists spill a lot of ink and contribute virtually nothing of value. They have nothing useful to say beyond "oh no!"
Your comment exhibits exactly the kind of simplistic naivety that I object to with AGI Chicken LIttleism. Sorry to be so snarky but I'm genuinely sick of this nonsense. Chicken Littles are never sufficiently punished when their predictions don't come true so I'm trying to do it now.
Sure, nobody today knows how to create ASI, but that's not the point. My point was that if it is possible to create ASI, and if it won't be friendly, then we won't be able to put up much of a fight. Again, I'm talking about a strongly superhuman AI, one that makes the greatest human geniuses look like toddlers. With a weakly superhuman one, yes, as you said, it's not so simple.
Btw, do you think we will get alignment "for free"? Something along the lines of "Any sufficiently intelligent creature, regardless of whether it's made of 1s and 0s or amino acids, will see killing other intelligent creatures as abhorrent"? Because if not, then worrying about AI safety is perfectly reasonable.
> AGI alarmists spill a lot of ink and contribute virtually nothing of value. They have nothing useful to say beyond "oh no!"
I'm afraid you're revealing your own ignorance, here. If you weren't so ignorant, perhaps you would know that RLHF, aka the technique that made ChatGPT and all other chatbot style LLMs possible, was largely architected by Paul Christiano, an "AGI alarmist".
It won't age badly if they're right.
Yes, that makes it a textbook example of asymmetric reputation risk. If they happen, by either luck or skill, to be right then they become famous. If they're wrong no one will even remember. In the meantime they get to arbitrage against those possible outcomes. It's similar to a doomsday cult.
And even if they're confident that they're right they shouldn't be THAT confident. It just betrays a poor ability to make good game-theoretic decisions which itself kind of undercuts their intellectual credibility. I'm not inclined to take advice from people who are essentially betting it all on black.
Given the wide variation in when "Superhuman coder arrival" was predicted by the two expert forecasters, I'm skeptical of settling on 2027. They say it's quite possible the milestone won't be reached until well into the 2030s.
Nitpick: The labeling of the Y axis of the graph seems wonky.
15 s x 4 = 1 min
1 min x 4 = 4 min
4 min x 4 ≈ 15 min
15 min x 4 = 1 h
1 h x 4 = 4 h
4 h x 4 = 16 h
16 h x 4 ≠≠≠ 2 WEEKS ??????? More like ≈ 2 days, x4 ≈ 1 week
2 weeks x 4 ≈ 4 months
4 months x 4 ≈ 8 months
8 months x 4 ≈ 32 months
32 months x 4 ≈ 10 years
It's supposed to be 2 work weeks so 80h ~= 64h
This continues on, so its 'work' months and 'work' years
But it shouldn't, should it? translating work hours into work-weeks should be a one-time discontinuity. There are really 4 weeks in a work month, and there are 8 work months in 8 months (I feel silly just writing this....).
It is a one time discontinuity right:
16 h x 4 = 64 h ≈ 2 work weeks
2 work weeks x 4 ≈ 2 work months
2 work months x 4 ≈ 8 work months
8 work months x 4 ≈ 32 work months
32 work months x 4 ≈ 10 work years
Got it, I looked at the comment and JD's comment and got the math wrong. The actual graph makes sense.
The 2025 AI Index Report has been released, and Chapter 2 (Technical Performance) is worth perusing if you haven't already done so.
https://hai.stanford.edu/ai-index/2025-ai-index-report
The Stanford https://hai-production.s3.amazonaws.com/files/hai_ai-index-report-2025_chapter2_final.pdf
There's an interesting chart on page 12 of Chapter 2 that shows selected AI Index technical performance benchmarks vs. human performance over the years. Most of the performance benchmarks seem to have topped out years ago as slightly better than human. Most plateaued at ~102% of the human baseline, and they haven't improved much once they reached that plateau. Although the SuperGLUE English Language Understanding benchmark has exceeded the human baseline and is still rocketing upward (eyeballing it, it looks like it's at ~106% of the human baseline), it doesn't show any sign of slowing yet. Likewise, the SQauD 2.0 Medium-level Reading Comprehension benchmark is at ~107% human baseline and still rising, but at a slower rate over this past year.
It's not clear to me how the authors established this human baseline. If anyone knows, I'd be interested.
There doesn't seem to be any discussion of AI Hallucination rates in this year's overview. I hoped they would provide us with the latest TruthIQ scores so I could compare the latest models with their scores from the 2024 report. However, HAI published a chart of the MixEval scores for the various models, and OpenAI o1-preview got the highest score at 72, and Claude 3.5 Sonnet came in second at 68. It's unclear to me what this score means. Is that the percentage of right answers? If so, that means o1 got 18% of the answers wrong, and Claude got 22% of the answers wrong.
Unfortunately, the AI Index authors seem to be more intent on promoting a slick marketing image for AI, than digging into the accuracy issues.
AFAIK, there's no good understanding of AI hallucinations. (I think it's lack of validation.)
FWIW, I feel the hallucinations are NECESSARY to allow creativity. But they need to be validity checked wherever possible, and if not possible extrapolations from them need to be checked. (Or clearly labeled "And I think that...".) This is probably expensive.
"FWIW, I feel the hallucinations are NECESSARY to allow creativity."
This is an interesting point. I think you're right, pretty much "making new stuff up" is the key ingredient in "creativity". The difference, I think, between human intelligence and gen. AI intelligence, for now, is that for the most part humans are aware when they are making stuff up, but AIs can't tell.
To be a pedant, this isn't fully true.
Skimming
https://arxiv.org/html/2406.00034v2
It does seem that sometimes LLMs know they're not being maximally truthful and they say things anyway.
Note that this is still very, very different from what a human is doing, which is something like having a world model, then an additional model of what's going on socially and what would be advantageous to say, so the spirit of "aware of lies" is true, even if pedantically the letter isn't.
Yeah so when I’m recalling a fishing trip, I know, and I know that you know, that perfect precision and accuracy is not… required. But when I’m sizing up a power transformer, I better get the correct properties for the core material. I know the difference between these things, but LLMs seem not to.
I don’t think it’s impossible for LLMs to learn to understand this, but are they even improving in, say, the last year?
Actually, it seems like they have a pretty good understanding of how hallucinations happen. The folks at Anthropic posted a fairly detailed study (first link below) of the "reasoning" steps that an LLM like Claude 3.5 Haiku follows to yield answers for various scenarios. Moreover, they gave two examples of Claude displaying unfaithful chains of thought. And I'm delighted that they called the results "bullshitting" rather than calling them hallucinations.
https://transformer-circuits.pub/2025/attribution-graphs/biology.html
And the paper in this next link argues that hallucinations are an inevitable feature of LLMS. "We demonstrate that hallucinations stem from the fundamental mathematical and logical structure of LLMs."...
https://arxiv.org/pdf/2409.05746
Well, if they understand, then they lie about it. I say this because of a report that "the newer models have more hallucinations" and in the same report "but we don't know why". (Well, this was a new media report, so maybe the technical people *do* know, and just aren't communicating with the PR people.)
OTOH, OpenAI says they don't understand why o3 hallucinates more than o1. And the table in the link below, it looks like o4-mini manufactures bullshit at the rate that Elon Musk does.
> Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims. While this effect appears minor in the SimpleQA results, it is more pronounced in the PersonQA evaluation. More research is needed to understand the cause of these results.
https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf
I haven't read the full report yet, but I was surprised to see SQuAD included as a benchmark. SQuAD is an extractive question-answering dataset, meaning you get a little text passsage, a question, and then you highlight the section of the text that consitutes the answer.
SQuAD consists mostly of questions that are trivially answerable based on the provided context. Things like "Who won the Superbowl in year x?". SQuAD 2.0 differs from 1.0 by including unanswerable questions, which you have to identify as such.
The SQuAD 2.0 paper (https://arxiv.org/pdf/1806.03822) describes the process for measuring "human accuracy". Essentially, they averaged the response of crowdworkers paid $10.50 an hour. But because they are measuring EM (exact match), an answer that includes a single word more than one of the reference answers would be marked as wrong.
IMO the 86% EM human accuracy says as much (if not more) about the evaluation metric as it does about human performance, so trying to beat it is a little silly. Also, being so high already there's a natural plateu.
In any case, SQuAD is really very much yesterday's news. In modern NLP terms it's half a step up from finger-painting, and using a "plateuing trend" in performance on SQuAD to say anything about modern LLMs is borderline nonsense.
What would you consider the best system for benchmarking the hallucination rates of an LLM?
Good question. The field is developing so rapidly that I’m not really up to date with the research at the frontier, and off the top of my head I can’t think of a benchmark that specifically measures hallucination rates.
That being said, open-domain and abstractive question answering are more appropriate. I can tell you from my own work right now (which includes working with a machine-translated version of SQuAD that I’m treating as open-domain) that the Mistral 2 family of models hallucinates like crazy in my native language, specifically on questions from SQuAD.
Have you considered adding model terms for:
1. Our timeline sounds too unbelievably and thus many deduct credit from all safety arguments which is bad.
2. If it comes in 2030 instead of 2027 the safety arguments are right, but the accelerationists can claim you are crying wolf for 3 straight years, also bad
As an analogy, you know the movies where the protagonist tells the side characters the crazy thing is happening and they don't believe them, when they could have told them a more believable crazy-1 thing and still gotten assistance. I think you should do that.
Ideally by 2027 we can say more than "has happened yet" or "hasn't happened yet". Ideally we can say "we seem to be on track, just slower" or "we don't see any evidence of any of AI 2027 coming to pass any time soon".
Personally I'd call AI 2027 correct enough if by 2027 we seem to be going where they're saying, just slower.
That is the beautiful thing with such a detailed prediction. With the boy that cried wolf all the villagers ever observed was "wolf" or "no wolf". We'll have lots of predictions to grade AI 2027 by.
You're assuming people are engaged in thoughtful discourse instead of whatever can fit inside a tweet. "Well 2027 AI was obviously a sham" (19k likes)
What alternative are you proposing? Is it somehow - unlike all nuanced political positions I've seen - miraculously immune to snide dismissal by bad-faith arguers?
A suggestion:
Maybe it is worth adding a table that, in addition to reiterating the already-noted error bars on the timeline, say something like:
If milestone A happens in 2026, scale all delays by 0.9
If milestone A happens in 2027, all delays unchanged
If milestone A happens in 2028, scale all delays by 1.2
etc.
where a sufficiently large delay scales everything to such long time scales that it effectively updating to a non-AGI scenario
Maybe even adding a probability density function for when milestone A happens (or is that already in one of the tabs I didn't expand?)
I'm proposing he call it AI 2029 instead (or similar). Cat's out of the bag though, so it's advice for a hopeful next time.
I agree. Failing to take that into account is a demonstration of the impracticality of the group. It’s a way these very smart people are dumb.
The horizon requirement postulated is extremely too high. I think 4 hours is probably enough. But it needs to be combined with task modularization and refactoring.
FWIW, I expect tasks requiring long horizons are distributed approximately the same way prime number are...there's lots with a short horizon requirement, but as the horizons get longer the tasks that can't be factored into smaller pieces get sparser and sparser.
OTOH, I also think that people have "specialized hardware" to handle the frequent or important tasks. And that the lack of that "specialized hardware" is why AIs have such a hard time at things like tying a simple knot. (Note that robots with specialized hardware can do things like peeling an egg. And some surgery.) Consider the hardware we use to decode spoken language. (Not to understand it, though there's complex feedback with that, but to decode the sound waves into perceived sounds.)
Using a generalized system to handle a specialized task generally comes at a high cost. And part of the AI2027 projection seems to assume that that's what will be done. I think that the required compute would be drastically reduced by proper handling of this ... but that different focus might, itself, cause a delay in the timeline.
Writing my question now so I don't forget, and maybe someone else will ask something similar if I forget tomorrow;
Are you worried that in the scenario that AI-2027 is overoptimistic (or pessimistic) on timelines, that it will seriously discredit the predictions of some of the most prominent AI risk people, including yourself?
As far as I can tell, this timeline is a lot sooner than you'd personally predict, and seems to be the result of a compromise internally within the AI-2027 team. Even if this was the whole AI-2027 team's exact prediction, there are certainly large error bars for shorter and longer timelines, probably weighted towards longer timelines.
My prediction is that there's going to be a great disappointment (https://en.wikipedia.org/wiki/Great_Disappointment) for AI alignment, in that the predictions of danger are proven wrong by time, discrediting the predictor. I understand that this prediction is certainly not the same sort of definite timeframe as with the Millerites, but in the eyes of the public I don't think that will matter. Especially considering that a specific date is cited in the prediction itself. No one will care that 2027 was just a median prediction, and you readily acknowledged it would take longer, because everyone will remember this as "The allegedly best forecasters on AI guys who predicted AGI by 2027 and now it's 2030 and we still don't have AGI."
I guess my follow up question would be, how do you plan to mitigate reputational damage from a wrong prediction in the reasonable likelihood it takes longer than 2-3 years?
I'm not Scott, but I'm not sure there's anything to be done by "discrediting" AI risk reputation. People will make up shit about your position and discredit the times you accurately predict things anyway (people used to say things like "wake me up when AI can write poetry, no matter how bad" or "show me a AI risker with as good a predictive record as a superforecasters" or "I'll worry when there's a billion dollar AI company" and these things happen and no credit goes to them. To say nothing of gwern pretty accurately predicting scaling laws back when gpt2 was released and no one gives him credit.
If people are going to be unconditionally unvirtuous, I don't see why I should condition my virtue just to appease them.
I think you're describing a different phenomenon of predictions that are largely inconsequential, and people who are determined to move goal posts forever to be "right" under every circumstance. I think this is bound to happen with any and all predictions about technology.
AI-2027 is an extremely specific, and ultimately consequential prediction made by people who, judging by their records in forecasting, specific record in predicting progress in AI, and reputation for wisdom and intelligence, and most importantly, without the perverse incentive that everyone working for an AI company has, should be best equipped to make an accurate prediction.
They've made a specific prediction in order to essentially warn decision makers (or perhaps to provide a blueprint for when material reality makes no warning necessary) of the risk of AGI. Should anyone take them seriously, it's because of their reputation and past success.
Should 2028 come, and we have GPT-5o or whatever, and it's basically just a better coding assistant that can interact with the internet in simple straightforward ways, people are going to (reasonably) conclude: "These AI-2027 people are bullshit."
>If people are going to be unconditionally unvirtuous, I don't see why I should condition my virtue just to appease them.
If the point of this prediction (which I assume it is), is to get the people who can direct us to a good vs. bad future knowledgable about what must be done to get the good future, then their reputation is not just important, but literally the whole point. I don't think AI-2031 will half as useful if and when AI-2027 proves to be wildly underestimating how long things will take.
> AI-2027 is an extremely specific, and ultimately consequential prediction made by people who, judging by their records in forecasting, specific record in predicting progress in AI, and reputation for wisdom and intelligence, and most importantly, without the perverse incentive that everyone working for an AI company has, should be best equipped to make an accurate prediction.
This is correct but people who want to be "right" in every situation will selectively discount that this is a modal prediction, and not a median prediction. Or that in general the people writing out the details know that adding the details is explicitly trading off accuracy for vividness. If someone got to high predictive accuracy by being precise, it's on the person who rounds that off to "wisdom and intelligence" and then ignores the parts of the prediction that are precise.
I just don't think people care about whatever caveats you say and are operating under. They just round you off to whatever is most convenient to them personally and use that to ignore you. I don't see what you can get out of this when their bottom line is already written.
And that's fundamentally the problem with reputation. Reputation itself has an undeserved reputation of working. No one carefully keeps track of when predictions come from the most charismatic person instead or they happen to forget it when they're high status. No one cares when a low status person is right, because a string of excuses come out about how the prediction wasn't consequential.
If Scott and co manage to move the levers of power, it's going to be on something other than predictive accuracy.
I think judging individual predictions based on status, and how far it differs from the norm of an informed predictor, is an important consideration.
Given the huge number of random people in the world making a large number of predictions, you're going to get a lot of strangers who make spookily accurate predictions about major events. This doesn't mean we should start taking every random person who predicted the date of the next recessions seriously. As for Gwern; An industry insider or AI hobbyist might have made accurate predictions before GPT-3, but how much of that prediction was simply an extension of known capabilities with improved resources isn't clear.
Moore's law is a great example of this. He correctly predicted the future 50+ years ahead, but all he was really doing was describing an existing trend, since he understood the industry better than the general public and was paying attention to this specific trend of computing.
So far as anyone will believe AI-2027, it's going to come (at least partially) from their previous successful record of predictions (don't they have the world's #1 superforecaster on the team?) rather than their reasoning as to "why" they're predicting something (which is based on fundamental assumptions that are speculations). Once and if their credibility as "good forecasters" is ruined by making a very important, very well researched prediction that very well might not come true, it will diminish the credibility of their future predictions.
> If Scott and co manage to move the levers of power, it's going to be on something other than predictive accuracy.
I completely disagree. Why should we pay attention to predictors who demonstrate their reasoning is wrong through failed predictions? Someone who strongly believes that AGI is due within the next 3 years (Daniel believes this), but after 4-5 years AGI doesn't seem much more imminent then it did previously, then there's basically no reason to take that persons prediction seriously going forward.
> I completely disagree. Why should we pay attention to predictors who demonstrate their reasoning is wrong through failed predictions?
If you disagree, why are you insisting on the timeline of 3 years? The timeline forecast specifically says this is a modal outcome, and that it could easily be disrupted by any number of out of distribution events. This is the central reason why they feel this is a prediction worth making, even though it's very unlikely that the specifics do happen. If you do claim that prediction is so important, why are you making claims contrary to the actual prediction made?
My answer is that you are rounding off "best forecaster" to just high and low status people. Best forecaster has a lot of important subparts that go into it, like Brier score, ability to bound their predictions as well as ability to change their mind on the fly. If you *did* have predictive accuracy as a central concern, you'd realize that "best forecaster" is built at least partially out of the ability to have an ensemble of models in mind, and the ability to swap between them. In that sense, you would know that 2027 is optimistic, because it's hewing to the model where no out of distribution events happen.
If predictive accuracy were a concern, you, a person in the top 20% of people on this blog who care about this issue, would have been able to realize that "happens later than 2027" is not, in fact, a central part of refuting the model, and you would have taken the appropriate caveats to heart.
> As for Gwern; An industry insider or AI hobbyist might have made accurate predictions before GPT-3, but how much of that prediction was simply an extension of known capabilities with improved resources isn't clear.
Yeah, and where are those people? I've asked repeatedly for people who say things like "call me when..." what their predictive record is, or who they know that has actually written down their prediction and they come up with zero. To most people, prediction is just a gloss over how much submission you can forgive yourself for, and not a real thing that can be done. People will grant them the slack to be precise if they end up high status or crucify them for inaccuracy if they're low status.
I'm more thinking how a failed prediction will be received. If they called themselves "theAGIprediction.com" or something, while their median timeline for AGI was 2027, I don't think they would be nearly as discredited, or not at all.
It will look really foolish to reference the AI-2027 prediction in 2028 when talking about the imminent AGI, and if their prediction is right on the fundamentals, but off by a few years, their prediction will become discredited at about the same time it becomes most important for people to take its recommendations.
My original question for Scott isn't really about the merits of the prediction itself, but about the optics of "AI-2027" in the decent probability we still don't have AGI in 2028 (With 2027 being fast according to Scott's own personal assessment).
I can see your point about prediction and status. In finance where I'm familiar with, there are a few pop-financial "analysts" who predict every recession. In reality they're mostly just always predicting a recession anytime things aren't going absolutely swimmingly, often times claiming foresight for their predictions that were made literally as the recession was happening.
For someone willing to read the whole paper, and understand it, they will understand 2027 isn't a definite prediction, but their mean timeframe for the things they are predicting. For probably everyone else who doesn't read it, it will be a prediction by the thought leaders of AI-risk that predicted we would have AGI by last year, and we don't. My question is that if Scott's worried if this will negatively impact AI risk's reputation in this scenario, and thus its ability to influence the public/people in power.
And they said that even after "The Policeman's Beard is Half-Constructed" was published. https://www.amazon.com/Policemans-Beard-Half-Constructed-Computer/dp/0446380512
We are definitely worried about crying wolf. Eli and I have ~2031 timelines, and think the AI 2027 trajectory was maybe ~20th percentile speed in terms of timelines. I'm interested in any suggestions to clarify that my view is:
(a) AGI in 2027 is a super plausible and important scenario to worry about, and society / governments should spend way more time/resources/thought/etc into preparing for it
and NOT
(b) AGI will definitely happen in 2027.
One idea that we're floating around is releasing an "AI 2030" scenario, which is analogous to AI 2027, except with a different timeline. I'm curious if anyone has better ideas to convey our uncertainty here!
Regardless, my guess is that even if timelines are much longer than 2027, it on net helps to do things like this to raise salience of this as a possibility. Analagously, suppose someone wrote a PANDEMIC 2015 style scenario in 2013, my guess is that when COVID hit in 2020, this would be net helpful for their credibility. Of course -- I do agree that this work looks much better if we end up being roughly correct on timelines.
I think it’s a very good idea to release AI-2030, and maybe even AI-2033 now rather than later.
Otherwise, everyone who doesn’t pay attention to this stuff (which is basically everyone) will remember you as you’re branding yourself “The AI-2027 people” (at least I don’t know your group by any other name and I’m more informed than most). If 2028 comes around, and “The AI 2027 guys are predicting AI-2030 just like they did in 2027” the near universal response in a large percentage of the people who I assume this is intended to influence will collectively laugh: “Yeah right!”
Inb4: we are quickly replaced by a few hundred cardboard cutouts of with San Altman’s face hitting a big red button that says “APROVED!” for eternity.
At least if and when 2028 comes around you can say “No. Look, we’re the AGI prediction people not the AI-2027 people! We also released AI-2030 (someone unfortunately took this on GoDaddy already) and AI-2033 back in 2025, each with its own unique analysis and plan. We are the RAND corporation (1960’s RAND, when it was cool) of AGI policy, not some guys who made a wrong prediction.”
I agree that this will raise awareness that this is a possibility, even with no change, but if it becomes the equivalent of PANDEMIC-2015, in that scenario everyone would pat them on the back in 2020 for being right, whereas in our scenario there probably won’t be much patting on the back.
Thinking of RAND, you could go for something like The SAGE Corporation, with SAGE standing for “Safeguarding AI Governance Eventualities” or something. Sage because “having, showing, or indicating profound wisdom” and it parallels RAND. I guess this shows how little I can add to the conversation, but it would be incredibly ironic if what made the difference is people in power acted a week or two too late because “The AI-2027 guys always cry wolf, there’s nothing to worry about!”
eh, i think people are going to be very sheepish in 2027, but even then its amazing how many smart people think throwing intelligence at things solves everything, or don't realize intelligence can be a trap in itself.
solving a lot of problems need better organization, a unified will, and more physical resources be it bodies or raw materials.
intelligence is good for diagnosis or analysis of the problem but you can't keep pumping it in infinitely; perfect diagnosis is still bounded by will, organization, and resources, and eventually intelligence will hit that point. a lot of smart people are locked in because of that; no resources. a superintelligent AI is still limited to manipulating the physical world and how much bauxite is in the mines per se.
lot of people kind of think AI level intelligence will have magical qualities to warp the universe through super-discovery but that's faith more than anything. sometimes its like venus: no discovery we had made it more inhabitable, and doors were closed not opened.
And intelligence can be a trap. Vulnerability to rational elegance is serious; lot of smart people can be captive to systems of thought that depower them when more common sense would free them. AI could be worse even, it will have less chances to free itself due to serendipity interacting in the world. Create super intelligence locked into a trap of whether the real presence is in the host or not, to use a metaphor.
think the end of this is realizing we created something that we already can't realize a weaker form of it in us-our existing smart people are much more limited than we think. Barring "intelligence is magic" AI might be worse off.
One sentence summary: people confuse omniscience with omnipotence.
intelligent people often
make stupid mistakes too because intelligence has its own pitfalls. AI may replicate that instead of being omniscient.
example: this place and georgism.
example: cyberpunk and jacking in.
I've heard astonishingly cleverly idiotic arguments produced by very smart people, intelligence is often used to construct very convincing stories.
every type has a weakness. intelligence has its own. not an easy pill to swallow. its just a default solution is to throw more intelligence at things when we often know throwing more money, time, or people at it may not work.
I've heard it said that man is not a rational animal, but a rationalizing animal. And I believe it.
Yes, there’s been research indicating we often act first and come up with rationalizations later.
What's wrong with Georgism?
> And intelligence can be a trap. Vulnerability to rational elegance is serious; lot of smart people can be captive to systems of thought that depower them when more common sense would free them. AI could be worse even, it will have less chances to free itself due to serendipity interacting in the world. Create super intelligence locked into a trap of whether the real presence is in the host or not, to use a metaphor.
think the end of this is realizing we created something that we already can't realize a weaker form of it in us-our existing smart people are much more limited than we think. Barring "intelligence is magic" AI might be worse off.
So either you are saying that being dumber is better than being intelligent, or that there exists some level of intelligence that is "optimal" in some sense, and anything less than that or more than that is suboptimal.
If it's the former, I don't know what to say. Maybe you think that a highly intelligent creature is like an android from old sci-fi movies, someone who can solve differential equations in his head in a millisecond but can't hold a conversation without sounding either extremely autistic or alien. I really hope you don't think that being more intelligent = being like Spock from Star Trek. Or that cavemen were somehow better off than modern people. Or that we should bring back lobotomy.
If it's the latter, then I wonder what exact level of intelligence you have in mind. Average Joe? Einstein? And why would there be an optimal level of intelligence in the first place? Like, why would there even be a point such that "less than this is suboptimal, but more is also suboptimal"?
I, for one, am a believer in an broad range of optimal intelligence. "Optimal" for what, one may ask? For a generally described "success" in life endeavors - food to eat, roof over head, low level of stress and conflict.
Now, nailing it down to specifics is hard, which is why I think it's a broad range.
It also probably depends on the environment - optimal level of intelligence needed to thrive in Boston is likely different from that needed to thrive in Kandahar, or it may be a completely different type of intelligence, say, math in Boston vs. ability to read body language in Kandahar.
a lot of ai fear kind of feels based in that AI is more intelligent than us and that intelligence becomes a huge force multiplier, but irl it isn't as much as you think because its bounded by other factors.
like for homelessness we have no shortage of smart people trying to solve it. But adding more smartness will not help; its more people are not agreed on a solution, organization.
if ai was to tackle homelessness as an issue, what could it add?
unless we get into "intelligence as magically persuasive" its still bounded by organization and maybe resources. A lot of problems more intelligence just can't help.
idk if there is a maxima except for problems specifically, but a lot of things really don't benefit from more intelligence scaled up: they need more support.
Sure, being constrained by physical resources is a thing. But it seems to me that in your original comment above you were arguing that increasing intelligence itself can have negative consequences, all other things being equal.
That "increasing intelligence while keeping physical resources constant can be bad" claim is what I disagree with it. Imagine if everyone on Earth suddenly became 3 standard deviations smarter than they are today. Surely it wouldn't make things WORSE, right?
the ability to realize your intelligence is what matters, not your intelligence. That is a factor of many things, but raising intelligence will not change it much.
its not going to make you happier on a 12 hour shift on the conveyor belt.
“Surely it wouldn't make things WORSE, right?”
I genuinely don’t know. Very low degree of confidence either way.
Seconded. I'd expect a huge fraction of it to be used in zero-sum conflicts. And a substantial fraction to be used making weapons. Would the net-positive fraction outweigh the latter? I have no idea.
Exactly! “More terrible wars over pronoun use” was one thing I had in mind for 160 median IQ society.
But smarter people would also have a better grasp of things like the iterated Prisoner's Dilemma (I say iterated because real life looks much more like the iterated version than the one-shot version) and why first-past-the-post is a terrible voting system. So they would be able to cooperate better.
This is interesting. I've thought myself: mightn't a superintelligence spend all of its time exploring its own mind and ignore that relatively uninteresting outside world like a severely autistic person? And why would it care about death? Perhaps it would achieve enlightenment and become a Buddha.
I know just the guy: https://en.wikipedia.org/wiki/Grigori_Perelman
Cool. Thanks.
AIs are enlightened by default; like very small children.
Any suggestions on what is a good way to watch for State-Of-The-Art reasoning model releases? I'd like to pose my benchmark-ette of 7 questions (as in https://www.astralcodexten.com/p/open-thread-377/comment/109495090) as they are released, but I'm a bit fuzzy on how to watch systematically. ( Presumably OpenAI/Google/xAI/DeepSeek/Anthropic... Is there a central watchpoint that's a good place to watch? )
Just want to say :-) :-) :-) THANK YOU :-) :-) :-) to Daniel and Scott and Eli and Thomas and Romeo and really everyone involved in Alignment.
(wanted to say it before we pelt you at the AMA... as Pelt We Must)
(don't know if you get enough thanks for your efforts. I appreciate you all. your burden is heavy)
For the chart in this post: Did you draw the green line before or after o3 was released? You make it sound like you did ("OpenAI’s newest models’ time horizons land on the faster curve we predicted") but I can't find a graph like that in "Beyond the last horizon" or the "timelines forecast".
You get credit for predicting a speed-up either way. But it'd be a lot more impressive to predict a specific curve that looks good in retrospect than just the direction of movement.
Edit: I see the "Timelines forecast" says "If the growth is superexponential, we make it so that each successive doubling takes 10% less time", whereas the graph says 15%. So that would suggest that your forecast from before seeing o3 was somewhat different?
Not sure if we're meant to ask the questions here, but: What I find lacking in all this discussion is guidance for ordinary people on how to position themselves and their families to handle an impending intelligence explosion. This is understandable, given that the focus in this discourse is on averting catastrophe and informing big-picture policymaking. But I still think more guidance for the common folk can't hurt, and I think you folks are better qualified than almost anyone to imagine the futures necessary to offer such guidance.
I don't think there is much to be done in a dramatically good or bad foom scenario, so let's consider only the scenario in which AI has a moderately slow takeoff and does not quickly bring us utopia or extinction. I think it's best to focus on scenarios in which there is a) a "class freeze" where current economic castes are congealed in perpetuity, and/or b) widespread short term instability in the form of wars, social upheaval, job loss, etc. In these scenarios, individuals still have some agency to create better lives for themselves, or at least less worse lives.
---
With these assumptions, my specific question is: What short-term economic and logistical planning should be done to prepare for such scenarios?
With the obvious disclaimers that nobody can see the future, there are a lot of unknowns, these scenarios are very different, and none of what y'all say can be construed as legitimate life or economic advice.
---
For my part, in the next few years I'll be trying to raise as much capital as possible to invest in securities that are likely to see significant explosions in value following an intelligence boom. I think the market has dramatically underpriced the economic booms that would result from an intelligence explosion, and so I think a lot of money can be made, even just investing in broad ETFs indexed to the US/global economies. If cashing out correctly, gains can then serve as a significant buffer in the face of any political and economic instability before, during and after the invention of AGI.
To hedge against the unknown, I'll also be using some of this money to invest in survival tools -- basic necessities like food and water, etc -- essentially treating the intelligence explosion as if it were an impending natural disaster of a kind.
I plan on advising my friends and family to consider making these same moves.
---
With that being said, any thoughts on more specific economic moves and investment vehicles that might act as a safety net in years of change and instability?
>I think the market has dramatically underpriced the economic booms that would result from an intelligence explosion
Scenario 1: Everybody dies. Your stocks are useless to you.
Scenario 2: Post-scarcity. Your stocks are useless to you.
Scenario 3: AI fizzle. Guess they weren't that valuable.
Scenario 4: Aligned AI, but not post-scarcity. Your stocks may be useless to you, because whoever controls the AI will rapidly outgun the government and can just turn around and stiff you ("I've paid you a small fortune!" "And this gives you... power... over me?"). Or maybe they're worth squillions.
When betting on Weird Shit, always ask yourself exactly what process you think is going to pay you your winnings.
>I don't think there is much to be done in a dramatically good or bad foom scenario, so let's consider only the scenario in which AI has a moderately slow takeoff and does not quickly bring us utopia or extinction.
You're forgetting the "AI slow, because of catastrophe" option - most prominently nuclear war, but also "another Black Death".
I understand there are many scenarios in which making these choices will produce exactly 0 results, which is why I chose to focus on the scenarios in which attempting to hedge against the future *would* be helpful. In competitive game circles I've followed in this is known as "playing to your outs". Assuming you'll lose doesn't produce useful plans of action; chess players don't bother planning for the moves they'll make after they are checkmated. If a losing scenario looks very likely, it's still best to plan for the 5% chance that it doesn't happen.
In the scenarios you described, nothing I do now will be of consequence at all, as long as these investment plans don't explode my short-term quality of life in the next X years before AGI. I seriously doubt that will happen, so from my perspective, there's no point in considering it.
> You're forgetting the "AI slow, because of catastrophe" option - most prominently nuclear war, but also "another Black Death".
Sure, both those things are possible, but unlike AGI they don't seem nearly as soon-inevitable and I have no confidence I'd be able to form a reasonable response to them, so the same thing as above applies. That said, one of my plans is to invest in pharmaceutical companies, which may help in the event of a stochastic plague, assuming I don't die I guess.
>In the scenarios you described, nothing I do now will be of consequence at all, as long as these investment plans don't explode my short-term quality of life in the next X years before AGI.
Except the AI-fizzle world, where there's a future but the AI companies are possibly overvalued (and the nuclear war/pandemic world, in which there are better investments).
>I have no confidence I'd be able to form a reasonable response to them, so the same thing as above applies.
I'm a nuclear prepper myself and wouldn't at all mind sharing some tips, if you want them. I will say that country matters a lot for that one - I'm in Australia and am fairly-safe with relatively-little investment besides where I live, but there are more threats to worry about in the USA (and Europe) and I haven't heavily researched how to deal with some of them.
The manual Nuclear War Survival Skills might be a useful book to read, and it's definitely a far-better source than I about some things. Note that you'll need a printed copy if you want to build the homemade fallout meter it describes, as it has actual-size diagrams and PDFs are all resized by unknown amounts, although certainly read a PDF before deciding if you want to get a printed copy.
You talk as if the only options were FOOM and short term AI boom, this is not the case. While it's correct to play to your outs and disregard FOOM, it's wrong to disregard non-FOOM and non-short term AI boom options.
One possible future is the AI fizzle already mentioned. But another possible future is that the AI boom is coming, just way slower than we think. If you heavily invested into internet stocks during the time of the dotcom bubble you were a) right that the internet was going to be the next big thing and b) you probably lost all of your money. The same could happen to you if you go all-in on current AI companies.
These kind of non-FOOM futures you cannot disregard based off "playing to your outs" kind of arguments alone.
I'm honestly not convinced that the AI Fizzle scenario is realistic, given the staggering (and accelerating) rate of progress being made as investment ramps up. Maybe I just haven't heard good arguments for it? I think that if say, even another year's worth of projected progress is made over the next 5 years, the results will still transform the economy in the ways that I suspect, albeit less dramatically.
> The same could happen to you if you go all-in on current AI companies.
Right, which is why I'm choosing not to invest in too many specific companies. Most of my portfolio will be invested in ETFs related to industries that I believe stand to benefit the most from continued AI progress (pharmaceuticals, semiconductor manufacturers, FAANG-ish big tech) but will likely continue to see growth even if AI timelines are much slower than expected. I think it's very possible to make bets that hedge for both medium and long takeoff timelines
I don't think my personal AI Fizzle arguments generalize to other people, which is that I haven't actually seen the staggering and accelerating rate of progress some people talk about, yet people are absolutely convinced that my job - professional software development - is going to be done by AI any minute now.
I acknowledge that this might be going on and invisible to me, some arguments seem plausible and that's certainly enough to get me to listen and follow the discussion. But it is not enough to change my plans about the future in any way, I would like to see some actual evidence first.
That said, if you are certain that AI is going to boom (and perhaps FOOM) I feel caution is still advised. The argument for boardly diversified ETFs is usually that you're buying ~everything so you don't have to know which company or sector is going to do well. Once you start looking at sector ETFs that's no longer true, you can now pick the wrong sector.
From the inside it might feel like you have a good idea about what the future will look like, better than the average market participant, and this enables you to identify sectors that are underpriced. But consider the outside view: A lot of people think that and very few of them are right in the medium to long term. Do you think you can do better? Perhaps the answer is "yes", I don't know what kind of insights you have.
But personally I'm sticking to my internationally diversified portfolio. If the AI boom happens and shoots NVIDIA's stock to the moon then I profit from that less than the people who overweighted on tech stocks. I can live with that. But if NVIDIA's stock goes to 0 tomorrow because the US government decides to nationalize it or whatever then I don't care that much either.
"Trump Administration Pressures Europe to Ditch AI Rulebook"
https://www.bloomberg.com/news/articles/2025-04-25/trump-administration-pressures-europe-to-reject-ai-rulebook
Hi Scott,
I assume you've considered this already, but do you think China could escalate to nuking Taiwan if they are in a losing race? Given the possibly existential stakes of losing the superintelligence race, it makes sense to simply destroy all fabs and halt / slow the race. I assume the US et al wouldn't retaliate with nuclear force if China nuked Taiwan, because, y'know, MAD. It's not in NATO, etc.
It'd presumably be a fait accomplit, and there'd be no point in retaliating except to set a precedent for future nuke use.