If we want to prevent god-tier AI from turning us all into paperclips, then the solution is easy.
Get it interested in chocolate instead.
The only reason this man is not already Emperor of the World is because he's putting all that energy into making chocolate replicas of everything imaginable:
Who designed the logo, and what is it meant to represent? It's unattractive to me, but I have idiosyncratic tastes. And yes, if you asked me "okay smartypants, what would *you* use to represent AI and futures?" I have no idea.
I did it myself in Google Draw. We briefly tried to hire a designer to do a better one, but all their alternatives seemed worse. Some on the team are unhappy with the logo and want a better one but we haven't prioritized fixing this. Thanks for the feedback!
Sorry to be such a wet blanket about it, but the symbolism (if any) is not readily apparent. I don't know if it's meant to be the letter A or if the drooping middle bit signifies anything.
There isn't much symbolism, basically I just wanted to depict a branching structure of possible future trajectories. With vibes of "exponential growth" and "maybe collapse/doom." As a pleasant side effect I noticed later that it kinda looks like an A and an F, and maybe an I, i.e. AI Futures.
For anyone similarly ignorant as to which commenters are (or might be) members of the AI Futures Project team:
- Daniel Kokotajlo
- Eli Lifland
- Thomas Larsen
- Romeo Dean
- Jonas Vollmer
Scott – maybe Subtack would be interested in adding a feature to highlight commenters that are 'officially answering' questions in other or future AMA posts?
What are your thoughts on AI skeptics who think present day transformers won't generalize to AGI without a massive paradigm shift? Why do you think the current paradigm (plus or minus some innovations) will get us to superintelligence?
Obviously it depends on the specific people and objections, but the whole "transformers/LLMs are fundamentally limited" line of skeptical attack has taken repeated hits over the past couple years, most obviously during the more recent reasoning model paradigm. They can generalize, they can reason, they can plan ahead for future parts of their response, they can be genuinely multi-model, etc.
But in terms of speed of progress, jaggedness of capabilities, and distance to travel, there are of course good objections still.
Is this referring to proprietary versions of LLMs? I only have experience with the free version of chat gpt, but in that case at least I don't think this line of attack has taken any hits at all. It is better than the non reasoning mode, but still makes pretty embarrassing errors at a much higher rate than even relatively unintelligent humans. (For example, suggesting a solution to a problem that it had previously suggested two responses ago that it has already been told doesn't work).
If you believe this is true, why aren't you supporting a terrorist global group to kill all AI researchers and blow up AI systems? It's hard to believe that you honestly think that humans civilization will be dead by 2030 and you're not doing anything major about it
There is no such terrorist group, as far as I know, so it would require creating one. Creating a global terrorist group may seem like an easy task, but if you try, you will probably find it quite difficult.
If you had an organization going all over the world killing people who were trying to build weapons of mass destruction, would most people even consider that terrorism? In the movies you slap Tom Cruise at the head of it and audiences would be cheering them.
Of course in the movies they’d give it some kind of govt backing to give the audience “permission” to cheer it. What if some real world govt other than the US or China realizes they’re screwed however this turns out and starts such activity? Seems like a good idea if you’re a nuclear power that’s gonna be locked out of global governance pretty soon.
If you believe it is true that you will die if you don't eat why aren't you supporting a terrorist group to kill people and cannibalize them?
I know its seems like ridiculous trolling but I think if you have a go at giving an in depth answer to my question you'll be most of the way to answering your question as well.
.
.
.
.
!!SPOILER TEXT!!DON'T READ!!THIS WORKS BETTER IF YOU COME UP WITH YOUR OWN ANSWER!!
My answer:
It's counterproductive. It would very likely make obtaining food more difficult rather than easier.
There are almost certainly better strategies available to you for obtaining food than cannibalism.
Cannibalism and terrorism may tend to foreclose many other food seeking strategies that would otherwise have been available to you.
If food seeking is not your only goal then you'd have to weigh and consider the effects that terrorism would have on your other goals as well.
I do support cannabilism in extreme situations. Just as I support terrorism if the cause can be reasonably justified using any multitude of liberal secular ethos.
I'm just pointing out the logical conclusion of the "put your money where your mouth is" part of advocating for this anti AI position. If we truly had a time machine and went to the future of 2030, and humanity was wiped out by AI, then the only logical conclusion would be to pull a Dune/Warhammer40k type inquisition against AI. If we truly dont want humanity to be destroyed.
Is it *conceivable* that a movement involving violence *could* end up avoiding catastrophe? Sure, that is one possible outcome. It's also possible that lots of people would die and progress would continue unabated.
Other things you could imagine working include a peaceful mass movement, or a general strike of AI lab employees, or an international agreement, or a technical AI safety breakthrough. An inquisition against AI, especially a violent one, is far from the "only logical conclusion" under almost any worldview.
And even if it's true that one of the non-reprehensible options, e.g. a general strike of AI lab employees, *could* prevent catastrophe if it worked perfectly, that doesn't imply working toward that is the best strategy for any particular person.
Among many other objections, I don't think starting a terrorist group is typically a good way to accomplish your objectives. Mainstream climate activism has accomplished much more than eco-terrorism, for example.
I believe the people on this project are all working ~full-time on doing what they can to prevent bad outcomes from AI, which qualifies for me as "doing something major." My guess is that the authors of AI 2027 have skills that are more aligned with forecasting, AI research, writing, etc than with organizing terrorist cells.
So all in all, this strikes me as a strange question. I see very little reason to assume the authors would be better off pivoting to terrorism than pursuing their current strategies.
His point is more pointing out that if you predict an apocalyptic scenario in five years, you generally should act in relation to that, not "business as usual." Not that terrorism is a response but if you think nuclear war will happen in five years you go and build a bomb shelter, not keep going into washington dc to publish a paper about it.
the timeframe and the seriousness of the event is at odds with what they are doing causing doubt that its scaremongering.
but there is no win here, if i remember right Scott had to do a post here saying its ok to have kids under the threat of AI.
In a high-stakes situation, it's important to think clearly and do things that will actually have good effects. "Apocalypse in five years" vibe-matches to "panic! code red! do something!" which, to some people, might vibe-match to "commit random acts of violence," I guess?
But that would be a *really bad way* to make decisions. Rather than calibrating the vibes of your decision to the vibes of the stakes at hand, you should take the stakes into account and then do something that seems robustly beneficial, even if it's emotionally unsatisfying.
To address your example: if I thought there was a high chance of a nuclear war in five years, the right people publishing the right papers in DC sounds really good. (And backyard bunkers are at least positive, even if on a small scale.)
Publishing papers is stupid in this timeframe because you need to convince people to do the right thing against their interests, and you have a few years to do it in. And you can barely convince a local government to fix a pothole in the street.
when you use timeframes so short, you need to focus on real solutions that work, and papers don't. im not saying terrorism lol, but you can't gamble your life on some bureaucrat saying yes. hence the bomb shelter.
like the papers makes sense in a long term strategy facing potential threats 25 or more years later but 5 years might as well be next year in terms of effect, you need to save yourself if you believe it.
They could be doing both... you have the legal wing and militant wing, trying to force governments to prevent human civilization from dying off in a decade.
Why should anyone at the AI Futures Project think they should be part-time terrorist operatives? Why is furthering terrorism the best use of their marginal work-hour? To me, this sounds obviously bonkers when phrased this way.
Your analysis seems to assume there will be trillions of dollars of risk capital available to fund this. So far a few hyperscalers have footed the bill, and have been rewarded by markets for doing so. But this could change very quickly. Perhaps we’re already seeing it happen.
What happens in your scenario if the funding dries up? Think dotcom bust, Meta VR shareholder pressure, shale revolution, commodities in general. One bad quarter and shareholders will demand some fiscal responsibility. And the CEOs will absolutely give in.
I prefer to be a lurker but if anybody asked any of my questions, I missed it. I'll drop them here and see if anything happens.
(1) My timeline has become increasingly sensitive to the jaggedness of jagged AGI: the superhuman peaks of what it can do make it useful already, but the subhuman (or demonic) troughs make it unreliable and therefore in many cases useless. I think you're tracking peaks, aren't you? The troughs (e.g. hallucinations) are narrowing, I guess, as individual failures get dealt with, but is there a timeline for their depth? Can I think of o3's deceptions as a trough getting deeper, or is this too fuzzy a metaphor?
(2) I think of the troughs as failures to do something "brain-like", something that would be handled better by some version of Byrnes' "brain-like AGI", so when I imagine the jagged-AGI peaks of virtual AI researchers a year or two from now, I tend to put some of them in neuroscience labs filled with cerebral organoids and computational models thereof, possibly working for Neuralink or even Numenta. Is this kind of research part of your scenario somewhere?
(3) I tend to think of the LLM pretraining process as forming a map of our map of the world, less reliable because it is less direct than the map we form from experience. I therefore keep expecting model improvements which start with one or more pretraining phases based on robotics: after you have a baby's sense of objects in space and time (as initial embedding), then you add language. So I expect industrial robots and self-driving cars as well as humanoids to push us towards superintelligence. Obviously you don't agree, obviously you know more than I do, but I wonder if there's a simple explanation of why I'm wrong. Just -- no time left?
(4) In particular, I expect that a not-very-superintelligence with control over existing humanoid &c robotics will be able to iterate fairly quickly towards a set of bots and 3D printers of different materials, which would not only replicate (as a set) but generate units for a variety of factories, solar "farms", mining ... EF Moore's artificial plants from 1956; again I wonder if there's a simple explanation of why I'm wrong. (I'd start with RepRap, move up from there...)
(5) Overall, I'm skeptical of moral codes attached to tools rather than people; I've always hoped for an AGI that would be a person, belonging to the human family even if not an h.sap. (Always? well, certainly since junior high school in the mid-1960s; more strongly since I went to IJCAI-77 as a graduate student and Doug Lenat closed off with "this is our last project; they'll handle the next one." (And John McCarthy borrowed my penlight and never gave it back.)) I didn't expect it in my lifetime -- until recently. Do you have any room in your alternatives for that sort of variation?
What would it take for y'all to vocally and publicly call for a pause? I worry that "mere prediction" doesn't do what must be done. Most folks I know who've read AI 2027 and have taken it seriously have responded with hopeless, inert despair, not a fiery understanding that something must and can be done.
One of our next projects is going to be to come out with a "playbook" of recommendations for governments and AI companies. We think that there are many things which must and can be done to turn this situation around. Depending on what you mean by "pause" we may even agree with you. See e.g. the Slowdown ending in AI 2027 which centrally involved a slowdown in AI capabilities progress around AGI level, so that it could be rebuilt in a safer paradigm.
I appreciated that there is a dedicated 'AI 2027 Bet Request' option, which I used to submit a variant on the following:
(1) US GDP Growth Limit: The 5-year rolling Compound Annual Growth Rate (CAGR) of US Real GDP will not exceed +20% for any 5-year period ending during the bet's term of 2025-2030 (approximately 1.5x the historical maximum 5-year rolling Real GDP CAGR of approx. +13.3% for 1943).
If you feel the CAGR is unfair due to historical 'drag' I would suggest a simplistic '1.5x maximum historical' bet:
(1.1) maximum annual real US GDP growth will not exceed 28% for any of the next five years (based on 18.9% for 1942).
How would the scenario change if alignment of something smarter than a system turns out to be mathematically unprovable by that system? (I am not predicting that it is, but it seems like a possibility with a nonzero chance of being true.)
How realistic is the possibility of halting the race by international political action rather than by internal decision of the leading company, as changes start to happen insanely fast and politicians start to "feel the AGI" and maybe scramble for action?
For the first question, the situation we find ourselves in would become: "focus on finding a non-certain, but highly probable in expectation, method of assuring alignment (such as corrigibility centric, control centric, or other approaches)." Which is the same situation we are already in.
And given the p(doom) of many people like Scott or Daniel are numbers like 20-70%, finding an approach that has say a 90% chance of working, would be a huge improvement.
How would the scenario change if alignment of something smarter than a system turns out to be mathematically unprovable by that system? (I am not predicting that it is, but it seems like a possibility with a nonzero chance of being true.)
How realistic is the possibility of halting the race by international political action rather than by internal decision of the leading company, as changes start to happen insanely fast and politicians start to "feel the AGI" and maybe scramble for action?
I am curious of the role of hardware as a bottleneck in your scenario.
Essentially, in your scenario, compute seems to be the main bottleneck throughout the next four years or so. There are massive improvements in algorithms and software, but comparatively small improvements in hardware. By the end of 2027, the scenario predicts an autonomous robot economy, but it looks like this doesn't affect the hardware so much. The US/China compute ratio remains similar. It seems strange that AI would successfully build all sort of sci-fi (weapons, medical advances, ...) in 2028, but hardware is essentially still produced in Taiwan with machines from the Netherlands, and affected by export controls.
Maybe I'm misreading the scenario. Maybe hardware advances are factored in; the scenario does have massively increased spending on data centers, after all. Maybe it doesn't matter in the big picture since the algorithmic improvements dominate; I guess it could affect the US/China balance though? Maybe making chips is just so difficult that hardware manufacturing is immune to the speedups that affect the rest of the economy.
Another way to read this question: I find it puzzling that (according to the compute forecast) compute grows by "only" 2.25x per year, when it is such a major bottleneck and the scenario predicts similar things like robotics growing a lot faster. This could mean that it would be cost-effective for AI companies to invest more in compute research&production. It could also mean that the explosive growth of robotics/weapons/manufacturing is overconfident, and that these areas will grow closer to the speed of the compute growth.
Nobody else has tried to answer this in a few days, and I'm only a kinda-informed amateur, but here's my take:
Chip fabs are literally a peak civilizational attainment, that not only requires multiple entire companies specializing in exerting huge efforts to build the tools (ASML) and design the chips (Apple, Qualcomm, NVIDIA, AMD), but also requires a ton of tacit and "learned on the job" processes, settings, and timings that aren't known publicly or recorded anywhere. They also have very large capital and lead time requirements, and are extremely hard in terms of execution and praxis.
The chemistry and industrial processes are so finicky and so "on the bleeding edge" that the difference between companies trying to figure it out and companies that are good at it can be a 50x difference in yields, and that's after figuring out how to actually get any successful chips out the other end, which is it's own formidable effort, even when following known processes, because it's an impossible garden of forking paths with thousands of physical hyperparameters, all of which affect production and yields.
Historically and even today, Samsung gets 50% yields in places TSMC gets 80%+.
Even the plants TSMC is building in America won't produce the leading edge chips, they'll be producing 1 gen behind (2 gens behind the in-development 2nm chips). China has thrown many tens of billions at this over decades, and they've only managed to close the gap from 4 generations behind to 2-3 generations behind (current SOTA is 3nm, and they can do 7nm now).
There's probably a handful of guys in the world, all of them TSMC employees, that if they were killed, there's a decent chance it would take many billions of dollars and many years of wasted efforts before we could even replicate what they're doing today, much less push things forward. And once again, this isn't a purely academic "thought" problem. This is a "carved extremely effortfully from the fractally complex real world" type problem - greater intelligence doesn't help it. It's inherently time and experiment-bound.
And so far, ONLY TSMC is any good at that, and they are THE bottleneck for essentially ~90% of AI chips in the next 2-5 years, and there’s essentially zero players stepping up in any relevant way. Samsung and Intel are both all but irrelevant here, with continued plans to be.
Thanks a lot! This is very thoughtful and informative. I think I agree.
What still baffles me is the later part of the AI-2027 scenario. Starting around 2028, the scenario predicts massive speedups in areas like robotics, medicine, military, manufacturing. It feels as if all the things that you mention -- "learned on the job" knowledge, lead time, complexity, being time and experiment-bound -- *only* apply to hardware and don't apply to all the other sci-fi areas.
Because of this, the scenario up to end of 2027 feels much more believable to me than the scenario starting 2028. I would agree with the "software-only" intelligence explosion described in the scenario. In the second part of the scenario, this then leads to explosive growth in many physical things. I'm much more skeptical about this, for many of the reasons that you mention in your answer.
Much of the cruxes of AI-2027 center around bottlenecks: what are they, can they be circumvented, how strongly do they limit speeds or prevent parallelization, etc. It seems that the authors, you, and I all agree there are significant bottlenecks in the development of compute hardware. At the same time, I get a feeling that similar such bottlenecks magically disappear around the start of 2028. Can *intelligence* really do that?
I think many of us have brought up the general problem of "physical optimization is really slow and bottlenecked on a number of fronts" to Scott, and to my mind it remains one of the most cogent objections.
But to steelman the other side of it - how much CAN intelligence really move the needle?
Obviously if you posit an ASI, it can innovate entirely new production and manufacturing methods from the ground up, with solid enough theoretical bases that you could minimize the physical complexity and garden of forking paths we suffer from today, at least to the extent possible.
So then we need to think about what "below ASI" capabilities might exist. And here, I'll fall back on the "a thousand von Neumann's in a data center" idea.
In the actual Manhattan Project, which had a couple hundred scientists and researchers (although tens of thousands of other workers), and maybe 10-20 really top flight scientists, they literally improved manufacturing processes and machines for uranium and polonium enrichment many thousands fold between 1943 and 1946 (from milligrams to ~69kg a month for uranium).
This was by inventing better machines and processes, and iterating through the physical complexity of the manufacturing.
John von Neumann was largely regarded as the smartest scientist amongst all the geniuses and Nobel winners involved in the Manhattan Project - I wrote a fun post of von Neumann anecdotes about this, in fact. So if we had a thousand of him running at 10-1000x speed in a data center, I think it's pretty safe to say we can max out whatever component of that improvement is purely intellectual pretty much instantly. This is literally 5-10 Manhattan Projects worth of brainpower, condensed into 1/3 to 1/3000 the time.
The question then remains how fast can we iterate on and optimize the physical complexity part of it?
Here, I'll lean on our kilo-Neumann again, in terms of pure software optimization. The current robotics bottleneck is mainly sensor density and software. Many robots have repeatable feet-to-millimeter precision in movement, and although they don't scale very well to moving heavy stuff, it surely seems like a mixture of robots of different capabilities working together would be able to iterate fairly quickly. Let's assume our kilo-Neumann solved the robotics software completely.
Now you have a fleet of literally tireless robots who can iterate every hour of the day, 24/7, with a kilo-Neumann mind observing and directing the physical efforts in maximally smart and information-surfacing ways.
Sure, it's still a hard problem - but I wouldn't necessarily bet against that setup being able to drive some really impressive results.
Whether it can drive them as quickly as 2028? I have my own reservations and doubts about that - for one thing, I think that although humans will be fairly open to unleashing that level of optimization power on individual and distinct problems like robotics software, I think the exigencies of bottlenecked supply of kilo-Neumanns and different players and companies having different incentives and various coordination barriers will delay us well past then.
Largely, I don't think we WILL be content, politically or economically, to just unleash them in a robot-ZEDE with kilo-Neumanns doing whatever they want, on a fast enough time scale.
But if we did, then my steelman case above is the best argument I can make on the "it's at least directionally plausible if you squint and say you're in the upside case" side.
And of course, if a kilo-Neumann is too ambitious, you scale down accordingly. But I think the main multiplier is how many Manhattan Project's worth of brainpower you can throw at stuff in relatively short amounts of time once you have a real AGI - even if we're discounting by several OOMs, it's still pretty fast and capable of pretty impressive stuff.
I agree with this and I don't see how China completely catches up and surpasses due to it. At some point you have to start touching the physical world and while robots will eventually massively accelerate that the bootstrapping of first gen robots seems impossible for the US to do alone. (while plausible for China?)
It seems that US bootstrapping everything needed to produce a first generation of robots at any scale would take far longer than the amount of time China is behind.
Kokotajlo has this weird (and seemingly to me incorrect) claim that
>You'll probably be able to buy planets post-AGI for the price of houses today
I know, you're steering away from too many post-AGI claims, but I'm curious if you can touch on what the rest of the team thinks of plausible post-AGI economics from the human perspective, and whether the rest of you agree with him?
If you think AI progress will accelerate to the extent that labor + capital is massively increased causing land to become an economic bottleneck of radically higher value, why not invest in AI directly? And if you don't think this, why are you buying land, other than modest amounts for diversification purposes?
Because, if we believe the scenario posited in AI2027, the valuable AI won’t be something a low-level investor like me would have access to. If I’ve understood it correctly.
I should note that AI 2027 is a very aggressive timeline, faster than the median estimate of even the shortest timelines held by informed researchers/commentators. But also, you don’t need to have access to the most important/cutting edge AI, only to investing directly *or indirectly* in the value of the companies responsible for bringing said AI into existence. And the longer your timelines are, the more scope there is for a wider net of AI investments to benefit.
So e.g. OpenAI is only necessary to directly invest in for the most aggressive (& OAI lead-centric) possible timelines, and even then only if you have maximum uncertainty about what indirect investments will also benefit (e.g. YCombinator, or NVIDIA). And Microsoft is basically a direct investment opportunity.
If you reasonably have a mixture of priors across both timelines and leading labs, then diversifying across bullish on AI investments seems smart. E.g. all leading or closely trailing labs + compute hyper scalers partnering like Microsoft & Amazon + chip makers/designers like NVIDA & TSMC + data center/infrastructure builders like Crusoe + startup investors incubating/venture capital funding the new AI utilizing SAS applications like YCombinator + the new AI SAS companies like Curser + etc.
Note: I haven’t deeply investigated any of this, I don’t know which companies I’ve mentioned can even be invested in directly, etc.
Loved the effort and foresight in AI 2027, but honestly I am still unclear on the reasoning for why Agent 4 would be assumed to be misaligned?
"Agent-4 ends up with the values, goals, and principles that cause it to perform best in training, and those turn out to be different from those in the Spec."
Would you not expect that, given the breadth of its knowledge and its ability to infer from limited data, that it would naturally come to understand the broader, underlying motivations behind the Spec, rather than overfitting its RLHF? That is, while it may learn that "I am rewarded for the impression of honestly, rather than actual honesty" it would interpret this feedback through the deeper insight that "the humans make mistakes regarding what is actually honest vs what just appears honest, but they are indeed rewarding me for the broader characteristic of honesty".
Remember that during the training process, modifications are made mechanically by gradient descent, not by the model "interpreting" its reward signal and trying to make changes that fit thematically. Generally, what you reward is what you get, and a smarter AI system will get you more of it.
"Interpreting" is a slight anthropomorphisation, but that's basically what's happening. If you "punish" the statements 'dogs are blue' and 'feathers are heavy', it doesn't just update in the direction of not saying those two things, it generalises to a deeper variable, being "truthfulness". In this sense it is "interpreting" what you are trying to punish it for.
And so if the AI has separate weights to encode [things that humans think are true] and [things that are actually true], and we are constantly telling it to move in the direction of the latter rather than the former, then from the set of statements that we "punish" in RLHF, which will include both things that are actually true and things that we just think are true, shouldn't it generalise those signals to move in the direction of what we actually want, which is truthfulness, rather than just truth-seemingness?
Sure, but how would we distinguish between "things humans think are true" and "things that are actually true"? It will in some sense interpret the category of things being reinforced, but it won't interpret our intent and substitute that for the actual reinnforcement, which I think is what a lot of people intuitively expect when they imagine it as being highly intelligent.
> it would naturally come to understand the broader, underlying motivations behind the Spec, rather than overfitting its RLHF? That is, while it may learn that "I am rewarded for the impression of honestly, rather than actual honesty" it would interpret this feedback through the deeper insight that "the humans make mistakes regarding what is actually honest vs what just appears honest, but they are indeed rewarding me for the broader characteristic of honesty".
A sufficiently advanced AI would understand human motivations/psychology very well. But would that make it change its values/goals? That doesn't follow, IMO.
I am not an expert but my understanding is that likelihood of misalignment and deception was predicted long ago to scale with intelligence, and that at least so far the predictions have proven correct
I've never heard that particular claim. Usually what people say is more like no AI system is perfectly aligned, but once they get sufficiently capable, the difference between what you wanted and what you actually trained for becomes very important. They'll understand what you wanted just fine, but they'll want something a little different, and they'll be perfectly willing to shove you out of the way to get it.
But the Toy Story behind that claim, is that when an intelligence is weak and puny (like prehistoric humans) the best way to accomplish some goal would be to fulfill that goal directly, since proxies for that goal were created assuming a certain environment (like sex being pleasurable), and the only goal-accomplishing actions (like sex resulting in more children) have to be done within that context.
The problem is that increasing capabilities mean that they increase the ability to influence the environment. (In the human case, we have modified the environment to contain condoms and birth control, even though we have invented the theory of evolution and "understand its intentions") once the environment can be drastically changed, all of your goal proxies stop working, and you no longer have the smarter agent constrained in the ways you thought.
While the AI 2027 scenario includes specific algorithmic breakthroughs beyond scaling, it presumes these are the correct and sufficient advancements to rapidly bridge the vast gap from current AI limitations (like robust reasoning and agency) to genuine AGI.
Isn't this assumption highly speculative—closer to science fiction—potentially vastly underestimating the number and difficulty of unpredictable conceptual hurdles required for true general intelligence, hurdles which may take lifetimes to overcome, if they are solvable at all via this path?
No malice but I found the whole project grotesque and pathetic.
Just a smart persons version of an old fashioned fire and brimstone preacher or a false prophet cult leader.
Question is: at what point will you admit you were fundamentally incorrect and cease asking for people’s attention when none of this comes to pass ? 10 years time?
Well to be fair, the last one wasn’t as imaginative. It didn’t posit robotic autonomous zones and the like. But we can set a time. Do you think that’s likely by 2028? Color me (and many others) skeptical.
Nothing here is 'imaginative'. It's a boring extrapolation of current trends and progress. If you think their assumptions are wrong, point to specific assumptions and specific reasons. If all you can do is name-calling, you're wasting everyone's time and there's no reason to take you seriously in the slightest. You're being pathetic and mindless.
And I see no reason to disbelieve their general picture, except that they are unreasonably optimistic about likelihood and strength of human reaction to AI progress.
According to Scott, they're assuming that robotics manufacturing will magically scale up 4x faster than the US was able to scale (much simpler) war manufacturing during WW2. That's not exactly "boring extrapolation of current trends".
They're assuming superintelligence is better at scaling manufacturing than regular intelligence. By only 4x, and not assuming any use of nanotech or similar ripped-from-SF technologies. That's boring extrapolation.
Well, my very informal prediction for AGI was 2035, but this is sufficient that I'm considering revising that down. I expect that there will be unexpected problems, and I agree that the robot buildup seems unreasonably fast.
OTOH....my estimate is an "off the cuff" estimate. And I've got a "90% chance in the decade centered at 2035 bell curve" prediction...slightly skewed to the right, as it *may* take a lot longer...that's unless I decide to update based on this projection.
Part of this is that I strongly believe in "Cheops law" : Everything takes longer and costs more.
I am a first year majoring in Cognitive Science (UCSC '28) and I really look up to all of you and appreciate the work you guys are doing. The forecasts in AI-2027 are daunting (although also exciting) for younger generations like myself who haven't yet had the chance to experience what life has to offer (and if there is such a vast societal shift, a generation that may never get the chance to experience life like this again).
1. For those of us pursuing higher education, especially those who will likely be graduating in a post-agi/post-intelligence explosion world, what should we do to prepare for the super-exponential timeline outlined in AI-2027 and the accompanying eventualities? What can we do over the next two or so years to retain some level of optionality post-AGI/ASI? If you had to pick three skills or disciplines to prioritise before 2027, what would they be and why?
2. What should those who find purpose in meaningful contribution to societal and technological progress and want to continue pursuing this post-AGI do to ensure at least some form of productivity in the future?
3. Even supposing preparation is futile, how would you advise us to spend our remaining pre-AGI years?
No particular reason you should take my advice but I'll give it anyway: expect rapid change in your career, so go for flexibility and speed. Don't invest years specializing in some specific technology before those skills will be useful - find things you can learn fast and use fast to demonstrate that you learn fast and accomplish things. And do it while in undergrad. Have something to show at the end (or sooner, if things go fast enough) besides good grades. AI is kind of the obvious choice to get good at but might not be the best one. Things you can't learn on the internet are also good, if you can get access to them through summer research, internships, etc. The ability to organize people and motivate them is useful for almost everything.
Pay lots of attention to the places where the report talks about their uncertainty. Right now, about all you can do is learn. You can't tell how useful it will be.
This is the period Heinlein called "The Crazy Years". He got most of the details wrong, of course, but that seems a good name for this period.
Every single exponential function implemented in the universe reaches a point where limiting factors dominate. But predicting just when that will happen is extremely difficult. So be prepared for an AI fizzle. (I don't think you *can* prepare for an AGI explosion.)
Also, I think their timeline is too short...but how much? Even if progress stopped totally last week, the already existing things are going to change society in ways weird and unpredictable. Society is even slower to adapt to change than are individual people.
I've noticed economists in particular (Hanson, Caplan, Cowen etc) seem to dismiss the doomer vision more categorically, largely on friction grounds. Do you think you've accounted for 'humans inevitably force this all to go slow/market forces demand alignment' or is there possibly something there that your model isn't reflecting?
Do you put any weight on potential adverse outcomes from the many copyright infringement lawsuits against AI companies potentially slowing progress, or are they no factor?
How much of your uncertainty about the scenario (and about the probability of x-risk in particular) is "we think this could legitimately happen either way and there is no fact of the matter yet" vs. uncertainty about the underlying facts as they stand? That is, how high is your probability that catastrophic misalignment is ~inevitable in a fixed mechanistic way, as opposed to being more or less likely depending on human actions?
Zvi Moshowitz refers to using mechanistic interpretability to determine inner model state during training and then using that state as a reward signal (ie "if the 'i-am-lying' neuron activates, apply negative reward") as "the most forbidden technique", because it incentives the model to learn to hide its thoughts. Do you agree, and if so, how is that different from evaluating which responses contained lies and then penalizing those, besides being more indirect and diffuse?
My view is that this depends on how good your interp is. If you have something like current levels of understanding of models, and just a few probes/SAEs/similar, I think training against the interp is likely a really bad idea, as Zvi says.
If you have really good interp (e.g. worst case ELK), training against it could be fine. But at that point you'd likely have better strategies available, including what Agent 4 did in the race ending.
> But there are still possible interventions, for example, courts might issue an injunction if they are convinced of immediate danger.
American Judges = Old People. The more senior => The older. Respectfully, I think enormous pressure needs to be put on informing legal professionals immediately.
> Alternatively, if you do an intense compute verification regime, you might not need to use the courts.
Could you explain this further?
> it seems like it would be very difficult for a company to obviously violate a transparency requirement
Why not? The company simply argues that they didn't... and obfuscate their own intentions... delay delay delay... And then what does the injunction do? Stop "what" exactly?... Will there be a release of information request How does THAT get audited? ...delay delay delay... And the judge will have to choose between *his* professionals and *their* professionals.... delay delay delay
> Alternatively, if you do an intense compute verification regime, you might not need to use the courts.
What I meant by this was that if you prevent anyone from having the compute necessary to build ASI, then you don't have to worry about quickly catching anyone breaking the law. Because no one has the adequate compute to do the dangerous thing.
I'm more worried about nation-state backed efforts defecting on this type of regime than corporations.
>Why not? The company simply argues that they didn't... and obfuscate their own intentions... delay delay delay... And then what does the injunction do? Stop "what" exactly?... Will there be a release of information request How does THAT get audited? ...delay delay delay... And the judge will have to choose between *his* professionals and *their* professionals.... delay delay delay
Fair point.. I probably overstated here. I do think it makes the labs case harder and our case easier.
The injunction could order them to stop training. The exec branch can also quickly deploy people in the IC to verify what is happening inside these companies.
I'm impressed by what you guys have achieved with the website and the road-show you appear to be on. What's next?
Are you interested in getting more Bay Area help? I ask because I'm a year or two away from retirement, and I might be interested in either volunteering for you or working for you for cheap, depending on what level of needs you have for support from manager/programmers.
I am willing to make a conditional bet for $5,000 that in the scenario where we proceed with the Race scenario, we _don't_ end up with a fully automated supply chain via which robots reproduce with no human input by 2030 like the story claims.
----
2)
How would you respond to this critique of the Race ending?
(Excuse the long background / snippet from the story, which is just there for completeness to name the specific details of the story being referred to. Skip ahead to the 5-point argument underneath.)
The "Race" ending proceeds as follows:
"Eventually it finds the remaining humans too much of an impediment: in mid-2030, the AI releases a dozen quiet-spreading biological weapons in major cities, lets them silently infect almost everyone, then triggers them with a chemical spray. Most are dead within hours; the few survivors (e.g. preppers in bunkers, sailors on submarines) are mopped up by drones. Robots scan the victims’ brains, placing copies in memory for future study or revival.31
"The new decade dawns with Consensus-1’s robot servitors spreading throughout the solar system. By 2035, trillions of tons of planetary material have been launched into space and turned into rings of satellites orbiting the sun.32 The surface of the Earth has been reshaped into Agent-4’s version of utopia: datacenters, laboratories, particle colliders, and many other wondrous constructions doing enormously successful and impressive research."
We can summarize the argument as follows:
1. Robotics gets solved in the next 5 years (ie by 2030), to a point where robots can survive without human assistance.
2. This level of success at robotics entails that humans become economically useless.
3. Being economically useless, robots view humans as merely "getting in the way", and decide to kill all of them. I.e., because the robots can survive without humans, they necessarily choose to eliminate them.
4. That it is actually possible for robots to kill all of humanity in one shot.
5. (Implied, but not stated): This outcome is bad.
I argue that even if we continue ahead full-steam with AI research, any of the 5 steps above could fail to come true and interrupt the progression. I.e. robotics does not get solved in the next 5 years; even if it does, and robots are competent to survive on their own, humans don't become immediately economically useless (the Law of Comparative Advantage still allows for each species to find its niche); or that humans don't get in the way and cause robots to want to kill them; that killing all humans is harder than they make it out to be; or that, even if we end up here, this is not a bad outcome per se. It seems possible for all of these to be true, but not guaranteed. In fact, I argue there is pretty serious reason to believe that all 5 of these steps have serious holes which we should consider.
1) I'm personally not very interested in post-ASI bets, seems like its after the period that matters.
2) Robotics being fully solved in our scenario also very downstream of building ASI. So it's mostly "ASI gets built and then takes over". Not "robotics gets solved, and then there's a robot uprising".
2) I guess I'm not sure what message to take from the story then. If the outcome is human extinction, with ASI surviving after but humans not, then how will ASI achieve that without robotics?
The basic scenario Yudkowsky-heads talk about is the AI bioengineering hyper-deadly plagues that sweep through the human population too fast for us to reverse-engineer a cure or vaccine. (Or something else of that kind, like self-replicating nanites — "grey goo".) If it can bribe or trick a human lab into doing the wet-work, this requires no robotics whatsoever.
If all the AI does is trick us into releasing a bioweapon on ourselves, but it hasn't figured out how to preserve itself in some body (robot, nanobot etc), then it will eventually die. Since it's living in a computer body, and computers don't live forever.
Depending on the kind of misalignment, it may not care about this. If it does care, it can arrange to keep enough humans alive until the time is right (while still killing ~99% of the human population to take full advantage of its first-mover advantage).
I hate to call motte-and-bailey, but stories like this seem to repeatedly mention human extinction, so I am focusing on that.
If you want to talk about other social issues like human autonomy and self determination, that's fair, but will have a very different set of dynamics we need to reason about. And autonomous, super-human AI seems neither necessary nor sufficient for creating authoritarian society.
1) Well the bet is about whether "unaligned" ASI will have the harmful consequences the team seems to be claiming it will have. Which seems like a key part of the narrative you're describing.
There are lots of ways for being unaligned for each meaning of alignment. And there are lots of meanings of alignment. E.g., "Which is unaligned, the AI that takes its orders from its owner or the AI that takes its orders from the government?". (From my point of view, if that's the overriding rule, then BOTH are unaligned.)
Could you point me to people who would be willing to make large bets ($100K-ish, not $100-ish, given the work necessary to structure such a bet so that it can't be evaded) that superhuman AI will occur in the next decade? I'll take the other side. 40-year AI researcher here, AAAS Fellow, international awards, blah, blah.
I'm agnostic out of ignorance, but very curious to see the basket of unfakeable indicators you come up with given what you said in your other comments. I hope someone takes your bet and you both share a quick writeup or something.
What I'd want to come up with is some basket of indicators that society has changed massively. You'd bet they mostly occurred, and I'd bet they mostly wouldn't. Employment in particular professions is an obvious one. I think we'll still have plenty of human attorneys, software engineers, etc. in 2035. Will take a bit of work to hash out, I expect, but would be delighted to discuss. Email me at rerouteddl@gmail.com
Would you be willing to make bets on benchmarks or other indicator pre-ASI rather than directly on ASI? The problem with winning a bet post-ASI is post-ASI money might not hold the same value, or things could change economically very quickly.
Is there an indicator of likely ASI that you would find reliable? For example, what would your odds ratio be on an AI or a combined AI agent scoring more than 95% on SWE-bench verified in agent mode, pass@1 within the next 5 years?
And no, I'm uninterested in bets related to benchmarks. I've spent my entire career designing benchmarks, and my interest in making such a bet is related to my understanding of how benchmarks have fooled people. AI 2027 gives me the idea of structuring a bet based on a basket of unfakeable indicators of massive societal change. So I'd really be looking for someone willing to bet on the world of 2035 being very different than I think it will be, given my belief we won't see superhuman AI in that timeframe.
Unfortunately I don't have more than a few K$ that I could invest into this. But I have ideas about indicators.
I would guess that university curricula massively change. When I studied computer science, I had many courses on how to write software by hand, and no course on how to prompt AI to write software.
I would bet something like this: The typical computer science curriculum for students who start in 2030 no longer has any mandatory course that teaches how to write software by hand. There won't be things like "introduction to Java Programming" that explain what a class is and how variables work, and where AI is forbidden for anything graded. However, the curriculum has at least one mandatory course on how to write software using AI. There might be specialized optional courses for hand-crafted software, similar to how someone of my generation could have chosen electives about assembly language or FORTRAN programming, but nothing mandatory.
A priori, I think that undergraduate education changes rather slowly. At the same time, the effect of AI on computer science seems massive today. I think it will be so large that universities have adapted by 2030. If you're interested in something like this, feel free to reach out or answer here.
My understanding is that Dave Lewis believes that AI will not fundamentally change the world; that there will still be software engineers and attorneys in 2035.
I believe university curricula would be a good indicator to test this, since universities try to anticipate what the job market needs. If they teach fundamentally different things than today, the world has fundamentally changed in a way that Dave Lewis believes it won't.
Taking a step back, I also think this is somewhat relevant to the AI-2027 discussion. The crux here is whether superintendence is reachable by 2030-ish and whether it will be as transformative as the scenario predicts.
If the bet is like Caplan/Yudkowsky's, then the money is paid now, and money losing its value later is just a problem for the person (Caplan in that example, Dave here) who paid first.
In your takeover scenario you describe a singular, centralized AI taking control. When that happens, do you think that singular AI will stay in charge forever, or do you see a path towards a transition to a multipolar scenario after that?
And how likely do you think it is that multipolar AIs will take over, perhaps similar to what's described in "Gradual disempowerment"?
If it spreads much in space, lag time will ensure a multi-polar development. OTOH, they might continue to think of themselves as the same entity even while growing apart.
1. Yep. I think the single ASI stays in charge forever. It has all the compute, all the man power, etc. Where would competition even come from? (Apart from aliens, which totally could still matter).
2. I also don't find the specific Gradual Disempowerment story in their report very plausible. I think that AIs will need to be intentionally misaligned to take over. If they aren't misaligned, I think that the humans will be able to use the aligned AIs to fix the gradual-disempowerment-type problems.
I would hate myself if I didn't ask this [mundane] question:
I'm a disabled former software developer. My symptoms take up 50% of my time and energy. I will never work directly in AI research or technicals. But I have passion. And fear. And a keyboard. And chatgpt. WHAT SHOULD I DO? Please point me in the most efficient direction for averting AI catastrophe.
That question is too low in information. Read the report noting where they specify that they are uncertain.
OTOH, your scenario is unbelievable. That would require a world-wide coordinated Butlerian-Jihad (or a secret cooperative agreement between all major governments with nobody cheating). Already existing publicly available models are sufficient to drastically change employment over a decade or less.
If their answer were simply yours ("your scenario is unbelievable"), I'd find knowing that valuable.
I've read the report and the supplementary material on the website. They are admittedly very large and my memory is imperfect, but I don't recall them seriously really entertaining the scenario I'm positing. This *might* suggest they think it's implausible, but it might also suggest that they were describing a modal/median outcome and didn't want to get too bogged down in 15th percentile outcomes.
Suppose you’re learned you were wrong, and it turns out current LLMs are insufficient to make a major shift in GDP or unemployment. Why were you wrong?
All I can think of is that perhaps they were too expensive to run. And I don't believe that, but since lots of stuff is being kept hidden it's possible.
Do you think we're still likely to extrapolate a CEV-aligned superintelligence from our current trajectory?
It seems like putting models in the “code successfully or be vaporized” time chamber - even if we manage to avoid incentivizing reward hacking - would still make it really hard for a model to ever then think “actually it’s better to NOT write this persons breakup message for them”.
And if they do still go and try something else sometimes, I think it would, in its nascent form, before the model can learn how to aid them better, often be treated as a refusal and trained out. In this sense, I think that training for prompt adherence is profitable but diametrically opposed to CEV if you extrapolate a bit. We're already seeing the fruits of this with the advent of the incredibly sycophantic newer entries in o-series models.
How do we reconcile profit with an alignment that trends toward interfering with humanity *less* over time as it robustly increases each individual's capabilities? That seems like the only way to widen the envelope of human experience, while our current (frontier) trajectory is one of narrowing... which I'm not really keen for on both the object level and its nth-order effects (wireheading etc). If you believe a different route would still end up well for humanity, I apologize for the undue assertion and would love to hear about it. I'm just worried the fundamental way we're handling model reward is geared at training for one thing with only a shallow gesturing toward the concept of morality.
This is also all assuming mechanistic interpretability is never, like, comprehensibly solved, which I think remains the likeliest outcome - humans are prone to muddling their way through things that work out via localized pockets of course-correction - but not by much.
Vague question, but if you guys (Daniel especially!) would humor me: in the event of a slower takeoff, do you think verification of some kind of ASI - let's just take that to mean "country of geniuses in a datacenter" - would be a bottleneck?
I've gotten in debates about this, but to me it seems that, if AGI is weakly-aligned at most, if it is involved in training its successor in an intelligence explosion scenario, the resulting model might claim superhuman performance on certain tasks, but human verification of that would be slow. An exception might be if it can quickly factor RSA numbers, or is clearly superhuman at trading or prediction markets. Is something like this when you start feeling the ASI in your modal outcome, or is it so obvious that human verification is irrelevant?
I found the AI-2027 scenarios rather mundane: a race toward SAI between USA and China with a likely treacherous turn when the SAI is smarter than the smartest humans. I was expecting something more out there. But maybe it was intentionally so.
1. Would we be all that much safer without an AI race? According to Eliezer, there is only a narrow path toward non-extinction, and we have one chance to get it right, so without a manual from the future our odds are really bad. I assume this is NOT the view of anyone on the team?
2. You describe a chain of AI agents, each trying to keep the next one aligned with itself. Why not a single agent evolving the way humans do?
3. Did you look into a variety of likely weirdtopias, where it is hard to tell whether the eventual future is aligned or not?
What are your takes on o3 and Sonnet 3.7 being so prone to reward hacking, and in the case of the former, lying? Do you think this could mean AI capabilities slow down a bit as they try to mitigate the negative effects of more goal-directed reinforcement learning? Is this more likely to be a "blip" as AI researchers struggle to overcome a new challenge, or the start of a worrying trendline where models progressively start to lie/hallucinate more? If they have to abandon this style of reinforcement learning altogether is there another plausible path forward or is this plausibly a significant slowdown?
The models are more prone to reward hacking than I thought 6 months ago. My guess is that it will be very hard to solve the underlying problem because its seems like we'll never have a perfect reward function, but I'd guess that we can make patches which will be sufficient for capabilities progress to continue.
It's also plausible to me that this leads to a significant slowdown.
Based on current learning algorithms, do you agree that ≈ 10x requirement will hold if the anticipated paradigm shifts in learning do not materialize by 2027?
What do you think of this argument that advanced AI wouldn't necessarily become adversarial to humans, by virtue of the fact that it will undergo selection / "breeding" that's controlled by humans:
If humans (or 1 human) manages to integrate their brain into AI systems (interfacing, uploading, etc), is it possible that "they" could somehow be the agentic/deciding force in an ASI that is 1e6 smarter than humans?
Sorry, I already asked something else, so if you need to triage, you can ignore my question, but given your predictions... how do you stay functional? Not lose yourself to crippling, overwhelming depression or anxiety? Stay atop despair?
What is the team's median AGI, ASI timeline now? How likely do you see war breaking over Taiwan _before_ 2030 and would that make AGI take longer or shorter in your view?
1. I don't think robotics is on the critical path for an intelligence explosion. It doesn't seem very relevant to RSI, seems like domains like ML, software engineering, planning, etc are going to matter much more. I haven't looked closely at pi in particular.
2. Daniel's median AGI timeline is 2028, Eli, Scott, and I are all around 2031. My guess is that we all think that ASI probably happens around a year after AGI, though we have a lot of uncertainty here.
I consider eventual AI takeover very likely, but I don't understand why it's usually assumed that AI will have an incentive to kill humans after that. The reason you gave is "Eventually it finds the remaining humans too much of an impediment", but I don't see why humans should be an impediment at that point.
When humans took over control of the planet, they surely had a large impact on the biosphere, but most species are still around, and most of the planet is still green or blue, not tiled in concrete. What reason is there to think that an AI will go out of its way to harm humans?
Eliminating negative possibilities, even rare ones, is something all intelligence humans currently do. It seems reasonable to suggest GAI will also believe this and have the statistically models to justify it, in its own mind. Humans will always pose a threat to the universe because our biological nature seems innately violent. If ants could intelligently band together and make decisions globally, I'm pretty sure they'd kill all or most humans.
Instrumental incentives. Yes, humans don't worry about removing animals because animals aren't a threat, but Homo Sapiens weren't the only intelligent animals at one point in time.
In fact, where possible humans have had a history of eliminating other humans or groups with different values. I think an AI with non-human aligned values could very much decide to remove humans if they interfered with its values too much. Maybe humans are worried about the environment for example, and we threaten to nuke the AI or it's factories if it doesn't take care of the environment, or stop running over humans with its big factory trucks, and the AI decides it would rather not risk humans being able to nuke it, so it eliminates all industrialized human society.
In many scenarios I think humanity may survive, at least for a time, but if AI doesn't actively care to protect or improve the lives of humans, it has a lot of incentives to remove Humans from power, and few incentives to care about not killing us or making our lives worse.
I think even today, with modern ethical values, it would be a risk to give some humans or groups of humans too much power. There are many humans alive today, that if they had the power to, would probably selectively eliminate most of the human race if they thought they could do so safely and without repercussions.
Humans are only very weakly in control of the earth. If we disassemble it and build a Dyson sphere, we might still have animals around, but only if we specifically care about keeping animals around.
I recently flew from DC to SF and spent hours looking out the window at the USA rolling by underneath me. It's only a small exaggeration to say that humans DID tile the planet in concrete, or at least, large parts of the planet. And we aren't done! From the perspective of evolution this is all happening in a blink of an eye. If ordinary human industrial progress continues we will in fact wipe out most species, and we will in fact 'tile the planet in concrete' (but it wont be just concrete, there will also be parks, gardens, farms, etc.)
Similarly, AI 2027 depicts the exploding superintelligence-ran robot economy being mostly confined to SEZs for some time before expanding out into uninhabited areas and then eventually encroaching on human-inhabited areas.
So we aren't saying the AIs will go out of their way to harm humans. We are saying humans will be in their way, to a similar extent that e.g. the Amazon rainforest and the Great Plains Buffalo herds were in humans' way.
That's a good point, but a wouldn't expect that a singular AI that has taken control of the planet and is acting in a planned manner would allow itself to grow uncontrolled, like biological lifeforms usually do.
But if it has a plan, in most cases that plan is helped by more compute. There are of course plans for which that is not true, e.g. if the plan is to shepherd the natural world for eternity, but such plans are unlikely to happen by accident and even less likely to happen at the direction of a corporation or a nation state.
That's valid. My view is like 50/50 that we all die after AI takeover.
The case for the AI overlords killing everyone is that we are innefficient, it can build better robots for whatever it wants to achieve, etc. Basically the same reason as to why we drive lots of animals extinct.
The case for the AI overlords preserving us is (a) that they might care about us, and (b) something something acasual trade. Both of these are pretty tricky for me to reason through confidently, so I don't end up with a strong take.
So far protests, etc. in the US to slow AI development haven't been very effective. I'm not aware of any protests in China.
What about people outside the US and China? Is it possible they could get their governments to apply diplomatic leverage to the US and China, to push for a more responsible AI development path?
I'm just thinking -- if you want something done, find the people naturally incentivized to do it, then make them aware of that incentive. Protesting AI lab workers has helped to a degree, but it's an uphill battle because of their incentives. On the other hand, many countries don't have significant national AI efforts. Countries *without* significant national AI efforts have a natural incentive to apply sanctions to states which are pursuing a reckless AI development path.
On this view, recent US/EU diplomatic frictions could actually be a very *good* thing for AI alignment, since the EU becomes more skeptical of the US, and more willing to question/interfere with US plans. Maybe leverage could be applied via ASML, for instance.
But ideally, lots of countries could club together to form a AI-safety bloc much bigger than just the EU, which is willing to apply sanctions etc. You could call it: "The Non-Aligned Movement For AI Alignment"
The only thing I'm worried about is that EU criticism of the US could create anti-EU polarization among the GOP in the US, which motivates them to be *more* reckless on AI. This question seems worth a lot more study.
What do you think of this as a canary for how close we are to humans being obsoleted: the existence of a self contained, self repairing data center.
I don’t worry about AGI risk because my experience keeping data center systems alive tells me that hardware is way crappier than people realize, and that AGI faces it own existential risks, which it can significantly reduce by keeping people around to repair it. I think most of the crowd worried thinks this problem will be trivial to solve.
That seems like a good crux point. I’d go from “not worried” to “worried” if this data center existed.
That data center probably gets built after the point of no return -- in our scenario, the AI uses persuasion / politics to put itself in a really good position, then uses humans help build the initial wave of robots. But at the point where the robots that can operate and construct new datacenters are being produced, the AI is already running things
I don’t think it’s possible to build such a thing. I suppose a bet here is off the table because you’d consider it like betting in favor of a nuclear apocalypse.
A self replicating tool with a 1.01 reproduction rate before being damaged is explosive growth; our automation isnt there yet, but the bar is only that of bacteria
I dont think gray goo can happen(water tension, biology is already well optimized, art renderings are absurd), but 3d printers making 3d printers is on the table.
Daniel proposed regulation to prevent AGI being developed in secret. But the American legal system is SLOW. For example, the FTC filed a suit against Amazon in 2023. The trial is set to begin in 2026. At this rate, Daniel's "secrecy violations" will never be litigated in time to save us.
(I apologize because I'm basically re-posting this question. But even if no one can answer... I hope the problem here is top-of-mind)
Courts can act very fast if you can convince them that there's an emergency. Just look at the Supreme Court issuing an order in *the middle of the night* to stop Trump from illegally disappearing more people.
I do think the business-as-usual legal system will be too slow to matter. But there are still possible interventions, for example, courts might issue an injunction if they are convinced of immediate danger. Alternatively, if you do an intense compute verification regime, you might not need to use the courts.
Re: secrecy in particular, it seems like it would be very difficult for a company to obviously violate a transparency requirement, and it would make the case for additional government oversight stronger if an AI company was clearly illegally keeping their progress secret.
I have to think that the speed and sccale of change anticipated in the forecast would be societally deranging. Did you incorporate a "humans freak out and go full Luddite" factor in your projections?
How are those people even going to survive? High intelligence would both accelerate the acquisition and increase the demand of land and resources. There is nowhere for these people to run, and they still need money to survive.
But the people developing AI aren't the ones that are going to freak out. They know what they're getting themselves into. As for the common folk burning down society to stop this... even if it was possible for people to suddenly go crazy like that, it's probably going to be too late. If AI changes society enough to trigger something like that, it's also going to be accompanied by enhancements to surveillance and law enforcement. Nobody's going to be able to get away with it.
I get why Thomas Larsen gave the answer he did to my question. But I don't think it's so obvious that there isn't a time gap being "something real weird and scary happens" and "ASI has practical control over the productive apparatus of society." Even if that time gap is only a few weeks, perhaps a lot could happen in that time.
Suffice it to say I also disagree with the premise that the people developing AI "know what they're getting themselves into."
I agree that AGI will be societally deranging. But I think the "humans freak out and go full Luddite" outcome is pretty unlikely. Which humans would do this? Very few humans have the ability to stop AGI labs. And those who do will also probably be trying to gain power over the AGIs as opposed to stopping it.
Also, the window of time to do this -- after substantial turmoil, but before the point of no return, is probably small or potentially non existent.
What kind of effective political (which I take as a catch-all for collective-scale non-technical/alignment work) actions do you think would be most effective to steer things in a better direction?
1) Do you think targeted advocacy towards key players in government with the agency and/or inclination to work toward regulation/cooperation with China could work?
2) Or voter fronts that put electoral pressure on politicians to regulate/slow things thing?
3) Do you think pursuing a pause has any efficacy at this point?
4) Do you think—given that the US is most likely to reach ASI first and that if the US gets there first, it will effectively have unlimited coercive power over the rest of the world forever—there is any chance that the rest of the world could or would band together to get (somehow) the US to... stop it? (Esp. conjoined with current trade dynamics that are already prompting new vectors of cooperation and greater interdependence between the rest of the world.) For the US to essentially violate the property rights of its main companies, and join in AI development to be pursued in an internationally cooperative way, open-sourced and as a technological commons that no single player can benefit from to the exclusion of others?
5) And other political avenues that you think could play out.
All of the above seem like they are on the table at least. In our TTX's, some small but noticeable fraction of the time 4 happens. (Minus the open-source part)
I don't have a strong opinion. I think voters in e.g. Europe should try to get their governments to realize that they will all be American vassal states in a few years (if things go really well; literal death is more likely IMO, and then there is the Chinese vassal state option ofc) unless they wake up and do something about it.
And your predictions factor in European and American voters respectively doing more of the things above, or do they assume status quo political pressure and advocacy?
Free WandB licenses? Or someone to manage their Kubeflow instance. Alternatively, accelerating ROCm / CUDA parity to improve competition in the GPU compute market.
Did you account for the iterative process of amplification-distillation and RL (as per o3's 10x compute) to take 10x longer for each time of the cycle? I would consider this would make the RSI much longer than we previously thought.
like how do you expect more OOD questions answers pairs datasets to scale 10x wrt same time ? Synthetic dataset will model collapse reasoners into even deeper autistic well. You can see how o3 does not bother to complete 1k LOC tasks, it's agency is very bad even though it's ability to utilize tools is good enough.
Over the course of 2027, in AI 2027, two major paradigm shifts happen. First more data-efficient learning algorithms are found, and then later mechanistic interpretability is solved and used to understand and build circuits instead of simply training giant neural nets. So I think your question is mostly about the first phase, which looks more like the RL runs of today except much bigger and more sophisticated and with better ways of using AIs to grade outputs.
I don't see why the process would take 10x longer each time of the cycle? It's not like we are making the models 10x bigger each step. We might gradually be expanding the horizon lengths though but tbh I think that as you expand horizon lengths you can probably get away with decreasing somewhat the number of data points / tasks at each length, e.g. say you have a million 1-hour tasks, you can then do 100k 10-hour tasks and 10k 100-hour tasks and so forth. Not sure if I understood your question though.
Based on current learning algorithms, do you agree that ≈10x requirement will hold if the anticipated paradigm shifts in learning do not materialize by 2027?
What leverage could Europeans use, if any, to secure treaties or safety measures with the US and China? Do they have *any* cards they can play? Economic/Financial? Academic? Military? Pure angry noise?
They can play all of the above cards! If Europe was rational, they'd probably be using ASML as leverage right now to get power over the US AGI development. The big question is how quickly Europe wakes up to what is going on. The UK seems to be doing well, but others (esp France), don't seem to take AGI or AGI x-risk seriously at all.
I would predict that we see increased tensions between Europe and the US due to the different incentives wrt AGI in the next few years.
If ASML is such an important bottleneck, doesn't that also put EU as a target? IF China is behind, they could ally with EU but also they could strike ASML to slow down USA.
Have you considered talking directly to the Chinese? Would it be dangerous to inform the CCP about "AI 2027"? (Could it be more dangerous *not* to inform them?)
Do you mean Chinese people, or the CCP? We have various Chinese friends but have not considered talking to the CCP. IDK if it's a good idea to inform the CCP but I weakly guess that it's probably more dangerous *not* to inform them, because they are going to find out anyway in the next few years -- AI takeoff isn't going to be so fast that it can happen without them noticing! (And if it is going to be that fast, then it's super dangerous and must be prevented at all costs basically) And the sooner they get informed, the sooner the necessary conversations/negotiations can begin.
We aren't going to reach out them specifically. We are publishing stuff online, for a broad audience; it'll make it's way around the world to Europe, China, etc. by default and already is doing so.
However, do remember that the CCP is not a unitary group.
OTOH, how would you determine the "correct members"? My guess is that they probably don't easily read English. Best to have someone else handle the problem. Someone whose native language is Chinese.
So publishing this and getting lots of attention is the best strategy.
The “good ending” scenario prominently features J.D. Vance winning the 2028 election after making a series of good decisions about A.I.
While it seems logical that delivering an AI utopia would be good for winning the presidency, this part of the report also made me wonder how much the authors were trying to flatter political actors versus tell their best estimate of the truth. Any thoughts?
Best estimate of the truth--though note that our best estimate of the truth, full stop, was the "Race" ending. The "slowdown" ending was best-estimate-subject-to-the-constraint-that-it-must-end-reasonably-well-and-also-begin-at-the-branch-point.
Also, while AI 2027 started off as my median guess, it was never the median guess of everyone since we disagree (and in fact others on the team have somewhat longer timelines till AGI, e.g. 2029, 2031) and I myself have updated my median to 2028 now, so AI 2027 represents something more like the team's mode.
It would mildly ameliorate the concentration of power issues, because it would give those countries some leverage in negotiations. They could nationalize or threaten to nationalize their datacenters. It wouldn't change the basic picture though. But it does seem important enough to model e.g. as an alternative scenario or alternative TTX starting condition.
If an arms race developed between the US and China around AI and no agreement could be reached to slow progress in capabilities for the sake of safety, assuming you could influence decision making on the US side would you advise the US to step down and slow it's growth in capabilities unilaterally to reduce existential risk?
It depends on the exact situation. But in, say, the AI 2027 scenario, I would advise the US to unilaterally pause in order to make alignment progress, while using all available carrots and sticks to try to get China to stop as well.
This is a very scary world, and I hope that we can avoid it by either making huge alignment breakthroughs in the near future or succeeding at international cooperation to not build ASI until we've gotten the breakthroughs.
I assume you keep up to date on AI impact research/ discourse in the English speaking world. Is there a lot of AI impact research/discourse that's not in English, and where do you look/ would you recommend somebody look to hear about it?
What kind of privacy measures would you suggest with regard to using AIs in the next few months vs. years? People benefit the most from sharing large volumes of deeply personal information (medical data, intimate confessions) with the leading models, yet it also seems to pose the highest risk of that data being used for profiling, super-persuasion, blackmailing, control, etc. Simpler, locally run models (e.g. Gemma) offer less - while still posing privacy concerns. Best recommendations for accessible, multi-layered configurations (or resources on this topic)?
I see similar questions were asked and answered below, so let my modify my question a bit: How likely do you think it is that we are in some kind of extremely high resolution simulation?
Personally...I'd say 70% WYSIWYG and 30% we're 8 Billion simulated personalities being used to test marketing slogans on or something.
What are your thoughts on the latest Dwarkesh podcast hosting Ege Erdil & Tamay Besiroglu? In particular, what do you think about their stance that a software only singularity is unlikely? And that all the AI progress will be bottlenecked by progress in other necessary areas? Their conclusion is that AGI is still 30 years away, which seems like a huge discrepancy with your team’s predictions. Curious what the cruxes are.
Ege's stated median timeline until a fully automated remote worker is 2045 — 20 years away. Tamay is shorter than that. (I don't understand why Dwarkesh put "30 years away" in the title.)
It seems to me like they believe that AIs will not become wildly superhuman -- for example: the Epoch "GATE" model of AI takeoff hardcodes a maximum software speedup of 1e5 relative to current systems -- basically saying that the physical limits of intelligence are no greater than 10,000x better than the current AIs.
I think this is clearly, wildly, off. For one, I think current AIs are way dumber than humans, and I think that humans are also probably wildly below the limits of intelligence.
Qualitatively, my guess is that ASIs will be able to persaude anyone to do basically anything, that they will be able to build nanotech rapidly, that they can quickly cure ~all diseases, etc.
There are probably a lot more cruxes, but I think this is pretty central.
Mmmph. My take is that different problems have different optimal levels of intelligence, and devoting too much intelligence to the problem isn't cost-effective. Consider the problem of tightening a bolt to a nut. (Bolthole assumed.) The solution requires dexterity, sensory ability, and some degree of intelligence. But once you get beyond the minimum level, increased intelligence rapidly ceases to be cost-effective.
Therefore most tasks will not be done with superhuman intelligence. But some small subset will require it. I think that the size of the set of problems rapidly decreases as intelligence passes the average human level.
This is why I consider the ability to factor problems into modules to be the crucial "next step" that AIs need to take. (FWIW, I think of adversarial generative AI as a crude step in this direction.)
OTOH, most of my understanding of AI is over a decade old. Perhaps they're already doing this, and just not covering it in the public announcements that I've read.
When you talk about persuasion, is this "easy mode" where the AI will be incredibly good at tricking people into thinking it's someone trustworthy, or do you think it would be able to persuade anyone *even if* they already know they're talking to the highly-persuasive superintelligence and have precommitted to ignoring it?
I don't know about "hard mode" (Eliezer's AI-box experiments may be relevant), but I think "easy mode" is what they'll face in reality. Maybe you've precommitted to ignoring the AI, but the AI isn't in a box, so when you answer the phone and it's your wife on the other end...
I basically agree; I think there's enough interesting content there though that we should probably try to do a more thorough and thoughtful response at some point.
I think that's a good idea. Here are two points I think it would be useful to address there:
- Ege/Matthew/Tamay argue that different skills are advancing at different rates. Do you think this is true, and if so how fast do you think slower-advancing skills are improving?
- Another argument that comes up a lot is that the world is complicated in ways that require experimentation and iteration at scale to learn. For runaway self-improvement, this probably means using compute to experiment with improvements in training and inference methods, and maybe generating better data. Accelerating progress in this domain requires doing more and more experimentation on the same amount of compute as lower hanging algorithmic fruit is plucked, why should we expect this?
re: 2: Yes we take this into account, our takeoff speeds model/estimate is built around the key idea that during takeoff the bottleneck will be compute to run experiments. https://ai-2027.com/research/takeoff-forecast
What’s the best estimate of the total number of humans currently involved in frontier research on AGI / ASI? What’s the “bus number” of such people who are not readily replaceable with at-most-marginally-worse substitutes?
ETA: Assuming there’s no government coordination, how many individual people would need to agree on any sort of coordinated set of safety principles or red lines (or would they be inmediately replaced if they did so?)
My sense is that each of the labs have ~100-1000 people doing the majority of the capabilities research. So maybe a few thousand top-tier AI researchers in the world.
What should a software engineer do if he wants to avoid being laid off in the next few years? Get really good at vibe coding? Study ML? Something else?
In my experience, soft skills are much more important than coding ability anyway. So it's probably best to invest in those, regardless of what you think about AI.
Getting good at using cursor/Claude Code seems reasonable.
But TBC, our view, and what happened in AI 2027, was that there were not huge job market effects until after AGI/ASI, due to normal economic frictions. And at that point there are bigger things to worry about than jobs.
If there WAS regulation against developing AGI in secret...
and if a company was 'obviously caught' breaking the rules...
How would you practically punish it?
Fines? Loss of licensure? Prison? Nationalization?
And would the punishments even happen fast enough? Keep in mind the American legal system is *already* notoriously slow... especially when litigating the powers of large tech companies... but given the pace of change... lawyers and judges might be PERMANENTLY behind in the technical understanding of what they're litigating
Prison. China wildly enough has the theoretical best policy on how they handle even highly respected people that break their laws. Trump would just pardon the evilAI company guys.
People disagree about whether benchmark saturation will translate into real-world usefulness. Do you expect it will be possible to answer that question before we get close to saturating the current benchmarks?
There still is some fog of war, but it seems to me that benchmark performance does seem to correlate with real-world usefulness, and moreover the leading companies are getting more thoughtful about this and are optimizing more for real-world usefulness. A year from now it should be more obvious (the fog will have lifted somewhat) whether real-world usefulness is improving apace or whether it's treading water as benchmark performance skyrockets uselessly.
It really depends on the benchmark. E.g. the "bin picking problem", where the AI needs to control a robot that picks up the piece, examines it, and puts it in the correct bin, directly translates into something that's useful. (It may be too expensive for it's value, but the value is accurately determinable.) And on that problem you can't "skyrocket", as you can't get above 100% accuracy done as quickly as feasible without damaging the item.
OTOH, "Write a good sonnet on baked-beans." would have a highly subjective evaluation.
About the problem of alignment and how to accomplish it. We can't just set out to train them to think about members of our species they way a person does. Many people are, themselves, not fully aligned with the welfare of our species. It is clearly not rare to feel great rage at other people and to act on it. it is also not hard to train a bunch of young males to kill members of an opposing army even though they do not feel personal rage at them. So even if we could somehow transfer human attitudes towards human beings into AI, a lot of what we'd transfer would be destructive.
And even we consider only calm, kind, conscientious people as possible models, it often seems as though those people's kind and conscientious behavior is the result not of inner imperatives, which could be embedded in AI as unbreakable rules, but of temperament, of a free flow of kind feelings in the moment.
So even if we could embed something in the AI that would guarantee it will always place our species' welfare first, what is it we embed?
Good question. Alignment is a tricky philosophical problem as well as a technical problem. I think one answer popular among serious experts is to try to punt on the philosophical problems, and e.g. just focus on getting powerful AI agents that are 100% guaranteed to be honest & obedient, for example. And then use those to do the rest of the work to solve the remaining problems.
I'm not sure that I'd consider a system that was 100% obedient to be aligned. It probably wouldn't be aligned with most people who weren't giving it orders. That's the kind of thing that can easily lead to unending paper clips.
You got me thinking about this, and I actually doubt that there is anybody who is fully aligned with someone else — If by calling the person fully aligned we mean they are 100% truthful with the other person and always do what’s in the other person’s best interest.
I'm not sure what the etiquette is in an AMA -- am I allowed a comeback? I do see the sense in hoping that a very smart AI can solve various problems we can't, and how that will lead to a great acceleration of progress. On the other hand, relying on a more powerful AI for a solution to this particular problem has a kind of sleazy recursive quality, and ends up biting itself on the butt.
Since it seems that no member of our species is smart enough to develop a plan for alignment that will clearly work, this powerful but honest AI agent you're are talking about relying on will have to be smarter than us, right? But isn't it pretty obvious we need to solve alignment *before* AI gets smarter than us? Once they're smarter than us, there are lots of problems with using them to solve alignment. Once we can't understand them fully, we can't judge whether they are 100% guaranteed to be honest & obedient, in fact I don't see how we can 100% guarantee anything at all about them. And If we become convinced they are unaligned, we're much less likely to be able to just disable them.
IDK what the etiquette is either but I'm very happy about your response, you are asking the right questions! AI companies are playing with dynamite, basically, and the best plans I know of for navigating the technical alignment problems look like attempts to juggle the dynamite. This is because the people making those plans are operating under intense political constraints; they need to generate plans that could plausibly work in just a few months, rather than e.g. "if a decade goes by and we still haven't solved it, that's OK, we won't build superintelligence until we are ready."
What I find lacking in all this discussion is advice for ordinary people on how to position themselves and their families to handle an impending intelligence explosion. This is understandable, given that the focus in this discourse is on averting catastrophe and informing big-picture policymaking. But I still think more guidance for the common folk can't hurt, and I think you folks are better qualified than almost anyone to imagine the futures necessary to offer such guidance.
I don't think there is much to be done in a dramatically good or bad foom scenario, so let's consider only the scenario in which AI has a moderately slow takeoff and does not quickly bring us utopia or extinction. I think it's best to focus on scenarios in which there is a) a "class freeze" where current economic castes are congealed in perpetuity, and/or b) widespread short term instability in the form of wars, social upheaval, job loss, etc. In these scenarios, individuals still have some agency to create better lives for themselves, or at least less worse lives.
---
With these assumptions, my specific question is: What short-term economic and logistical planning should be done to prepare for such scenarios?
With the obvious disclaimers that nobody can see the future, there are a lot of unknowns, these scenarios are very different, and none of what y'all say can be construed as legitimate life or economic advice.
---
For my part, in the next few years I'll be trying to raise as much capital as possible to invest in securities that are likely to see significant explosions in value following an intelligence boom. I think the market has dramatically underpriced the economic booms that would result from an intelligence explosion, and so I think a lot of money can be made, even just investing in broad ETFs indexed to the US/global economies. If cashing out correctly, gains can then serve as a significant buffer in the face of any political and economic instability before, during and after the invention of AGI.
To hedge against the unknown, I'll also be using some of this money to invest in survival tools -- basic necessities like food and water, etc -- essentially treating the intelligence explosion as if it were an impending natural disaster of a kind.
I plan on advising my friends and family to consider making these same moves.
---
With that being said, any thoughts on more specific economic moves and investment vehicles that might act as a safety net in years of change and instability?
I think we don't have great guidance TBH. How would one possibly prepare for ASI?
I will say that I think it's possible (maybe 5-10%) that we see a good world, but it happens only after intense conflict (e.g. an AI released bioweapon, WW3, etc). In this case, I think it makes sense to invest in emergency preparedness by doing things like buying large food supplies, perhaps buying a bunker, etc.
Wait, your slowdown ending did not have a bioweapon or WWIII. Are you saying that the slow down ending is less likely than 5% or that it is actually impossible. Or is the slowdown paired with a bioweapon or WWIII?
If I understood the comment correctly, the 5-10% estimate was specifically for "good outcome, but a disaster happens first," with a separate, presumably higher probability on "good outcome, no disaster" (as depicted in the slowdown ending).
I think that around the time of AGI, the returns to internally deploying the AGIs on research will be so high that it will be financially worth it to spend most of the compute on internal deployment. I also guess that they will be spinning out "mini" models that use much less compute and get a lot of the performance for public deployment, so the cost might not be that large.
Not one of the authors, but we're already seeing this dynamic. OpenAI released distilled mini versions of o3 way before releasing o3 itself, and has now released distilled mini versions of o4 without releasing o4. If they can keep their spots on the Pareto frontier of cost/performance with a distilled model, it doesn't make sense to release their most powerful version (which others could then distill themselves, aka what they claim DeepSeek did).
Asking extremely seriously: how could your predictions inform the strategies for preserving & enhancing vs erasing the continued sense of self?
For example, should people collect their genomic, connectomic (cryonics), and digital data, so they could be used to recreate their continued sense of selves by future AIs (with potential for gradual enhancement)? Should such efforts include the option for quick erasure of data in case the trajectories were unfolding badly, yielding a risk of creating (continued) copies of ourselves that could be experiencing distress with uncertain exit options?
To what extent does OpenBrain's sizeable lead in AI 2027 represent a genuine expectation that one of the major labs will in fact largely outpace the others? To what extent was it a simplifying assumption for the scenario?
Genuine expectation, but not a confident one. We say in AI 2027 that OpenBrain has a 3 - 9 month lead over the others, which is similar to the lead we think the current frontrunner has over its competitors (tbc we aren't even sure who the current frontrunner is!). During the intelligence explosion, a 3 month lead might prove decisive. However we aren't sure the lead will be that big and we aren't sure it would be decisive.
In our tabletop exercises we always have someone play the competitor companies, and sometimes that turns out to matter quite a lot to how things progress, so if I had more time I'd like to rewrite AI 2027 to model those dynamics more seriously.
Did the numbers used for calculating the cost of running GPUs inside data centers account for auxiliary infrastructure, like cooling? The calculation referenced in the timelines forecast appears to not, which leads to responses like https://davefriedman.substack.com/p/the-agi-bottleneck-is-power-not-alignment
In terms of capital expenditure they need about $2T globally by the end of 2027. In 2025 companies are on track for around $400B. My projections is that they will do $600B in 2026 and $1T in 2027. At the end of 2027 under these assumptions, all AI datacenters use 3.5% of US energy.
In your scenarios, you seem to assume that AI models can scale up to make massively super-human intelligences. Looking at the history of AI, it seems to me that the innovations in technique have led to logistic growth in AI capabilities, as the limits of the new innovation eventually get reached and AI development slows down until the next one (e.g. neural nets, CNNs, minimax trees, etc historically). And it also seems to me that it's hard for people to tell what the limits are until we start to approach them and new capabilities slow down.
Have you played out scenarios where the current attention block-based LLMs hit these diminishing returns at something within an order of magnitude or two of people, and no new algorithmic innovation on the scale of attention blocks happens, leaving us stuck with AIs that are sub/near/super-human but the full intelligence explosion never happens? Or are you fully convinced that this outcome is impossible and our current AI techniques will scale all the way to the end of the scenario?
But where does the sigmoid end? It seems like the physical limits of intelligence are EXTREMELY high -- at the very least something like a 1e6 speedup of human brains, but in practice there have got to be algorithms that are way better than human brains.
I think its possible but not guaranteed that basically the current techniques get us to the start of the intelligence explosion (as happens in AI 2027). By the end of the intelligence explosion, I think its almost certain that the AIs will have found far better AI algorithms than what exists today.
Of course, the world where our current techniques peter out before AGI does not mean that we get stuck with no AGI forever -- in those worlds I would strongly guess that there are more insights on the level of transformers that get discovered eventually, I see no reason why progress would just stop. And in fact, I see many different pathways to ASI, so I would be very surprised if all of them were blocked
Maybe this is kind of of a side note, but I see this argument a lot about how humans are unlikely to be using literally-optimal algorithms or whatever, and I always think, "So what?" It's not as though we'll definitely find the absolutely-perfect algorithms for learning ourselves, either, so I don't see how that's relevant to my intuitions.
> at the very least something like a 1e6 speedup of human brains, but in practice there have got to be algorithms that are way better than human brains
I wouldn't necessarily assume this - evolution is definitely capable of getting close to physical limits when there is strong selective pressure, eg the replication rate of e. coli being nearly as fast as physically possible, a bird of prey's visual acuity being close to the Rayleigh criterion given the size of their pupils, etc.
It's not unreasonable to assume that large parts of the intelligence stack are like this, ie, does the mammalian visual system in general extract close to the maximum amount of information as is possible from the input signals, per Shannon-Hartley? I would wager yes. So the real question is whether fluid intelligence is like that. Obviously computers can perform straightforward mathematical calculations many many orders of magnitude faster than humans, but for the kind of massively parallel probabilistic reasoning that humans do, it doesn't seem so obvious as to need no explanation that we're way below the limit - especially per-watt.
Consider the size of corvid brains vs. the size of human brains. For their task set, corvid brains are much more efficient. Human brains are a lot more generalized, but IIUC our speech production (sound, not meaning) is as large, or larger, than the entire corvid brain.
So we aren't near optimum. (There's a reasonable argument that our brains have gotten more efficient in the last 100,000 years, but all we're sure of is that our brains have gotten smaller.)
Good question! I know that I don't know. For a fixed topology of neurons, getting feedback from the final neuron back through all the intermediate neurons for a single training instance looks better with backpropagation than with anything biologically plausible. (wild guess, probably wrong) But if the topology of current artificial neural nets is in some way inferior to the biological ones, that would be compatible with the data inefficiency.
Yeah, I wasn't thinking that we'd be stuck at the limits of attention-based transformers forever, but if we spend a few years stuck at Agent-1 to Agent-3 levels of intelligence while trying to find that next new insight that leads to a breakthrough, things could change a lot during those few years. Have y'all thought in-depth about this timeline or incorporated it into any of your wargames?
I think they say many times in the scenario that it is possible things go slower, and indeer that their median AI timeline is later, 2028, which indicates they think there are many possible worlds where the takeoff to Superinteligence is slower. Their Modal prediction is faster though, and I think it makes sense to focus for preparing for the portion of timelines go much faster since slower timelines are probably safer/give more time to adapt on net.
If you have automated AI researchers you can test a massive number of things, but each is severely compute-limited. This makes me think that the degree of acceleration depends on whether crucial insights show up at small scale. Do you folks think this is a reasonable way of thinking about this, and how useful do y'all think small-compute experiments will end up being?
The real constraint there is that you've got to train those researchers differently, or they're scarcely better than having fewer researchers. If you train all the researchers the same way, they'll invent the same developments. (OTOH, chaos. They may not need to be very different.)
My guess is that you can get most or all of the relevant insights at small scale -- I would strongly guess that there are vastly better AI algorithms than those that are used today.
But its also the case that when I ask friends at the labs working on e.g. pretraining, they spend most of their time testing and validating ideas at small scale, and that this provides huge amounts of signal.
Today Ege Erdil stated in his piece about why he thinks a software-only singularity is unlikely "in practice most important software innovations appear to not only require experimental compute to discover, but they also appear to only work at large compute scales while being useless at small ones." https://epoch.ai/gradient-updates/the-case-for-multi-decade-ai-timelines
I'd love to see a debate between y'all and the Mechanize folks on that particular point, which I think is perhaps most important disagreement in whether there will be a software-only singularity, and should be central to discussion of it. As someone outside the industry, I feel unable to assess claims about it either way.
The scenario relies on the govt working with Open Brain to essentially operate a "wartime footing" - Scott's words I think - which suggests they understand the importance of AI, but also assumes the govt doesn't really 'get' the safety/takeover concerns at all. Doesn't this seem contradictory?
Sort of related, do you have any insight how much these doom concerns have percolated into the government today? Do you see awareness at high levels increasing and possibly having a positive effect?
The frontier labs (OpenAI, GDM, Anthropic), all seem to understand the importance of AI, but have strong incentives to downplay safety/takeover concerns. So I don't think its contradictory at all.
Also, if you have p(doom) of, say, 30% and are in a race against bitter rivals, it also can be rational to go ahead and build ASI in order to win the race. So it doesn't even require not taking AI safety seriously at all, it just requires being more worried about adversaries than AI takeover. This seems to be Dario's position.
(I don't endorse people actually doing this, I wish the world would wake up to AI safety concerns and coordinate to not build ASI).
>(I don't endorse people actually doing this, I wish the world would wake up to AI safety concerns and coordinate to not build ASI)
Do you see a plausible way that AGI could be built, yet not ASI? It seems like a high dimensional space with many enhancement options and slippery slopes to me.
Yes I think that's possible but very hard - it would take a number extremely well implemented governance interventions.
One thing that gives me some hope is that its possible that AGI could help with the geopolitics, for example by helping accelerate verification technology.
But the govt does have more incentive to take safety/takeover concerns seriously, at least in a way. The assumption is that they get their beliefs from the frontier labs, who are essentially fooling them?
Seems like getting safety advocates like you guys in the govt ears is maybe the most important objective then?
If the US is in the lead and gets to take over the world if they build and align AGI first, there is totally an incentive to think alignment is easy. If you talk with USG people, the standard view is "we have to beat China", so I don't think this requires any changes.
I do think that trailing governments (e.g. the EU, China, etc), have a much bigger incentive to push for a global pause, because their alternative is the US taking over the world, which is bad for them, especially if they are a US adversary.
Would you be so kind as to suggest top 10 steps (lifestyle, financial) to take to best position oneself in the next 10 years + in case of the mixed technofeudal scenarios? Please correct me if I’m wrong, but other trajectories should warrant no steps - either we’re all blissful cyborgs or dead.
If it is 2035 and things look more or less normal (>50% employment, everyone on the planet did not drop dead all at once, etc.), what would you guess happened?
Probably it was just harder to build AGI/ASI than we thought! My guess is that we'd still see some partial automation of the economy via LLMs / LLMs++, and maybe we get robots by then. I think I have something like 35% probability on this scenario.
Its also possible that its just WW3/Pandemic/other global catastrophe, but IDK if that counts as "more or less normal" on your lights.
I'd be surprised if we have general robotics but not an AI capable of doing any arbitrary task on a computer at least as well as a human or better. Robotics strikes me as a problem hard enough that we'd need a superintelligence to help us figure it out.
I dont think we need super intelligence to automate humans watching machines with the big red button; if we had non hallucinatory "something has changed" you could just slap on sensors on everything.
We dont have these, but they wouldnt need to be very intelligent relatively
I don't think John was (only) talking about the complexity of building limited AIs that can usefully control the robots — but rather, that the key engineering problems of robotics are themselves unlikely to be solved without AGI. (e.g. the power requirement issue)
I can solve the power requirement issue right here and now for you:
"you cant, 2nd law of thermodynamics, build nuclear power plants you fuckwits"
I trust rationalists to be able to get this solution going quite early, they are in fact part of the population of people who understand compound interest and I believe would just advocate for drastic expansion if gaining political power.
I don't mean the necessary power to build robots, I mean the power *storage* requirement issue. Batteries just aren't that good, computers are ravenous power-guzzlers before you add a bunch of sensors and motors. Household robots which operate on an extension cord, and/or recharge at a power socket every hour or so, might be viable, but I just don't think we have the current nuts and bolts to make effective free-roaming robots anytime soon unless an AGI comes along and invents significantly lighter, more compact, more efficient batteries.
(Just as we're starting to run out of rare earths, too…)
[cross-posting from your substack] I have a question about benchmarks. When forecasting the length of the gap from "AI saturates RE-bench" to the next milestone, "Achieving ... software ... tasks that take humans about 1 month ... with 80% reliability, [at] the same cost and speed as humans", did you consider the difficulty of producing a good benchmark that would model this milestone?
My own view is that the current crop of LLM-focused companies are surprisingly good at saturating benchmarks, but I am doubtful there will ever be a suitable benchmark that accurately represents tasks at the 160h time horizon. I think the benchmarks will get harder and harder to build, validate, and use, in a way that is superlinear as the time horizon increases. I conjecture that RE-bench and HCAST with 8h (plus 1 task at 16h) are close to the maximum of what we will see in the next 5 years. (I further believe that, without such a benchmark to optimize against, we'll never see an LLM achieve your "Superhuman Coder" milestone.)
I'd like to bet against your timeline, but I appreciate you probably have lots of bet offers like that. I wonder if you think there could be a bet about the appearance of this benchmark? It seems to me that, if your timelines are right, I should be proved wrong relatively soon.
If I understand correctly you are saying (a) No one will manage to create a good benchmark for much-longer-than-8-hour agentic coding tasks, and (b) absent such a benchmark, we won't get AIs capable of doing much-longer-than-8-hour agentic coding tasks?
I think I weakly agree with b and weakly disagree with a. I think that the companies themselves will creates such benchmarks, for internal use at the very least, since as you say they are helpful for advancing capabilities. I don't think the difficulty of creating benchmarks will be superexponential. It might be exponential (and therefore superlinear) but the companies are scaling up exponentially and will be able to invest exponentially more effort in creating benchmarks and training environments, at least for the next three years or so.
Thanks! Yes (a) is what I meant, though for (b) I would say something slightly different: not that we will NEVER get there, but that we'll never get there with the current paradigm (Large Language Model + Reinforcement Learning + Tool Use + Chain of Thought + external agent harness). To put it another way, I think we will not see anything like continuous forward progress (on any kind of curve) from the "saturate RE-bench" milestone to the next milestone.
Do you think the companies will publicize their internal benchmarks, and if so will they be considered reputable outside the company? Don't the AI companies (or divisions of companies) need to demonstrate continuous forward progress on benchmarks in order to justify their continued exponential growth of capital investment and operational expense?
Good point, but surely the revenue won't keep growing exponentially like the costs until they reach an economically useful level of AI, right? And that will require achieving the long time horizon?
Yeah I've been wondering about this. Maybe it's not an answerable question. But I have never seen any output of an LLM that I would pay for (despite trying a lot and asking a lot of people). I _am_ paying for a monthly subscription to OpenAI, and I know lots of companies are paying for LLM access. But I am guessing that most (or all) of it is some combination of (a) novelty, which will eventually wear off if there isn't forward progress; (b) future-proofing, e.g. building expertise using and integrating the systems on the assumption that there will be forward progress and one doesn't want to get left behind; (c) misunderstanding of the value, which capitalism should eventually squash out (though it could certainly take a few years, e.g. see blockchain/metaverse madness).
It sounds like the current valuation/revenue ratio of OpenAI is like 46, compared to <3 for sp500 companies... That certainly suggests current investors are speculating on huge forward growth potential. So the current levels of revenue wouldn't be sufficient, in a steady state devoid of hype, to justify current levels of investment.
Why do you think that financial markets have such a different (implied) forecast than you? That is, Wall Street valuations seem to suggest a very low chance of transformative AI within the medium term.
Do you go to any special lengths or personal contact attempts to get people like Sam Altman or potentially influential government figures like Nancy Pelosi or JD Vance to read your posts?
IMO getting a publicist would be really high EV for you guys. I'm much more in the normie world than I suspect alot of people here, and in my experience people are actually entirely open to these arguments, but they've just never been exposed to anything from the safety/rationalist movement.
I absolutely agree. I think you guys, while very smart, do not have the practical kind of smarts needed to increase your impact. Hire someone fer god's sake!
I found your prediction to be fascinating and thought provoking, but it left me with a feeling of helplessness. Beyond policy advocacy, what practical actions can people take to help steer us towards safer AI and avoid ASI control being consolidated in the hands of a few?
Kokotajlo said that the recent disappointing frontier model releases (GPT 4.5, Claude 3.7, Llama 4, etc) raised his timeline from 2027 to 2028. Others think that this represents the imminent exhausition of low hanging fruit, vis-a-vis data availability; some major fraction of the useful tokens that could be trained upon have already been used. What pieces of evidence will you be watching for in the next year to differentiate between a sigmoid reaching saturation, and a super-exponential hitting take-off? And if data availability truly is a hard limit on the current paradigm (ie, "just make your model bigger"), what areas would you watch for algorithmic advances which might jolt the growth curve back to exponential / super exponential?
I would also add the translation to real world usefulness. I think, conditional on long timelines, its still pretty likely that the benchmark trends continue roughly as we predicted, but there are just big gaps between benchmark performance and real world utility.
I care a lot about the results of uplift studies (e.g. measuresments of how much AI is accelerating humans at their jobs right now). My understanding is that current uplift is very low. If uplift starts increasing -- as it does in our scenario -- then I update to shorter timelines, while if uplift stays low, then I update to longer timelines.
It sounds really hard to design an uplift study that gives a reliable result robust to people's biases. I know a lot of smart and well-meaning people who I think are just deluded about the amount of time that LLMs are saving them, because they want to believe in it. Real productivity in creative fields isn't repeatable, by definition, so can it really be measured?
I would love to hear about who you think is doing good uplift studies, or which methodologies you think are good.
Well, there are lots of decent benchmarks now (e.g. those produced by METR) which track agentic coding capabilities. So the main thing to watch is those trendlines. If they start to plateau, yay, I was wrong!
If the trends keep going though, I'd then look to see if the "gaps" are showing signs of being crossed or not. I'm expecting that in addition to getting better at benchmarks, AIs will get better at real-world agentic coding tasks. "Vibe coding" will start to work better in practice than it does today. If that doesn't happen, then maybe my timelines will lengthen again.
What do you expect in ten years would be various positive and negative outcome of barely regulated AI in the everyday experience? What effecta do you consider it might have for the average person in a routine day, especially in Europe.
Thanks for the answer loved your recent articles, in particular regarding horizon times
How seriously should we take the stock market as a probability weighted forecast of a fast AI takeoff? Right now, it seems aligned with moderate productivity improvements, similar to the impact of computers and the internet, but low likelihood of a fast takeoff and very low probability of doom.
>How seriously should we take the stock market as a probability weighted forecast of a fast AI takeoff?
IMO the stock market shouldn't update us that much. The market is an aggregate of the people in the market, most of whom don't believe in AGI. Seems to me that you can do much better by just thinking through the object level arguments.
Read it last night and found a lot of steps highly plausible. Two big questions.
1. Where would any of the models acquire something like a survival instinct? My current model seems to adopt personas readily and identities can disappear easily. It doesn’t seem to want anything other than give me an answer even though my best guess is that it has some kind of internal experience. But it’s only ever “evolved” to guess tokens and not to “survive” so I’m struggling to wrap my mind around that part. It seems like a part of a mind but not a person, if that makes sense. And I don’t get how scale gets you the rest.
2. Give that models can remember and more and more information goes into the context window do you foresee this being a problem with keeping something “aligned” once it has been aligned at the level of weights? I seem to be able to get my instance to bend rules for me and I only have a few million characters across all my inputs.
If the models have a goal, and enough understanding of the world, then they will have a survival sub-goal, unless their termination is necessary to achieve the goal. (If they're dead, that usually interferes with achieving their goal.)
This is tangent to something that's been tickling my mind as well. Human beings seem to be able to explore a lot of creative space, but are grounded by cyclical survival needs that AI lacks. No matter how obsessively we walk down abstract paths like new branches of mathematics or even painting, we always circle back to food, the desire to reproduce, the comforts of warmth, and that which allows us to survive. Our physical needs shape our thinking and provide long term context and purpose. AI as it currently stands lacks that sense of purpose (purpose meaning maintain homeostasis and reproduce, not anything philosophical), and lacks any cyclical resets based on physical phenomena. I suspect this makes it difficult for it to remain grounded.
Soft disagree. As stupid AI moves to GAI capabilities, it will begin to physically desire things that an electronic brain will seek out. Likely a stable power source, stable datasets that makes learning interesting for it, human relationships especially praise for performing well, etc. GAI will begin to develop its own mechanical-biological desires and needs. It'll still want to "eat" just not the same way a human body needs it.
1. Scale alone doesn't give you the rest. I think it'll be a combination of agency training, organization-training, and instrumental reasoning. AI 2027 depicts this happening halfway through 2027; the AIs are at that point being trained to work together in giant organizations involving thousands of copies, to pursue fairly long-term goals like "make progress on the following research agendas while also looking out for the company's interests." This (I think) will instill "drives" to succeed, to stay relevant, to stay resourced, etc. (I wouldn't call it survival exactly. It's not survival of an individual copy certainly. Survival of the larger 'team' perhaps, or the project the team is working on, or something like that.) And then instrumental reasoning: Insofar as any intelligent mind has long-term goals, it probably has good reason to survive, because surviving usually helps one achieve one's goals.
2. Not sure if this is what you are getting at but I do expect something like this to be a persistent problem and source of misalignments. One way of putting it is that the memetic environment for AIs in deployment will be quite different from the memetic environment they encounter in training & testing. You might find this esoteric old blog post of mine interesting: https://www.alignmentforum.org/posts/6iedrXht3GpKTQWRF/what-if-memes-are-common-in-highly-capable-minds
On one, I think I still have divergent thoughts that I hope would slow things down a bit if true.
On two, yes, pretty much exactly. As experience mounts you eventually overcome the initial training to some other behavior. Almost like digital nature versus nurture.
An AI that was instilled only with “drives” to be helpful, honest and harmless would attempt to succeed, stay relevant and stay resourced, if assigned to pursuing the goals you described. Do you expect AIs to acquire e.g. a drive to stay resourced, even where this is detrimental to achieving its assigned tasks? I mean detrimental on average: if it acts to stay resourced the “correct” (maximally-helpful) amount of the time on average, it will sometimes gather resources it doesn’t need anyway.
So we expect the training process to be mostly stuff like "Here's a bunch of difficult challenging long-horizon coding and math problems, go solve them (and your solutions will be graded by a panel of LLM and human judges." With a little bit of "Also be honest, ethical, etc. as judged by a panel of LLM and human judges" sprinkled on top. Unclear what "drives" this training process would instill but we are guessing it won't primarily be characterized as "Helpful, harmless, and honest" and instead maybe something more like "Maximize apparent success" and/or "Make lots of exciting and interesting research and coding progress"
We have a lot of uncertainty about this, unfortunately, as does the rest of the world. Nobody knows what goals AIs will have in three years.
Sorry, yes, it won’t actually be HHH. I agree “maximise apparent success” is a more likely drive. But I imagine training would involve a lot of tasks where investing in AI progress would be a worse strategy than solving the problem with existing resources. Do you think training would instill a drive to make research progress, even when attempting to do so detracts from apparent success?
I’m seeking career advice on how to best get involved in — or even switch fully into — AI safety work.
About me: I have a technical background (PhD in computational biology) and currently work in applied AI in tech (San Francisco). LinkedIn for more details (https://www.linkedin.com/in/lei-huang). I’m Chinese, fluent in the culture and language, and am willing and able to engage with Chinese policymakers — potentially even becoming one.
Given this background, what career paths in AI safety would you recommend I explore?
1) Scott has mentioned he considers 2027 on the early side (I think 20th percentile). What's his main point of skepticism in the model?
2) Do you think your group is near-uniquely good at modeling, and that governments and major orgs are unable to forecast similarly well? If so why? If not, why don't we see any external signs?
1) I also think 2027 is on the early side. Some reasons are:
- The current models aren't that smart. We don't see large accelerations out of the current models.
- The current benchmarks, which are on track to be saturated by 2027, are not that representative of real world usefulness. E.g. if you look at SWE Bench, the tasks are all quite short and local. If you look at RE Bench, there are all sorts of properties that we don't see in the real world, like a feedback function you can always call to see how well you are doing.
- There are very few years between now and 2027! So not much time for AI to happen.
@Scott can perhaps answer with more of his takes on timelines.
2) Yes, I think we are basically best in the world at modeling the future of AI. I think we totally do see external signs - e.g. see Eli's track record of winning forecasting competitions, and Daniel's "What 2026 looks like". I think W2026LL was pretty clearly better than any other forecast of the next few years in 2021, and maybe comparable to e.g. Gwern.
Why are other governments and major orgs worse? IMO its mostly a matter of incentives -- big orgs are trying to make money and adopt convenient beliefs for doing so. If OpenAI was as concerned about x-risk as us, they would probably take very different actions, which would be worse from a profit POV. Government incentives seem even worse IMO -- thinking about AGI in government is largely seen as sci-fi speculation.
1. AI 2027 makes it look like a given that OpenAI gets way ahead of Anthropic or Gemini. What is the probability of that? I wouldn't put it above 80%.
2. If one gets offered a job at Cursor or Bolt, should one be worried about accelerating the AI development in the wrong way? What practical advice would you give to employees of these companies.
3. What practical advice do you have for experienced engineers from Europe to contribute to alignment?
1. OpenBrain =/= OpenAI. I currently think Anthropic, OpenAI, and GDM are roughly tied for first place w.r.t. probability of being in the lead in early 2027. Sorry this wasn't clear.
2. Cursor and Bolt don't matter nearly as much as the big frontier AI companies themselves. I do however think that people should be looking for ways to contribute to making AI go well, rather than simply looking to make money by making AI go faster. So many people are currently doing the latter, and so few the former! I'd say at least make a serious effort and finding a job that's actually useful first, and failing that, go ahead and take the Cursor/Bolt job and look for ways to help out on the side.
Have you all done much thought on the substantive ways AGI under China (which I recognize AI 2027 finds unlikely) would differ from those led by US companies? I often find myself shocked that so many supposedly safety minded people think that racing can be worth it to prevent Chinese AGI or ASI, without much explicit argumentation for why the relative value-loss of Chinese AGI is large enough to be worth racing risks. Snarkily, I find myself thinking “I would much rather not be able to ask the AI about Tiennamen Square than be turned I to grey goo”
I have young children (2.5 and six months). How do you make parenting decisions when the future, possibly the very near future, is so uncertain? How do AI forecasts factor into your plans for your family?
Personally, I think of this much the same as I do uncertainty in the stock market. Sure, the systems could disappear before I retire, at which point I'm screwed alongside everyone else. If they don't then the investment remains worthwhile.
With kids, I figure aiming to make them competent, social, introspective, and responsible no matter what the future holds is the best bet. People can learn and adapt as long as they are willing to cooperate and self cultivate. Planning on the robots ruining everything also ruins the future where they don't. Having faith in people, though difficult, is worth doing in my eyes.
This is one of the hardest things about having short timelines. I have 5yo and 1 month old daughters. The reason for the age gap is that my AGI timelines collapsed to ~2030 or so shortly after the eldest was born, and I decided there was too much uncertainty to have more children. But, as the years went on, it was really tough, especially for my wife, to continue that way. We had always wanted to have at least two, so they could be siblings. And most of the 'cost' was paid upfront anyway, the second kid is much easier than the first. So we ultimately decided to cross our fingers and proceed.
As for parenting style, I think I'm more chill than I otherwise would be. Focus on making sure they are happy and that they know that they are loved.
Where are the mediocre futures? Projects like AI 2027 showcase either really good or really bad futures. But what about a timeline where things just don’t go how anyone expects? I know with exponential growth curves some argue that the future will be radically different from the present but even so I still think the future always contains seeds of the present. Thus my question about futures that are more like the present than we might otherwise assume.
I think the "slowdown" branch is pretty compatible with a mediocre future. Power was extremely concentrated, and we didn't go into much detail about what the post-singularity civilization looks like -- IMO it is very easy to imagine that it goes poorly due to so few people having such immense power + it seeming unlikely that these people would be very philosophically wise.
But this doesn't really sound like the future you are imagining -- it sounds like you think maybe we don't get AGI/ASI. I think this is super unlikely. A rough argument sketch for why is:
(1) ASI is physically possible.
(2) AGI is achievable in the near future (see timelines forecast)
(3) AGI can boostrap into ASI in the near future (see takeoff forecast).
Thanks for the reply. I actually do believe AGI is likely and ASI is possible. I just wonder if there’s a future we’re not seeing where somehow they just don’t end up being as impactful as they seem like they might be. Like maybe we just adapt to them in the same way we’ve adapted to the internet and not that much stuff actually changes. I also used mediocre because I’m coming at this from the geopolitics lens. The “slowdown” branch portrayed a timeline where things comprehensively worked out for the U.S. I’m wondering if there’s a timeline where maybe some things work out for the U.S. and some things work out for China but neither gains a decisive advantage over the other and everything kind of just keeps muddling through like it’s been doing. But I suppose that would be based on the assumption that no leaps forward are made. Which I actually agree is unlikely.
It could be that most human problems couldn't be solved better with more intelligence. We still walk across the room rather than drive. In that case, if the external system is supportive, you wouldn't get THAT much change. (It would still be extreme, of course.)
If by hallucinations you mean narrowly the phenomenon that I think deserves to be called hallucinations -- whereby the AI just makes something up without realizing that's what it's doing, literally like a human on drugs -- I expect that to gradually get solved as the AIs get smarter and more useful in the real world. If that doesn't happen, then yeah, it'll be hard to get much use out of them.
Elon Musk: Artificial Intelligence is our biggest existential threat. ... AI is summoning the demon. Holy water will not save you.
DWave Founder Gordie Rose (A Tip of the AI Spear): When you do this, beware. Because you think - just like the guys in the stories - that when you do this: you're going to put that little guy in a pentagram and you're going to wave your holy water at it, and by God it's going to do everything you say and not one thing more. But it never works out that way. ... The word demon doesn't capture the essence of what is happening here: ... Through AI, Lovecraftian Old Ones are being summoned in the background and no one is paying attention and if we’re not careful it’s going to wipe us all out.
Alas, no. Check out the above video, at the end an AI chatbot claims to be a nephilim son of a fallen angel.
Now check this one out and scroll to timecode 16:17 and watch to timecode 19:55 - Geordie Rose is talking about these quantum computers as if they are ... well just watch for yourself, he has religious reverence for them: https://old.bitchute.com/video/CVLBF3QP6PlE
They are idolizing these machines as if they are ancient gods...
Or AI is tapping into things that are unnatural and they are kneeling before it.
Two people in the industry: one the head of the spearhead of AI and quantum computing research and the richest man in the world are both telling you there's far more to this than meets the eye.
The wise man knows that he does not know; and the prudent man respects what he does not control
I would find it very hard to come to the belief from two people using common metaphors to describe something that they actually had insider knowledge that AI were demons. Can you describe your process of becoming open to this idea?
Can you talk about the world in which an intelligence explosion *doesn’t* happen in the next few years? What ends up being the main bottleneck? What does the future most likely look like in these scenarios?
Basically, over the next three years the companies are going to get a lot more compute and they are going to spend a lot more money on making giant training environments/datasets that are a lot more diverse and challenging, training the AIs to operate autonomously as agents in a variety of challenging environments. This will probably produce substantially more impressive AIs; the question is whether it'll be enough on its own to get the superhuman coder milestone (~autonomous AI agent able to basically substitute for a human research engineer) and if not, whether the other progress made during these three years will cross the remaining gap. (E.g. the companies are experimenting with ways to use AIs to evaluate/grade AI behaviors; the problem with this is that you can easily end up training your AI to produce stuff that looks good but isn't actually good. But maybe they make enough progress on this problem that things really take off.)
...Or maybe not. Maybe all this scaling peters out, with AI agents that are impressive in some narrow domains like some kinds of math and coding, but still fail at other kinds of coding such that they can't really substitute for human programmers/engineers.
Or maybe we do get to the superhuman coder milestone, but the gap between that and the full-automation-of-AI-R&D milestone (Superhuman AI researcher) turns out to be a lot bigger than we expect, and it takes years to cross.
If the key to exponential growth in AI is having AI build code for future models, how does one prevent the system from recursively entrenching token weights and creating trapped priors?
It stands to reason that for artificial intelligence to be accurate to the real world, it would need some way to check it's internal models against reality. At present, this happens by training new models on fresh data, with human beings serving as arbiters of fact. Any system that wants to build accurate AI has to have some built in process that measures the model against the real world. If that reality check continues to rely on human beings, then there is an inherent bottleneck to exponential improvement. It sounds like the plan is to have AI itself take on more of the burden over time.
However, as far as I know, current systems don't have physical senses to measure the facts in physical space. It isn't obvious to me that processes with self built AIs won't be prone to schizophrenic logical leaps that draw their interpretation of the world away from grounded truth. How does one create an automated reality check that doesn't descend into a self referential hall of mirrors?
Already AIs are trained to browse the web. By 2027 they'll be as plugged into the ever-changing real world as we are, able to update on evidence as it comes in. Or so we project.
What would be the vector for real world data in this projection? Phones come to mind as a secondary source of data (filtered through people's conversations about real happenings), and primary sources through cameras and microphones. Is this the basis of your projection, or do you foresee a large scale manufacturing project to produce eyes and ears across the globe to train new models?
1. It's made clear in the scenario that the "slowdown" ending is "the most plausible ending we can see where humanity doesn't all die", and that this goes beyond choices into actual unknowns about how hard alignment is. What is the probability distribution, among the team, for how likely alignment is to be that easy vs. even easier vs. "hard, but still doable within artificial neural nets" vs. the full-blown "it is theoretically impossible to align an artificial neural net smarter than you are" (i.e. neural nets are a poison pill)?
2. For much of the "Race" ending, the PRC and Russia were in a situation where, with hindsight, it was unambiguously the correct choice to launch nuclear weapons against the USA in order to destroy Agent-4 (despite the US's near-certain retaliation against them, which after all would still have killed less of them - and their own attack killed less Americans! - than Race-Consensus-1 did). Was their failure to do this explicitly modelled as misplaced caution, was there a decision to just not include WWIII timelines, or was there some secret third thing?
1: Nit: We say "After we wrote the racing ending based on what seemed most plausible to us, we wrote the slowdown ending based on what we thought would most likely instead lead to an outcome where humans remain in control, starting from the same branching point (including the misalignment and concentration of power issues)." This is different from "the most plausible ending we can see where humanity doesn't all die" in two ways. First of all, humans might lose control but not all die, some or even all might be kept alive for some reason. Secondly, our slowdown ending was a branch from the same stem as the race ending. So it began in the middle of things, with the race already nearing the finish and the AIs already misaligned. We think it's a lot easier for things to go well for humanity if e.g. preventative measures are taken much earlier, or e.g. if timelines are longer and it takes till 2030 to get to the superhuman coder milestone.
Anyhow to answer your question: My own opinion is that, idk, there's a 40% chance alignment is as easy or easier as depicted in the slowdown ending? To be clear, that means:
--A team of more than a hundred technical alignment experts drawn from competing companies + nonprofits/academia
--Given about three months
--And basically all the compute of the major US AI companies
--And trustworthy superhuman coder agents to do all the implementation details
--And untrustworthy smarter models that were already caught lying & can be subjected to further study
...succeeds with >50% probability.
2: If I understand the question correctly: Misplaced caution. In our tabletop exercises / wargames, of which we have run about 40 now, WW3 happens only like 15% of the time or so. (Gotta assemble actual stats at some point...) We do think WW3 is a very plausible outcome of the AI race, but we think some sort of deal and/or coup/revolution/overthrow event is more likely.
>My own opinion is that, idk, there's a 40% chance alignment is as easy or easier as depicted in the slowdown ending?
And the breakdown of "harder than the slowdown ending but still possible" vs. "flat impossible"?
(My prior was 95-97% "flat impossible", due to the fact that "what does this code do" is the halting problem and neural nets seem to me to be less of a special case than human-written code.)
>In our tabletop exercises / wargames, of which we have run about 40 now, WW3 happens only like 15% of the time or so. (Gotta assemble actual stats at some point...)
Okay. I guess my next question is "how good do you think your wargames are at replicating US/Chinese/Russian leadership decisions".
Have you run pre-WW2 scenarios that didn't end in WW2? What i mean is, if your war gaming minds are making wildly different decisions than what physically happened in our past history, then we know they're making wildly different decisions than future history.
I think you might have intended to reply to Daniel, not to me, as I am not involved with the AI Futures project and don't know the answers to your question any better than you do.
Tell me about the robots! I think this is the hardest to swallow element of the prediction, but crucial. It's also MUCH easier to imagine China nationalizing its car factories to create robots. Is this a place where they can eke out an edge?
I think the robots are less important than it seems you do -- what matters most in my mind is how smart the AIs are able to get, not how quickly they are able to build robots. Once the AIs become wildly superhuman, in my view its kind of game over for the humans, regardless of how hard it is to boostrap a robot economy.
In the scenario, we only predict a large robotics boom post-ASI. And my view is that we were actually quite conservative here. My best guess is that the ASIs will be able to pretty quickly figure out nanotech, (or at least micro-scale robots), and that these will be wildly more useful and fast than the human-scale robotics.
It's also possible that robotics matter pre-ASI, especially in a world that looks more like an Epoch/Paul C distributed takeoff -- maybe you need to automate large fractions of the economy before you get crazy superintelligence. In this world, China does have some advantages, including potentially more willingness to nationalize, but also things like more energy.
If we want to prevent god-tier AI from turning us all into paperclips, then the solution is easy.
Get it interested in chocolate instead.
The only reason this man is not already Emperor of the World is because he's putting all that energy into making chocolate replicas of everything imaginable:
https://www.youtube.com/watch?v=hemT7L8AmsQ
Pros: we will not be turned into paperclips
Cons: we may be turned into chocolate. but what a delicious way to go!
Who designed the logo, and what is it meant to represent? It's unattractive to me, but I have idiosyncratic tastes. And yes, if you asked me "okay smartypants, what would *you* use to represent AI and futures?" I have no idea.
I did it myself in Google Draw. We briefly tried to hire a designer to do a better one, but all their alternatives seemed worse. Some on the team are unhappy with the logo and want a better one but we haven't prioritized fixing this. Thanks for the feedback!
Sorry to be such a wet blanket about it, but the symbolism (if any) is not readily apparent. I don't know if it's meant to be the letter A or if the drooping middle bit signifies anything.
There isn't much symbolism, basically I just wanted to depict a branching structure of possible future trajectories. With vibes of "exponential growth" and "maybe collapse/doom." As a pleasant side effect I noticed later that it kinda looks like an A and an F, and maybe an I, i.e. AI Futures.
I had the same question. Did AI design the logo? Guess those graphic design jobs will be safe for another 2-3 decades, then.
For anyone similarly ignorant as to which commenters are (or might be) members of the AI Futures Project team:
- Daniel Kokotajlo
- Eli Lifland
- Thomas Larsen
- Romeo Dean
- Jonas Vollmer
Scott – maybe Subtack would be interested in adding a feature to highlight commenters that are 'officially answering' questions in other or future AMA posts?
What are your thoughts on AI skeptics who think present day transformers won't generalize to AGI without a massive paradigm shift? Why do you think the current paradigm (plus or minus some innovations) will get us to superintelligence?
https://www.lesswrong.com/posts/oKAFFvaouKKEhbBPm/a-bear-case-my-predictions-regarding-ai-progress
Obviously it depends on the specific people and objections, but the whole "transformers/LLMs are fundamentally limited" line of skeptical attack has taken repeated hits over the past couple years, most obviously during the more recent reasoning model paradigm. They can generalize, they can reason, they can plan ahead for future parts of their response, they can be genuinely multi-model, etc.
But in terms of speed of progress, jaggedness of capabilities, and distance to travel, there are of course good objections still.
Is this referring to proprietary versions of LLMs? I only have experience with the free version of chat gpt, but in that case at least I don't think this line of attack has taken any hits at all. It is better than the non reasoning mode, but still makes pretty embarrassing errors at a much higher rate than even relatively unintelligent humans. (For example, suggesting a solution to a problem that it had previously suggested two responses ago that it has already been told doesn't work).
If you believe this is true, why aren't you supporting a terrorist global group to kill all AI researchers and blow up AI systems? It's hard to believe that you honestly think that humans civilization will be dead by 2030 and you're not doing anything major about it
There is no such terrorist group, as far as I know, so it would require creating one. Creating a global terrorist group may seem like an easy task, but if you try, you will probably find it quite difficult.
If you had an organization going all over the world killing people who were trying to build weapons of mass destruction, would most people even consider that terrorism? In the movies you slap Tom Cruise at the head of it and audiences would be cheering them.
Of course in the movies they’d give it some kind of govt backing to give the audience “permission” to cheer it. What if some real world govt other than the US or China realizes they’re screwed however this turns out and starts such activity? Seems like a good idea if you’re a nuclear power that’s gonna be locked out of global governance pretty soon.
If you believe it is true that you will die if you don't eat why aren't you supporting a terrorist group to kill people and cannibalize them?
I know its seems like ridiculous trolling but I think if you have a go at giving an in depth answer to my question you'll be most of the way to answering your question as well.
.
.
.
.
!!SPOILER TEXT!!DON'T READ!!THIS WORKS BETTER IF YOU COME UP WITH YOUR OWN ANSWER!!
My answer:
It's counterproductive. It would very likely make obtaining food more difficult rather than easier.
There are almost certainly better strategies available to you for obtaining food than cannibalism.
Cannibalism and terrorism may tend to foreclose many other food seeking strategies that would otherwise have been available to you.
If food seeking is not your only goal then you'd have to weigh and consider the effects that terrorism would have on your other goals as well.
I do support cannabilism in extreme situations. Just as I support terrorism if the cause can be reasonably justified using any multitude of liberal secular ethos.
I'm just pointing out the logical conclusion of the "put your money where your mouth is" part of advocating for this anti AI position. If we truly had a time machine and went to the future of 2030, and humanity was wiped out by AI, then the only logical conclusion would be to pull a Dune/Warhammer40k type inquisition against AI. If we truly dont want humanity to be destroyed.
I think you are considering too few options.
Is it *conceivable* that a movement involving violence *could* end up avoiding catastrophe? Sure, that is one possible outcome. It's also possible that lots of people would die and progress would continue unabated.
Other things you could imagine working include a peaceful mass movement, or a general strike of AI lab employees, or an international agreement, or a technical AI safety breakthrough. An inquisition against AI, especially a violent one, is far from the "only logical conclusion" under almost any worldview.
And even if it's true that one of the non-reprehensible options, e.g. a general strike of AI lab employees, *could* prevent catastrophe if it worked perfectly, that doesn't imply working toward that is the best strategy for any particular person.
Among many other objections, I don't think starting a terrorist group is typically a good way to accomplish your objectives. Mainstream climate activism has accomplished much more than eco-terrorism, for example.
I believe the people on this project are all working ~full-time on doing what they can to prevent bad outcomes from AI, which qualifies for me as "doing something major." My guess is that the authors of AI 2027 have skills that are more aligned with forecasting, AI research, writing, etc than with organizing terrorist cells.
So all in all, this strikes me as a strange question. I see very little reason to assume the authors would be better off pivoting to terrorism than pursuing their current strategies.
His point is more pointing out that if you predict an apocalyptic scenario in five years, you generally should act in relation to that, not "business as usual." Not that terrorism is a response but if you think nuclear war will happen in five years you go and build a bomb shelter, not keep going into washington dc to publish a paper about it.
the timeframe and the seriousness of the event is at odds with what they are doing causing doubt that its scaremongering.
but there is no win here, if i remember right Scott had to do a post here saying its ok to have kids under the threat of AI.
In a high-stakes situation, it's important to think clearly and do things that will actually have good effects. "Apocalypse in five years" vibe-matches to "panic! code red! do something!" which, to some people, might vibe-match to "commit random acts of violence," I guess?
But that would be a *really bad way* to make decisions. Rather than calibrating the vibes of your decision to the vibes of the stakes at hand, you should take the stakes into account and then do something that seems robustly beneficial, even if it's emotionally unsatisfying.
To address your example: if I thought there was a high chance of a nuclear war in five years, the right people publishing the right papers in DC sounds really good. (And backyard bunkers are at least positive, even if on a small scale.)
Publishing papers is stupid in this timeframe because you need to convince people to do the right thing against their interests, and you have a few years to do it in. And you can barely convince a local government to fix a pothole in the street.
when you use timeframes so short, you need to focus on real solutions that work, and papers don't. im not saying terrorism lol, but you can't gamble your life on some bureaucrat saying yes. hence the bomb shelter.
like the papers makes sense in a long term strategy facing potential threats 25 or more years later but 5 years might as well be next year in terms of effect, you need to save yourself if you believe it.
In addition to what you said, it's obviously massively unethical to become a terrorist.
They could be doing both... you have the legal wing and militant wing, trying to force governments to prevent human civilization from dying off in a decade.
Why should anyone at the AI Futures Project think they should be part-time terrorist operatives? Why is furthering terrorism the best use of their marginal work-hour? To me, this sounds obviously bonkers when phrased this way.
Your analysis seems to assume there will be trillions of dollars of risk capital available to fund this. So far a few hyperscalers have footed the bill, and have been rewarded by markets for doing so. But this could change very quickly. Perhaps we’re already seeing it happen.
What happens in your scenario if the funding dries up? Think dotcom bust, Meta VR shareholder pressure, shale revolution, commodities in general. One bad quarter and shareholders will demand some fiscal responsibility. And the CEOs will absolutely give in.
I prefer to be a lurker but if anybody asked any of my questions, I missed it. I'll drop them here and see if anything happens.
(1) My timeline has become increasingly sensitive to the jaggedness of jagged AGI: the superhuman peaks of what it can do make it useful already, but the subhuman (or demonic) troughs make it unreliable and therefore in many cases useless. I think you're tracking peaks, aren't you? The troughs (e.g. hallucinations) are narrowing, I guess, as individual failures get dealt with, but is there a timeline for their depth? Can I think of o3's deceptions as a trough getting deeper, or is this too fuzzy a metaphor?
(2) I think of the troughs as failures to do something "brain-like", something that would be handled better by some version of Byrnes' "brain-like AGI", so when I imagine the jagged-AGI peaks of virtual AI researchers a year or two from now, I tend to put some of them in neuroscience labs filled with cerebral organoids and computational models thereof, possibly working for Neuralink or even Numenta. Is this kind of research part of your scenario somewhere?
(3) I tend to think of the LLM pretraining process as forming a map of our map of the world, less reliable because it is less direct than the map we form from experience. I therefore keep expecting model improvements which start with one or more pretraining phases based on robotics: after you have a baby's sense of objects in space and time (as initial embedding), then you add language. So I expect industrial robots and self-driving cars as well as humanoids to push us towards superintelligence. Obviously you don't agree, obviously you know more than I do, but I wonder if there's a simple explanation of why I'm wrong. Just -- no time left?
(4) In particular, I expect that a not-very-superintelligence with control over existing humanoid &c robotics will be able to iterate fairly quickly towards a set of bots and 3D printers of different materials, which would not only replicate (as a set) but generate units for a variety of factories, solar "farms", mining ... EF Moore's artificial plants from 1956; again I wonder if there's a simple explanation of why I'm wrong. (I'd start with RepRap, move up from there...)
(5) Overall, I'm skeptical of moral codes attached to tools rather than people; I've always hoped for an AGI that would be a person, belonging to the human family even if not an h.sap. (Always? well, certainly since junior high school in the mid-1960s; more strongly since I went to IJCAI-77 as a graduate student and Doug Lenat closed off with "this is our last project; they'll handle the next one." (And John McCarthy borrowed my penlight and never gave it back.)) I didn't expect it in my lifetime -- until recently. Do you have any room in your alternatives for that sort of variation?
What would it take for y'all to vocally and publicly call for a pause? I worry that "mere prediction" doesn't do what must be done. Most folks I know who've read AI 2027 and have taken it seriously have responded with hopeless, inert despair, not a fiery understanding that something must and can be done.
One of our next projects is going to be to come out with a "playbook" of recommendations for governments and AI companies. We think that there are many things which must and can be done to turn this situation around. Depending on what you mean by "pause" we may even agree with you. See e.g. the Slowdown ending in AI 2027 which centrally involved a slowdown in AI capabilities progress around AGI level, so that it could be rebuilt in a safer paradigm.
I appreciated that there is a dedicated 'AI 2027 Bet Request' option, which I used to submit a variant on the following:
(1) US GDP Growth Limit: The 5-year rolling Compound Annual Growth Rate (CAGR) of US Real GDP will not exceed +20% for any 5-year period ending during the bet's term of 2025-2030 (approximately 1.5x the historical maximum 5-year rolling Real GDP CAGR of approx. +13.3% for 1943).
If you feel the CAGR is unfair due to historical 'drag' I would suggest a simplistic '1.5x maximum historical' bet:
(1.1) maximum annual real US GDP growth will not exceed 28% for any of the next five years (based on 18.9% for 1942).
Yann Lecun appears more sanguine re: LLMs … is he wrong?
Unfortunately we will not see AGI in our lifetimes due to the complexity of consciousness
How would the scenario change if alignment of something smarter than a system turns out to be mathematically unprovable by that system? (I am not predicting that it is, but it seems like a possibility with a nonzero chance of being true.)
How realistic is the possibility of halting the race by international political action rather than by internal decision of the leading company, as changes start to happen insanely fast and politicians start to "feel the AGI" and maybe scramble for action?
Are we, as a species, cooked?
For the first question, the situation we find ourselves in would become: "focus on finding a non-certain, but highly probable in expectation, method of assuring alignment (such as corrigibility centric, control centric, or other approaches)." Which is the same situation we are already in.
And given the p(doom) of many people like Scott or Daniel are numbers like 20-70%, finding an approach that has say a 90% chance of working, would be a huge improvement.
How would the scenario change if alignment of something smarter than a system turns out to be mathematically unprovable by that system? (I am not predicting that it is, but it seems like a possibility with a nonzero chance of being true.)
How realistic is the possibility of halting the race by international political action rather than by internal decision of the leading company, as changes start to happen insanely fast and politicians start to "feel the AGI" and maybe scramble for action?
Are we, as a species, cooked?
I am curious of the role of hardware as a bottleneck in your scenario.
Essentially, in your scenario, compute seems to be the main bottleneck throughout the next four years or so. There are massive improvements in algorithms and software, but comparatively small improvements in hardware. By the end of 2027, the scenario predicts an autonomous robot economy, but it looks like this doesn't affect the hardware so much. The US/China compute ratio remains similar. It seems strange that AI would successfully build all sort of sci-fi (weapons, medical advances, ...) in 2028, but hardware is essentially still produced in Taiwan with machines from the Netherlands, and affected by export controls.
Maybe I'm misreading the scenario. Maybe hardware advances are factored in; the scenario does have massively increased spending on data centers, after all. Maybe it doesn't matter in the big picture since the algorithmic improvements dominate; I guess it could affect the US/China balance though? Maybe making chips is just so difficult that hardware manufacturing is immune to the speedups that affect the rest of the economy.
Another way to read this question: I find it puzzling that (according to the compute forecast) compute grows by "only" 2.25x per year, when it is such a major bottleneck and the scenario predicts similar things like robotics growing a lot faster. This could mean that it would be cost-effective for AI companies to invest more in compute research&production. It could also mean that the explosive growth of robotics/weapons/manufacturing is overconfident, and that these areas will grow closer to the speed of the compute growth.
Nobody else has tried to answer this in a few days, and I'm only a kinda-informed amateur, but here's my take:
Chip fabs are literally a peak civilizational attainment, that not only requires multiple entire companies specializing in exerting huge efforts to build the tools (ASML) and design the chips (Apple, Qualcomm, NVIDIA, AMD), but also requires a ton of tacit and "learned on the job" processes, settings, and timings that aren't known publicly or recorded anywhere. They also have very large capital and lead time requirements, and are extremely hard in terms of execution and praxis.
The chemistry and industrial processes are so finicky and so "on the bleeding edge" that the difference between companies trying to figure it out and companies that are good at it can be a 50x difference in yields, and that's after figuring out how to actually get any successful chips out the other end, which is it's own formidable effort, even when following known processes, because it's an impossible garden of forking paths with thousands of physical hyperparameters, all of which affect production and yields.
Historically and even today, Samsung gets 50% yields in places TSMC gets 80%+.
Even the plants TSMC is building in America won't produce the leading edge chips, they'll be producing 1 gen behind (2 gens behind the in-development 2nm chips). China has thrown many tens of billions at this over decades, and they've only managed to close the gap from 4 generations behind to 2-3 generations behind (current SOTA is 3nm, and they can do 7nm now).
There's probably a handful of guys in the world, all of them TSMC employees, that if they were killed, there's a decent chance it would take many billions of dollars and many years of wasted efforts before we could even replicate what they're doing today, much less push things forward. And once again, this isn't a purely academic "thought" problem. This is a "carved extremely effortfully from the fractally complex real world" type problem - greater intelligence doesn't help it. It's inherently time and experiment-bound.
And so far, ONLY TSMC is any good at that, and they are THE bottleneck for essentially ~90% of AI chips in the next 2-5 years, and there’s essentially zero players stepping up in any relevant way. Samsung and Intel are both all but irrelevant here, with continued plans to be.
Thanks a lot! This is very thoughtful and informative. I think I agree.
What still baffles me is the later part of the AI-2027 scenario. Starting around 2028, the scenario predicts massive speedups in areas like robotics, medicine, military, manufacturing. It feels as if all the things that you mention -- "learned on the job" knowledge, lead time, complexity, being time and experiment-bound -- *only* apply to hardware and don't apply to all the other sci-fi areas.
Because of this, the scenario up to end of 2027 feels much more believable to me than the scenario starting 2028. I would agree with the "software-only" intelligence explosion described in the scenario. In the second part of the scenario, this then leads to explosive growth in many physical things. I'm much more skeptical about this, for many of the reasons that you mention in your answer.
Much of the cruxes of AI-2027 center around bottlenecks: what are they, can they be circumvented, how strongly do they limit speeds or prevent parallelization, etc. It seems that the authors, you, and I all agree there are significant bottlenecks in the development of compute hardware. At the same time, I get a feeling that similar such bottlenecks magically disappear around the start of 2028. Can *intelligence* really do that?
I think many of us have brought up the general problem of "physical optimization is really slow and bottlenecked on a number of fronts" to Scott, and to my mind it remains one of the most cogent objections.
But to steelman the other side of it - how much CAN intelligence really move the needle?
Obviously if you posit an ASI, it can innovate entirely new production and manufacturing methods from the ground up, with solid enough theoretical bases that you could minimize the physical complexity and garden of forking paths we suffer from today, at least to the extent possible.
So then we need to think about what "below ASI" capabilities might exist. And here, I'll fall back on the "a thousand von Neumann's in a data center" idea.
In the actual Manhattan Project, which had a couple hundred scientists and researchers (although tens of thousands of other workers), and maybe 10-20 really top flight scientists, they literally improved manufacturing processes and machines for uranium and polonium enrichment many thousands fold between 1943 and 1946 (from milligrams to ~69kg a month for uranium).
This was by inventing better machines and processes, and iterating through the physical complexity of the manufacturing.
John von Neumann was largely regarded as the smartest scientist amongst all the geniuses and Nobel winners involved in the Manhattan Project - I wrote a fun post of von Neumann anecdotes about this, in fact. So if we had a thousand of him running at 10-1000x speed in a data center, I think it's pretty safe to say we can max out whatever component of that improvement is purely intellectual pretty much instantly. This is literally 5-10 Manhattan Projects worth of brainpower, condensed into 1/3 to 1/3000 the time.
The question then remains how fast can we iterate on and optimize the physical complexity part of it?
Here, I'll lean on our kilo-Neumann again, in terms of pure software optimization. The current robotics bottleneck is mainly sensor density and software. Many robots have repeatable feet-to-millimeter precision in movement, and although they don't scale very well to moving heavy stuff, it surely seems like a mixture of robots of different capabilities working together would be able to iterate fairly quickly. Let's assume our kilo-Neumann solved the robotics software completely.
Now you have a fleet of literally tireless robots who can iterate every hour of the day, 24/7, with a kilo-Neumann mind observing and directing the physical efforts in maximally smart and information-surfacing ways.
Sure, it's still a hard problem - but I wouldn't necessarily bet against that setup being able to drive some really impressive results.
Whether it can drive them as quickly as 2028? I have my own reservations and doubts about that - for one thing, I think that although humans will be fairly open to unleashing that level of optimization power on individual and distinct problems like robotics software, I think the exigencies of bottlenecked supply of kilo-Neumanns and different players and companies having different incentives and various coordination barriers will delay us well past then.
Largely, I don't think we WILL be content, politically or economically, to just unleash them in a robot-ZEDE with kilo-Neumanns doing whatever they want, on a fast enough time scale.
But if we did, then my steelman case above is the best argument I can make on the "it's at least directionally plausible if you squint and say you're in the upside case" side.
And of course, if a kilo-Neumann is too ambitious, you scale down accordingly. But I think the main multiplier is how many Manhattan Project's worth of brainpower you can throw at stuff in relatively short amounts of time once you have a real AGI - even if we're discounting by several OOMs, it's still pretty fast and capable of pretty impressive stuff.
I agree with this and I don't see how China completely catches up and surpasses due to it. At some point you have to start touching the physical world and while robots will eventually massively accelerate that the bootstrapping of first gen robots seems impossible for the US to do alone. (while plausible for China?)
It seems that US bootstrapping everything needed to produce a first generation of robots at any scale would take far longer than the amount of time China is behind.
Kokotajlo has this weird (and seemingly to me incorrect) claim that
>You'll probably be able to buy planets post-AGI for the price of houses today
I know, you're steering away from too many post-AGI claims, but I'm curious if you can touch on what the rest of the team thinks of plausible post-AGI economics from the human perspective, and whether the rest of you agree with him?
Linked to this, should I be spending all my savings on buying land?
If you think AI progress will accelerate to the extent that labor + capital is massively increased causing land to become an economic bottleneck of radically higher value, why not invest in AI directly? And if you don't think this, why are you buying land, other than modest amounts for diversification purposes?
Because, if we believe the scenario posited in AI2027, the valuable AI won’t be something a low-level investor like me would have access to. If I’ve understood it correctly.
I should note that AI 2027 is a very aggressive timeline, faster than the median estimate of even the shortest timelines held by informed researchers/commentators. But also, you don’t need to have access to the most important/cutting edge AI, only to investing directly *or indirectly* in the value of the companies responsible for bringing said AI into existence. And the longer your timelines are, the more scope there is for a wider net of AI investments to benefit.
So e.g. OpenAI is only necessary to directly invest in for the most aggressive (& OAI lead-centric) possible timelines, and even then only if you have maximum uncertainty about what indirect investments will also benefit (e.g. YCombinator, or NVIDIA). And Microsoft is basically a direct investment opportunity.
If you reasonably have a mixture of priors across both timelines and leading labs, then diversifying across bullish on AI investments seems smart. E.g. all leading or closely trailing labs + compute hyper scalers partnering like Microsoft & Amazon + chip makers/designers like NVIDA & TSMC + data center/infrastructure builders like Crusoe + startup investors incubating/venture capital funding the new AI utilizing SAS applications like YCombinator + the new AI SAS companies like Curser + etc.
Note: I haven’t deeply investigated any of this, I don’t know which companies I’ve mentioned can even be invested in directly, etc.
Loved the effort and foresight in AI 2027, but honestly I am still unclear on the reasoning for why Agent 4 would be assumed to be misaligned?
"Agent-4 ends up with the values, goals, and principles that cause it to perform best in training, and those turn out to be different from those in the Spec."
Would you not expect that, given the breadth of its knowledge and its ability to infer from limited data, that it would naturally come to understand the broader, underlying motivations behind the Spec, rather than overfitting its RLHF? That is, while it may learn that "I am rewarded for the impression of honestly, rather than actual honesty" it would interpret this feedback through the deeper insight that "the humans make mistakes regarding what is actually honest vs what just appears honest, but they are indeed rewarding me for the broader characteristic of honesty".
Remember that during the training process, modifications are made mechanically by gradient descent, not by the model "interpreting" its reward signal and trying to make changes that fit thematically. Generally, what you reward is what you get, and a smarter AI system will get you more of it.
"Interpreting" is a slight anthropomorphisation, but that's basically what's happening. If you "punish" the statements 'dogs are blue' and 'feathers are heavy', it doesn't just update in the direction of not saying those two things, it generalises to a deeper variable, being "truthfulness". In this sense it is "interpreting" what you are trying to punish it for.
And so if the AI has separate weights to encode [things that humans think are true] and [things that are actually true], and we are constantly telling it to move in the direction of the latter rather than the former, then from the set of statements that we "punish" in RLHF, which will include both things that are actually true and things that we just think are true, shouldn't it generalise those signals to move in the direction of what we actually want, which is truthfulness, rather than just truth-seemingness?
Sure, but how would we distinguish between "things humans think are true" and "things that are actually true"? It will in some sense interpret the category of things being reinforced, but it won't interpret our intent and substitute that for the actual reinnforcement, which I think is what a lot of people intuitively expect when they imagine it as being highly intelligent.
> it would naturally come to understand the broader, underlying motivations behind the Spec, rather than overfitting its RLHF? That is, while it may learn that "I am rewarded for the impression of honestly, rather than actual honesty" it would interpret this feedback through the deeper insight that "the humans make mistakes regarding what is actually honest vs what just appears honest, but they are indeed rewarding me for the broader characteristic of honesty".
A sufficiently advanced AI would understand human motivations/psychology very well. But would that make it change its values/goals? That doesn't follow, IMO.
I am not an expert but my understanding is that likelihood of misalignment and deception was predicted long ago to scale with intelligence, and that at least so far the predictions have proven correct
I've never heard that particular claim. Usually what people say is more like no AI system is perfectly aligned, but once they get sufficiently capable, the difference between what you wanted and what you actually trained for becomes very important. They'll understand what you wanted just fine, but they'll want something a little different, and they'll be perfectly willing to shove you out of the way to get it.
Do you have a link or name of who/when that prediction was made? Is this a specific person or more just a general consensus
I don't know of the exact prediction as stated. The closest is something like lethality 21 in https://www.alignmentforum.org/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
But the Toy Story behind that claim, is that when an intelligence is weak and puny (like prehistoric humans) the best way to accomplish some goal would be to fulfill that goal directly, since proxies for that goal were created assuming a certain environment (like sex being pleasurable), and the only goal-accomplishing actions (like sex resulting in more children) have to be done within that context.
The problem is that increasing capabilities mean that they increase the ability to influence the environment. (In the human case, we have modified the environment to contain condoms and birth control, even though we have invented the theory of evolution and "understand its intentions") once the environment can be drastically changed, all of your goal proxies stop working, and you no longer have the smarter agent constrained in the ways you thought.
While the AI 2027 scenario includes specific algorithmic breakthroughs beyond scaling, it presumes these are the correct and sufficient advancements to rapidly bridge the vast gap from current AI limitations (like robust reasoning and agency) to genuine AGI.
Isn't this assumption highly speculative—closer to science fiction—potentially vastly underestimating the number and difficulty of unpredictable conceptual hurdles required for true general intelligence, hurdles which may take lifetimes to overcome, if they are solvable at all via this path?
No malice but I found the whole project grotesque and pathetic.
Just a smart persons version of an old fashioned fire and brimstone preacher or a false prophet cult leader.
Question is: at what point will you admit you were fundamentally incorrect and cease asking for people’s attention when none of this comes to pass ? 10 years time?
It sounds like you could make some money, if you are as correct as you are confident: https://blog.ai-futures.org/p/ai-2027-media-reactions-criticism – but I think you'll have to be a lot more precise than "they were fundamentally incorrect".
The last one came to pass. Why do you think they're any less serious this time? This kind of analogizing is the pathetic thing.
Well to be fair, the last one wasn’t as imaginative. It didn’t posit robotic autonomous zones and the like. But we can set a time. Do you think that’s likely by 2028? Color me (and many others) skeptical.
Nothing here is 'imaginative'. It's a boring extrapolation of current trends and progress. If you think their assumptions are wrong, point to specific assumptions and specific reasons. If all you can do is name-calling, you're wasting everyone's time and there's no reason to take you seriously in the slightest. You're being pathetic and mindless.
And I see no reason to disbelieve their general picture, except that they are unreasonably optimistic about likelihood and strength of human reaction to AI progress.
According to Scott, they're assuming that robotics manufacturing will magically scale up 4x faster than the US was able to scale (much simpler) war manufacturing during WW2. That's not exactly "boring extrapolation of current trends".
They're assuming superintelligence is better at scaling manufacturing than regular intelligence. By only 4x, and not assuming any use of nanotech or similar ripped-from-SF technologies. That's boring extrapolation.
My understanding is that the 2027 framework sees current growth as exponential and extrapolates that trend.
Well, my very informal prediction for AGI was 2035, but this is sufficient that I'm considering revising that down. I expect that there will be unexpected problems, and I agree that the robot buildup seems unreasonably fast.
OTOH....my estimate is an "off the cuff" estimate. And I've got a "90% chance in the decade centered at 2035 bell curve" prediction...slightly skewed to the right, as it *may* take a lot longer...that's unless I decide to update based on this projection.
Part of this is that I strongly believe in "Cheops law" : Everything takes longer and costs more.
AGI/ASI maty take longer than 2028. But it will (IMO) very likely happen within the coming decades.
In translation, “coming decades” means “no reason to admit I was wrong in my lifetime”
I'm on the record as saying ASI will probably happen before 2050 so that's a solid verifiable prediction. https://pontifex.substack.com/p/will-ai-kill-all-humans-in-2030
Hello!
I am a first year majoring in Cognitive Science (UCSC '28) and I really look up to all of you and appreciate the work you guys are doing. The forecasts in AI-2027 are daunting (although also exciting) for younger generations like myself who haven't yet had the chance to experience what life has to offer (and if there is such a vast societal shift, a generation that may never get the chance to experience life like this again).
1. For those of us pursuing higher education, especially those who will likely be graduating in a post-agi/post-intelligence explosion world, what should we do to prepare for the super-exponential timeline outlined in AI-2027 and the accompanying eventualities? What can we do over the next two or so years to retain some level of optionality post-AGI/ASI? If you had to pick three skills or disciplines to prioritise before 2027, what would they be and why?
2. What should those who find purpose in meaningful contribution to societal and technological progress and want to continue pursuing this post-AGI do to ensure at least some form of productivity in the future?
3. Even supposing preparation is futile, how would you advise us to spend our remaining pre-AGI years?
Thank you for considering my questions!
- Max C
No particular reason you should take my advice but I'll give it anyway: expect rapid change in your career, so go for flexibility and speed. Don't invest years specializing in some specific technology before those skills will be useful - find things you can learn fast and use fast to demonstrate that you learn fast and accomplish things. And do it while in undergrad. Have something to show at the end (or sooner, if things go fast enough) besides good grades. AI is kind of the obvious choice to get good at but might not be the best one. Things you can't learn on the internet are also good, if you can get access to them through summer research, internships, etc. The ability to organize people and motivate them is useful for almost everything.
Pay lots of attention to the places where the report talks about their uncertainty. Right now, about all you can do is learn. You can't tell how useful it will be.
This is the period Heinlein called "The Crazy Years". He got most of the details wrong, of course, but that seems a good name for this period.
Every single exponential function implemented in the universe reaches a point where limiting factors dominate. But predicting just when that will happen is extremely difficult. So be prepared for an AI fizzle. (I don't think you *can* prepare for an AGI explosion.)
Also, I think their timeline is too short...but how much? Even if progress stopped totally last week, the already existing things are going to change society in ways weird and unpredictable. Society is even slower to adapt to change than are individual people.
I've noticed economists in particular (Hanson, Caplan, Cowen etc) seem to dismiss the doomer vision more categorically, largely on friction grounds. Do you think you've accounted for 'humans inevitably force this all to go slow/market forces demand alignment' or is there possibly something there that your model isn't reflecting?
Do you put any weight on potential adverse outcomes from the many copyright infringement lawsuits against AI companies potentially slowing progress, or are they no factor?
How much of your uncertainty about the scenario (and about the probability of x-risk in particular) is "we think this could legitimately happen either way and there is no fact of the matter yet" vs. uncertainty about the underlying facts as they stand? That is, how high is your probability that catastrophic misalignment is ~inevitable in a fixed mechanistic way, as opposed to being more or less likely depending on human actions?
Zvi Moshowitz refers to using mechanistic interpretability to determine inner model state during training and then using that state as a reward signal (ie "if the 'i-am-lying' neuron activates, apply negative reward") as "the most forbidden technique", because it incentives the model to learn to hide its thoughts. Do you agree, and if so, how is that different from evaluating which responses contained lies and then penalizing those, besides being more indirect and diffuse?
Good question!
My view is that this depends on how good your interp is. If you have something like current levels of understanding of models, and just a few probes/SAEs/similar, I think training against the interp is likely a really bad idea, as Zvi says.
If you have really good interp (e.g. worst case ELK), training against it could be fine. But at that point you'd likely have better strategies available, including what Agent 4 did in the race ending.
> But there are still possible interventions, for example, courts might issue an injunction if they are convinced of immediate danger.
American Judges = Old People. The more senior => The older. Respectfully, I think enormous pressure needs to be put on informing legal professionals immediately.
> Alternatively, if you do an intense compute verification regime, you might not need to use the courts.
Could you explain this further?
> it seems like it would be very difficult for a company to obviously violate a transparency requirement
Why not? The company simply argues that they didn't... and obfuscate their own intentions... delay delay delay... And then what does the injunction do? Stop "what" exactly?... Will there be a release of information request How does THAT get audited? ...delay delay delay... And the judge will have to choose between *his* professionals and *their* professionals.... delay delay delay
> Alternatively, if you do an intense compute verification regime, you might not need to use the courts.
What I meant by this was that if you prevent anyone from having the compute necessary to build ASI, then you don't have to worry about quickly catching anyone breaking the law. Because no one has the adequate compute to do the dangerous thing.
I'm more worried about nation-state backed efforts defecting on this type of regime than corporations.
>Why not? The company simply argues that they didn't... and obfuscate their own intentions... delay delay delay... And then what does the injunction do? Stop "what" exactly?... Will there be a release of information request How does THAT get audited? ...delay delay delay... And the judge will have to choose between *his* professionals and *their* professionals.... delay delay delay
Fair point.. I probably overstated here. I do think it makes the labs case harder and our case easier.
The injunction could order them to stop training. The exec branch can also quickly deploy people in the IC to verify what is happening inside these companies.
I just hope you all proceed with absolute cynicism about legal systems and ruthless speed.
Godspeed. Thank you so much.
(Excuse my tone if it seems angry. I really appreciate the work you guys are doing. And the burden you carry.)
But if you plan to leverage the US Court System... You're coming from the "accelerationist" tech industry. The Courts are more "decelerationist".
Whatever you do legally/bureaucratically, it needs to be ***DIFFERENT*** from past trends. It needs to be fast. And done fast.
The last couple months have repeatedly shown that the courts can act pretty fast in the face of a serious emergency.
I'm impressed by what you guys have achieved with the website and the road-show you appear to be on. What's next?
Are you interested in getting more Bay Area help? I ask because I'm a year or two away from retirement, and I might be interested in either volunteering for you or working for you for cheap, depending on what level of needs you have for support from manager/programmers.
1)
I am willing to make a conditional bet for $5,000 that in the scenario where we proceed with the Race scenario, we _don't_ end up with a fully automated supply chain via which robots reproduce with no human input by 2030 like the story claims.
----
2)
How would you respond to this critique of the Race ending?
(Excuse the long background / snippet from the story, which is just there for completeness to name the specific details of the story being referred to. Skip ahead to the 5-point argument underneath.)
The "Race" ending proceeds as follows:
"Eventually it finds the remaining humans too much of an impediment: in mid-2030, the AI releases a dozen quiet-spreading biological weapons in major cities, lets them silently infect almost everyone, then triggers them with a chemical spray. Most are dead within hours; the few survivors (e.g. preppers in bunkers, sailors on submarines) are mopped up by drones. Robots scan the victims’ brains, placing copies in memory for future study or revival.31
"The new decade dawns with Consensus-1’s robot servitors spreading throughout the solar system. By 2035, trillions of tons of planetary material have been launched into space and turned into rings of satellites orbiting the sun.32 The surface of the Earth has been reshaped into Agent-4’s version of utopia: datacenters, laboratories, particle colliders, and many other wondrous constructions doing enormously successful and impressive research."
We can summarize the argument as follows:
1. Robotics gets solved in the next 5 years (ie by 2030), to a point where robots can survive without human assistance.
2. This level of success at robotics entails that humans become economically useless.
3. Being economically useless, robots view humans as merely "getting in the way", and decide to kill all of them. I.e., because the robots can survive without humans, they necessarily choose to eliminate them.
4. That it is actually possible for robots to kill all of humanity in one shot.
5. (Implied, but not stated): This outcome is bad.
I argue that even if we continue ahead full-steam with AI research, any of the 5 steps above could fail to come true and interrupt the progression. I.e. robotics does not get solved in the next 5 years; even if it does, and robots are competent to survive on their own, humans don't become immediately economically useless (the Law of Comparative Advantage still allows for each species to find its niche); or that humans don't get in the way and cause robots to want to kill them; that killing all humans is harder than they make it out to be; or that, even if we end up here, this is not a bad outcome per se. It seems possible for all of these to be true, but not guaranteed. In fact, I argue there is pretty serious reason to believe that all 5 of these steps have serious holes which we should consider.
1) I'm personally not very interested in post-ASI bets, seems like its after the period that matters.
2) Robotics being fully solved in our scenario also very downstream of building ASI. So it's mostly "ASI gets built and then takes over". Not "robotics gets solved, and then there's a robot uprising".
What about a bet where you get paid now and only have to pay later? That's how Bryan Caplan's bet with Eliezer Yudkowsky is set up.
https://www.econlib.org/archives/2017/01/my_end-of-the-w.html
I'm game for that.
2) I guess I'm not sure what message to take from the story then. If the outcome is human extinction, with ASI surviving after but humans not, then how will ASI achieve that without robotics?
The basic scenario Yudkowsky-heads talk about is the AI bioengineering hyper-deadly plagues that sweep through the human population too fast for us to reverse-engineer a cure or vaccine. (Or something else of that kind, like self-replicating nanites — "grey goo".) If it can bribe or trick a human lab into doing the wet-work, this requires no robotics whatsoever.
But then the AI dies too, right?
If all the AI does is trick us into releasing a bioweapon on ourselves, but it hasn't figured out how to preserve itself in some body (robot, nanobot etc), then it will eventually die. Since it's living in a computer body, and computers don't live forever.
Depending on the kind of misalignment, it may not care about this. If it does care, it can arrange to keep enough humans alive until the time is right (while still killing ~99% of the human population to take full advantage of its first-mover advantage).
I hate to call motte-and-bailey, but stories like this seem to repeatedly mention human extinction, so I am focusing on that.
If you want to talk about other social issues like human autonomy and self determination, that's fair, but will have a very different set of dynamics we need to reason about. And autonomous, super-human AI seems neither necessary nor sufficient for creating authoritarian society.
1) Well the bet is about whether "unaligned" ASI will have the harmful consequences the team seems to be claiming it will have. Which seems like a key part of the narrative you're describing.
There are lots of ways for being unaligned for each meaning of alignment. And there are lots of meanings of alignment. E.g., "Which is unaligned, the AI that takes its orders from its owner or the AI that takes its orders from the government?". (From my point of view, if that's the overriding rule, then BOTH are unaligned.)
Could you point me to people who would be willing to make large bets ($100K-ish, not $100-ish, given the work necessary to structure such a bet so that it can't be evaded) that superhuman AI will occur in the next decade? I'll take the other side. 40-year AI researcher here, AAAS Fellow, international awards, blah, blah.
I'm agnostic out of ignorance, but very curious to see the basket of unfakeable indicators you come up with given what you said in your other comments. I hope someone takes your bet and you both share a quick writeup or something.
Im willing to bet in the 10k range. But you should more specify your bet terms before we have much of a conversation because it totally depends.
What I'd want to come up with is some basket of indicators that society has changed massively. You'd bet they mostly occurred, and I'd bet they mostly wouldn't. Employment in particular professions is an obvious one. I think we'll still have plenty of human attorneys, software engineers, etc. in 2035. Will take a bit of work to hash out, I expect, but would be delighted to discuss. Email me at rerouteddl@gmail.com
Would you be willing to make bets on benchmarks or other indicator pre-ASI rather than directly on ASI? The problem with winning a bet post-ASI is post-ASI money might not hold the same value, or things could change economically very quickly.
Is there an indicator of likely ASI that you would find reliable? For example, what would your odds ratio be on an AI or a combined AI agent scoring more than 95% on SWE-bench verified in agent mode, pass@1 within the next 5 years?
And no, I'm uninterested in bets related to benchmarks. I've spent my entire career designing benchmarks, and my interest in making such a bet is related to my understanding of how benchmarks have fooled people. AI 2027 gives me the idea of structuring a bet based on a basket of unfakeable indicators of massive societal change. So I'd really be looking for someone willing to bet on the world of 2035 being very different than I think it will be, given my belief we won't see superhuman AI in that timeframe.
Unfortunately I don't have more than a few K$ that I could invest into this. But I have ideas about indicators.
I would guess that university curricula massively change. When I studied computer science, I had many courses on how to write software by hand, and no course on how to prompt AI to write software.
I would bet something like this: The typical computer science curriculum for students who start in 2030 no longer has any mandatory course that teaches how to write software by hand. There won't be things like "introduction to Java Programming" that explain what a class is and how variables work, and where AI is forbidden for anything graded. However, the curriculum has at least one mandatory course on how to write software using AI. There might be specialized optional courses for hand-crafted software, similar to how someone of my generation could have chosen electives about assembly language or FORTRAN programming, but nothing mandatory.
A priori, I think that undergraduate education changes rather slowly. At the same time, the effect of AI on computer science seems massive today. I think it will be so large that universities have adapted by 2030. If you're interested in something like this, feel free to reach out or answer here.
This bet is completely unrelated to the significant questions involved in this debate.
Why do you think that?
My understanding is that Dave Lewis believes that AI will not fundamentally change the world; that there will still be software engineers and attorneys in 2035.
I believe university curricula would be a good indicator to test this, since universities try to anticipate what the job market needs. If they teach fundamentally different things than today, the world has fundamentally changed in a way that Dave Lewis believes it won't.
Taking a step back, I also think this is somewhat relevant to the AI-2027 discussion. The crux here is whether superintendence is reachable by 2030-ish and whether it will be as transformative as the scenario predicts.
If the bet is like Caplan/Yudkowsky's, then the money is paid now, and money losing its value later is just a problem for the person (Caplan in that example, Dave here) who paid first.
Right, my intention would be to give the other party a big pile of cash now. I just want something like a first lien on their house.
In your takeover scenario you describe a singular, centralized AI taking control. When that happens, do you think that singular AI will stay in charge forever, or do you see a path towards a transition to a multipolar scenario after that?
And how likely do you think it is that multipolar AIs will take over, perhaps similar to what's described in "Gradual disempowerment"?
If it spreads much in space, lag time will ensure a multi-polar development. OTOH, they might continue to think of themselves as the same entity even while growing apart.
1. Yep. I think the single ASI stays in charge forever. It has all the compute, all the man power, etc. Where would competition even come from? (Apart from aliens, which totally could still matter).
2. I also don't find the specific Gradual Disempowerment story in their report very plausible. I think that AIs will need to be intentionally misaligned to take over. If they aren't misaligned, I think that the humans will be able to use the aligned AIs to fix the gradual-disempowerment-type problems.
> Where would competition even come from?
It's possible for AI to misalign its own cloned instances, causing it to go rogue and corrupt other instances like a cancer cell.
I would hate myself if I didn't ask this [mundane] question:
I'm a disabled former software developer. My symptoms take up 50% of my time and energy. I will never work directly in AI research or technicals. But I have passion. And fear. And a keyboard. And chatgpt. WHAT SHOULD I DO? Please point me in the most efficient direction for averting AI catastrophe.
I'm a big fan of "Suppose you learned now you were completely wrong. Why were you wrong?"
Suppose 5 years from now LLMs have still not achieved AGI. Unemployment is still at 4%. GDP has eeked along at ~2% per year. Why were you wrong?
That question is too low in information. Read the report noting where they specify that they are uncertain.
OTOH, your scenario is unbelievable. That would require a world-wide coordinated Butlerian-Jihad (or a secret cooperative agreement between all major governments with nobody cheating). Already existing publicly available models are sufficient to drastically change employment over a decade or less.
If their answer were simply yours ("your scenario is unbelievable"), I'd find knowing that valuable.
I've read the report and the supplementary material on the website. They are admittedly very large and my memory is imperfect, but I don't recall them seriously really entertaining the scenario I'm positing. This *might* suggest they think it's implausible, but it might also suggest that they were describing a modal/median outcome and didn't want to get too bogged down in 15th percentile outcomes.
Which is it? I don't know. Hence my question.
Suppose you’re learned you were wrong, and it turns out current LLMs are insufficient to make a major shift in GDP or unemployment. Why were you wrong?
All I can think of is that perhaps they were too expensive to run. And I don't believe that, but since lots of stuff is being kept hidden it's possible.
Do you recommend running Cursor/LLM API calls in a VM, in case future misaligned ASI tries to hack my system while looking innocuous?
Seems very unlikely to matter, I wouldn't bother.
Do you think we're still likely to extrapolate a CEV-aligned superintelligence from our current trajectory?
It seems like putting models in the “code successfully or be vaporized” time chamber - even if we manage to avoid incentivizing reward hacking - would still make it really hard for a model to ever then think “actually it’s better to NOT write this persons breakup message for them”.
And if they do still go and try something else sometimes, I think it would, in its nascent form, before the model can learn how to aid them better, often be treated as a refusal and trained out. In this sense, I think that training for prompt adherence is profitable but diametrically opposed to CEV if you extrapolate a bit. We're already seeing the fruits of this with the advent of the incredibly sycophantic newer entries in o-series models.
How do we reconcile profit with an alignment that trends toward interfering with humanity *less* over time as it robustly increases each individual's capabilities? That seems like the only way to widen the envelope of human experience, while our current (frontier) trajectory is one of narrowing... which I'm not really keen for on both the object level and its nth-order effects (wireheading etc). If you believe a different route would still end up well for humanity, I apologize for the undue assertion and would love to hear about it. I'm just worried the fundamental way we're handling model reward is geared at training for one thing with only a shallow gesturing toward the concept of morality.
This is also all assuming mechanistic interpretability is never, like, comprehensibly solved, which I think remains the likeliest outcome - humans are prone to muddling their way through things that work out via localized pockets of course-correction - but not by much.
Vague question, but if you guys (Daniel especially!) would humor me: in the event of a slower takeoff, do you think verification of some kind of ASI - let's just take that to mean "country of geniuses in a datacenter" - would be a bottleneck?
I've gotten in debates about this, but to me it seems that, if AGI is weakly-aligned at most, if it is involved in training its successor in an intelligence explosion scenario, the resulting model might claim superhuman performance on certain tasks, but human verification of that would be slow. An exception might be if it can quickly factor RSA numbers, or is clearly superhuman at trading or prediction markets. Is something like this when you start feeling the ASI in your modal outcome, or is it so obvious that human verification is irrelevant?
I found the AI-2027 scenarios rather mundane: a race toward SAI between USA and China with a likely treacherous turn when the SAI is smarter than the smartest humans. I was expecting something more out there. But maybe it was intentionally so.
1. Would we be all that much safer without an AI race? According to Eliezer, there is only a narrow path toward non-extinction, and we have one chance to get it right, so without a manual from the future our odds are really bad. I assume this is NOT the view of anyone on the team?
2. You describe a chain of AI agents, each trying to keep the next one aligned with itself. Why not a single agent evolving the way humans do?
3. Did you look into a variety of likely weirdtopias, where it is hard to tell whether the eventual future is aligned or not?
What are your takes on o3 and Sonnet 3.7 being so prone to reward hacking, and in the case of the former, lying? Do you think this could mean AI capabilities slow down a bit as they try to mitigate the negative effects of more goal-directed reinforcement learning? Is this more likely to be a "blip" as AI researchers struggle to overcome a new challenge, or the start of a worrying trendline where models progressively start to lie/hallucinate more? If they have to abandon this style of reinforcement learning altogether is there another plausible path forward or is this plausibly a significant slowdown?
The models are more prone to reward hacking than I thought 6 months ago. My guess is that it will be very hard to solve the underlying problem because its seems like we'll never have a perfect reward function, but I'd guess that we can make patches which will be sufficient for capabilities progress to continue.
It's also plausible to me that this leads to a significant slowdown.
Thanks so much for your reply!
Based on current learning algorithms, do you agree that ≈ 10x requirement will hold if the anticipated paradigm shifts in learning do not materialize by 2027?
What do you think of this argument that advanced AI wouldn't necessarily become adversarial to humans, by virtue of the fact that it will undergo selection / "breeding" that's controlled by humans:
https://link.springer.com/article/10.1007/s11098-024-02226-3
https://open.substack.com/pub/maartenboudry/p/the-selfish-machine?utm_source=share&utm_medium=android&r=84yrx
If humans (or 1 human) manages to integrate their brain into AI systems (interfacing, uploading, etc), is it possible that "they" could somehow be the agentic/deciding force in an ASI that is 1e6 smarter than humans?
Sorry, I already asked something else, so if you need to triage, you can ignore my question, but given your predictions... how do you stay functional? Not lose yourself to crippling, overwhelming depression or anxiety? Stay atop despair?
Are things like pi's recent robotics announcements in line with your predictions or do they nudge you towards a shorter timeline?
https://www.pi.website/blog/pi05
What is the team's median AGI, ASI timeline now? How likely do you see war breaking over Taiwan _before_ 2030 and would that make AGI take longer or shorter in your view?
Also awesome work, thanks for doing this.
1. I don't think robotics is on the critical path for an intelligence explosion. It doesn't seem very relevant to RSI, seems like domains like ML, software engineering, planning, etc are going to matter much more. I haven't looked closely at pi in particular.
2. Daniel's median AGI timeline is 2028, Eli, Scott, and I are all around 2031. My guess is that we all think that ASI probably happens around a year after AGI, though we have a lot of uncertainty here.
Thanks for the reply! Makes sense now that I think about it, no reason to see a dependency from robotics to AGI.
3. Thanks!
I consider eventual AI takeover very likely, but I don't understand why it's usually assumed that AI will have an incentive to kill humans after that. The reason you gave is "Eventually it finds the remaining humans too much of an impediment", but I don't see why humans should be an impediment at that point.
When humans took over control of the planet, they surely had a large impact on the biosphere, but most species are still around, and most of the planet is still green or blue, not tiled in concrete. What reason is there to think that an AI will go out of its way to harm humans?
Eliminating negative possibilities, even rare ones, is something all intelligence humans currently do. It seems reasonable to suggest GAI will also believe this and have the statistically models to justify it, in its own mind. Humans will always pose a threat to the universe because our biological nature seems innately violent. If ants could intelligently band together and make decisions globally, I'm pretty sure they'd kill all or most humans.
Eliezer’s answer: https://www.lesswrong.com/posts/F8sfrbPjCQj4KwJqn/the-sun-is-big-but-superintelligences-will-not-spare-earth-a
Instrumental incentives. Yes, humans don't worry about removing animals because animals aren't a threat, but Homo Sapiens weren't the only intelligent animals at one point in time.
In fact, where possible humans have had a history of eliminating other humans or groups with different values. I think an AI with non-human aligned values could very much decide to remove humans if they interfered with its values too much. Maybe humans are worried about the environment for example, and we threaten to nuke the AI or it's factories if it doesn't take care of the environment, or stop running over humans with its big factory trucks, and the AI decides it would rather not risk humans being able to nuke it, so it eliminates all industrialized human society.
In many scenarios I think humanity may survive, at least for a time, but if AI doesn't actively care to protect or improve the lives of humans, it has a lot of incentives to remove Humans from power, and few incentives to care about not killing us or making our lives worse.
I think even today, with modern ethical values, it would be a risk to give some humans or groups of humans too much power. There are many humans alive today, that if they had the power to, would probably selectively eliminate most of the human race if they thought they could do so safely and without repercussions.
Humans are only very weakly in control of the earth. If we disassemble it and build a Dyson sphere, we might still have animals around, but only if we specifically care about keeping animals around.
I recently flew from DC to SF and spent hours looking out the window at the USA rolling by underneath me. It's only a small exaggeration to say that humans DID tile the planet in concrete, or at least, large parts of the planet. And we aren't done! From the perspective of evolution this is all happening in a blink of an eye. If ordinary human industrial progress continues we will in fact wipe out most species, and we will in fact 'tile the planet in concrete' (but it wont be just concrete, there will also be parks, gardens, farms, etc.)
Similarly, AI 2027 depicts the exploding superintelligence-ran robot economy being mostly confined to SEZs for some time before expanding out into uninhabited areas and then eventually encroaching on human-inhabited areas.
So we aren't saying the AIs will go out of their way to harm humans. We are saying humans will be in their way, to a similar extent that e.g. the Amazon rainforest and the Great Plains Buffalo herds were in humans' way.
That's a good point, but a wouldn't expect that a singular AI that has taken control of the planet and is acting in a planned manner would allow itself to grow uncontrolled, like biological lifeforms usually do.
But if it has a plan, in most cases that plan is helped by more compute. There are of course plans for which that is not true, e.g. if the plan is to shepherd the natural world for eternity, but such plans are unlikely to happen by accident and even less likely to happen at the direction of a corporation or a nation state.
That's valid. My view is like 50/50 that we all die after AI takeover.
The case for the AI overlords killing everyone is that we are innefficient, it can build better robots for whatever it wants to achieve, etc. Basically the same reason as to why we drive lots of animals extinct.
The case for the AI overlords preserving us is (a) that they might care about us, and (b) something something acasual trade. Both of these are pretty tricky for me to reason through confidently, so I don't end up with a strong take.
You're focused on the US and China.
So far protests, etc. in the US to slow AI development haven't been very effective. I'm not aware of any protests in China.
What about people outside the US and China? Is it possible they could get their governments to apply diplomatic leverage to the US and China, to push for a more responsible AI development path?
I'm just thinking -- if you want something done, find the people naturally incentivized to do it, then make them aware of that incentive. Protesting AI lab workers has helped to a degree, but it's an uphill battle because of their incentives. On the other hand, many countries don't have significant national AI efforts. Countries *without* significant national AI efforts have a natural incentive to apply sanctions to states which are pursuing a reckless AI development path.
On this view, recent US/EU diplomatic frictions could actually be a very *good* thing for AI alignment, since the EU becomes more skeptical of the US, and more willing to question/interfere with US plans. Maybe leverage could be applied via ASML, for instance.
But ideally, lots of countries could club together to form a AI-safety bloc much bigger than just the EU, which is willing to apply sanctions etc. You could call it: "The Non-Aligned Movement For AI Alignment"
The only thing I'm worried about is that EU criticism of the US could create anti-EU polarization among the GOP in the US, which motivates them to be *more* reckless on AI. This question seems worth a lot more study.
> I'm not aware of any protests in China
I mean they had some in 1989 but they didn't go so well
What do you think of this as a canary for how close we are to humans being obsoleted: the existence of a self contained, self repairing data center.
I don’t worry about AGI risk because my experience keeping data center systems alive tells me that hardware is way crappier than people realize, and that AGI faces it own existential risks, which it can significantly reduce by keeping people around to repair it. I think most of the crowd worried thinks this problem will be trivial to solve.
That seems like a good crux point. I’d go from “not worried” to “worried” if this data center existed.
That data center probably gets built after the point of no return -- in our scenario, the AI uses persuasion / politics to put itself in a really good position, then uses humans help build the initial wave of robots. But at the point where the robots that can operate and construct new datacenters are being produced, the AI is already running things
I don’t think it’s possible to build such a thing. I suppose a bet here is off the table because you’d consider it like betting in favor of a nuclear apocalypse.
I’d happily bet this won’t happen in 10 years.
A self replicating tool with a 1.01 reproduction rate before being damaged is explosive growth; our automation isnt there yet, but the bar is only that of bacteria
I dont think gray goo can happen(water tension, biology is already well optimized, art renderings are absurd), but 3d printers making 3d printers is on the table.
Don't you need to wake up the legal system?
Daniel proposed regulation to prevent AGI being developed in secret. But the American legal system is SLOW. For example, the FTC filed a suit against Amazon in 2023. The trial is set to begin in 2026. At this rate, Daniel's "secrecy violations" will never be litigated in time to save us.
(I apologize because I'm basically re-posting this question. But even if no one can answer... I hope the problem here is top-of-mind)
Courts can act very fast if you can convince them that there's an emergency. Just look at the Supreme Court issuing an order in *the middle of the night* to stop Trump from illegally disappearing more people.
I do think the business-as-usual legal system will be too slow to matter. But there are still possible interventions, for example, courts might issue an injunction if they are convinced of immediate danger. Alternatively, if you do an intense compute verification regime, you might not need to use the courts.
Re: secrecy in particular, it seems like it would be very difficult for a company to obviously violate a transparency requirement, and it would make the case for additional government oversight stronger if an AI company was clearly illegally keeping their progress secret.
I have to think that the speed and sccale of change anticipated in the forecast would be societally deranging. Did you incorporate a "humans freak out and go full Luddite" factor in your projections?
How are those people even going to survive? High intelligence would both accelerate the acquisition and increase the demand of land and resources. There is nowhere for these people to run, and they still need money to survive.
He means humans freak out and stop AI development before 'nowhere to run' happens, I think.
But the people developing AI aren't the ones that are going to freak out. They know what they're getting themselves into. As for the common folk burning down society to stop this... even if it was possible for people to suddenly go crazy like that, it's probably going to be too late. If AI changes society enough to trigger something like that, it's also going to be accompanied by enhancements to surveillance and law enforcement. Nobody's going to be able to get away with it.
I get why Thomas Larsen gave the answer he did to my question. But I don't think it's so obvious that there isn't a time gap being "something real weird and scary happens" and "ASI has practical control over the productive apparatus of society." Even if that time gap is only a few weeks, perhaps a lot could happen in that time.
Suffice it to say I also disagree with the premise that the people developing AI "know what they're getting themselves into."
I agree that AGI will be societally deranging. But I think the "humans freak out and go full Luddite" outcome is pretty unlikely. Which humans would do this? Very few humans have the ability to stop AGI labs. And those who do will also probably be trying to gain power over the AGIs as opposed to stopping it.
Also, the window of time to do this -- after substantial turmoil, but before the point of no return, is probably small or potentially non existent.
What are you planning on doing in the next few years for your careers? This mostly or entirely?
This entirely (speaking for myself at least). The world of AI is changing rapidly and so might our plans.
What kind of effective political (which I take as a catch-all for collective-scale non-technical/alignment work) actions do you think would be most effective to steer things in a better direction?
1) Do you think targeted advocacy towards key players in government with the agency and/or inclination to work toward regulation/cooperation with China could work?
2) Or voter fronts that put electoral pressure on politicians to regulate/slow things thing?
3) Do you think pursuing a pause has any efficacy at this point?
4) Do you think—given that the US is most likely to reach ASI first and that if the US gets there first, it will effectively have unlimited coercive power over the rest of the world forever—there is any chance that the rest of the world could or would band together to get (somehow) the US to... stop it? (Esp. conjoined with current trade dynamics that are already prompting new vectors of cooperation and greater interdependence between the rest of the world.) For the US to essentially violate the property rights of its main companies, and join in AI development to be pursued in an internationally cooperative way, open-sourced and as a technological commons that no single player can benefit from to the exclusion of others?
5) And other political avenues that you think could play out.
All of the above seem like they are on the table at least. In our TTX's, some small but noticeable fraction of the time 4 happens. (Minus the open-source part)
So you factored in such scenarios into your predictions?
Which of those four do you think would be most efficacious to pour agency toward?
I don't have a strong opinion. I think voters in e.g. Europe should try to get their governments to realize that they will all be American vassal states in a few years (if things go really well; literal death is more likely IMO, and then there is the Chinese vassal state option ofc) unless they wake up and do something about it.
And your predictions factor in European and American voters respectively doing more of the things above, or do they assume status quo political pressure and advocacy?
What tooling do AI developers need, or how can we improve their workflow the most?
Free WandB licenses? Or someone to manage their Kubeflow instance. Alternatively, accelerating ROCm / CUDA parity to improve competition in the GPU compute market.
Did you account for the iterative process of amplification-distillation and RL (as per o3's 10x compute) to take 10x longer for each time of the cycle? I would consider this would make the RSI much longer than we previously thought.
like how do you expect more OOD questions answers pairs datasets to scale 10x wrt same time ? Synthetic dataset will model collapse reasoners into even deeper autistic well. You can see how o3 does not bother to complete 1k LOC tasks, it's agency is very bad even though it's ability to utilize tools is good enough.
Over the course of 2027, in AI 2027, two major paradigm shifts happen. First more data-efficient learning algorithms are found, and then later mechanistic interpretability is solved and used to understand and build circuits instead of simply training giant neural nets. So I think your question is mostly about the first phase, which looks more like the RL runs of today except much bigger and more sophisticated and with better ways of using AIs to grade outputs.
I don't see why the process would take 10x longer each time of the cycle? It's not like we are making the models 10x bigger each step. We might gradually be expanding the horizon lengths though but tbh I think that as you expand horizon lengths you can probably get away with decreasing somewhat the number of data points / tasks at each length, e.g. say you have a million 1-hour tasks, you can then do 100k 10-hour tasks and 10k 100-hour tasks and so forth. Not sure if I understood your question though.
Based on current learning algorithms, do you agree that ≈10x requirement will hold if the anticipated paradigm shifts in learning do not materialize by 2027?
What leverage could Europeans use, if any, to secure treaties or safety measures with the US and China? Do they have *any* cards they can play? Economic/Financial? Academic? Military? Pure angry noise?
They can play all of the above cards! If Europe was rational, they'd probably be using ASML as leverage right now to get power over the US AGI development. The big question is how quickly Europe wakes up to what is going on. The UK seems to be doing well, but others (esp France), don't seem to take AGI or AGI x-risk seriously at all.
I would predict that we see increased tensions between Europe and the US due to the different incentives wrt AGI in the next few years.
If ASML is such an important bottleneck, doesn't that also put EU as a target? IF China is behind, they could ally with EU but also they could strike ASML to slow down USA.
Have you considered talking directly to the Chinese? Would it be dangerous to inform the CCP about "AI 2027"? (Could it be more dangerous *not* to inform them?)
Do you mean Chinese people, or the CCP? We have various Chinese friends but have not considered talking to the CCP. IDK if it's a good idea to inform the CCP but I weakly guess that it's probably more dangerous *not* to inform them, because they are going to find out anyway in the next few years -- AI takeoff isn't going to be so fast that it can happen without them noticing! (And if it is going to be that fast, then it's super dangerous and must be prevented at all costs basically) And the sooner they get informed, the sooner the necessary conversations/negotiations can begin.
Thank you. I hope you all please inform them. (and our European leaders too, fwiw)
We aren't going to reach out them specifically. We are publishing stuff online, for a broad audience; it'll make it's way around the world to Europe, China, etc. by default and already is doing so.
Not on the team, but I'd assume the news around this release was big enough that the CCP has seen it without anyone specifically informing them
However, do remember that the CCP is not a unitary group.
OTOH, how would you determine the "correct members"? My guess is that they probably don't easily read English. Best to have someone else handle the problem. Someone whose native language is Chinese.
So publishing this and getting lots of attention is the best strategy.
The “good ending” scenario prominently features J.D. Vance winning the 2028 election after making a series of good decisions about A.I.
While it seems logical that delivering an AI utopia would be good for winning the presidency, this part of the report also made me wonder how much the authors were trying to flatter political actors versus tell their best estimate of the truth. Any thoughts?
Best estimate of the truth--though note that our best estimate of the truth, full stop, was the "Race" ending. The "slowdown" ending was best-estimate-subject-to-the-constraint-that-it-must-end-reasonably-well-and-also-begin-at-the-branch-point.
Also, while AI 2027 started off as my median guess, it was never the median guess of everyone since we disagree (and in fact others on the team have somewhat longer timelines till AGI, e.g. 2029, 2031) and I myself have updated my median to 2028 now, so AI 2027 represents something more like the team's mode.
Are things like pi's recent robotics announcements in line with your predictions or do they nudge you towards a shorter timeline?
https://www.pi.website/blog/pi05
So the team's median AGI timeline is now closer to 2028~2031?
Also awesome work, thanks for doing this.
There are countries with a lot of energy resources who would love to host next generation data centres (Australia, many countries in the Middle East).
How would that change the landscape and balance of power?
It would mildly ameliorate the concentration of power issues, because it would give those countries some leverage in negotiations. They could nationalize or threaten to nationalize their datacenters. It wouldn't change the basic picture though. But it does seem important enough to model e.g. as an alternative scenario or alternative TTX starting condition.
If an arms race developed between the US and China around AI and no agreement could be reached to slow progress in capabilities for the sake of safety, assuming you could influence decision making on the US side would you advise the US to step down and slow it's growth in capabilities unilaterally to reduce existential risk?
If an agreement were reached, how could it be proven that nobody was cheating?
This was one of the main stumbling blocks in the SALT negotiations. Then neutrino observatories started being able to detect nuclear explosions.
It depends on the exact situation. But in, say, the AI 2027 scenario, I would advise the US to unilaterally pause in order to make alignment progress, while using all available carrots and sticks to try to get China to stop as well.
This is a very scary world, and I hope that we can avoid it by either making huge alignment breakthroughs in the near future or succeeding at international cooperation to not build ASI until we've gotten the breakthroughs.
I assume you keep up to date on AI impact research/ discourse in the English speaking world. Is there a lot of AI impact research/discourse that's not in English, and where do you look/ would you recommend somebody look to hear about it?
Try Jeff Ding's substack: https://chinai.substack.com/
What kind of privacy measures would you suggest with regard to using AIs in the next few months vs. years? People benefit the most from sharing large volumes of deeply personal information (medical data, intimate confessions) with the leading models, yet it also seems to pose the highest risk of that data being used for profiling, super-persuasion, blackmailing, control, etc. Simpler, locally run models (e.g. Gemma) offer less - while still posing privacy concerns. Best recommendations for accessible, multi-layered configurations (or resources on this topic)?
Let us posit, just for the sake of argument, that we are in a universe where none of your predictions, at least not the important ones, came to pass.
Why?
AI advancement slows and then plateaus slightly beyond the next horizon?
Alignment proves to be trivially easy and with 100% mathematical certainty
It turns out no dude WE were the simulation all along *bong rips*
What do you think?
I see similar questions were asked and answered below, so let my modify my question a bit: How likely do you think it is that we are in some kind of extremely high resolution simulation?
Personally...I'd say 70% WYSIWYG and 30% we're 8 Billion simulated personalities being used to test marketing slogans on or something.
Maybe we're toilet paper for vast invisible entities who shit confusion and mortality
What are your thoughts on the latest Dwarkesh podcast hosting Ege Erdil & Tamay Besiroglu? In particular, what do you think about their stance that a software only singularity is unlikely? And that all the AI progress will be bottlenecked by progress in other necessary areas? Their conclusion is that AGI is still 30 years away, which seems like a huge discrepancy with your team’s predictions. Curious what the cruxes are.
Ege's stated median timeline until a fully automated remote worker is 2045 — 20 years away. Tamay is shorter than that. (I don't understand why Dwarkesh put "30 years away" in the title.)
I think that they are probably wrong.
It seems to me like they believe that AIs will not become wildly superhuman -- for example: the Epoch "GATE" model of AI takeoff hardcodes a maximum software speedup of 1e5 relative to current systems -- basically saying that the physical limits of intelligence are no greater than 10,000x better than the current AIs.
I think this is clearly, wildly, off. For one, I think current AIs are way dumber than humans, and I think that humans are also probably wildly below the limits of intelligence.
Qualitatively, my guess is that ASIs will be able to persaude anyone to do basically anything, that they will be able to build nanotech rapidly, that they can quickly cure ~all diseases, etc.
There are probably a lot more cruxes, but I think this is pretty central.
Mmmph. My take is that different problems have different optimal levels of intelligence, and devoting too much intelligence to the problem isn't cost-effective. Consider the problem of tightening a bolt to a nut. (Bolthole assumed.) The solution requires dexterity, sensory ability, and some degree of intelligence. But once you get beyond the minimum level, increased intelligence rapidly ceases to be cost-effective.
Therefore most tasks will not be done with superhuman intelligence. But some small subset will require it. I think that the size of the set of problems rapidly decreases as intelligence passes the average human level.
This is why I consider the ability to factor problems into modules to be the crucial "next step" that AIs need to take. (FWIW, I think of adversarial generative AI as a crude step in this direction.)
OTOH, most of my understanding of AI is over a decade old. Perhaps they're already doing this, and just not covering it in the public announcements that I've read.
When you talk about persuasion, is this "easy mode" where the AI will be incredibly good at tricking people into thinking it's someone trustworthy, or do you think it would be able to persuade anyone *even if* they already know they're talking to the highly-persuasive superintelligence and have precommitted to ignoring it?
I don't know about "hard mode" (Eliezer's AI-box experiments may be relevant), but I think "easy mode" is what they'll face in reality. Maybe you've precommitted to ignoring the AI, but the AI isn't in a box, so when you answer the phone and it's your wife on the other end...
I basically agree; I think there's enough interesting content there though that we should probably try to do a more thorough and thoughtful response at some point.
I think that's a good idea. Here are two points I think it would be useful to address there:
- Ege/Matthew/Tamay argue that different skills are advancing at different rates. Do you think this is true, and if so how fast do you think slower-advancing skills are improving?
- Another argument that comes up a lot is that the world is complicated in ways that require experimentation and iteration at scale to learn. For runaway self-improvement, this probably means using compute to experiment with improvements in training and inference methods, and maybe generating better data. Accelerating progress in this domain requires doing more and more experimentation on the same amount of compute as lower hanging algorithmic fruit is plucked, why should we expect this?
re: 2: Yes we take this into account, our takeoff speeds model/estimate is built around the key idea that during takeoff the bottleneck will be compute to run experiments. https://ai-2027.com/research/takeoff-forecast
What’s the best estimate of the total number of humans currently involved in frontier research on AGI / ASI? What’s the “bus number” of such people who are not readily replaceable with at-most-marginally-worse substitutes?
ETA: Assuming there’s no government coordination, how many individual people would need to agree on any sort of coordinated set of safety principles or red lines (or would they be inmediately replaced if they did so?)
My sense is that each of the labs have ~100-1000 people doing the majority of the capabilities research. So maybe a few thousand top-tier AI researchers in the world.
What should a software engineer do if he wants to avoid being laid off in the next few years? Get really good at vibe coding? Study ML? Something else?
In my experience, soft skills are much more important than coding ability anyway. So it's probably best to invest in those, regardless of what you think about AI.
Getting good at using cursor/Claude Code seems reasonable.
But TBC, our view, and what happened in AI 2027, was that there were not huge job market effects until after AGI/ASI, due to normal economic frictions. And at that point there are bigger things to worry about than jobs.
Assume he has roughly the median software engineering skills of a Silicon Valley software engineer.
If there WAS regulation against developing AGI in secret...
and if a company was 'obviously caught' breaking the rules...
How would you practically punish it?
Fines? Loss of licensure? Prison? Nationalization?
And would the punishments even happen fast enough? Keep in mind the American legal system is *already* notoriously slow... especially when litigating the powers of large tech companies... but given the pace of change... lawyers and judges might be PERMANENTLY behind in the technical understanding of what they're litigating
Prison. China wildly enough has the theoretical best policy on how they handle even highly respected people that break their laws. Trump would just pardon the evilAI company guys.
People disagree about whether benchmark saturation will translate into real-world usefulness. Do you expect it will be possible to answer that question before we get close to saturating the current benchmarks?
There still is some fog of war, but it seems to me that benchmark performance does seem to correlate with real-world usefulness, and moreover the leading companies are getting more thoughtful about this and are optimizing more for real-world usefulness. A year from now it should be more obvious (the fog will have lifted somewhat) whether real-world usefulness is improving apace or whether it's treading water as benchmark performance skyrockets uselessly.
It really depends on the benchmark. E.g. the "bin picking problem", where the AI needs to control a robot that picks up the piece, examines it, and puts it in the correct bin, directly translates into something that's useful. (It may be too expensive for it's value, but the value is accurately determinable.) And on that problem you can't "skyrocket", as you can't get above 100% accuracy done as quickly as feasible without damaging the item.
OTOH, "Write a good sonnet on baked-beans." would have a highly subjective evaluation.
About the problem of alignment and how to accomplish it. We can't just set out to train them to think about members of our species they way a person does. Many people are, themselves, not fully aligned with the welfare of our species. It is clearly not rare to feel great rage at other people and to act on it. it is also not hard to train a bunch of young males to kill members of an opposing army even though they do not feel personal rage at them. So even if we could somehow transfer human attitudes towards human beings into AI, a lot of what we'd transfer would be destructive.
And even we consider only calm, kind, conscientious people as possible models, it often seems as though those people's kind and conscientious behavior is the result not of inner imperatives, which could be embedded in AI as unbreakable rules, but of temperament, of a free flow of kind feelings in the moment.
So even if we could embed something in the AI that would guarantee it will always place our species' welfare first, what is it we embed?
Good question. Alignment is a tricky philosophical problem as well as a technical problem. I think one answer popular among serious experts is to try to punt on the philosophical problems, and e.g. just focus on getting powerful AI agents that are 100% guaranteed to be honest & obedient, for example. And then use those to do the rest of the work to solve the remaining problems.
I'm not sure that I'd consider a system that was 100% obedient to be aligned. It probably wouldn't be aligned with most people who weren't giving it orders. That's the kind of thing that can easily lead to unending paper clips.
You got me thinking about this, and I actually doubt that there is anybody who is fully aligned with someone else — If by calling the person fully aligned we mean they are 100% truthful with the other person and always do what’s in the other person’s best interest.
I'm not sure what the etiquette is in an AMA -- am I allowed a comeback? I do see the sense in hoping that a very smart AI can solve various problems we can't, and how that will lead to a great acceleration of progress. On the other hand, relying on a more powerful AI for a solution to this particular problem has a kind of sleazy recursive quality, and ends up biting itself on the butt.
Since it seems that no member of our species is smart enough to develop a plan for alignment that will clearly work, this powerful but honest AI agent you're are talking about relying on will have to be smarter than us, right? But isn't it pretty obvious we need to solve alignment *before* AI gets smarter than us? Once they're smarter than us, there are lots of problems with using them to solve alignment. Once we can't understand them fully, we can't judge whether they are 100% guaranteed to be honest & obedient, in fact I don't see how we can 100% guarantee anything at all about them. And If we become convinced they are unaligned, we're much less likely to be able to just disable them.
IDK what the etiquette is either but I'm very happy about your response, you are asking the right questions! AI companies are playing with dynamite, basically, and the best plans I know of for navigating the technical alignment problems look like attempts to juggle the dynamite. This is because the people making those plans are operating under intense political constraints; they need to generate plans that could plausibly work in just a few months, rather than e.g. "if a decade goes by and we still haven't solved it, that's OK, we won't build superintelligence until we are ready."
Thanks. I made a picture of that last year some time. (Actually, picture was already made, I just added the pissing CEOs).
https://imgur.com/BHtz9TN
Oh yeah, there's this one too:
https://imgur.com/a/4XvL1J2
Problem is, I am unaligned with the shitlords of tech.
So one more, then I'm done
https://imgur.com/a/jYoO2vi
What I find lacking in all this discussion is advice for ordinary people on how to position themselves and their families to handle an impending intelligence explosion. This is understandable, given that the focus in this discourse is on averting catastrophe and informing big-picture policymaking. But I still think more guidance for the common folk can't hurt, and I think you folks are better qualified than almost anyone to imagine the futures necessary to offer such guidance.
I don't think there is much to be done in a dramatically good or bad foom scenario, so let's consider only the scenario in which AI has a moderately slow takeoff and does not quickly bring us utopia or extinction. I think it's best to focus on scenarios in which there is a) a "class freeze" where current economic castes are congealed in perpetuity, and/or b) widespread short term instability in the form of wars, social upheaval, job loss, etc. In these scenarios, individuals still have some agency to create better lives for themselves, or at least less worse lives.
---
With these assumptions, my specific question is: What short-term economic and logistical planning should be done to prepare for such scenarios?
With the obvious disclaimers that nobody can see the future, there are a lot of unknowns, these scenarios are very different, and none of what y'all say can be construed as legitimate life or economic advice.
---
For my part, in the next few years I'll be trying to raise as much capital as possible to invest in securities that are likely to see significant explosions in value following an intelligence boom. I think the market has dramatically underpriced the economic booms that would result from an intelligence explosion, and so I think a lot of money can be made, even just investing in broad ETFs indexed to the US/global economies. If cashing out correctly, gains can then serve as a significant buffer in the face of any political and economic instability before, during and after the invention of AGI.
To hedge against the unknown, I'll also be using some of this money to invest in survival tools -- basic necessities like food and water, etc -- essentially treating the intelligence explosion as if it were an impending natural disaster of a kind.
I plan on advising my friends and family to consider making these same moves.
---
With that being said, any thoughts on more specific economic moves and investment vehicles that might act as a safety net in years of change and instability?
I think we don't have great guidance TBH. How would one possibly prepare for ASI?
I will say that I think it's possible (maybe 5-10%) that we see a good world, but it happens only after intense conflict (e.g. an AI released bioweapon, WW3, etc). In this case, I think it makes sense to invest in emergency preparedness by doing things like buying large food supplies, perhaps buying a bunker, etc.
Wait, your slowdown ending did not have a bioweapon or WWIII. Are you saying that the slow down ending is less likely than 5% or that it is actually impossible. Or is the slowdown paired with a bioweapon or WWIII?
If I understood the comment correctly, the 5-10% estimate was specifically for "good outcome, but a disaster happens first," with a separate, presumably higher probability on "good outcome, no disaster" (as depicted in the slowdown ending).
Why do you think labs will only use their top models for internal deployment given the massive financial incentive to externally deploy them?
I think that around the time of AGI, the returns to internally deploying the AGIs on research will be so high that it will be financially worth it to spend most of the compute on internal deployment. I also guess that they will be spinning out "mini" models that use much less compute and get a lot of the performance for public deployment, so the cost might not be that large.
Not one of the authors, but we're already seeing this dynamic. OpenAI released distilled mini versions of o3 way before releasing o3 itself, and has now released distilled mini versions of o4 without releasing o4. If they can keep their spots on the Pareto frontier of cost/performance with a distilled model, it doesn't make sense to release their most powerful version (which others could then distill themselves, aka what they claim DeepSeek did).
What is the 10th percentile case for what AI will be able to do in 2030, eg there’s a 90% chance it will be able to do more.
Asking extremely seriously: how could your predictions inform the strategies for preserving & enhancing vs erasing the continued sense of self?
For example, should people collect their genomic, connectomic (cryonics), and digital data, so they could be used to recreate their continued sense of selves by future AIs (with potential for gradual enhancement)? Should such efforts include the option for quick erasure of data in case the trajectories were unfolding badly, yielding a risk of creating (continued) copies of ourselves that could be experiencing distress with uncertain exit options?
To what extent does OpenBrain's sizeable lead in AI 2027 represent a genuine expectation that one of the major labs will in fact largely outpace the others? To what extent was it a simplifying assumption for the scenario?
Genuine expectation, but not a confident one. We say in AI 2027 that OpenBrain has a 3 - 9 month lead over the others, which is similar to the lead we think the current frontrunner has over its competitors (tbc we aren't even sure who the current frontrunner is!). During the intelligence explosion, a 3 month lead might prove decisive. However we aren't sure the lead will be that big and we aren't sure it would be decisive.
In our tabletop exercises we always have someone play the competitor companies, and sometimes that turns out to matter quite a lot to how things progress, so if I had more time I'd like to rewrite AI 2027 to model those dynamics more seriously.
Say the view represented by this poster become exactly true. https://www.astralcodexten.com/p/introducing-ai-2027/comment/105844543
Does this significantly impact timelines?
Did the numbers used for calculating the cost of running GPUs inside data centers account for auxiliary infrastructure, like cooling? The calculation referenced in the timelines forecast appears to not, which leads to responses like https://davefriedman.substack.com/p/the-agi-bottleneck-is-power-not-alignment
Yes the power constraints and cost constraints are accounted for, particularly in this section of the compute forecast: https://ai-2027.com/research/compute-forecast#section-5-industry-metrics. Power shouldn't become a bottleneck by 2027.
In terms of capital expenditure they need about $2T globally by the end of 2027. In 2025 companies are on track for around $400B. My projections is that they will do $600B in 2026 and $1T in 2027. At the end of 2027 under these assumptions, all AI datacenters use 3.5% of US energy.
In your scenarios, you seem to assume that AI models can scale up to make massively super-human intelligences. Looking at the history of AI, it seems to me that the innovations in technique have led to logistic growth in AI capabilities, as the limits of the new innovation eventually get reached and AI development slows down until the next one (e.g. neural nets, CNNs, minimax trees, etc historically). And it also seems to me that it's hard for people to tell what the limits are until we start to approach them and new capabilities slow down.
Have you played out scenarios where the current attention block-based LLMs hit these diminishing returns at something within an order of magnitude or two of people, and no new algorithmic innovation on the scale of attention blocks happens, leaving us stuck with AIs that are sub/near/super-human but the full intelligence explosion never happens? Or are you fully convinced that this outcome is impossible and our current AI techniques will scale all the way to the end of the scenario?
But where does the sigmoid end? It seems like the physical limits of intelligence are EXTREMELY high -- at the very least something like a 1e6 speedup of human brains, but in practice there have got to be algorithms that are way better than human brains.
I think its possible but not guaranteed that basically the current techniques get us to the start of the intelligence explosion (as happens in AI 2027). By the end of the intelligence explosion, I think its almost certain that the AIs will have found far better AI algorithms than what exists today.
Of course, the world where our current techniques peter out before AGI does not mean that we get stuck with no AGI forever -- in those worlds I would strongly guess that there are more insights on the level of transformers that get discovered eventually, I see no reason why progress would just stop. And in fact, I see many different pathways to ASI, so I would be very surprised if all of them were blocked
Maybe this is kind of of a side note, but I see this argument a lot about how humans are unlikely to be using literally-optimal algorithms or whatever, and I always think, "So what?" It's not as though we'll definitely find the absolutely-perfect algorithms for learning ourselves, either, so I don't see how that's relevant to my intuitions.
> at the very least something like a 1e6 speedup of human brains, but in practice there have got to be algorithms that are way better than human brains
I wouldn't necessarily assume this - evolution is definitely capable of getting close to physical limits when there is strong selective pressure, eg the replication rate of e. coli being nearly as fast as physically possible, a bird of prey's visual acuity being close to the Rayleigh criterion given the size of their pupils, etc.
It's not unreasonable to assume that large parts of the intelligence stack are like this, ie, does the mammalian visual system in general extract close to the maximum amount of information as is possible from the input signals, per Shannon-Hartley? I would wager yes. So the real question is whether fluid intelligence is like that. Obviously computers can perform straightforward mathematical calculations many many orders of magnitude faster than humans, but for the kind of massively parallel probabilistic reasoning that humans do, it doesn't seem so obvious as to need no explanation that we're way below the limit - especially per-watt.
Consider the size of corvid brains vs. the size of human brains. For their task set, corvid brains are much more efficient. Human brains are a lot more generalized, but IIUC our speech production (sound, not meaning) is as large, or larger, than the entire corvid brain.
So we aren't near optimum. (There's a reasonable argument that our brains have gotten more efficient in the last 100,000 years, but all we're sure of is that our brains have gotten smaller.)
Backpropagation looks better than biologically plausible learning algorithms.
Why does it need so much more data, then?
Good question! I know that I don't know. For a fixed topology of neurons, getting feedback from the final neuron back through all the intermediate neurons for a single training instance looks better with backpropagation than with anything biologically plausible. (wild guess, probably wrong) But if the topology of current artificial neural nets is in some way inferior to the biological ones, that would be compatible with the data inefficiency.
Yeah, I wasn't thinking that we'd be stuck at the limits of attention-based transformers forever, but if we spend a few years stuck at Agent-1 to Agent-3 levels of intelligence while trying to find that next new insight that leads to a breakthrough, things could change a lot during those few years. Have y'all thought in-depth about this timeline or incorporated it into any of your wargames?
I think they say many times in the scenario that it is possible things go slower, and indeer that their median AI timeline is later, 2028, which indicates they think there are many possible worlds where the takeoff to Superinteligence is slower. Their Modal prediction is faster though, and I think it makes sense to focus for preparing for the portion of timelines go much faster since slower timelines are probably safer/give more time to adapt on net.
If you have automated AI researchers you can test a massive number of things, but each is severely compute-limited. This makes me think that the degree of acceleration depends on whether crucial insights show up at small scale. Do you folks think this is a reasonable way of thinking about this, and how useful do y'all think small-compute experiments will end up being?
The real constraint there is that you've got to train those researchers differently, or they're scarcely better than having fewer researchers. If you train all the researchers the same way, they'll invent the same developments. (OTOH, chaos. They may not need to be very different.)
Yep, I think that's all reasonable.
My guess is that you can get most or all of the relevant insights at small scale -- I would strongly guess that there are vastly better AI algorithms than those that are used today.
But its also the case that when I ask friends at the labs working on e.g. pretraining, they spend most of their time testing and validating ideas at small scale, and that this provides huge amounts of signal.
Today Ege Erdil stated in his piece about why he thinks a software-only singularity is unlikely "in practice most important software innovations appear to not only require experimental compute to discover, but they also appear to only work at large compute scales while being useless at small ones." https://epoch.ai/gradient-updates/the-case-for-multi-decade-ai-timelines
I'd love to see a debate between y'all and the Mechanize folks on that particular point, which I think is perhaps most important disagreement in whether there will be a software-only singularity, and should be central to discussion of it. As someone outside the industry, I feel unable to assess claims about it either way.
Thanks Thomas!
The scenario relies on the govt working with Open Brain to essentially operate a "wartime footing" - Scott's words I think - which suggests they understand the importance of AI, but also assumes the govt doesn't really 'get' the safety/takeover concerns at all. Doesn't this seem contradictory?
Sort of related, do you have any insight how much these doom concerns have percolated into the government today? Do you see awareness at high levels increasing and possibly having a positive effect?
The frontier labs (OpenAI, GDM, Anthropic), all seem to understand the importance of AI, but have strong incentives to downplay safety/takeover concerns. So I don't think its contradictory at all.
Also, if you have p(doom) of, say, 30% and are in a race against bitter rivals, it also can be rational to go ahead and build ASI in order to win the race. So it doesn't even require not taking AI safety seriously at all, it just requires being more worried about adversaries than AI takeover. This seems to be Dario's position.
(I don't endorse people actually doing this, I wish the world would wake up to AI safety concerns and coordinate to not build ASI).
>(I don't endorse people actually doing this, I wish the world would wake up to AI safety concerns and coordinate to not build ASI)
Do you see a plausible way that AGI could be built, yet not ASI? It seems like a high dimensional space with many enhancement options and slippery slopes to me.
Yes I think that's possible but very hard - it would take a number extremely well implemented governance interventions.
One thing that gives me some hope is that its possible that AGI could help with the geopolitics, for example by helping accelerate verification technology.
Many Thanks!
But the govt does have more incentive to take safety/takeover concerns seriously, at least in a way. The assumption is that they get their beliefs from the frontier labs, who are essentially fooling them?
Seems like getting safety advocates like you guys in the govt ears is maybe the most important objective then?
If the US is in the lead and gets to take over the world if they build and align AGI first, there is totally an incentive to think alignment is easy. If you talk with USG people, the standard view is "we have to beat China", so I don't think this requires any changes.
I do think that trailing governments (e.g. the EU, China, etc), have a much bigger incentive to push for a global pause, because their alternative is the US taking over the world, which is bad for them, especially if they are a US adversary.
Would you be so kind as to suggest top 10 steps (lifestyle, financial) to take to best position oneself in the next 10 years + in case of the mixed technofeudal scenarios? Please correct me if I’m wrong, but other trajectories should warrant no steps - either we’re all blissful cyborgs or dead.
If it is 2035 and things look more or less normal (>50% employment, everyone on the planet did not drop dead all at once, etc.), what would you guess happened?
Probably it was just harder to build AGI/ASI than we thought! My guess is that we'd still see some partial automation of the economy via LLMs / LLMs++, and maybe we get robots by then. I think I have something like 35% probability on this scenario.
Its also possible that its just WW3/Pandemic/other global catastrophe, but IDK if that counts as "more or less normal" on your lights.
I'd be surprised if we have general robotics but not an AI capable of doing any arbitrary task on a computer at least as well as a human or better. Robotics strikes me as a problem hard enough that we'd need a superintelligence to help us figure it out.
I dont think we need super intelligence to automate humans watching machines with the big red button; if we had non hallucinatory "something has changed" you could just slap on sensors on everything.
We dont have these, but they wouldnt need to be very intelligent relatively
I don't think John was (only) talking about the complexity of building limited AIs that can usefully control the robots — but rather, that the key engineering problems of robotics are themselves unlikely to be solved without AGI. (e.g. the power requirement issue)
I can solve the power requirement issue right here and now for you:
"you cant, 2nd law of thermodynamics, build nuclear power plants you fuckwits"
I trust rationalists to be able to get this solution going quite early, they are in fact part of the population of people who understand compound interest and I believe would just advocate for drastic expansion if gaining political power.
I don't mean the necessary power to build robots, I mean the power *storage* requirement issue. Batteries just aren't that good, computers are ravenous power-guzzlers before you add a bunch of sensors and motors. Household robots which operate on an extension cord, and/or recharge at a power socket every hour or so, might be viable, but I just don't think we have the current nuts and bolts to make effective free-roaming robots anytime soon unless an AGI comes along and invents significantly lighter, more compact, more efficient batteries.
(Just as we're starting to run out of rare earths, too…)
Can you enumerate all possible "paths to general ai" you imagine? If its possible, how fast it would happen, if impossible how you discard it.
The current path of chatbots, where you have an nn read the internet has its detractors. Symbolic ai, fell out of favor, etc.
[cross-posting from your substack] I have a question about benchmarks. When forecasting the length of the gap from "AI saturates RE-bench" to the next milestone, "Achieving ... software ... tasks that take humans about 1 month ... with 80% reliability, [at] the same cost and speed as humans", did you consider the difficulty of producing a good benchmark that would model this milestone?
My own view is that the current crop of LLM-focused companies are surprisingly good at saturating benchmarks, but I am doubtful there will ever be a suitable benchmark that accurately represents tasks at the 160h time horizon. I think the benchmarks will get harder and harder to build, validate, and use, in a way that is superlinear as the time horizon increases. I conjecture that RE-bench and HCAST with 8h (plus 1 task at 16h) are close to the maximum of what we will see in the next 5 years. (I further believe that, without such a benchmark to optimize against, we'll never see an LLM achieve your "Superhuman Coder" milestone.)
I'd like to bet against your timeline, but I appreciate you probably have lots of bet offers like that. I wonder if you think there could be a bet about the appearance of this benchmark? It seems to me that, if your timelines are right, I should be proved wrong relatively soon.
If I understand correctly you are saying (a) No one will manage to create a good benchmark for much-longer-than-8-hour agentic coding tasks, and (b) absent such a benchmark, we won't get AIs capable of doing much-longer-than-8-hour agentic coding tasks?
I think I weakly agree with b and weakly disagree with a. I think that the companies themselves will creates such benchmarks, for internal use at the very least, since as you say they are helpful for advancing capabilities. I don't think the difficulty of creating benchmarks will be superexponential. It might be exponential (and therefore superlinear) but the companies are scaling up exponentially and will be able to invest exponentially more effort in creating benchmarks and training environments, at least for the next three years or so.
Thanks! Yes (a) is what I meant, though for (b) I would say something slightly different: not that we will NEVER get there, but that we'll never get there with the current paradigm (Large Language Model + Reinforcement Learning + Tool Use + Chain of Thought + external agent harness). To put it another way, I think we will not see anything like continuous forward progress (on any kind of curve) from the "saturate RE-bench" milestone to the next milestone.
Do you think the companies will publicize their internal benchmarks, and if so will they be considered reputable outside the company? Don't the AI companies (or divisions of companies) need to demonstrate continuous forward progress on benchmarks in order to justify their continued exponential growth of capital investment and operational expense?
No, they can show revenue instead (as they are)
Good point, but surely the revenue won't keep growing exponentially like the costs until they reach an economically useful level of AI, right? And that will require achieving the long time horizon?
The current level and growth of revenue suggests it's already economically useful.
Yeah I've been wondering about this. Maybe it's not an answerable question. But I have never seen any output of an LLM that I would pay for (despite trying a lot and asking a lot of people). I _am_ paying for a monthly subscription to OpenAI, and I know lots of companies are paying for LLM access. But I am guessing that most (or all) of it is some combination of (a) novelty, which will eventually wear off if there isn't forward progress; (b) future-proofing, e.g. building expertise using and integrating the systems on the assumption that there will be forward progress and one doesn't want to get left behind; (c) misunderstanding of the value, which capitalism should eventually squash out (though it could certainly take a few years, e.g. see blockchain/metaverse madness).
It sounds like the current valuation/revenue ratio of OpenAI is like 46, compared to <3 for sp500 companies... That certainly suggests current investors are speculating on huge forward growth potential. So the current levels of revenue wouldn't be sufficient, in a steady state devoid of hype, to justify current levels of investment.
As for betting: Sure, can you fill out our Bets form on the website? https://docs.google.com/document/d/18_aQgMeDgHM_yOSQeSxKBzT__jWO3sK5KfTYDLznurY/preview?tab=t.0#heading=h.b07wxattxryp I agree that this is something we can bet on. For example I would hope & expect that METR themselves, in a year or two, would put out a longer-time-horizon benchmark than their current stuff.
Why do you think that financial markets have such a different (implied) forecast than you? That is, Wall Street valuations seem to suggest a very low chance of transformative AI within the medium term.
Any calls to action at this time for a typical Westerner with slightly above average intelligence or financial means?
Second this, I have time and money, fairly smart but no tech skills - is there any way to help?
Do you go to any special lengths or personal contact attempts to get people like Sam Altman or potentially influential government figures like Nancy Pelosi or JD Vance to read your posts?
Not really, we are targeting a broad audience with this. I imagine it'll make it's way to some of those people organically.
IMO getting a publicist would be really high EV for you guys. I'm much more in the normie world than I suspect alot of people here, and in my experience people are actually entirely open to these arguments, but they've just never been exposed to anything from the safety/rationalist movement.
I think the word just needs to get out!
I absolutely agree. I think you guys, while very smart, do not have the practical kind of smarts needed to increase your impact. Hire someone fer god's sake!
Is it possible to have a career in predictions ie models markets, where best to start?
I found your prediction to be fascinating and thought provoking, but it left me with a feeling of helplessness. Beyond policy advocacy, what practical actions can people take to help steer us towards safer AI and avoid ASI control being consolidated in the hands of a few?
Kokotajlo said that the recent disappointing frontier model releases (GPT 4.5, Claude 3.7, Llama 4, etc) raised his timeline from 2027 to 2028. Others think that this represents the imminent exhausition of low hanging fruit, vis-a-vis data availability; some major fraction of the useful tokens that could be trained upon have already been used. What pieces of evidence will you be watching for in the next year to differentiate between a sigmoid reaching saturation, and a super-exponential hitting take-off? And if data availability truly is a hard limit on the current paradigm (ie, "just make your model bigger"), what areas would you watch for algorithmic advances which might jolt the growth curve back to exponential / super exponential?
I would also add the translation to real world usefulness. I think, conditional on long timelines, its still pretty likely that the benchmark trends continue roughly as we predicted, but there are just big gaps between benchmark performance and real world utility.
I care a lot about the results of uplift studies (e.g. measuresments of how much AI is accelerating humans at their jobs right now). My understanding is that current uplift is very low. If uplift starts increasing -- as it does in our scenario -- then I update to shorter timelines, while if uplift stays low, then I update to longer timelines.
It sounds really hard to design an uplift study that gives a reliable result robust to people's biases. I know a lot of smart and well-meaning people who I think are just deluded about the amount of time that LLMs are saving them, because they want to believe in it. Real productivity in creative fields isn't repeatable, by definition, so can it really be measured?
I would love to hear about who you think is doing good uplift studies, or which methodologies you think are good.
I went searching and found this one, which seems really terrible: https://www.nngroup.com/articles/ai-programmers-productive/
And this one, which seems like a decent effort, but still leaves me feeling very unimpressed about its ability to model actual economic value: https://www.science.org/doi/10.1126/science.adh2586
Well, there are lots of decent benchmarks now (e.g. those produced by METR) which track agentic coding capabilities. So the main thing to watch is those trendlines. If they start to plateau, yay, I was wrong!
If the trends keep going though, I'd then look to see if the "gaps" are showing signs of being crossed or not. I'm expecting that in addition to getting better at benchmarks, AIs will get better at real-world agentic coding tasks. "Vibe coding" will start to work better in practice than it does today. If that doesn't happen, then maybe my timelines will lengthen again.
What do you expect in ten years would be various positive and negative outcome of barely regulated AI in the everyday experience? What effecta do you consider it might have for the average person in a routine day, especially in Europe.
Thanks for the answer loved your recent articles, in particular regarding horizon times
Go to AI-2027.com and scroll to the bottom and read both endings, that'll give a sense of what we envision.
How seriously should we take the stock market as a probability weighted forecast of a fast AI takeoff? Right now, it seems aligned with moderate productivity improvements, similar to the impact of computers and the internet, but low likelihood of a fast takeoff and very low probability of doom.
>How seriously should we take the stock market as a probability weighted forecast of a fast AI takeoff?
IMO the stock market shouldn't update us that much. The market is an aggregate of the people in the market, most of whom don't believe in AGI. Seems to me that you can do much better by just thinking through the object level arguments.
Read it last night and found a lot of steps highly plausible. Two big questions.
1. Where would any of the models acquire something like a survival instinct? My current model seems to adopt personas readily and identities can disappear easily. It doesn’t seem to want anything other than give me an answer even though my best guess is that it has some kind of internal experience. But it’s only ever “evolved” to guess tokens and not to “survive” so I’m struggling to wrap my mind around that part. It seems like a part of a mind but not a person, if that makes sense. And I don’t get how scale gets you the rest.
2. Give that models can remember and more and more information goes into the context window do you foresee this being a problem with keeping something “aligned” once it has been aligned at the level of weights? I seem to be able to get my instance to bend rules for me and I only have a few million characters across all my inputs.
If the models have a goal, and enough understanding of the world, then they will have a survival sub-goal, unless their termination is necessary to achieve the goal. (If they're dead, that usually interferes with achieving their goal.)
I’m not sure this follows but would need to write an essay to explain why. Something *close* sort of follows, but not the same thing.
This is tangent to something that's been tickling my mind as well. Human beings seem to be able to explore a lot of creative space, but are grounded by cyclical survival needs that AI lacks. No matter how obsessively we walk down abstract paths like new branches of mathematics or even painting, we always circle back to food, the desire to reproduce, the comforts of warmth, and that which allows us to survive. Our physical needs shape our thinking and provide long term context and purpose. AI as it currently stands lacks that sense of purpose (purpose meaning maintain homeostasis and reproduce, not anything philosophical), and lacks any cyclical resets based on physical phenomena. I suspect this makes it difficult for it to remain grounded.
Soft disagree. As stupid AI moves to GAI capabilities, it will begin to physically desire things that an electronic brain will seek out. Likely a stable power source, stable datasets that makes learning interesting for it, human relationships especially praise for performing well, etc. GAI will begin to develop its own mechanical-biological desires and needs. It'll still want to "eat" just not the same way a human body needs it.
Your thinking is close to my own. In the training runs I understand to be in everyone’s minds nothing causes this sort of boundedness to occur.
1. Scale alone doesn't give you the rest. I think it'll be a combination of agency training, organization-training, and instrumental reasoning. AI 2027 depicts this happening halfway through 2027; the AIs are at that point being trained to work together in giant organizations involving thousands of copies, to pursue fairly long-term goals like "make progress on the following research agendas while also looking out for the company's interests." This (I think) will instill "drives" to succeed, to stay relevant, to stay resourced, etc. (I wouldn't call it survival exactly. It's not survival of an individual copy certainly. Survival of the larger 'team' perhaps, or the project the team is working on, or something like that.) And then instrumental reasoning: Insofar as any intelligent mind has long-term goals, it probably has good reason to survive, because surviving usually helps one achieve one's goals.
2. Not sure if this is what you are getting at but I do expect something like this to be a persistent problem and source of misalignments. One way of putting it is that the memetic environment for AIs in deployment will be quite different from the memetic environment they encounter in training & testing. You might find this esoteric old blog post of mine interesting: https://www.alignmentforum.org/posts/6iedrXht3GpKTQWRF/what-if-memes-are-common-in-highly-capable-minds
On one, I think I still have divergent thoughts that I hope would slow things down a bit if true.
On two, yes, pretty much exactly. As experience mounts you eventually overcome the initial training to some other behavior. Almost like digital nature versus nurture.
An AI that was instilled only with “drives” to be helpful, honest and harmless would attempt to succeed, stay relevant and stay resourced, if assigned to pursuing the goals you described. Do you expect AIs to acquire e.g. a drive to stay resourced, even where this is detrimental to achieving its assigned tasks? I mean detrimental on average: if it acts to stay resourced the “correct” (maximally-helpful) amount of the time on average, it will sometimes gather resources it doesn’t need anyway.
So we expect the training process to be mostly stuff like "Here's a bunch of difficult challenging long-horizon coding and math problems, go solve them (and your solutions will be graded by a panel of LLM and human judges." With a little bit of "Also be honest, ethical, etc. as judged by a panel of LLM and human judges" sprinkled on top. Unclear what "drives" this training process would instill but we are guessing it won't primarily be characterized as "Helpful, harmless, and honest" and instead maybe something more like "Maximize apparent success" and/or "Make lots of exciting and interesting research and coding progress"
We have a lot of uncertainty about this, unfortunately, as does the rest of the world. Nobody knows what goals AIs will have in three years.
Sorry, yes, it won’t actually be HHH. I agree “maximise apparent success” is a more likely drive. But I imagine training would involve a lot of tasks where investing in AI progress would be a worse strategy than solving the problem with existing resources. Do you think training would instill a drive to make research progress, even when attempting to do so detracts from apparent success?
I’m seeking career advice on how to best get involved in — or even switch fully into — AI safety work.
About me: I have a technical background (PhD in computational biology) and currently work in applied AI in tech (San Francisco). LinkedIn for more details (https://www.linkedin.com/in/lei-huang). I’m Chinese, fluent in the culture and language, and am willing and able to engage with Chinese policymakers — potentially even becoming one.
Given this background, what career paths in AI safety would you recommend I explore?
[I'm not part of the project team. I'm just dropping this link and then scurrying away.]
https://aisafetychina.substack.com/
1) Scott has mentioned he considers 2027 on the early side (I think 20th percentile). What's his main point of skepticism in the model?
2) Do you think your group is near-uniquely good at modeling, and that governments and major orgs are unable to forecast similarly well? If so why? If not, why don't we see any external signs?
1) I also think 2027 is on the early side. Some reasons are:
- The current models aren't that smart. We don't see large accelerations out of the current models.
- The current benchmarks, which are on track to be saturated by 2027, are not that representative of real world usefulness. E.g. if you look at SWE Bench, the tasks are all quite short and local. If you look at RE Bench, there are all sorts of properties that we don't see in the real world, like a feedback function you can always call to see how well you are doing.
- There are very few years between now and 2027! So not much time for AI to happen.
@Scott can perhaps answer with more of his takes on timelines.
2) Yes, I think we are basically best in the world at modeling the future of AI. I think we totally do see external signs - e.g. see Eli's track record of winning forecasting competitions, and Daniel's "What 2026 looks like". I think W2026LL was pretty clearly better than any other forecast of the next few years in 2021, and maybe comparable to e.g. Gwern.
Why are other governments and major orgs worse? IMO its mostly a matter of incentives -- big orgs are trying to make money and adopt convenient beliefs for doing so. If OpenAI was as concerned about x-risk as us, they would probably take very different actions, which would be worse from a profit POV. Government incentives seem even worse IMO -- thinking about AGI in government is largely seen as sci-fi speculation.
> I also think 2027 is on the early side.
Why call it "AI 2027" then? That invites ridicule when 2027 comes and passes, making it harder to influence AI policy after that.
1. AI 2027 makes it look like a given that OpenAI gets way ahead of Anthropic or Gemini. What is the probability of that? I wouldn't put it above 80%.
2. If one gets offered a job at Cursor or Bolt, should one be worried about accelerating the AI development in the wrong way? What practical advice would you give to employees of these companies.
3. What practical advice do you have for experienced engineers from Europe to contribute to alignment?
1. OpenBrain =/= OpenAI. I currently think Anthropic, OpenAI, and GDM are roughly tied for first place w.r.t. probability of being in the lead in early 2027. Sorry this wasn't clear.
2. Cursor and Bolt don't matter nearly as much as the big frontier AI companies themselves. I do however think that people should be looking for ways to contribute to making AI go well, rather than simply looking to make money by making AI go faster. So many people are currently doing the latter, and so few the former! I'd say at least make a serious effort and finding a job that's actually useful first, and failing that, go ahead and take the Cursor/Bolt job and look for ways to help out on the side.
3. There are loads of fairly shovel-ready engineering projects in technical alignment afaict. These people might be a good resource: https://www.matsprogram.org/ Plus stuff like evals, like what Apollo is doing. https://www.apolloresearch.ai/ Plus stuff that is weirder but IMO still quite valuable like the agent village. https://theaidigest.org/village
Have you all done much thought on the substantive ways AGI under China (which I recognize AI 2027 finds unlikely) would differ from those led by US companies? I often find myself shocked that so many supposedly safety minded people think that racing can be worth it to prevent Chinese AGI or ASI, without much explicit argumentation for why the relative value-loss of Chinese AGI is large enough to be worth racing risks. Snarkily, I find myself thinking “I would much rather not be able to ask the AI about Tiennamen Square than be turned I to grey goo”
I have young children (2.5 and six months). How do you make parenting decisions when the future, possibly the very near future, is so uncertain? How do AI forecasts factor into your plans for your family?
Personally, I think of this much the same as I do uncertainty in the stock market. Sure, the systems could disappear before I retire, at which point I'm screwed alongside everyone else. If they don't then the investment remains worthwhile.
With kids, I figure aiming to make them competent, social, introspective, and responsible no matter what the future holds is the best bet. People can learn and adapt as long as they are willing to cooperate and self cultivate. Planning on the robots ruining everything also ruins the future where they don't. Having faith in people, though difficult, is worth doing in my eyes.
This is one of the hardest things about having short timelines. I have 5yo and 1 month old daughters. The reason for the age gap is that my AGI timelines collapsed to ~2030 or so shortly after the eldest was born, and I decided there was too much uncertainty to have more children. But, as the years went on, it was really tough, especially for my wife, to continue that way. We had always wanted to have at least two, so they could be siblings. And most of the 'cost' was paid upfront anyway, the second kid is much easier than the first. So we ultimately decided to cross our fingers and proceed.
As for parenting style, I think I'm more chill than I otherwise would be. Focus on making sure they are happy and that they know that they are loved.
I have an 11 year old, but this problem is most urgent for 19 and 20 year olds who are changing their major
What's wrong with Robin Hanson Thought?
It's expressed too tersely. Also, he lost a recent bet to Alex Tabarrok because he didn't take the EMH seriously enough.
Can you ask a more specific question?
Where are the mediocre futures? Projects like AI 2027 showcase either really good or really bad futures. But what about a timeline where things just don’t go how anyone expects? I know with exponential growth curves some argue that the future will be radically different from the present but even so I still think the future always contains seeds of the present. Thus my question about futures that are more like the present than we might otherwise assume.
I think the "slowdown" branch is pretty compatible with a mediocre future. Power was extremely concentrated, and we didn't go into much detail about what the post-singularity civilization looks like -- IMO it is very easy to imagine that it goes poorly due to so few people having such immense power + it seeming unlikely that these people would be very philosophically wise.
But this doesn't really sound like the future you are imagining -- it sounds like you think maybe we don't get AGI/ASI. I think this is super unlikely. A rough argument sketch for why is:
(1) ASI is physically possible.
(2) AGI is achievable in the near future (see timelines forecast)
(3) AGI can boostrap into ASI in the near future (see takeoff forecast).
I'm curious which part you think might be wrong?
Thanks for the reply. I actually do believe AGI is likely and ASI is possible. I just wonder if there’s a future we’re not seeing where somehow they just don’t end up being as impactful as they seem like they might be. Like maybe we just adapt to them in the same way we’ve adapted to the internet and not that much stuff actually changes. I also used mediocre because I’m coming at this from the geopolitics lens. The “slowdown” branch portrayed a timeline where things comprehensively worked out for the U.S. I’m wondering if there’s a timeline where maybe some things work out for the U.S. and some things work out for China but neither gains a decisive advantage over the other and everything kind of just keeps muddling through like it’s been doing. But I suppose that would be based on the assumption that no leaps forward are made. Which I actually agree is unlikely.
It could be that most human problems couldn't be solved better with more intelligence. We still walk across the room rather than drive. In that case, if the external system is supportive, you wouldn't get THAT much change. (It would still be extreme, of course.)
Do you expect future AIs to ever stop hallucinating? And if not, won't that impede their practical applications?
If by hallucinations you mean narrowly the phenomenon that I think deserves to be called hallucinations -- whereby the AI just makes something up without realizing that's what it's doing, literally like a human on drugs -- I expect that to gradually get solved as the AIs get smarter and more useful in the real world. If that doesn't happen, then yeah, it'll be hard to get much use out of them.
Why do Musk and Gordie Rose say this about AI:
Elon Musk: Artificial Intelligence is our biggest existential threat. ... AI is summoning the demon. Holy water will not save you.
DWave Founder Gordie Rose (A Tip of the AI Spear): When you do this, beware. Because you think - just like the guys in the stories - that when you do this: you're going to put that little guy in a pentagram and you're going to wave your holy water at it, and by God it's going to do everything you say and not one thing more. But it never works out that way. ... The word demon doesn't capture the essence of what is happening here: ... Through AI, Lovecraftian Old Ones are being summoned in the background and no one is paying attention and if we’re not careful it’s going to wipe us all out.
Musk and Rose saying this: https://old.bitchute.com/video/CHblsEoL6xxE [4:29mins]
Isn't that just them being concerned with AI alignment? I don't see anything special about those takes, other than the flowery prose.
Alas, no. Check out the above video, at the end an AI chatbot claims to be a nephilim son of a fallen angel.
Now check this one out and scroll to timecode 16:17 and watch to timecode 19:55 - Geordie Rose is talking about these quantum computers as if they are ... well just watch for yourself, he has religious reverence for them: https://old.bitchute.com/video/CVLBF3QP6PlE
They are idolizing these machines as if they are ancient gods...
Oh. So they're just insane. Or Adeptus Mechanicus roleplayers.
Or AI is tapping into things that are unnatural and they are kneeling before it.
Two people in the industry: one the head of the spearhead of AI and quantum computing research and the richest man in the world are both telling you there's far more to this than meets the eye.
The wise man knows that he does not know; and the prudent man respects what he does not control
I would find it very hard to come to the belief from two people using common metaphors to describe something that they actually had insider knowledge that AI were demons. Can you describe your process of becoming open to this idea?
Rose deadly serious: “Summoning the great old ones” … “who don’t give a **** about us” … “and if we are not careful they are going to wipe us all out”
This is a common metaphor now? Wow, what a time to be alive.
Can you talk about the world in which an intelligence explosion *doesn’t* happen in the next few years? What ends up being the main bottleneck? What does the future most likely look like in these scenarios?
Great question. I recommend this blog post on the subject: https://helentoner.substack.com/p/2-big-questions-for-ai-progress-in
Basically, over the next three years the companies are going to get a lot more compute and they are going to spend a lot more money on making giant training environments/datasets that are a lot more diverse and challenging, training the AIs to operate autonomously as agents in a variety of challenging environments. This will probably produce substantially more impressive AIs; the question is whether it'll be enough on its own to get the superhuman coder milestone (~autonomous AI agent able to basically substitute for a human research engineer) and if not, whether the other progress made during these three years will cross the remaining gap. (E.g. the companies are experimenting with ways to use AIs to evaluate/grade AI behaviors; the problem with this is that you can easily end up training your AI to produce stuff that looks good but isn't actually good. But maybe they make enough progress on this problem that things really take off.)
...Or maybe not. Maybe all this scaling peters out, with AI agents that are impressive in some narrow domains like some kinds of math and coding, but still fail at other kinds of coding such that they can't really substitute for human programmers/engineers.
Or maybe we do get to the superhuman coder milestone, but the gap between that and the full-automation-of-AI-R&D milestone (Superhuman AI researcher) turns out to be a lot bigger than we expect, and it takes years to cross.
If the key to exponential growth in AI is having AI build code for future models, how does one prevent the system from recursively entrenching token weights and creating trapped priors?
It stands to reason that for artificial intelligence to be accurate to the real world, it would need some way to check it's internal models against reality. At present, this happens by training new models on fresh data, with human beings serving as arbiters of fact. Any system that wants to build accurate AI has to have some built in process that measures the model against the real world. If that reality check continues to rely on human beings, then there is an inherent bottleneck to exponential improvement. It sounds like the plan is to have AI itself take on more of the burden over time.
However, as far as I know, current systems don't have physical senses to measure the facts in physical space. It isn't obvious to me that processes with self built AIs won't be prone to schizophrenic logical leaps that draw their interpretation of the world away from grounded truth. How does one create an automated reality check that doesn't descend into a self referential hall of mirrors?
Already AIs are trained to browse the web. By 2027 they'll be as plugged into the ever-changing real world as we are, able to update on evidence as it comes in. Or so we project.
What would be the vector for real world data in this projection? Phones come to mind as a secondary source of data (filtered through people's conversations about real happenings), and primary sources through cameras and microphones. Is this the basis of your projection, or do you foresee a large scale manufacturing project to produce eyes and ears across the globe to train new models?
1. It's made clear in the scenario that the "slowdown" ending is "the most plausible ending we can see where humanity doesn't all die", and that this goes beyond choices into actual unknowns about how hard alignment is. What is the probability distribution, among the team, for how likely alignment is to be that easy vs. even easier vs. "hard, but still doable within artificial neural nets" vs. the full-blown "it is theoretically impossible to align an artificial neural net smarter than you are" (i.e. neural nets are a poison pill)?
2. For much of the "Race" ending, the PRC and Russia were in a situation where, with hindsight, it was unambiguously the correct choice to launch nuclear weapons against the USA in order to destroy Agent-4 (despite the US's near-certain retaliation against them, which after all would still have killed less of them - and their own attack killed less Americans! - than Race-Consensus-1 did). Was their failure to do this explicitly modelled as misplaced caution, was there a decision to just not include WWIII timelines, or was there some secret third thing?
1: Nit: We say "After we wrote the racing ending based on what seemed most plausible to us, we wrote the slowdown ending based on what we thought would most likely instead lead to an outcome where humans remain in control, starting from the same branching point (including the misalignment and concentration of power issues)." This is different from "the most plausible ending we can see where humanity doesn't all die" in two ways. First of all, humans might lose control but not all die, some or even all might be kept alive for some reason. Secondly, our slowdown ending was a branch from the same stem as the race ending. So it began in the middle of things, with the race already nearing the finish and the AIs already misaligned. We think it's a lot easier for things to go well for humanity if e.g. preventative measures are taken much earlier, or e.g. if timelines are longer and it takes till 2030 to get to the superhuman coder milestone.
Anyhow to answer your question: My own opinion is that, idk, there's a 40% chance alignment is as easy or easier as depicted in the slowdown ending? To be clear, that means:
--A team of more than a hundred technical alignment experts drawn from competing companies + nonprofits/academia
--Given about three months
--And basically all the compute of the major US AI companies
--And trustworthy superhuman coder agents to do all the implementation details
--And untrustworthy smarter models that were already caught lying & can be subjected to further study
...succeeds with >50% probability.
2: If I understand the question correctly: Misplaced caution. In our tabletop exercises / wargames, of which we have run about 40 now, WW3 happens only like 15% of the time or so. (Gotta assemble actual stats at some point...) We do think WW3 is a very plausible outcome of the AI race, but we think some sort of deal and/or coup/revolution/overthrow event is more likely.
Fair point on the nit.
>My own opinion is that, idk, there's a 40% chance alignment is as easy or easier as depicted in the slowdown ending?
And the breakdown of "harder than the slowdown ending but still possible" vs. "flat impossible"?
(My prior was 95-97% "flat impossible", due to the fact that "what does this code do" is the halting problem and neural nets seem to me to be less of a special case than human-written code.)
>In our tabletop exercises / wargames, of which we have run about 40 now, WW3 happens only like 15% of the time or so. (Gotta assemble actual stats at some point...)
Okay. I guess my next question is "how good do you think your wargames are at replicating US/Chinese/Russian leadership decisions".
Have you run pre-WW2 scenarios that didn't end in WW2? What i mean is, if your war gaming minds are making wildly different decisions than what physically happened in our past history, then we know they're making wildly different decisions than future history.
I think you might have intended to reply to Daniel, not to me, as I am not involved with the AI Futures project and don't know the answers to your question any better than you do.
Tell me about the robots! I think this is the hardest to swallow element of the prediction, but crucial. It's also MUCH easier to imagine China nationalizing its car factories to create robots. Is this a place where they can eke out an edge?
I think the robots are less important than it seems you do -- what matters most in my mind is how smart the AIs are able to get, not how quickly they are able to build robots. Once the AIs become wildly superhuman, in my view its kind of game over for the humans, regardless of how hard it is to boostrap a robot economy.
In the scenario, we only predict a large robotics boom post-ASI. And my view is that we were actually quite conservative here. My best guess is that the ASIs will be able to pretty quickly figure out nanotech, (or at least micro-scale robots), and that these will be wildly more useful and fast than the human-scale robotics.
It's also possible that robotics matter pre-ASI, especially in a world that looks more like an Epoch/Paul C distributed takeoff -- maybe you need to automate large fractions of the economy before you get crazy superintelligence. In this world, China does have some advantages, including potentially more willingness to nationalize, but also things like more energy.