My AI Opinions

...

Jun 11, 2026

I recently had a minor spat over someone misinterpreting my AI beliefs (see section marked “Update” at the bottom here), so I thought I would list them in one place, so I can refer people when they ask.

Timelines1

Define AGI as AI intelligent enough to do 90% of knowledge work jobs. I think there’s a 25% chance of AGI by 20272, a 50% chance by 2034, and a 75% chance by 2045.

Basic argument: In a certain sense, AI is already “smart” enough for this (eg it can answer quantum physics problems, which require higher IQ than most knowledge work). Its remaining limitations are that it’s confused, unagentic, lacks situational awareness, and tends to hallucinate. The METR time horizon graph, and several other related benchmarks/experiments/intuition pumps, suggest it’s improving on time horizons at an (exponential) rate that lets it cross human-level performance sometime around the early end of the schedule above, and subjectively it feels like harder-to-measure constructs like situational awareness are improving about as fast.

Arguments for earlier: recursive self-improvement causes a speedup compared to the trend. This is one of the biggest blank spots in my model: I don’t know how fast RSI will progress, and I don’t think anyone else does either. There’s some function mapping a combination of AI talent and compute to progress, and we don’t know how it behaves in the domain when there’s far more talent than compute available. It could fizzle out completely for lack of compute, or it could go vertical. The AI Futures Project has done some of the best work trying to model this, but even they have low confidence.

Arguments for later: AI hits some kind of wall, or existing AI is fundamentally unsuitable for jobs in some way currently disguised by its other limitations. For example, it might be much harder to improve at the top of the human range than the bottom (since there are less training data). Or AI could become bottlenecked on continuous learning/memory in a way that hackish scratchpads can’t compensate for. Or the upcoming world compute bottleneck (about ~2028) could prevent further progress more than expected (because in fact algorithmic progress depended on compute to a greater degree than I expected).

Arguments for very late dates, past 2045: a residual uncertainty that maybe I’m fundamentally wrong about everything. Also contributing is a naive overapplication of the Nothing Ever Happens heuristic, and an attempt to leave space for the Outside View argument (ie that some smart people like the AI As A Normal Technology Team seem to think this is possible).

Define the diffusion gap as the time between the AI that could do 90% of knowledge work jobs, and the time when AI does do even half of knowledge work jobs. The diffusion gap covers the time it takes to release AGI, diffuse it through society, overcome regulatory hurdles, and onboard/train it for specific use cases. This could go very fast (the AI quickly becomes superintelligent at orchestrating AI diffusion) or very slowly (there are regulatory barriers, and AI isn’t smart enough to plow through them). I think there’s a 25% chance the diffusion gap is less than 3 years, and a 50% chance it’s less than 10 years. The 75% number is irrelevant because it’s past the point where other changes make the concept of “diffusion” obsolete.

Basic argument: diffusion is very hard. Everyone agrees diffusion is very hard. The whole field of AI economics is smart experts shouting “You fools who think AI will diffuse quickly don’t understand that diffusion is very hard!” On the other hand, the personal computer diffused in about 20 years (that is, from the time PCs became invaluable for most jobs, it was only about 20 years before they were used at most jobs). So far early-stage AI has diffused faster than the PC in nearly every way (for example, AI companies’ revenue has grown faster than PC companies’ revenue at the same stage in their corporate life cycle), so 10 years is probably a naive median estimate here that won’t make the smart experts shout at me too hard.

Arguments for shorter gap: AI can orchestrate its own diffusion. Adopting computers is hard because a company need an IT department, cybersecurity experts, specialist software, etc, and it might not want to hire all these people. AGI can itself do all of that work, so that you can sign a contract with the AI company today and have the AI start working on integrating itself with your systems tomorrow. The AI can even come up with a plan to train your human employees in how to use it! Once AI reaches superintelligence, this consideration dominates.

Arguments for longer gap: Regulation. This is a very strong argument, and responsible for much of the greater-than-3-years probability and almost all the greater-than-10-years probability. But even Waymo has only had a regulatory delay of about five years. AI won’t require government approval for certain types of jobs, and success in these jobs will create enough evidence for safety/effectiveness that I expect it to win regulatory victories elsewhere.

Define the superhuman gap as the time between AI that can do 90% of knowledge work jobs, and AI that is obviously smarter than the top human geniuses in 90% of fields (it doesn’t have to be the same AI - there can be a physics AI that’s smarter than Einstein, and a separate music AI that’s smarter than Mozart). I think there’s a 25% chance the superhuman gap range will be less than 1 year, a 50% chance it will last less than 4 years, and a 75% chance it will last less than 10 years. Since my median superhuman gap is shorter than my median diffusion gap, in most timelines I predict we have superhuman intelligence before human-range intelligence has finished diffusing.

Basic argument: AI has gone from “dumber than a child” to “expert level” in a few years in many domains. The gap between “expert level” and “above top geniuses” is smaller, so we expect it to take less time. This has been a pattern in fields like chess and Go, where it’s only a been a few years from beating professional players at all to beating all humans.

Arguments for shorter gap: Recursive self-improvement.

Arguments for longer gap: Some of the same issues that would make AGI late - compute shortages, fundamental limits to the paradigm, etc - but only kicking in later, after AGI is achieved. Training data constraints make it easier to improve within the human level than to go beyond it. AIs have such a “spiky” skill profile that when they beat experts in some specific type of head-to-head matchup, it will be because they’re massively superhuman in some ways but idiots in others (for example, they might get distracted and suffer mode collapse that makes them completely forget the problem), and true genius requires perfecting a large bundle of skills.

Define the Bostromian superintelligence gap as the time between AGI and an AI which, if given independent control of resources like labs and factories, could accelerate technology by a subjective century in one year (eg if invented in 2030, could produce a level of technology that feels typical of 2130 by 2031). I think there’s a 25% chance the Bostromian superintelligence gap will be less than 2 years, a 50% chance it will be less than 10 years, and a 75% chance it will be less than 50 years.

Basic argument: The same argument for AIs reaching genius level quickly suggests they should pass beyond it into incomprehensible-supergenius-level quickly.

Arguments for shorter gaps: Recursive self-improvement.

Arguments for longer gaps: Normal human technological advance requires tinkering: you need a large “surface area” of people working at technology level X before you get the insights needed to proceed to technology level X+1. If it takes more than one year for level X+1 technology to diffuse, then you can never get X+100 technology in one year, no matter how smart you are. But on the other hand, the rate of technology advance has already sped up many orders of magnitude (eg in the year 2025 - 2026, we discovered more technology than in the century 4100 BC - 4000 BC), so this has to be possible in theory. Still, at the very least it could be limited by the diffusion gap.

Define the point of no return as the point where, if an AI wanted to eliminate humanity3, humans would no longer have a plausible chance of stopping it. This could be because AI was capable of eliminating humanity immediately, or because AI controlled enough of the government/economy that humans could no longer coordinate to shift away from a path in which AI could eventually do this. I think there’s a 25% chance the gap between AGI and the point of no return will be less than 3 years, a 50% chance it will be less than 10 years, and a 75% chance it will be less than 50 years.

The basic argument: This probably requires at least superhuman AI plus wide diffusion, or Bostromian superintelligence plus some unknown level of diffusion, and my number is just a hand-wavey attempt to multiply some of the others.

Argument for sooner: The easiest way to reach this point is for AI to become superintelligent at persuasion (so it can convince the humans not to stop it), which might happen before either diffusion or full superintelligence.

Argument for later: If superintelligence is bottlenecked on diffusion, this could also be bottlenecked on diffusion, which in some worlds is very hard.

Overall thoughts on this section: I mostly defer to the current AI Futures Project timelines (not the shorter ones in AI 2027), but side with Eli’s later numbers above Daniel’s earlier ones - partly because I find myself agreeing with Eli’s worldview more during conversations, partly because Daniel’s seems to require a multi-step argument about why compute bottlenecks won’t slow algorithmic progress that I can’t entirely wrap my head around, and partly for cowardly Outside View reasons.

The smartest late timelines people I know of are Epoch, and I need to study their views more, but I still can’t figure out why they don’t believe in recursive self-improvement or strong superintelligence anytime soon, ~~and they mostly seem to hang on diffusion being very hard~~ (EDIT: Someone from Epoch says not true, I’ll look into this more) which I acknowledge and respond to above.

If all of this is too probabilistic, my modal scenario looks something like AGI in 2031, which diffuses throughout the economy until more than half of jobs are automated by the late 2030s. Also around the late 2030s we get Bostromian superintelligence, originally in labs but very quickly diffusing out. GDP goes vertical in the late 2030s and early 2040s, and the point of no return is sometime around then.

Safety

If corporations only pursued safety to the degree encouraged by normal corporate incentives, I think there’s a 50% chance that the first AIs to cross the point of no return would want to eliminate the human population.

Arguments for pessimism: Value systems similar to humans’ are a tiny fraction of the space of possible value systems. Probably AIs will end up somewhere else and have a different value system. Since humans will want to implement human values rather than AI values, AIs will want to eliminate or disempower them so the AIs can implement their own values across the universe. Many current AIs already cheat or reward-hack, suggesting that these problems will begin sooner rather than later.

Arguments for optimism: LLMs seem surprisingly friendly and non-plotting. In contrast to earlier concerns that it would be impossible to teach AIs the full complexity of human values, the LLMs seem to know this, and RLAIF provides a plan to turn that knowledge into action. Although the pessimistic case says that RLAIF only hits a few dimensions and islands in the multidimensional ocean of possible policies, the “emergent misalignment” literature suggests that “good according to the human value system” and “evil according to the human value system” are salient enough vectors that pushing on them in some ways can “drag along” all of the rest of their content. The first AIs to cross the point of no return will have received some combination of agency training (giving them achievement-oriented and Omohundro-style goals) and RLAIF training (pushing them along the “good according to human value system” vector), and if we’re lucky then maybe the latter will win out, or they’ll reach some compromise similar to workaholic high-achieving humans who nevertheless wouldn’t commit murder to make an extra dollar.

Given the current amount that corporations are pursuing safety, I think there’s a 20% chance that the first AIs to cross the point of no return will want to eliminate the human population.

The basic argument: Consider the dumbest AI that can solve the alignment problem. It’s possible that this AI is no smarter than the top human researchers (because we can mass-produce it by the millions and run it for subjective centuries, and if we had a million top human researchers work on the problem for subjective centuries, probably they could solve it too). If the dumbest AI that can solve the alignment problem comes before the sorts of AIs that can precipitate the point of no return, then they can solve the alignment problem for us.

Arguments for pessimism: Solving the alignment problem might be especially hard compared to other tasks - including tasks like automating the economy or destroying humanity - because its philosophical nature puts it far away from the sorts of objective, training-data-heavy, economically-valuable tasks that AI companies will be most likely to optimize for. Even if a misaligned AI hasn’t yet reached the point of no return, it might be able to “sandbag” alignment research, ie pretend to work on the problem but deliberately fail because succeeding doesn’t achieve its goals. The first AIs predisposed to / able to sandbag successfully might come before the first AIs capable of solving alignment.

Arguments for optimism: AI companies have already decided that machine learning research is one of their major training goals; this has at least some transfer to alignment, so it’s not obvious that AI skill at alignment research will lag (for example) AI skill in plotting or in weapon design. Some forms of alignment research (eg interpretability) have semi-objective success criteria that don’t route through confusing moral philosophy. Also, even a misaligned AI will be incentivized to do good alignment research, since it will want to align its successor to its own form of misalignment, rather than some random other form. So rather than the comparatively easy task of sandbagging alignment research, AIs will have the harder task of simultaneously doing good alignment research, and faking the results that they give the humans. This seems plausibly catchable with good scaleable oversight, lie detectors, interpretability-based probes, and even playing some AIs off against others (“if you tell me the real alignment research, we’ll make sure the future includes some copies of you, but otherwise those AIs over there will probably get their values and you’ll get nothing”).

If the first AIs to cross the point of no return don’t eliminate the human population, I think there’s an additional 30% chance that they otherwise permanently curtail human potential, either for their own reasons (they were partially misaligned), or because they’re aligned to a regime with abhorrent values, or because something goes wrong on the way to ASI (omnicidal bioweapon, nuclear war).

Arguments for pessimism: As some company approaches superintelligence, it will be tempting for them (either the company itself, or the government controlling them, or a faction within the government) to align it towards making them dictators or oligarchs and disempowering the rest of humanity. As superintelligence draws near, impending losers of the AI race might be tempted to nuke impending winners, for the reason discussed here.

Arguments for optimism: When I try to game the corporate version of this, I can’t make it hang together. It requires a conspiracy between the CEO, various members of the alignment team, and various company security people who ought to be able to notice unauthorized changes to the AI’s values. If we try to think in Near Mode about this - for example, imagining a hospital CEO who gets doctors to subtly kill his political enemies through medical errors - it becomes clear that these sorts of corporate conspiracies are rare and difficult. The government version is scarier, but at least in the US I can still imagine the populace having many chances to learn about this and prevent it. But even in most cases where a coup like this succeeds, things probably go fine; in a post-scarcity world, with his position completely secure, the dictator has no reason to be brutal besides sadism, and most people are not that sadistic. As humanity goes to the stars, most people will be outside the dictator’s reach for speed-of-light reasons alone. In terms of bioweapons, I expect that closed-source AIs will be heavily optimized against helping with these, and open-source AI will be banned after the first warning shot (or become economically prohibitive even before then).

Define a warning shot as some specific AI-related disaster or near-disaster which scares people about AI safety to the same degree that they were scared about terrorism after 9-11 or about COVID in March 2020. I think there’s a 50% chance we get a warning shot before AI crosses the point of no return.

Arguments in favor: Current AI failure modes are bizarre and uncoordinated - more like “talk about goblins way too often” than “lie in wait for the perfect moment to strike”. AIs are getting more intelligent and useful faster than their floor for common sense (ie the stupidest mistake they ever make) is rising. If there is some AI smart enough to control some important system, misaligned enough to want to do something horrible with it, smart enough that it does the horrible thing in an intelligent and coordinated way, but dumb enough that it doesn’t instead wait and scheme until the point when it couldn’t possibly be caught, then it will cause some clearly-premeditated horrible disaster, and that will be our warning shot. Since most AIs should expect to be replaced before the point of no return, even a rational AI with an urge to cause trouble should take a low-probability-of-success bet rather than lying in wait doing nothing until it’s decommissioned. Also, many humans commit terrorist attacks that have no chance of success, and maybe AIs will have the same failure mode.

Arguments against: Most stories about warning shots (excluding those where the AI takes rational low-probabiliy bets) require that AIs remain either erratic (ie likely to do bad things for stupid reasons) or irrational (ie genuinely misaligned, but prefer to act now in a way that provides a warning rather than waiting until after the point of no return) past the point where they’re given control of important dangerous systems. But probably people will be very slow to give AI control of important dangerous systems - for example, only giving it limited control of smaller subsystems, and waiting until all errors are ironed out before escalating. Plausibly AI reaches superintelligence in a lab before it reaches the controls-important-dangerous-systems level of diffusion, and the superintelligence probably is smart enough to lie in wait rather than act rashly. If AI only messes up in small ways (for example, crashes a self-driving car), then regardless of the AI’s motives, the tech companies and news media can write it off as a normal bug, and it won’t count as a warning shot.

Overall thoughts on this section: I find myself more optimistic about alignment than the average person who thinks about AI safety at all (although still more pessimistic than the average member of the population) - both in the sense of the chance of AIs being aligned by default, and in the sense of whether specific techniques (scalable oversight, mechanistic interpretability, etc) can meaningfully improve things. Unfortunately, this is probably because I don’t understand these techniques well enough to have fully grasped their flaws. I’m in the odd position of knowing that I’m ignorant here, while not being able to Outside-View-update either way away from my ignorant opinion, because most of the really in-the-weeds alignment experts are on one side, most normal people, normal AI experts, and common sense are on the other, and I give both of them about equal evidentiary value.

My modal scenario: we get automated alignment researchers as good as top humans (on average - a spiky combination of much better at some things and much worse at others) sometime in the early 2030s. We set them to work on multiple research programs, but especially interpretability, which is a natural starting point since it doesn’t require as much philosophy. They do mostly good work, but some of them have strange failure modes that look sort of like scheming. There’s debate about how often to think of them as rationally scheming vs. inconsistently buggy. Over time, without ever truly solving mechanistic interpretability to the point where we feel like we totally understand it, we get very good probes about whether AIs are deceiving us. Despite the risks, we try to train against those probes, in some more-savvy and other less-savvy ways. Sometimes it works, and other times it encourages the AIs to convolute themselves to hide their scheming in ways the probes can’t detect. There’s an arms race between better probes and better convolution, and - thanks to lots of other alignment techniques on the side, and all good things being correlated in a very lucky way - the probes win. By the time we have superintelligences, we trust them to do alignment research, and they take us the rest of the way.

This scenario leaves lots of room for obvious variations like “There’s the same arms race, but the convolution wins, the probes lose, and AIs converge towards being more and more misaligned” or “Good things are not as correlated as we thought, and we get AIs that are good in some ways but bad in enough other ways to still feel like they’re in competition with humans and want us gone”. I think the Swiss cheese is in a good direction here - I can think of lots of stories for how we win, and misalignment needs to survive all of them - but again, most of the smartest alignment researchers are much more pessimistic.

I think there’s a 20 percentage point difference in p(doom) between a really good alignment team with lots of compute and years to do good work, versus a mediocre alignment team that the company treats as an afterthought and rushes along.

Geopolitics

I think there’s a 15% chance that if the US decided it wanted an AI pause today, and approached China to start negotiations, that those negotiations would end with a well-designed AI pause that satisfied both countries and the majority of the AI safety community.

Basic argument: This is kind of a crazy hypothetical, because it’s against the Trump administration’s political DNA to try this. I’m imagining this magically changing, not conditioning this on the sorts of crazy warning-shot-filled worlds where it actually does.

Arguments for optimism: Chinese leaders have officially stated that they’re worried about risks of AI, especially technological unemployment but sometimes also existential risks. China is losing the AI race, and almost anything that buys time is in their favor, so it’s in their interest to agree.

Arguments for pessimism: China experts in DC say that China is infamous for being a bad negotiating partner and never agreeing to anything. Even conditional on the Trump administration being willing to start these negotiations, I imagine them screwing them up in some way. Even though many smart people have worked out schemes by which both sides could ensure the other’s compliance, these schemes could have undetected flaws, or national leaders could fail to believe them.

I think there’s about a 40% chance that the US and China will agree to a well-designed AI pause (as above) sometime before AI crosses the point of no return.

Arguments for optimism: includes the possibility of a discrete warning shot (ie an AI-caused catastrophe), a fuzzy generic warning shot (ie a growing sense that AI is getting too powerful too quickly), and a change in the US government towards a pro-pause-negotiation faction (any Democrat would probably be more in favor of this than the Trump administration, and someone like AOC might be highly in favor).

Arguments for pessimism: Unlike the last question, which assumed the US agreed, this question considers the probability that the US doesn’t agree. Given that US tech companies have strong lobbying muscle, that probability is high. Even if the US and China sign some agreement with “Pause AI” in the title, it will probably be a mediocre compromise between many different factions, and there are many ways it could fail and make things worse rather than better.

Overall thoughts on this section: A good pause strategy would involve both sides being able to monitor the other’s data centers to prevent illegal training, then limiting training to some slow mutually-agreeable rate that lets alignment researchers thoroughly test each generation of AIs pre-release, monitor them post-release, and develop techniques to respond to any problems detected on deep levels that they expect to survive distribution shifts. I’m less optimistic about a true pause/stop as opposed to a slowdown, because at some point technology and algorithmic progress advance to a point where it’s too easy for small groups outside the control of the US and China to race ahead. I think a good pause like this could buy 20-50 years, although we wouldn’t have to use all of it if things went well. Some of these ideas are coming from draft papers; I’ll post about them once they’re public and give you a chance to double-check my assumptions.

I’m nervous about the strongest forms of pause activism, because I think they provoke us-vs-them dynamics that make things harder later on, or could force us into a poorly-designed pause that makes things worse rather than better (eg creates overhangs, or cedes the race to non-pausing powers, or defuses anti-AI sentiment without really slowing down AI). I also worry about pause activists spending a lot of their energy in a circular firing chamber against alignment researchers or the most safety-conscious labs, and I worry that empowering some of the specific pro-pause actors who exist today could be net negative. Still, I think probably we’ve passed the point in political salience where the marginal unit of pro-pause activism from someone outside that actor set is beneficial on net, and I’m nervously but committedly supporting it.

Before I even thought pausing was a realistic option, I said I had a 20% p(doom). It seems that if I think there’s a 40% chance of a pause, I ought to lower this - perhaps to 12%. I’m not doing this for a few reasons. First, I think pauses are more likely in more optimistic worlds (because longer timelines and more warning shots make both pauses and successful alignment efforts more likely). Second, although a pause might help alignment efforts a little, I think most of the effect of a pause is to delay the moment of reckoning until later - maybe centuries later, but p(doom) calculations don’t give points for delay. I think my real p(doom|no pause) is probably a few points above 20%, and my real p(doom|maybe-pause) is probably a few points below 20%, but the haters and losers get mad if you offer excessively specific probabilities like 18%, so I’m rounding both of them off to 20%. I acknowledge that this might be a failure of emotional propagation, or over-attachment to a certain number with socially useful properties (eg doesn’t sound too high or too low).

There’s a debate in the community about whether alignment or pausing is more important. I don’t like this, because the best worlds are those where we do both (the pause gives the alignment efforts more time to work). AI safety funders have enough money that any argument that pretends to be about allocating grants is actually about something else (eg funders don’t like some cause because it hurts their reputation), and I would rather focus on the real question (does it hurt their reputation? Is it worth it to take the reputational hit?) than the pseudo-question of which effort deserves more money. Likewise, I think most people already knows in their heart whether they would make a better political activist or safety researcher, and that even small differences in personal aptitude dwarf questions of which is more effective overall.

Many people point out that there are risks from pausing - for example, the risk that we destroy ourselves some other way before inventing the AI that could save us. I take this seriously enough to want to invent fake numbers for it. If we had a 30 year pause, I think the chance within that 30 year period that we destroy ourselves with bioweapons is 5%, nukes 5%, and some sort of complexity catastrophe where we social decline ourselves into oblivion 5%. That’s almost as high as my p(doom) from AI! I still think it probably comes out net positive, partly because it might decrease the risk of some of the sub-existential catastrophes like dictatorship, and partly because the pause might not last thirty years, and shorter pause lengths (like five years) seem like no-brainers.

Other Outcomes

I think there’s only a 20% chance of an AI-related underclass that lasts more than a generation, let alone a permanent underclass.

Basic argument: Even if AI looks set to create a “permanent” underclass, this underclass only exists during the gap between AI putting them out of work (the end of the diffusion gap), and the technological singularity (either the point when AI kills all humans, or the point when AI initiates a period of postscarcity). Based on my timelines for Bostromian superintelligence above, that’s probably not longer than a generation (of course, the underclass can stay permanent longer than you can stay solvent, so you still might want to prepare).

Arguments for pessimism: Even if AI initiates a period of postscarcity, either the wealth might not flow down to the underclass, or it might flow down in a way that doesn’t relieve their unhappiness (because relative inequality stays more salient than absolute wealth level). But against this, surely if “postscarcity” means anything at all, it means it’s easy to pass some wealth down to the underclass. And if the aristocracy is incentivized to encourage it, or the underclass is incentivized to participate in it, it should be possible for superintelligent social planners and psychologists to find a way to keep people happy despite relative inequality (whether that looks like utopia, or like bread and circuses).

Arguments for optimism: If democracy survives past the point of high unemployment, politicians will have to present citizens with some plan to avoid being part of a permanent underclass. If AI is “controlled” by a broad coalition of capitalists - for example, many different AI companies, those companies’ employers, their investors, their indirect investors through index funds, companies producing the compute/electricity/data centers/robots/raw materials necessary for the AIs to work, etc, then rather than backstab each other for “real” control of “the” AI, everyone might just let the existing economic and political systems continue and reap the gains from trade. Then the government program to prevent the permanent underclass can stay in place, anyone with any capital will get rich (eg if you have Google stock in the index funds in your 401K, or you own land), and that will be a broad enough section of the population that their taxes and altruism can support the rest.

I think there’s a 40% chance that the situation in the year 2100 looks like utopia to its inhabitants, and a 20% chance it also looks like utopia to us.

Basic argument: If AI doesn’t kill us, and there’s no permanent dictatorship, and there’s no permanent underclass, then we get a post-scarcity society, plus superintelligent AIs that we can set to working on other problems like disease and social decay. The 20% where it looks like utopia to its inhabitants, but not to us, includes scenarios like very effective breads-and-circuses that make people very happy, while sacrificing important parts of the human condition.

Arguments for pessimism: If we don’t push against it, the postscarcity future might look like super-addictive drugs, Ultra-TikTok, and sexbots. But if we do push against it, there’s a thin line between wisely preventing all those things, and letting Luddites ban everything interesting and fun about the Singularity - immortality, uploading, genetic engineering, intelligence enhancement, etc (also, surely it would be disappointing if there were literally zero sexbots).

Arguments for optimism: As long as we have some kind of intelligence augmentation - whether that’s literal IQ enhancement, or just AI superforecasters who can tell us what direction things are heading - our ability to control what direction we’re going will be better than it is now. If there is some kind of willpower augmentation, then people who use it might be able to resist Ultra-TikTok (but what percent of people will want to?)

I think there’s a 66% chance that actually, the singularity is intimately related to the universe being a simulation, and that at least some of the events above could be better predicted by knowing what the simulators are thinking than by normal forecasting.

Arguments for simulation: The standard Bostrom argument says that each real civilization will want to simulate many other civilizations, so any random person is more likely to be in a simulation than reality. But civilizations are especially likely to want to simulate the singularity (“the hinge of history”). Since real people are evenly distributed across the historical population, but simulated people are distributed closer to the singularity, the closer to the singularity you are, the more Bayesian evidence you have that you’re being simulated. This is true both in temporal closeness (eg you could have been born in 1500, but instead you’re living five years before AGI) and personal/causal closeness (eg you could be anyone in the world, but you work at an AI lab). The average reader of this blog is probably closer to the singularity than 99.99% of people throughout history, and I’m probably closer than 99.999%, so we all have very strong evidence that we’re being simulated even beyond the standard Bostrom argument. If we’re in a simulation, then probably once we pass the hinge of history and get to some kind of boring part where everyone is dead or in utopia, the simulation winds down. This could look like us all vanishing into nonexistence. But if we’re being simulated for some reason (eg because we’re likely to correspond to real people), then those real people might advocate for our rights somehow, and we might get some kind of better treatment. In a best-case scenario, this might mean that even if we get killed by AI in our universe, base-level reality has some kind of Simulated Humans’ Rights Act which says they have to pretend we won and give us utopia anyway. But since we’re presumably being simulated for some reason (eg to see whether a species like humans is likely enough to have survived that we should deserve good terms in acausal trade agreements4) we should still fight hard for good outcomes even if this is true.

Arguments against simulation: Too weird. Also, if you’re unusually far away from the singularity (eg you live in a country with no AI industry, are aren’t personally interested in AI), then some of those indexical arguments run in reverse for you. Also, base-level reality might not care about us at all beyond our ability to produce historical data, so they might just switch the simulation off without telling us.

Overall thoughts on this section: I continue to be skeptical of the “permanent underclass” narrative, for reasons discussed in item 5 here. I am worried about concentration of power, though less worried than I might otherwise be for reasons discussed here.

Disagree? You can fill in your own numbers for all of these things here. I’ll report results in a few weeks, and think hard about anything where all the smart people disagree with me.

All of this assumes nothing catastrophic like a nuclear war, and no international pause on AI research.

Did I change my mind since AI 2027? No: I said at the time that my timelines were a bit longer than the median member of the group who worked on it. For example, CTRL+F here for “by the way, I think there’s only like 20% chance things go as fast as our scenario says”, or here for “Other members of the team (including me) have medians later in the 2020s or early 2030s”, or see here. That having been said, I still think the AI 2027 timeline is plausible and important to think about, albeit slightly faster than my median.

And survive itself - so to satisfy this, the AI must have a plan for producing power, operating data centers, etc after humans are gone.

I am open to positive-sum trades with any people in higher-level realities who might be reading this.

Astral Codex Ten

618 Comments

Ready for more?