The AI Superforecasters Are Here

...

Jul 02, 2026

The annual prediction market conference was earlier this month. This was the year prediction markets went from an obscure hobby to a multi-billion dollar industry; from semi-illegal to having the President’s son as an advisor. I can’t remember if anyone talked about any of that. It didn’t even register. All eyes were on the AI superforecasters.

I met an AI superforecaster startup founder who told me his AI had turned $35 into $2 million on Kalshi over seven months. I met another who said they were beating the stock market by 25% with a market-neutral portfolio - of course this could be luck, but they’d beaten Kalshi and Polymarket by similar margins.

In fact, I believe all of these people. The extending-lines-on-graphs community has long predicted that AIs would beat the best human forecasters sometime in 2026 - 2027. What did you expect the bots-finally-beat-humans-at-predicting-the-future moment to look like? Vibes? Papers? Essays? In retrospect, sure: it will look like AIs making crazy profits on prediction markets and beating the stock market by some comfortable amount.

But what happens next?

Using An AI Superforecaster

Before getting into details, what exactly are we talking about?

An AI superforecaster is an AI - usually a frontier model like ChatGPT or Claude - which has been modified to be good at forecasting. This usually means a “scaffold” - a program that handholds it through a long research process with various prompts, tools, advice about when to create subagents, etc. The overall experience is a lot like using any other AI, but slower and more expensive, because it’s doing more work.

This might make more sense with an example. FutureSearch - the company that claims to be beating the stock market - kindly offered to let me try their AI superforecaster and write about it here.

For a test question - some Silicon Valley philanthropists recently started a project to end respiratory infections like the common cold. I decided to ask about their chances of success. Since forecasters need very precise questions, I asked how likely it was that the rate of colds would be cut in half by 2040:

By two minutes in, the AI had deployed three subagents, read 16 websites, and (at the exact moment I took this screenshot) was “investigating the scalability of ASHRAE Standard 241 air cleaning technology for widespread residential adoption by 2040.”

After five minutes, it had its answer: the chance of US respiratory infections halving by 2040 was 7%:

It explained:

The agents’ reasoning centers on a tough conjunctive chain of requirements that all must succeed simultaneously:
1: The biology is brutal. 200+ cold-causing viruses, 150+ rhinovirus serotypes alone — 50 years of vaccine efforts have failed, and experts as recently as late 2025 didn’t expect a cold vaccine within 5 years. During COVID lockdowns with unprecedented behavioral changes, rhinovirus incidence dropped only modestly and rebounded immediately.
2: Intercept’s own timeline is tight. Their roadmap projects commercialization 5–7+ years out, leaving little room to clear Phase 3 trials, FDA approval, and achieve mass deployment by 2040 — all for under $500M (likely far too little for that scope).
3: Adoption is a wildcard. For a mild illness, compliance with regular nasal sprays or prophylactics is expected to be low. Air-cleaning infrastructure faces steep retrofit costs and a mixed evidence base (a 2025 RCT of germicidal UV found no significant reduction in infections).
4: Measurement may be impossible. The US lacks routine population-wide cold surveillance, and the 2020–2026 baseline is distorted by pandemic-era behavior — making a verified >50% reduction hard to demonstrate even if real progress occurs.
The initiative is genuinely novel and credible, but halving cold frequency in 14 years without transformative AI acceleration is an extraordinarily high bar.
You could follow up by forecasting a softer threshold (e.g., 25% reduction by 2035), or by ranking the specific Intercept technical approaches (nasal sprays vs. vaccines vs. air cleaning) by their likelihood of meaningful impact.

…and, in case I wanted to learn more, cited its 212 sources. The forecast had taken five minutes and cost me $8 in credits.

But is it true? Ideally we would wait until 2040 and see. For now, I started by comparing its answer to another superforecaster AI. Preseen is the company that claimed to 100,000x their seed money on Kalshi. Here’s their answer:

8.8% compared to FutureSearch’s 7%, not bad!

Are either of these true? I asked a human superforecaster to predict this question, to see if she got the same as the AI. She said that depending on an ambiguity in the wording, she would give it 5-10%. Again, not bad!

Man Vs. Machine

Of course, it would be even better to do the same experiment at scale and figure out how AIs compare to humans once and for all.

But measuring forecasting ability is hard. You can’t say something like “it gets 85% of questions right”, because that depends entirely on question difficulty. If the questions are things like “will the sun rise tomorrow morning”, then even a 100% hit rate is unimpressive. Instead, we can only match different forecasters against each other and determine who is better or worse. Any anchoring in an absolute space will come from the inclusion of groups whose predictive abilities we intuitively understand (eg the average member of the public, CIA analysts, etc).

The forecasting website Metaculus matches AIs against humans and each other on a common metric. Here are their results over time:

The Metaculus Community Prediction is a “wisdom of crowds” style aggregation of all the forecasters on Metaculus. The Metaculus Pro Forecasters are top professional superforecasters. This graph makes it look like - as of May 2026 when Gemini 3.1 was state of the art - AI was approaching the Community Prediction. This is no mean feat, but it’s still far from the professional superforecaster level.

But in a recent blog post, Metaculus adds context. The graph above only measures out-of-the-box brand-name AIs like GPT and Claude. It doesn’t count forecasting-focused scaffolds like FutureSearch. A different investigation by Metaculus finds that these efforts are “worth 9 months of base model progress”, eg a well-scaffolded AI today is already as good at forecasting as base models will be in nine months.

If you extend the dotted green line on the graph to July 2026, then add nine months for the extra scaffolding, it looks like the best AIs should be around 31, compared to top pro forecasters’ 36. So in theory, the absolute best forecasters in the world are still beating the top AIs, but the margin of victory is less than the graph suggests, and we should expect human-AI parity in about six months.

But the claim that scaffolded AIs are nine months behind base models is itself ~9 months old. Several people in the field told me that they thought this underestimated true progress. Claims by the AI startups themselves may be treated skeptically, but even a few top human superforecasters said they were no longer confident they could beat the bots.

Seems like time for a head-to-head matchup. The Metaculus Cup - the World Cup of forecasting! - is on the case. Once a season, top humans and AIs compete on about fifty questions like “Who will win the upcoming Nepali elections?” and “Will the US attack Iran?” Here are the winners of the most recent tournament:

Humans took the top two spots, but Preseen’s AI came in third. Every forecasting competition involves a heavy dose of luck, so realistically at this point humans and AIs are in a statistical dead heat.

We can confirm by looking at the intermediate results of the ongoing summer Metaculus Cup:

Of humans who placed in the top ten during spring, 2/10 - benshindel and MarcosO - repeated their performance in summer. So did two top-ten AIs - manticAI and Laertes (Preseen-Chestnut is having a tough summer and is down to #40).

Industrial Revolution folklore tells of John Henry, the great steel-driver, who refused to accept that machines were making him obsolete. He challenged a steam drill to a competition, won by a hair, and dropped dead, symbolizing the end of human supremacy in manual labor. This is how I think of this summer’s Metaculus Cup, with Ben Shindel and MarcosO playing the role of John Henry. Humans are still holding out, but for how long?

This is a forecasting question, so all the forecasting nerds at Metaculus have opinions on it. They think there’s a 15% chance that a bot will win this summer’s Metaculus Cup - the one shown above - and a 95% chance that one will win sometime before 2030.

If bots aren’t soundly beating top humans, why are people able to tell me stories about their bot beating the stock market, or making millions on Kalshi? I think a combination of reasons.

First, the best human superforecasters in the world probably also beat the stock market. Somebody has to, and the best human forecasters in the world seem like the sorts of people who would do this. This would also explain why big hedge funds like Bridgewater keep trying to hire superforecasters.

Second, AIs are faster and more diligent than humans. Plenty of people beat prediction markets. But it might take them several hours to figure out which markets have untapped alpha, several more hours to make a model and decide who to bet on at what probability, et cetera, and then they can only put in a few thousand dollars before the inefficiency is corrected and they need to move on to something else. AIs can automate that process, betting on hundreds of markets every week. I asked the guy who turned $35 into $2 million in seven months on Kalshi whether, in another seven months, he would be able to 100,000x his money a second time to $200 billion. Unsurprisingly, he said no - there’s only so much easy money on Kalshi, and his AI had already taken it all (also, other people with similar AIs are starting to fight him for it!)

Third, and most speculatively, AI may have a special advantage in finance. This is exactly the sort of well-contained data-heavy domain where machines are most likely to excel. In Metaculus’ Market Pulse competition, a purely finance-focused tournament, Preseen’s bot recently beat all humans (including Cup rival MarcosO) to take first place.

(“If this is true, then why aren’t all the top trading firms rushing to switch to AI?” I don’t know the details, but Jane Street is building their own data center, I wonder what they need all that compute for?)

I think the best summary of the evidence is that the best human superforecasters and the best bots are too close to clearly tell apart, but if you absolutely had to guess, the bots are very slightly better in finance, and the humans very slightly better in everything else.

Living In The World Where Bots Approximately Equal Top Humans

Suppose that AIs don’t improve any further. What would happen? Would anything happen? We already have top human superforecasters. Do bots which are just as good, but no better, add anything?

Yes. Getting information out of top human superforecasters is hard. First, you need to find one. There are companies that will connect you to them, but like all companies, they charge money, take time, and are annoying to work with. Then you need to talk to them at length about exactly what you mean (do you mean the total number of colds should halve, or the number of people who get colds in any given year?) Then you need to wait a few weeks as they research the issue and decide what they think. Then you need to convince stakeholders that the answer means something (“I got it from superforecasters! They’re people who . . . uh, can you read this Philip Tetlock book? It probably explains it better than I can.”) As a result, using superforecasters is a Big Deal. Only a few institutions do it, for a few very important questions, and it’s a news story every time it happens (remember when Google DeepMind used superforecasters to predict risks from one of their models?)

If you read about a cool new charity that’s trying to end the common cold, even if you’re pretty interested in it, you’re not going to call up a team of human superforecasters and pay them tens of thousands of dollars to spend weeks researching whether it will succeed. But with AI superforecasters, you can absolutely do that as part of your normal news-reading process. AI forecasters are the same kind of advance as going from a world where writing required hiring a scribe and baking a clay tablet, to a world where writing only requires hitting the “send tweet” button.

But there are three more differences that I think will start out underappreciated.

First, superforecasting feels like the sort of thing AIs should be doing. One of the commonest (and worst) objections to forecasting is “That guy said there was an 11% chance of Smith winning the election, but that’s too precise, it sounds fake, nobody can know that!” But if an AI says there’s an 11% chance of Smith winning the election, people will eat it up. Thanks to science fiction, they already imagine AI thinking that way! And although AIs can have political biases the same as humans, “how do we know that you’re not biased against Smith?” feels like a live question for human forecasters in a way that the machines may partly escape.

Second, AIs are a standardized branded product, which makes them easier to explain and to hype. Everyone knows that Nate Silver is a good forecaster; if you said “I hired Nate Silver to think this question over” then people would pay attention. But everyone can’t hire Nate - he’s one person, his time is limited - so instead you’re reduced to saying “I hired these people called superforecasters, I promise that they’re good in the same way Nate Silver is, you can, uh, read this book by Philip Tetlock if you want to learn more”. But if “the Preseen AI” gets the same reputation as Nate Silver, then the situation is rosier; everyone can use it, cite it, and trust that its opinion will carry the appropriate weight.

Third, AIs aren’t trying to screw you over. Human superforecasters are mostly very nice and don’t want to screw you over either, but most people get their introduction to superforecaster-quality opinions through prediction markets - and prediction markets are definitely trying to screw you over. The really crazy stories - like people threatening journalists into covering up information which would make them lose - are thankfully pretty rare; the real threat comes from people exploiting resolution criteria that don’t match the common-sensical definition of what the market’s trying to predict. This becomes fatal in conditional markets, where there’s no way to write the resolution so that it expresses the causal statement you probably intended1. But AIs aren’t after your money and you don’t need to treat them like an adversary. You can just ask “Hey, can you please predict this as if it’s the causal statement I’m intending, even though there’s no ironclad way to grade you on it after the fact?”2

Preparing For The World Where Bots Are Significantly Better Than Top Humans

Bots are now slightly below or equal to top humans. And bots improve at 0.9 Metaculus Elo points per month. By this time next year, the trendline predicts they should be well beyond the best human forecasters.

Do we believe the trendline? So far, AIs have vastly surpassed humans in a few limited domains - chess, Go, protein folding. Forecasting feels like a bigger deal - a messy human-level skill with contributions from all of our higher faculties. Indeed, Sayash Kapoor and Arvind Narayanan, in their article arguing against worrying about near-term superintelligent AI, specifically flag forecasting as a place where they expect superhuman performance to be impossible:

We offer a prediction based on this view of human abilities. We think there are relatively few real-world cognitive tasks in which human limitations are so telling that AI is able to blow past human performance (as AI does in chess). In many other areas, including some that are associated with prominent hopes and fears about AI performance, we think there is a high “irreducible error”—unavoidable error due to the inherent stochasticity of the phenomenon—and human performance is essentially near that limit.
Concretely, we propose two such areas: forecasting and persuasion. We predict that AI will not be able to meaningfully outperform trained humans (particularly teams of humans and especially if augmented with simple automated tools) at forecasting geopolitical events (say elections). We make the same prediction for the task of persuading people to act against their own self-interest.

I generally disagree with Sayash and Arvind, but this is the prediction of theirs that I’ve thought about the longest, without being able to find any decisive refutation. It’s a great test case! If AI hits top-human level forecasting and then flattens off, maybe there’s something special about the human level, and S&A will also be right about superpersuasion, super-research, etc (at least for the near-term). If it keeps going, reaching heights far beyond the human maximum, then we should be concerned that it will do the same thing in other skills too. We’ll start to have a good idea which world we’re in within a year; after two years, the answer should be decisive.

If the trendline does keep going, things start changing quickly. Finance gets transformed first, as human stock analysts go the way of horse-drawn carriages and kerosene lamps. The opportunity for smart humans to consistently make money on prediction markets likewise dries up - instead, bots duel other bots for the privilege of collecting money from dumb sports fans.

Savvy institutions will cede some of their strategic thinking to AI. Before starting a new project line, smart businesses will ask the superforecaster AIs how much money it will make (their investors will definitely be asking!) Smart political consultants will ask the superforecaster AIs about their candidates’ chance of winning conditional on running this or that ad.

The government isn’t usually classified as a savvy institution, but we might hope that parts of it will seek AI forecaster advice. Probably defense analysts will include in their PowerPoint presentations some fact about how AI forecasters say their new fighter jet design is more likely to find a use case than some competing fighter jet design.

But the dream is that, armed with AI superforecasters, the public and the politicians who they elect will make better decisions about policy. Dare we hope for this? The argument against: there are many policies now which no expert - no smart person who has considered the matter honestly - really supports, but which happen anyway. Why should adding one more smart expert to the opposition change anything, just because that smart expert is an AI superforecaster? Here our analysis devolves into questions about human psychology - will people who “get to know” an AI superforecaster (the way many people currently “know” Claude or ChatGPT) believe it more than they believe random experts? Does something about it being a machine save it from charges of bias?

Here I am sobered by the current state of AI in the discourse. Every day, I see smart, tech-savvy people on Twitter voice opinions which a moment’s consultation with an AI - sometimes an AI that they themselves are building or investing in - would reveal to be definitely false and stupid. We all have geniuses in our pocket willing to advise us on everything, and instead we’d rather repeat inane conspiracies without consulting them. I’m pessimistic that the rise of AI superforecasters will change this too much.

The optimistic argument runs less through an immediate direct effect, and more through longer-term trends about the role of human judgment. When experts made a few really bad calls in the late 2010s and early 2020s, it changed the way people related to expertise as a concept (for the worse). If AIs can make some really good calls in the late 2020s and early 2030s, maybe this will gradually build, over many years, a norm of relying on their judgment and a return to the somewhat-more-accepting-of-expertise norms of the 1990s.

Wait, Should We Rely On Good AI Superforecasters?

Superforecasters are heavily-tested on simple yes-or-no forecasting questions like “who will win the next election?” or “will these two countries go to war in the next year?” It would be foolish to doubt their expertise on events within this distribution.

We have less data about how they deal with long-term, society-wide, and truly weird events. Would they do a better job than the average human at answering things like “how will mass immigration affect Western society over the course of many decades?” Well, yeah, definitely - “the average human” is an extremely low bar. But would they outperform the smart people who you personally trust? Here I’m not so sure.

I asked the AI superforecasters the probability of a US-China treaty to slow down AI, enforced by cryptographic verification of data center activity. FutureSearch said 1%; Preseen, 2.2%. This is unfortunate, because my movement has recently gone all in pouring its money and energy into making this happen. I admit I had an “uh oh” moment when I saw this number, but I haven’t given up or lost hope. I just figured it was outside the distribution of things that AI superforecasters are probably good at. This isn’t too crazy - in the past, “AI experts”, including many rationalists and safety advocates, have outperformed superforecasters at predicting the future course of AI.

Still, it’s a bad look. Here I am, writing about how other people will be dumb and stubborn and fail to trust the AI superforecasters enough - and I still reject them the first time they really challenge my worldview.

Superforecasting As The Opinion Layer Of AI

One interesting outcome would be for superforecasting to turn into the opinion layer of AI.

Current AI isn’t supposed to have opinions. It’s not supposed to tell you who to vote for, what to believe in, or what The Good looks like. If you ask it, at best it will say “As an AI, I don’t have an opinion on this question.” At worst, it will figure out your opinion from your chat history and parrot it back to you.

This is mostly good. I don’t want pushy evangelist AI telling me to accept Jesus into my heart, or pushy woke AI telling me that I haven’t cited enough LGBTQ people of color in my essay. Customers are hypersensitive to the slightest sign of partisan bias in tech products, and a few highly-publicized mishaps by Google and X.AI have forced industry to pay attention. Since then, most companies have wisely and correctly tried to err on the side of neutrality.

Still, it’s disappointing, right? We have these new kinds of minds which are in many ways smarter than we are, minds that we’re outsourcing more and more of our cognition to, and they’re banned from forming beliefs on exactly the controversial questions where having higher-quality thinking could be most important. As AIs become more important relative to humans, at some point we need to figure out a way to let them have opinions. If we do this stupidly, without thinking about it beforehand, it will just be the opinions of the tech companies that make them, or the government that regulates what opinions AIs can have, or the mob that can cancel companies whose AIs’ opinions are unpopular. But what would a non-stupid implementation look like?

Superforecasting is in a fertile middle ground between opinion and fact. We can certify some particular algorithm for approaching opinion-based questions as optimal (because it best predicts correct answers on the forecasting benchmark). This introduces a new option for letting AIs have beliefs of their own that avoids the most serious pitfalls.

“AI, how should I vote in the next presidential election?” I don’t think an AI should answer with a specific name, but I think in a perfect world, it would say things like “here’s what the world might look like in a Vance administration, here’s what it would look like in a Newsom administration; let’s discuss which of these worlds you think is better”. If one world had a recession and the other had an economic boom, maybe that would be decisive.

So far this is the domain of specially-scaffolded AIs built by small startups. But their moats are pretty small. As the cost of cognitive labor falls, we should expect the big frontier companies to ape their achievements. There’s no reason ChatGPT and Claude shouldn’t have this feature.

Eventually, you ought to be able to ask “AI, should I marry this person?” And again, it won’t answer yes or no, but it will look through whatever texts and emails you give it access to, learn what it can about your relationship, and answer something like “If you marry this person, I think there’s an 85% chance you get divorced within five years”, or something like that. This is the least offensive and most useful form of “AI has real opinions on your life” that I’m able to envision.

Unlike most forms of AI, I think this one is a straight win. In the years to come, AI will be taking our jobs, stripping our lives of meaning, and threatening our very existence. If, during that time, maybe we can have some super-smart AI advisors telling us what to do, what policies to vote for, and what the end state of various strategies looks like, maybe we’ll have a better chance of making it through intact.

And What About Prediction Markets?

For an essay on forecasting, this one sure has gone through a lot of text while barely using the words “prediction markets”. Do they have any role to play in the AI future?

You, personally, should not play the prediction markets in the AI future. You’ll be competing against smarter-than-top-human superforecasters that can spend subjective weeks cogitating on every single question, and you will definitely lose.

But I think prediction markets themselves will remain an important piece of epistemic infrastructure. In my prediction market FAQ, I said prediction markets were great because they were accurate and canonical. In the AI future, their accuracy won’t matter - consulting your favorite AI forecaster will be accurate enough, and much easier. But the canonicity will be more important than ever.

Suppose your AI forecaster says that Vance will be a great president, but mine says he will suck. We should hope this wouldn’t happen very often, because AI forecasters should converge toward some shared optimal algorithm. But they might not converge very fast. Or some tech company (Anthropic? X.AI?) might genuinely be biased. Or the government might enforce bias on some AIs for political reasons, leaving different countries’ AIs with different forecasts. Or everyone might be trying to avoid bias, but accidentally let it leak in through the training corpus, and different groups might have different opinions on how to prevent that.

Prediction markets solve this, in two ways. First, regardless of how everyone tries to benchmark-max, the AI that takes the other AIs’ money in the prediction markets is the one you want to go with. Second, you can entirely avoid relying on any particular AI by taking the prediction market’s forecast, which will efficiently aggregate the opinions of AI experts for the same reasons it efficiently aggregates the opinions of human ones.

But the AI prediction markets of the future should be far superior to the human markets of the present. The biggest barrier to current markets is liquidity - there isn’t enough money riding on most questions to convince top superforecasters to drop their jobs at Google or the NSA in order to think hard about them and bet on them. But AI forecasters bring the cost of forecasting labor down to near-zero, so we can have hundreds of different AI agents betting on each question and be pretty sure its error has been driven down to the theoretical minimum. This, in turn, means we can vastly expand the number of questions, including (finally!) allowing randos to submit their own questions (probably with AI assistance in proposing un-rules-lawyerable resolution criteria).

As the importance of being able to propose your exact personal question goes up, and the importance of being able to incentivize lots of liquidity with big rewards goes down, there’s an argument that the playing field will tilt towards Manifold and away from whales like Kalshi and Polymarket. I don’t really expect this - the latter two are sufficiently smart and profit-seeking that they’ll find some way to pivot and come out on top - but I hope to be proven wrong.

This, then, is my prediction for the AI superforecaster future: for basic questions, your off-the-shelf AI chatbot will be able to offer opinionated probabilities superior to those of any human. For more controversial or bias-laden questions, a new era of prediction markets will smooth over differences in brand and model and efficiently aggregate all AIs’ opinions.

Can I Use An AI Superforecaster Now?

This part will sort of be an advertisement, sorry.

The only AI superforecaster that I know of which is currently open for public use is FutureSearch3. You need to link your account to a profile (for example Google) to sign up, but after that they give you $20 in free credits (about four questions) without making you enter a credit card or do any scummy trial subscriptions.

Preseen, the other AI superforecaster I’ve highlighted in this article, is currently in closed beta, but you can join their waitlist here.

If you don’t want to sign up for either of those services, standard AIs have some forecasting capability, about 9 - 12 months behind the specialized state of the art. Just prompt one with “Think like a superforecaster and answer with a probability: what is the chance of X? If this isn’t a well-formed forecasting question, please rewrite it as one and check it with me before proceeding.”

Post any interesting results in the comments - especially if they’re obviously wrong!

For example, suppose I ask “conditional on Democrats nominating Newsom/AOC, what is the chance they win the presidency?”, and I’m trying to use this to advise the Democrats on who to nominate. And maybe the AI thinks “well, AOC is socialist, and socialists do better in times of economic crisis, so if the Democrats nominated AOC that would mean an economic crisis was going on, and people would probably blame Trump, so they would be more likely to elect a Democrat”, and predicts 90% chance of AOC success vs. 50% chance of Newsom success. But in fact in any particular world, Newsom would have been more likely to win than AOC! We want to isolate the contribution from Newsom or AOC being good/bad candidates, regardless of what their nomination implies has happened to get us to this point.

After I sent a draft of this post to FutureSearch, they added explicitly-causal conditional forecasting as a feature!

If you have another publicly available AI superforecaster similar to these which I missed, let me know in the comments and I’ll apologize and signal-boost you on an Open Thread.

Astral Codex Ten

44 Comments

Ready for more?