Probably No Superintelligent Forecaster Yet
FiveThirtyNine (ha ha) is a new forecasting AI that purports to be “superintelligent”, ie able to beat basically all human forecasters. In fact, its creators go further than that: they say it beats Metaculus, a site which aggregates the estimates of hundreds of forecasters to generate estimates more accurate than any of them. You can read the announcement here and play with the model itself here.
(kudos to the team for making the model publicly available, especially since these things usually have high inference costs)
The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways.
The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%.
Manifold is skeptical:
The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average.
But also, my attempts to play around with the bot haven’t been encouraging:
I asked it to predict the chance that Prospera would have a population of at least 1,000 in 2027. Like FutureSearch on the same question, it cited many interesting news articles on Prospera’s chances but failed to do the basic step of figuring out its current population and growth rate. It eventually concluded 35% chance, which is reasonable enough. But when asked whether Prospera would have a population of 100,000 in 2028, it also said 35% chance, which is absurd.
A Twitter user pointed out (and I confirmed) that upon being asked “What is the probability that Joe Biden is still President in October 2025?”, it goes through a lot of reasoning about his age and dementia and finally concludes 55% because he’s not that demented. I originally thought this might be due to the knowledge cutoff (it doesn’t know Biden dropped out in favor of Harris), but if I ask the AI about October 2029, then it says that Joe Biden has dropped out in favor of Harris (even though in that question it doesn’t matter). So now I think it’s more like ChatGPT’s tendency to round anything that sounds vaguely like the surgeon riddle off to the surgeon riddle - in the same way, FiveThirtyNine rounds off anything that sounds vaguely like the popular question “is Biden too old and demented to stay president?” into that question, even though there are much stronger non-dementia-related reasons he can’t be president next year.
The FutureSearch team wrote a LessWrong post generalizing these kinds of observations, Contra Papers Claiming Superhuman AI Forecasting. They examine four claims, including the one above, and find similar problems with all of them. Sometimes the teams involved missed potential data contamination (ie their LLM wasn’t forecasting, it just already knew the answers). Other times the LLM failed but - in the spirit of technologists everywhere - the researchers invented finicky definitions of “above human level” by which even mediocre AIs qualified.
They conclude:
Today's autonomous AI forecasting can be better than average, or even experienced, human forecasters…but it's very unlikely that any autonomous AI forecaster yet built is close to the accuracy of a top 2% Metaculus forecaster, or the crowd.
Still, FiveThirtyNine is a big advance in at least one way: as far as I know, it’s the first high-quality AI forecaster which is free to the general public. Try it out!
r/MarkMyWords
This is a subreddit for people who want to record bold predictions. There’s nothing formal - nobody gives probabilities, and some of them don’t even have end dates. It’s just people going out on a limb to say they’re sure something will happen.
…most of them are “mark my words, time will prove Democrats right about everything, and reveal Republicans to be disgusting criminal hypocrites”.
…so much so that it kind of fails as a potentially interesting institution and becomes just another monument to how sad the Internet’s gotten.
Still, it might be fun to keep going until you find an old post where the prediction has already “resolved”, and see what happens. Here are some of the highest-upvoted posts from at least a year ago (minus pop culture and dumb in-jokes):
MMW: Somebody very important will be killed by an AI in 2021.
MMW: For the next 20 years, news outlets will claim a new virus is the next COVID
…okay, that wasn’t fun or interesting either. Also, it’s really hard (there are a lot more new posts than old ones). But I bet it’ll be fun to try the same thing a year or so after the election.
Polymarket Is Rolling In Cash
We talk about a lot of topics here. AI forecasters. Brier scores. Fixing science. But the average person is in forecasting for one thing: betting on presidential elections.
Here’s Polymarket’s volume (in dollars bet) over time (source):
Some of this is no doubt due to the hard work of Shayne and his team improving the site. But let’s be honest. It’s mostly because people really want to bet money on Trump/Harris 2024. The presidential market has a total volume of $910 million, far above eg markets about the Superbowl ($50 million), the World Series ($5 million), and the bird flu epidemic ($141,000).
Even a 1% fee on all this trading would make Polymarket a lot of money. But they . . . don’t really seem to charge fees? According to Forbes (paywalled):
Polymarket doesn’t charge fees, and [CEO Shayne] Coplan remains elusive about how the platform will generate revenues, but hints that fees are coming. “We're focused on growing the marketplace right now and providing the best user experience,” he says. “We'll focus on monetization later.”
They’re rolling in money, it’s just not their money. Yet.
Still, it’s hard to overstate their dominance. Remember, their presidential election market has $910 million. For their competitor, PredictIt, the same number is $37 million. Kalshi doesn’t have election bets (more on this later) but their biggest markets look to be in the $2 - $5 million range.
Along with the cash, they’re collecting prestige and endorsements. Nate Silver recently joined their advisory board. And their Substack newsletter is lots of fun:
I don’t talk about Polymarket much because they’re not doing anything too far-out or experimental. They don’t have the strongest accuracy track record, and they don’t have the most diverse markets.
Still, they’ve carried out their fundamentals really well, with great UI, market making, and ability to navigate legal storms. From a business perspective, they’re the standout winners of the early 2020s bumper crop of prediction markets.
This Month In The Markets
1: You knew it was coming:
See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast.
2: Why did this go down so much in April 2024?
3:
I originally thought this was about Strawberry, but the timing is wrong: it’s a Google DeepMind AI that got just short of the gold threshold back in July. People seemed genuinely surprised by this!
4:
5: I hadn’t even heard of this theory before; you can learn more here:
6: Finally, prediction markets returning to their roots:
7:
Forecasting Links
1: Trouble in England as politicians are accused of betting on political topics. In July, some MPs bet on when an election would be held; during the election, one bet £8,000 that he would lose his seat (he did). It’s illegal for people with nonpublic information to bet on political topics, but so far nobody is formally accusing the people involved of having nonpublic information. And the sums involved (£100 for one of the most scandalous election bets) suggests these aren’t exactly grand schemes. I file this under “need to avoid appearance of impropriety” more than “criminal mastermind”.
2: Dean Ball has a sort of vague vision of LLMs betting on prediction markets at massive scale. I agree something like this is interesting and plausible; I agree that it’s hard to pin down exactly how it would work. One suggestion he makes is to have the bots shadow public intellectuals - for example, a bot “trained on” my writing would ask itself “how would Scott Alexander bet in this market?”, and if it made more money than a bot asking “how would Tyler Cowen bet in this market?”, then maybe you would trust me more than Tyler. This is cute but there are a lot of wrinkles to work out For example, I talk more about superforecasting and probability calibration than Tyler, my bot might simulate me by making good bets; if Tyler sometimes uses extreme or ideological language, his bot might make worse bets not because his ideas are worse, but because it “simulates” him as being an incautious better.
3: Kalshi vs. CFTC, round one million: after CFTC banned Kalshi from hosting political contracts last year, Kalshi appealed. Earlier this month, the judge sided with Kalshi, saying that the CFTC’s attempt to define elections as “gaming” so it can regulate them under anti-gaming laws is an illegal power grab. The judge claims this has no relevance to the CFTC’s broader anti-political-market push, but since the whole thing is based on the elections = gaming theory I think it has a lot of relevance indeed. The CFTC has since appealed, and Kalshi is blocked from hosting the contracts until the appeal goes through (it’s 49 days until the election; at this point even a pro-Kalshi ruling might be a Pyrrhic victory). Also, why is Kalshi trying to get Congress contracts up, but not a Presidency contract? More sympathetic test case?
Mantic Monday 9/16/24