Happy One Millionth Prediction, Metaculus
Metaculus celebrated its one millionth user forecast with a hackathon, a series of talks, and a party:
This was a helpful reminder that Metaculus is a real organization, not just a site I go to sometimes to check the probabilities of things. The company is run remotely; catching nine of them in a room together was a happy coincidence.
Although I think it still relies heavily on grants, Metaculus’ theoretical business model is to create forecasts on important topics for organizations that want them (“partners”) - so far including universities, tech companies, and charities. A typical example is this recent forecasting tournament on the spread of COVID in Virginia, run in partnership with the Virginia Department of Health and the University of Virginia Biocomplexity Institute (this year’s version here).
The main bottleneck is interest from policy-makers, which they’re trying to solve both through product improvement and public education. In December, Metaculus’ Director of Nuclear Risk, Peter Scoblic, published an article in Foreign Affairs magazine about forecasting’s “struggle for legitimacy” in the foreign policy world. It’s paywalled, but quoting liberally:
Organizational change is difficult under the best of circumstances and is close to impossible when powerful insiders actively resist it. National security experts with decades of experience and access to classified information see little reason for deferring to the upstart winners of forecasting tournaments, contests that allow the public to compete at putting realistic odds on future events. Perhaps they are concerned that as forecasters get better at geopolitical analysis, they will threaten the notion of expertise and the professional identities of those who supply it. But forecasting should be seen as a complement to expert analysis, not a substitute for it.
The same situation obtains among the corps of foreign-policy columnists, think tank fellows, and former government officials who wield more influence for the confidence of their convictions than for the precision of their predictions. There is little incentive for such analysts to ask when they have been wrong and why—questions that top forecasters must constantly confront if they are to maintain their place in the accuracy hierarchy. Instead, the “thought leader” ecosystem insulates the careers of people who would have washed out of any geopolitical forecasting tournament.
It concludes:
All this suggests that to make forecasting a resource that policymakers use, the quality of both supply and demand needs to improve. The former requires giving subject-matter experts a role in producing forecasts—in formulating questions (because they know which indicators are most germane) and in vetting the rationales that inform forecasts (because they can gut-check causal claims and fact-check evidence). The latter requires making the national security establishment more numerate or at least more open to quantitative appraisals of the future.
These are challenging tasks, but forecasting scholars are already testing methods for not only measuring the best forecasts but also judging the most persuasive rationales for those forecasts. For example: What story best conveys that there is a 10–15 percent chance of between one and three million people dying in the Ukraine war by the end of 2024? Where forecasters provide probability, subject-matter experts can provide plausibility, making well-calibrated quantitative future estimates more convincing and palatable to policymakers—and therefore making their decisions a little less wrong. And in national security, being a little less wrong can be a lot less dangerous.
These are the kinds of questions Metaculus-the-organization is thinking about, and the kinds of problems it’s trying to solve.
They’ve also got some exciting ideas for making their product more policy-relevant. For example, they’re working on causal modeling, where forecasters not only predict the chance of (eg) a Russian nuclear strike, but also all of the inputs into their decision. For example, there’s a 10% chance of a strike, which comes from a 15% chance if the war in Ukraine continues vs. a 5% chance if it doesn’t. And they think there’s a 50% chance the war will continue, which comes from a 60% chance if the US stops arms shipments and a 33% chance if it doesn’t - and so on. Policymakers can play around with the causal graph, investigate which factors make a strike more vs. less likely, and check how their preferred policy would affect things they care about.
For more on the intersection of forecasting and policy, see this EA Forum post. To learn more about Metaculus, follow them on Twitter or Facebook. And here’s to many millions more predictions!
Taking Stock
Prediction market users really want stocks.
“Stock” in this sense means an instrument that measures the status of a person, group, or idea. When their status goes up, the stock goes up. When their status goes down, the stock goes down. It feels like a natural way to bet on things like “I’m bearish on Elon Musk and think everyone else is overestimating him.”
It’s hard to turn this vague idea into a real financial instrument. You could try tying it to their Twitter follower count, or Google search trends, or net worth, but none of these exactly track “status”. If Musk commits murder in broad daylight, his search volume will go up, his Twitter follower count will stay about the same, his net worth might not be affected, but his status will have gone way down.
The current solution is to make no effort whatsoever to moor stocks to the real world and just hope they work out. This could work! It’s kind of like a Ponzi scheme or crypto token. Some big influencer endorses MoonCoin, and MoonCoin goes up, because MoonCoin has gained status, which means more people will want to buy it, because it’s even more likely that more people will want to buy it later. Crypto tokens keep a fig leaf of “and maybe in the cyberpunk future when all transactions everywhere have switched to crypto this will really pay off”, but over time that fig leaf became increasingly threadbare, and a fun low-stakes instrument like Manifold stocks might do fine without it.
But the 0% to 100% prediction scale is a bad match for stocks. If Elon started at 50% in 2000, then when Tesla made it big he surely should have doubled. And that brings him up to 100% and leaves nowhere for him to go. Also, people who bet on Elon Musk in 2000 might be miffed that their prescient choice only doubled their money. Probably the solution is some kind of cardinal number. But which one, and at what scale? Again, the lesson from crypto is that maybe it doesn’t matter. Just start at 10 or something or something and see where it ends up.
Manifold leadership isn’t totally resigned yet to having stocks be meaningless Ponzi schemes. If you have a better idea for how to run stocks, leave it in the comments here and they’ll probably see it.
CFTC vs. PredictIt Update
So far it’s not clear if this means indefinite normal operation, or if they’ll spend the extra time trying to wind existing markets down. The overall chance of them winning their lawsuit remains unchanged at around 25%.
PredictIt has gotten some sympathetic news coverage, including from the Washington Post. In the process, the Post tried to get some clarity on what terms of the no-action letter PredictIt violated, apparently without success:
I guess they’ll have to give some kind of explanation during the hearing, right?
Related: Richard Hanania has an article on How To Legalize Prediction Markets. The actual advice isn’t very surprising, and mostly boils down to “write letters to the government officials in charge of this”, but like other people I learned something new from the details:
In the United States, prediction markets are, with a few minor exceptions, against the law. If you don’t have a legal background, you might think that means that Congress at some point considered the issue, decided people shouldn’t be able to bet on real world events, and passed a law to that effect, which was then signed by the president. But this is not what happened.
As with most things, Congress has never directly considered the matter. Rather, prediction markets are illegal due to the discretion of a government agency called the Commodity Futures Trading Commission (CFTC). Why does it have this right? And on what basis has it made prediction markets illegal? […]
In 1936, Congress passed and FDR signed the Commodity Exchange Act. In 1974, Congress created the CFTC to enforce the original law, which has been amended on multiple occasions over the years. The CFTC has authority to regulate what are called “derivatives markets.” A derivatives contract derives its value from some kind of underlying asset or benchmark in the real world.
The thing to understand about derivatives is that the baseline is that they’re legal. That’s why you can “bet” on the price of oil through a futures contract. The CFTC wasn’t created to ban derivative markets, but to regulate them, though this can involve prohibiting certain kinds of markets altogether. Current law includes the following provision on event contracts, [banning]:
activity that is unlawful under any Federal or State law;
terrorism;
assassination;
war;
gaming; or
other similar activity determined by the Commission, by rule or regulation, to be contrary to the public interest
So the CFTC may ban certain prohibited categories. With this statutory authority, it has decided to take advantage of its power to the maximum extent possible and create a blanket ban on all markets that involve “terrorism, assassination, war, gaming, or an activity that is unlawful under any State or Federal law.” Prediction markets for elections are therefore banned because, according to the CFTC, they are a type of “gaming,” that is, gambling. To repeat, and summarize for those whose eyes gloss over when faced with legalese, the steps are
Congress says the CFTC can prohibit event markets that involve “gaming” if it’s in the public interest.
The CFTC says fine, we ban all gaming.
The CFTC says that prediction markets are a kind of gaming, and therefore the default is that they’re banned.
Read the full article if you want to learn more, or hear about how you should contact your representative to change this.
From The Department Of “Probably Not A Superforecaster”
One goal of forecasting technology is to incentivize good predictions and create accountability for bad ones. Meanwhile, here’s former Russian President Dmitri Medvedev:
I don’t think he’s being completely serious, but if it’s a joke then I don’t get it.
None Dare Call It . . . CONSPIRACY!
We’ve previously talked about prediction markets as solutions to misinformation and conspiracy theories. What happens if you skip all the intermediate deconfusion steps and just try to predict if misinformation is true? I guess you get this. It’s pretty limited, since it can only predict the chance the conspiracy will be discovered; a conspiracy theorist might agree that there’s only a 3% chance we will get proof of this in the next few years.
When I first looked into this a few weeks ago, a few conspiracy-related prediction markets gave pretty high predictions, because people weren’t sure if the market creators would resolve them honestly, which naturally pushes the price towards 50. I can see this going badly if someone who doesn’t know this failure mode posts one of them on social media as “proof” that prediction markets support their conspiracy theory. But right now the worst example I can find is still only 11%. Also a good sign: the lizardman prediction market is within 1 pp of the Lizardman Constant.
My 2022 Calibration
Most of my 2022 predictions got folded into the 2022 contest, but not all of them, so I still need to score the remainder.
Of my 50% predictions, 5 were right and 5 wrong, for a score of 50%
Of my 60% predictions, 17 were right and 11 wrong, for a score of 61%
Of my 70% predictions, 17 were right and 10 wrong, for a score of 63%
Of my 80% predictions, 26 were right and 9 wrong, for a score of 74%
Of my 90% predictions, 11 were right and 4 wrong, for a score of 73%
Of my 95% predictions, 10 were right and 0 wrong, for a score of 100%
Of my 99% predictions, 4 were right and 0 wrong, for a score of 100%
Some real overconfidence this year, especially at 90. Nothing like that last year, so it might be coincidence. I’ll try to average these all out sometime for a better read.
Scandal Markets Revisited
In November, just after the FTX collapse, we talked about scandal markets - markets that might predict whether some person or group was secretly doing something terrible. With such a market in place, people could either avoid associating with them (if it was high) or prove that they were blameless and couldn’t possibly have known (if it was low).
Since then, several things have happened that have made me less optimistic about these:
A (as far as I know) good person who is not involved in any scandals asked to have their scandal market taken down, because people were citing the existence of the market as evidence that they were suspected of wrongdoing.
Someone dug up a several-year-old long-since-debunked false rape allegation against someone in order to move their scandal market.
Someone did a market on whether X’s investigation of Y’s sex life would conclude they had done unethical things. I don’t know either of the people involved, but commenters seemed to think X hated Y, was “investigating” in bad faith, and planned to find dirt whether it was there or not. Although everyone betting on the market knew this, just showing the market to other people would unfairly tank Y’s reputation.
I opened a newspaper and remembered that most scandals are basically fake but can ruin the lives of the people involved.
The specific updates I made on this are:
Having a scandal market on someone incentivizes people to dig up every scandal they’ve ever been involved in or accused of (including false allegations). Outside Encyclopedia Dramatica and KiwiFarms, most people don’t have a convenient list of every embarrassing and reputation-lowering thing they’ve ever done in their lives in one place. Gathering such a list makes those people look worse than people who don’t have such a list about them (even if a good Bayesian would adjust for such a bias, most people aren’t good Bayesians) and gives free ammunition to their enemies. Doing this for the people we currently like and trust the most is asking for trouble.
The median scandal is extremely stupid, but can still ruin someone’s life. Things like “he wore blackface in a Halloween costume when he was 16 twenty years ago and didn’t know any better” or “he donated to that California proposition against gay marriage back when that was a live issue”. I don’t care about this, I don’t think it affects my opinion of someone, but having this information get out can absolutely ruin someone’s life or force them to resign from positions that they were doing well at. Scandal markets incentivize production of these incidents at least as well as real malfeasance, and there are probably more of them, and they’re easier to find.
Normies don’t understand prediction markets and will see the word “scandal” next to someone’s name, maybe with a nonzero percent after it, and freak out, even when that’s inappropriate.
Are there ways around this? One way is extremely limited markets, like “will X be found to have committed fraud?”
…but this obviously restricts what kinds of problems it can pick up. I could go on a murder spree and this would still resolve negative.
This market tries to steamroll through the problem. It says:
This doesn't count a scandal like the NYT article, since most people within Scott's communities were in favor of Scott. It must be something where a significant number of Scott's friends, readers, and/or professional aquaintances are angry at him or want to stop associating with him. It must also be public enough that I hear about it through public channels.
I agree that this is better than the vague framing, since it restricts the problem to scandals that we-the-abstract-set-of-people-making-and-using-the-market care about, but it doesn’t solve the KiwiFarms problem. You could solve it by banning comments, but half the benefit of these markets is incentivizing whistleblowers to come out.
For now, I think direct scandal markets are a bad idea, and more careful scandal markets like the one above should be started only with caution. I think there’s more of an argument for doing this with public figures and people you’re depending on (the way charities that took FTX funding depended on FTX); public figures are probably more used to having random allegations about them thrown around. Still, the border between public and nonpublic figure is fuzzy, and I would err on the side of caution.
This Month In The Markets
Useful input to our previous discussion on whether all crypto things are scams or scams are actually quite rare - although I think this is somewhere in the middle, a 10%-per-two-years failure rate for the most trustworthy crypto-institution is still pretty bad
Another useful contribution to discussions of whether crazy financial things a lot of people talk about will or won’t happen.
I think no. If he does really well, everyone will shrug and not care, because of course he would; if he does badly, people will make a big thing of it. Of course, the alpha move would be to join, bet all of his starting money on “yes” in this market before anyone else knows about it, collect his winnings, then never bet again.
Current level is 4%.
I think this is moving forward at a rate of one year per year.
Selection bias!
Shorts
1: The American Civics Exchange sells contracts on political futures. Does that make it a prediction market? Not exactly:
We don't operate an exchange model, so we don't maintain an order book and your orders aren't matched to other traders. Instead, we serve as the sole liquidity provider and counterparty. Every executed contract is individually negotiated between you and us through our platform and can be customized to meet your needs.
And they don’t give consistent, public prices for any of their contracts. In other words, they have every aspect of a prediction market except the predictions! I have to admit, I thought “some people might genuinely need to hedge political events” was kind of a fig leaf for prediction markets’ other good qualities, but this seems to take it very seriously. Related: their Twitter account.
2: How much money is in forecasting, where? Crypto vaporware is worth 100x as much as non-crypto beloved widely-used sites, but I think that’s probably either an artifact of the market cap measure, or obsolete.
3: “Squiggle is a is a special-purpose programming language for generating probability distributions, estimates of variables over time, and similar tasks, with reasonable transparency.”
Mantic Monday 1/30/2023