Astral Codex Ten

https://www.lesswrong.com/posts/rT8AkEcBnfX8ZdSLs/2022-acx-predictions-market-prices

Looks like Sam grabbed the probabilities as of February 14, 2022.

Thinking about this now, I'm worried that the dataset for entrants doesn't record the date each entry was submitted. That could make a big difference if most human submissions were weeks before the Feb 14 hard cutoff.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Comment deleted

Jan 24, 2023Edited

Comment deleted

Expand full comment

There could be a hidden variable that has already fixed the outcome but that we don't know about. If e.g. Vladimir Putin rolled a d10 yesterday and resolved to end the war as soon as his diplomats can come up with a cease-fire agreement if it came up 1-4, then either all universes branching off from the present moment will result in a quick ceasefire or none of them will. The die roll happened yesterday, but it's too soon for Russia's diplomats to have put together the cease-fire proposal. But if our spies reported that Putin was planning to roll the die and then got chased out of the Kremlin before they could see the result, we'd be correct to assess the cease-fire as a 40% probability.

We can still save something like your formulation by replacing "the present moment" with "the ensemble of possible universes that would be indistinguishable from the present moment if examined as closely as we examined this one while making the prediction".

Expand full comment

Jan 24, 2023Edited

You've mostly answered your own question, I think - individually assigning a probability to an event is a statement of personal belief in the likelihood of the outcome. If that makes sense to you and your concern is on the resolution, then there are several reasons that might be convincing to you, which I'm pulling from Tetlock's research:

- For good forecasters, brier scores improved with the specificity of the prediction. Rounding to the nearest 10%, or 1%, caused loss in accuracy.

- "I'm confident that the outcome is more likely than not" can communicate a wide range of certainty to different people, and the same can be said of most words people use to convey confidence. Using numbers increases precision.

Most people using percent chances for outcomes aren't making a metaphysical statement on the nature of reality, just using precise language to convey confidence levels. A personal statement of probability on nontrivial outcomes is generally a combination of aleatory and epistemic uncertainty.

Expand full comment

valencia_o

I think this post answers this well - "Probability is in the Mind" -https://www.lesswrong.com/s/p3TndjYbdYaiWwm9x/p/f6ZLxEWaankRZ2Crv

Expand full comment

I think this entry gives a good overview of the philosophy behind it: https://plato.stanford.edu/entries/epistemology-bayesian/.

Expand full comment

https://plato.stanford.edu/entries/probability-interpret/

I think this article is actually more useful on this specific topic, since it draws the distinction between the interpretations rather than just talking about one:

Expand full comment

Agreed, read Kenny's article instead :)

Expand full comment

Also, a very simple way to think about it is that if someone is an ideal forecaster, among the events that they put 80% probability on, 8 out of 10 of them happen (and similarly for all other probabilities).

Expand full comment

That is only calibration (weather forecasting term, "reliability"), which is different than the accuracy / resolution.

Expand full comment

Could you explain? What CounterBlunder said makes sense to me.

Expand full comment

Replied to CounterBlunder

Expand full comment

They're super related, though, right? In my head, Brier score (or log loss score or whatever) is basically a combination of calibration + making stronger predictions (like closer to 0 or 1) while staying calibrated

Expand full comment

Jan 25, 2023Edited

Was thinking of decompositions, see this https://en.wikipedia.org/wiki/Brier_score . Also shouldn't have used word "accuracy", as "resolution" is related to strength and variance of predictions relative to the overall "uncertainty" of observations. Correct, the components are related because they must sum up to Brier score and are bounded. But resolution it is more variance-like tern than mere "strength" of predictions, in my eyes it is more noisy than something than simply increases with stronger predictions.

Let's imagine a small prediction contest of 5 questions that resolve [1, 1, 0, 0, 0]. This set of question has uncertainty of 2/5 x (1-2/5) = 0.24. Enter Alice and Bob.

Alice predicts f1 = [0.6,0.6, 0.2,0.2,0.2], Brier score is 0.088. Bob predicts f2 = [0.9,0.8, 0.1, 0.1, 0.1], Brier score 0.016; both Alice and Bob have same resolution 0.24, but Bob has better reliability (mean squared calibration), 0.016, which is better than Alice's, 0.088. Because of perfect resolution (arbitrary small example make it easy), their reliability equals Brier scores. But Alice's underconfident predictions result in exactly same resolution as Bob's.

Charlie has secret inside information on questions 1, 4 and 5 and is strategically ignorant on the rest. His predictions are [0.99, 0.5, 0.5, 0.01, 0.01]; Brier score 0.10006 ~~practically same as Alice~~ edit. not same While his resolution is worse (0.14) but his calibration is near perfect (0.00006) and can have two 50% predictions and still have a good BS.

(edit Made a decimal mistake above. However, consider Cecilia, [0.99, 0.99, 0.5, 0.01,0.01], BS =0.058, which is better than Alice.)

Mallory knows the answers but just wants to watch the world burn. His submits in a prediction [2/5, 2/5, 2/5, 2/5, 2/5]. Brier score is equal to the uncertainty, calibration is perfect zero, and resolution also zero, worst possible. (This is not the worst possible prediction: Mallory's second prediction is [0,0,1,1,1] which has Brier score and reliability 1, but resolution 0.24.)

Expand full comment

John Wittle

Jan 24, 2023Edited

welcome to bayesianism

here is the standard parable

you are given a coin, and told that it is biased, but not what direction it is biased in. it could be biased towards heads, or towards tails, and you have absolutely no evidence in either direction.

what probability do you assign to seeing heads when you flip the coin?

the bayesian, who believes that probability represents degrees of anticipation and certainty in the mind of a predictor, says "50%, duh. i have no data distinguishing these outcomes from each other, so they have equal probability."

the frequentist, who believes that probability represents the actual frequency at which events occur, says "anything BUT 50%! that's the one probability we know for sure it ISN'T!"

then you get to steal the frequentist's lunch money by offering to bet money on the outcome

Expand full comment

>the frequentist, who believes that probability represents the actual frequency at which events occur, says "anything BUT 50%! that's the one probability we know for sure it ISN'T!"

I'd be interested in reading a steelman of frequentism from a self-identified frequentist. (I don't think this is one.)

Expand full comment

Nine Dimensions

Agreed. If they were required to use P(Heads) as part of a larger equation, what would they consider the best value to substitute in for it, given imperfect knowledge? Surely their options are "50%" (IE the same as the Bayesian answer) or "Stop the maths, we can go no further here" (an odd position for a probability theorist).

Or to ask the question another way - replace the coin with a biased dice that favours one (unknown) number. The frequentist is told that on a roll of 1, the bomb to his left will explode. On a roll of 2 to 6, the bomb to his right will explode. The frequentist would ideally like to run in the opposite direction to avoid the bomb. Does he run to the left, the right, or does he stand in the middle saying smugly "there's no way of knowing"?

Expand full comment

Jacob Steel

Jan 24, 2023Edited

An obvious argument for frequentism is that pure bayesianism is totally useless, and almost everyone is a frequentist at base.

Bayesianism is the correct way to update pre-existing opinions in light of new evidence, but it doesn't let you form evidence-based opinions, and while Frequentism doesn't either it's the intuitive starting point must people use.

For any set of evidence and any non-degenerate (i.e. no divide-by-zero errors) probability distribution on world-states and futures that you like, I can construct a prior that gives me that distribution in light of that evidence. And there's no scientific or rational way of choosing between priors, only to say how we should have updated them.

So, ultimately, on some level almost everyone defaults to frequentism, and says that if so far 70 of 100 coins have come up heads the chance that the 101st does is probably about 0.7.

Expand full comment

Jan 24, 2023Edited

The frequentist says the probability of the coin coming up heads is an unknown quantity somewhere between 0 and 1 but not 0.5. Frequentist inference refuses say anything more, so not sure how the bayesian is supposed to get any money by betting.

The situation gets more interesting after one single coin flip.. Suppose the coin comes up heads. What is the probability that coin comes up heads *again*?

The frequentist still says the true probability is unknown, but his maximum likelihood parameter estimate is now 1. But if null hypothesis is "less than 0.5", his p-value is 0.5. That is, if the true probability is less than 0.5, this sort of result can be expected 50% of time. It the null hypothesis "greater than 0.5", p-value is 1. Neither null hypothesis can be rejected. And he will helpfully draw two-tailed 95% confidence interval from 0.025 to 1 around (/ up to) the point estimate 1.

The bayesian's answer depends on his prior. If it is uniform, he answers that his posterior mean is 2/3.

Still not sure how they bet.

Expand full comment

>The frequentist says the probability of the coin coming up heads is an unknown quantity somewhere between 0 and 1 but not 0.5.

I actually don't see why a frequentist would say this, and I honestly suspect John Wittle made it up. Eager to be corrected if I'm wrong on this.

Expand full comment

Because in the joke we have absolutely sure information it isn't 0.5. Further, the frequentist theory generally will refuses to make probability statements about parameters as they are not considered random variables but fixed unknown non-random quantities. Thus inferences about them are based on "likelihood function", whereas Bayesian generally will give a posterior distribution and interpret is as probability.

I admit I am also confused how a frequentist should think of coin flips. Parameter p in a binomial distribution is a non-random overall innate propensity of a coin to land heads or tails in a single experiment. Yet event X = "coin lands heads" is a random variable.

Joke falls flat as I see no reason for a frequentist to pick any other estimate before seeing data, because I think all common estimates will be undefined.

Also, now that I am thinking it more, I think a careful Bayesian should choose a prior that is zero at 0.5, too. In practice, it doesn't really matter because excluding a single point (measure zero) in a range [0,1] does not really contribute any change.

Expand full comment

This is pedantic, but if you are given a coin and told that is biased, you know you have been misinformed and the probability of heads *is* 50%. There are no biased coins. https://www.tandfonline.com/doi/abs/10.1198/000313002605

Expand full comment

fion

I like to think of a dart as basically being a biased coin

Expand full comment

None of the Above

Man are you ever going to be surprised when someone uses a double headed coin....

Expand full comment

Huh. Yes, of course. That follows from Galileo, doesn't it?

Expand full comment

Can you explain what bet you could offer the frequentist to steal their money? I'm having trouble coming up with one.

Expand full comment

John Wittle

oh, i mean, the reductio ad absurdum would be that they would want to accept odds at something other than 1:1 since they know that must not be the "true" odds

but what actually happens is that they say "oh what? if you were gonna bet on it, of course the break even point would be 1:1"

then you say "hmm, it's almost like the way you're actually using them, the odds represent the degree of uncertainty about the outcomes in your mind, not the actual real-world frequency of given outcomes."

then they say "woah now hold on, that's not quite true, there is some true probability of the biased coin landing heads, when I talk about the probability of seeing heads I'm actually talking about the probability of that true probability (or 'frequency') being different values" or something like that

then it devolves into a long and hopefully productive if dry conversation on the nature of anticipation and epistemology and prediction

but obviously that's not as clever and pithy as just pretending they're going to stick to frequentist purism when money's on the line

(maybe? i am not good at steelmanning frequentism tbh, i feel like I don't understand it, it seems to make predictions about repeated events into a different datatype from predictions about one-off events, in a way that makes them incomparable, when to me it's pretty obvious they're the same kind of thing)

Expand full comment

Jacob Steel

I think this is more about epistemic than aleatoric uncertainty.

I'm a die-hard Laplacian determinist - I cannot believe in a god who plays dice, even if that means accepting action at a distance and Copenhagen; when I say "I think there's a 40% chance of X happening" I mean something like "in 40% of the notional universes that fit the evidence I have, X is definitely going to happen, and in the other 60% it definitely isn't".

Expand full comment

Jan 24, 2023Edited

I think of it as if there existed an objective answer to the question of "given information X, what's the probability of such event happening?". And when we assign a probability to this, we are estimating this probability for X = "everything we experienced, know, research, etc". The way you got that info also has to be included in the info.

I don't claim this is philosophically correct if you think about it too much, but so far I feel fine enough with this idea.

So different people have different Xs so they give different probabilities to the same event, which can all be objectively correct because they are estimations of different things. A smart aggregate might use info from all of our Xs.

We can suck at estimating the thing, in which case we are not well calibrated. If our when we claim 80% probability, the events happen 40% of the time only, then we know we are not estimating well all the time.

We can also choose to use Xs other than "all information at our reach". For example, we can always say 50% if we choose X as empty, no info, and it will be a perfect estimate of a thing no one cares about.

Expand full comment

Jan 24, 2023Edited

I think it's more or less equivalent ot the size of the bet one is willing to put on the outcome, e.g. if I say something has a probability of 80% it probably means I'm willing to bet more on the outcome than if I think it has a probability of 20%. How much more can probably be precisely stated by a complex algorithm that is fiercely debated at enormously length by philosphers of epistemology, but it hardly matters, as we're just talking about whether ineffable human confidence should be expressed on a linear or log scale, and what the units should be[1].

I agree something inherently subjective has dick to do with probability the way we usually define it, which is an objective statement about what fraction of an ensemble has a given value of a property that can take on more than one value. But it has value, kind of the way the price of a stock has value, inasmuch as it allows people who want to take opposite sides of a bet to find each other and agree on the terms of the bet.

Maybe people use probability instead of a literal option price in order to normalize for complicating demographic factors, like how wealthy the estimator is. My saying "I think this has a 99% chance of coming true" might indicate something like "I would bet what for me is an uncomfortable but not suicidal amount of money $X on the outcome," and that statement translates well whether I am Jeff Bezos and $X = $25 billion or I'm a college student and $X = 1000.

-------------

[1] It's kind of like asking someone how he feels and having him say "I'm 25% happier than I was yesterday, but 50% happier than I was Monday." The quantification is a little weird, but it does give us a sense of relative amplitude, particularly if our respondent is not tip-tip with words and can't use a rich vocabulary and active imagination for metaphors to more clearly convey the shades of distinction in his moods.

Expand full comment

beleester

I thought that it means "If this person predicts a 40% probability for 100 different events, approximately 40 of them will happen and 60 will not." It's a measure of accuracy over many similar events. Similar to how a statement that "a coin has a 50% chance of landing heads" means that if you toss a coin 100 times, you'll get about 50 heads.

I guess you could say that this is still something behavioral ("this person knows enough about world events to be accurate as often as they say they are") but to me it seems the same as the coin. I don't see how "this coin has a 50% chance of landing heads" is a mathematical proposition but "based on the flips I've seen so far, I think the coin has a 50% chance of landing heads" is a behavioral one.

I guess you could take the first statement as implying some sort of absolute certainty, like "the author of this stats textbook has declared that the coin is fair, so it definitely has 50% probability." But any real-world statement of probability is implicitly saying "this probability is based on what we know so far."

Expand full comment

Steven Chicoine

"It seems like it has to say something behavioural about the person rather than something mathematical about the proposition but I can't put my finger on it"

I think that is correct.

From your posts it looks like you believe people can have internal mental states with different levels of confidence of things happening. Your question is then, how do people convert this confidence into a discrete percentage?

I think for most people, myself included, there is no exact algorithm for converting internal mental states of confidence into a specific percentage. When I end up saying there is a 40% chance something occurs, normally I think it's probably somewhere between 30%-50%, almost definitely between 20%-60%, but I think it is less likely to happen the more likely to happen, so let's go with 40%. This process is a skill more akin to chicken sexing https://www.phon.ucl.ac.uk/home/PUB/WPL/02papers/horsey.pdf, then it is to working out a math problem.

This then can feed into an iterative process of adjustments. If my initial prediction of something in Category A is 80%, I may go back and look at how all my other high confidence predictions for things like Category A did (e.g. a lot of the times I had a high confidence a stock was going to go up it did not) and adjust my predicted percentage based on historical data.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Korakys

Using nuclear weapons serves nobody's interests. Ukraine attacking into Russia would surely galvanise a lot more Russians to actually fight in this war and it would also cost Ukraine the moral advantage. They might do it (15%?), but only to trade it for Crimea.

Expand full comment

I'm less worried about nuclear war now than I was six months ago; Russia has accepted some major setbacks without going nuclear, and evacuating Kherson without at least an explicit threat of nuclear escalation is a good sign. So maybe now 10-12%, down from 20% at the peak.

But that's the total risk of anything that would count as a "nuclear war", including a single strike at a Ukrainian logistics depot followed by loud threats. That's the sort of thing that *might* plausibly benefit Putin and Russia, if things break well for them. A large-scale nuclear war between Russia and NATO, does not plausibly benefit Russia. Or China or India for that matter, and they do matter. It *probably* doesn't benefit Vladimir Putin, and if it does it's because his position in Moscow is not secure, which correlates with a high probability that the Russian military would stop taking his orders if he tried to go Full Nuclear.

So the risk of Global Thermonuclear War is probably now down to ~1% or so. Well, 1% above the baseline for a nothing-interesting-happening-right-now year.

Expand full comment

Jan 24, 2023Edited

Well, Putinism is generally self-destructive by all appearances, but some of this is likely still delusional optimism, and it doesn't seem to be short-term suicidal just yet. I'm sure that China and India have abundantly communicated to Putin that they will completely cut him off if he presses the button, and it's unlikely that the small triumphant war is worth it for him, and even that is by no means guaranteed.

He also can't truly lose, even if Ukraine recaptures all territory he can just say that it was all NATO/mercenaries, lick his wounds and relaunch the invasion in a few years. A major Ukrainian invasion beyond its borders is very unlikely, and won't be supported by the West. Ukraine getting accepted into NATO is a different story, but I very much doubt that there would be an appetite in the West for that either.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

The Poland-Soviet Union war right after WWII saw a lot of back and forth across large distances. Same with the Russian Civil War. Same with the Greece-Turkey war of the same era in more rugged terrain.

Expand full comment

beleester

WWI ended before allied troops entered Germany. You could argue that this was because Germany surrendered, since the Allies were clearly willing to push farther if they needed to, but I don't see why it wouldn't work just as well with a negotiated peace.

It's also worth noting that when Ukraine took back the area around Kharkiv, they did stop at the border rather than, say, counter-invading towards Belgorod.

Expand full comment

OffaSeptimus

In Russia the president doesn't have the unilateral power to launch nukes, he needs the Defence minister to order it as well, while I am sure Shoigu is loyal to Putin, but I am not sure he is suicidally loyal.

Expand full comment

As far as I can tell, the official hierarchy/checks and balances mean nothing in practice in Russia, if an underling refuses to follow an order he is simply and quietly replaced. Of course, there are a few people whose opinions Putin actually needs to consider, mostly in his FSB/KGB close circle, but I wouldn't bet on them to successfully coordinate and perform a coup in the last possible moment.

Expand full comment

Even if the order is given, would Russian missileers obey? Or would they pretend not to have received the order, raise questions about its authenticity, "discover" that something critical is inoperable, &c.?

Expand full comment

Jan 26, 2023Edited

They probably would. They aren't out there watching CNN tell them all about insane tyrant Putin's latest antics, the propaganda they're swimming in every day is instead about Mother Russia being beset by enemies on all sides, and that any day they might be called on to perform their grave but necessary duty. Those who aren't up to it generally don't end up with such jobs in the first place.

Expand full comment

There is the question of whether Putin even has a few years left after which to try again, or if his health is too deteriorated.

Expand full comment

Jan 24, 2023Edited

There is a separate issue of Russian nuclear arsenal potentially being in just as terrible condition as everything else. Claiming that Russia doesn't actually have working nukes at all seems to be overly optimistic. But it seems quite likely that there are much less of them than it's claimed to be, and huge uncertanty which are ponent and which are not.

This situation prevents Russia not only from using nukes but even from performing nuclear tests, because a failure would expose the problem making all the implicit nuclear blackmail much less credible.

Expand full comment

Comment deleted

Jan 24, 2023Edited

Comment deleted

Expand full comment

I think it's pretty unlikely that whole civilized world is going to deplete its resources faster than Russia, especially considering the state of Russian economy and industry. But thanks for your public prediction. On the countrary, I predict that in a few month we will see new results of Ukraine's offensive.

Missles is one thing, properly functioning nuclear warheads - the other.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

I counted this as a prediction:

>Now think about the attitude of in a few months the Ukrainians deplete even more of the capacity of Europe to defend itself and starts to lose territory. It will be panic again.

But now I see that you treat it just as one of the possible scenarios, not necessary one you put your money on.

> my main point was that they have relatively good missile systems which you seem to have ceded in moving the goalposts to a discussion about nuclear warheads

I don't think it can be categorised as goalpost moving if I didn't mention missles in the first place and was talking specifically about nukes.

Expand full comment

Maybe later

Did you respond to this comment from the substack email and/or the email link to the comment?

I've noticed that substack makes it quite difficult to see the entire context.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

It is plausible that the "whole civilized world" is going to deplete the particular resource that is "artillery ammunition" faster than Russia and its handful of allies. It's not *likely*, but it is possible and it's why I'm not at 90+% for Ukraine to win this war.

Everybody seriously underestimates the amount of artillery ammunition it takes to win a modern war, even if they've got the data from all the previous wars sitting right in front of them. "Civilized" nations deal with this by not "squandering" their taxpayer's money buying more than they think they're going to need, disposing of the old stuff as soon as it becomes dangerous or expensive to keep in storage, and not pre-emptively building megafactories that can crank out thousands of shells a day that would then sit mostly-idle until they rust out. Russians (and Ukrainians, who learned warfighting from the Russians), deal with this by saying "so what; more artillery is better, build it all and don't throw anything away".

Ukraine reportedly used more artillery ammunition in the first week of this war, than the British army had in total. I *think* that when we throw in the whole rest of the "civilized world", we'll be able to scrounge up enough to get Ukraine through to the end, but it's not a sure thing.

Expand full comment

SAMs & ICBMs are two very different creatures.

Their hypersonic missiles are either not new (Iskander) or bullshit (Zircon).

Expand full comment

StrangePolyhedrons

I see people say this a lot, but under the START treaty the United States is allowed to perform up to 18 on site inspections per year on Russian nuclear sites. There's also a lot of indirect means of verification allowed. The same parts of the treaty devoted to making sure Russia doesn't have MORE weapons than it claimed likely do a good job of ensuring that they don't have LESS weapons than they claimed. And while the only way to know for sure if they work is to fire them, the US inspection teams can look around and see how much money and effort is being put into these sites and put some measure of confidence in "how well maintained are these systems". Acting like the state of the Russian nuclear arsenal is a black box to the United States is simply not true.

Expand full comment

LHN

The problem with nuclear weapons, especially on the scale of the Russian arsenal, is that you don't need a lot of it to ruin millions of people's days.

If they couldn't launch 95% of their missiles (which seems wildly unlikely-- they have a lot of problems with their hardware, but it's not as if that fraction of their tanks just won't start, and their civilian rocketry still works reasonably reliably pace the current ISS Soyuz issue), that's still enough to expose 80 locations we're fond of to instant sunshine and start an unpredictable cascade of events (including the likely retaliation that serves as our deterrent from doing that, the response to that, etc.) from there.

Being degraded to the point that they're literally not a nuclear threat seems basically impossible.

Expand full comment

Well, is substantial part of your nuclear arsenal is mailfunctioning or even if you just can't be sufficiently confident that it's not, and you don't know which nukes are which, it works as an extra reason not to use your nuclear weapons.

Yes, in theory you still can incinerate a lot. But you definetely won't be able to do it without retaliation, and you can't properly prioritise your targets.

Expand full comment

LHN

If all their nukes are working, they still can't strike the US without retaliation. By the same token, they have enough nuclear weapons to deter us from putting troops on the ground in Ukraine and from directly attacking actual Russian territory either way.

Expand full comment

Jan 24, 2023Edited

Round 1 of the prediction contest says 7% chance of a nuclear weapon being used in war this year, although this question poses some problems (in the event of nuclear armageddon, the contest is unlikely to be scored).

That particular scenario strikes me as unlikely. Ukraine has nothing to gain and much to lose by invading Russia. Also, Ukraine has already pushed Russia back to the Kharkiv/Belgorod border without crossing it.

Expand full comment

Taking Belgorod doesn't have much incremental strategic value over Kharkiv, and carries geopolitical risk while much Ukrainian territory is still contested.

Sochi, Krasnodar, & Rostov-on-Don, however would provide a buffer for Crimea (once secured) and denying Russia access to the Black Sea (and simultaneously reducing Erdogan's leverage over NATO) might be sufficiently attractive for the West to continue material support to Ukraine.

Expand full comment

I'm no strategist, but a glance at a map suggests that the Kerch Strait may be more defensible than the arc through those cities.

Expand full comment

Perhaps, but having uncontested sea lanes at your back might make up the difference. But I was Air Force*, so take my opinions about surface combat with a salt mine.

*My old AFSC is now in Space Force, so an even further remove from real expertise than that.

Expand full comment

The Ukrainians have done well, but they're still quite a ways from being able to push Russia back to the 2014 borders, let alone invade. I think the failure to date to take Kreminna is telling.

It's also become clear through this winter's robot Battle of Britain that Russia can't shock 'n' awe the Ukrainians into surrender, so just one nuke hit isn't going to accomplish anything outside of its direct military significance.

Thirdly, the Russians have absorbed a large number of casualties to date, and some painful reverses, and there hasn't been any revolution or even significant unrest. Putin can still apparently count on half a million yokels from the East who'll serve as cannon fodder without major complaint.

Finally, what *new* holes in Russian maintenance and operational reliability would be revealed by a widespread use of nuclear forces? Russia's credibility as a military power has already suffered a serious reverse, only the existence of (untested) nuclear forces prevents Russia from being treated with as much contempt as the Iranians.

All four factors suggest to me the probability of Putin going nuclear in the near future is low, and lower than it was. I also think the success of the conventional weapon supply chain to Ukraine[1] has reduced the perception in the West that NATO or the US directly need to get further involved, so that reduces the probability of Western use, even in response to Russian first use.

--------------

[1] And not just this. I'm dubious the Ukrainians have been as well-informed and strategically smart as they seem to have been entirely on their own. I think they're getting a lot of behind the scenes...help. That "help" can be scaled up as necessary. There can be many more inexplicable explosions at Russian military bases, for example.

Expand full comment

"2014 borders"

To change the topic, I never know what terms like 2014 borders mean: Do they mean the borders on January 1, 2014 or on 12/31/2014. Same with all the references to Israel's 1967 borders. If people mean before the war of that year, they should say 2013 borders and if they mean after, they should say 2015 borders, or 1966 vs. 1968 borders for Israel.

Expand full comment

They mean the ante bellum borders.

Expand full comment

I meant the borders before the invasion, i.e. including Luhansk and Donetsk, plus Crimea.

Expand full comment

Mr. AC

Very very different cases. Loss of Luhansk and Donetks would be tolerable (and in certain circumstances, preferable) to Putin. I cannot imagine a loss of Crimea in any situation except a total collapse of the Russian state à la 1917 or 1991, or anything similarly dramatic.

Expand full comment

Why? A nuclear retaliation would swiftly lead to a total collapse with a higher probability. Do you expect Putin's fascist cheerleaders to depose him if he abandons Crimea? Seems unlikely.

Expand full comment

If Russia loses Crimea it is permanently neutered in a significant way. Russia will hold it if at all possible.

Expand full comment

I don't go as far as that, but I do find it odd that the first round prediction averages have 34% for Ukraine to hold Sevastopol and 44% to hold Luhansk. I'm not sure what the absolute numbers should be, but the second must be significantly more likely. If Ukraine holds Sevastopol at the end of the year, then Russia has either surrendered or suffered a total military collapse.

Expand full comment

Yes, I agree with you. There are a lot of ethnic Russians in Crimea, and it has a history belonging to pre-Soviet Russia going way back, even leaving aside the strategic value of Sevastapol. The Russian claim to Crimea is both more reasonable and probably more sincere than the claim to the rogue republics.

Which is why I don't think the Ukrainians will actually invade Crimea. I think the talk of it is (1) distraction from what they actually intend to do, which my vague guess is press aggressively forward in the Lysychansk area, maybe enfilade the Russians in the Bakhmut area and fuck the Wagner Group[1], and (2) a threat to keep Russian assets located in this region (and unable to reinforce northern and northeastern theaters), and finally maybe (3) a bargaining chip to reluctantly give away at the table in exchange for something more plausible, like Russia cutting loose the breakaway republics.

----------------

[1] Although...arguably Prigozhin is useful to them at present, because his rivalry with Putin keeps both men off balance and not cooperating ideally, and the tension between the Wagner Group and the Russian mlitary might also reduce the military effectiveness of either.

Expand full comment

Paul Brinkley

Couldn't the lack of progress be attributable to it being winter? It's still early for mud, I think (spring?), unless fall mud is maybe taking its toll? Plus, of course, it's cold.

Expand full comment

Oh absolutely. And the Ukrainians have shown that they have the discipline to bide their time, e.g. until they get improved armor and/or better air defense systems so they don't have to spend so much effort protecting civilians. They could easily be planning a major offensive in this area after the mud dries up (in April/May?), and indeed I hope they are.

I'm just saying the fact that they need to plan a major operation to get this to succeed, which is only a small part of what they need to do to win back the breakaway republics, is a sign of how long the road is ahead of them to their pre-2014 borders (excluding Crimea).

Expand full comment

The existence of mud means Ukraine would have been highly incentivized to launch a January offensive to beat the mud rather than waiting until April and giving Russia three extra months to prepare. As does the bit where Russian soldiers are literally freezing to death in the trenches while Ukrainians are better able to care for troops fighting on their home ground.

So the lack of a January offensive (so far...) counts as evidence that the Ukrainian army is stretched close to the limit.

Expand full comment

Kenny

Jan 29, 2023

I heard that the lack of a January offensive was due to the abnormally warm temperatures, not them being "stretched close to the limit".

Expand full comment

Jan 29, 2023

It took longer than expected for the ground to freeze, but it did freeze across most of eastern Ukraine by the first week of January. And frozen ground + not terribly cold + little snow is ideal for a winter offensive, if you're going to have one.

Expand full comment

Kenny

Jan 30, 2023

Peter Zeihan is the source from which I heard that the ground had not frozen and, IIRC, there wasn't, as-of whenever I heard it (the video itself was published 1/11) sufficiently cold temperatures (below freezing), nor sufficiently cold temperatures for long enough, that the Ukrainian tanks could be moved (or subsequently maneuvered).

You might be right that conditions are good enough now for such an offensive (and, therefore, the lack of one is evidence that the Ukrainians don't think they can or should try to pull one off), but I can imagine reasons why either conditions aren't good enough, e.g. the ground still isn't sufficiently frozen solid to support tanks or the delay in the ground freezing sufficiently means the offensive would have to be intolerably compressed into too-short of an operational window.

I'd guess it's also possible that a significant issue might be that the tanks (or other heavy vehicles) can't be safely moved into the relevant areas in the east where they could safely operate, i.e. the Russians could target them while they're traveling on the roads in areas where the ground isn't sufficiently frozen solid.

The relevant Peter Zeihan video: https://www.youtube.com/watch?v=o49oovm1fRs

Expand full comment

Jon Cutchins

I agree. This thing has already gone sooooo much farther than I expected. I can't find any 'hard limits' on the escalation. The Russians see this as an existential matter for them and may act to 'protect the homeland' before that act seems rational or likely to us.

Of course, the Kiev regime has already signaled their willingness to shell a nuclear power plant, so if they get their hands on some sort of nuke we should not be surprised if they use it.

Expand full comment

Randomstringofcharacters

Is the 2023 contest still open?

Expand full comment

Metacelsus

yes, it closes Feb. 1

Expand full comment

Evin Morris

Link to 2023 questions?

Expand full comment

valencia_o

https://astralcodexten.substack.com/p/stage-2-of-prediction-contest

Expand full comment

People with self-reported IQ above 150 doing better is pretty surprising to me. I would have expected them to do worse.

Expand full comment

Reply (5)

Comment deleted

Comment deleted

Expand full comment

Comment deleted

Comment deleted

Expand full comment

It's quite possible or even likely that the self-reported 150+ IQ cohort *do* have very high IQs, and that it's just a matter of some substantial portion of them reporting some high water mark from an online test.

Expand full comment

I'm not surprised that people with exceptionally high IQs are good at intellectual tasks. I'm surprised that people who self-report exceptionally high IQs actually have them.

Expand full comment

Cody Rushing

Really? It seems to me that the most people might exaggerate their IQ as being slightly higher than it actually is, but not greater-than-150 exaggerate. Do a lot of people who self-report high IQs, but don't have them, see lots of evidence pointing them to think so?

Expand full comment

Reply (4)

Comment deleted

Comment deleted

Expand full comment

Comment deleted

Jan 24, 2023Edited

Comment deleted

Expand full comment

Comment deleted

Comment deleted

Expand full comment

GreyEnlightenment

you're right. probably an IQ of 130 minimum needed to make any major contribution to an intellectual field. for pure math or theoretical physics, more like 140-150.

Expand full comment

Ch Hi

Depending on the IQ test, one of the things that a really high IQ is correlated with is an excellent memory. So one can remember small details that might turn out to be important in making a prediction. So that the correlation exists isn't surprising. What's surprising is that it's so weak. This may indicate that the future is chaotic, and that there aren't that many attractors. Or it may be the nature of the questions asked that picked out a subset of features that are chaotic and without strong attractors. (The Ukraine war is a good example becaise the participants should work to be unpredictable in many ways.)

Expand full comment

Ben

AFAIK you need either a super long test (impractical) or a combination of a "normal" test with shorter, but extremely difficult "genius" tests to get a reasonable comparison for extremes. The tests given in schools to identify "gifted" students are (in my experience in the US) often the only proper IQ tests anyone takes in their lives.

The SATs might be an apt comparison. A small percentage, (what like 1-5% i don't know I'm in my fuckin 30s now) get a perfect score. A small enough percentage that it's still impressive, but it literally doesn't let you differentiate between the *best* performing students. It's *only* useful for sorting the rest. And colleges want it that way.

Expand full comment

Jan 24, 2023Edited

My guess would have been that because an actual IQ of 150 is so rare, some combination of trolling (my IQ is 150, I'm 6'9", and I'm a billionaire) answers and delusional (rather than just slightly exaggerating) answers would have compensated for the real and slightly exaggerated answers. Surveys find small percentages of people self-reporting all kinds of crazy things.

But it looks like I guessed wrong.

Expand full comment

Just about anywhere other than among Scott's readers, self-reported IQs over 150 are likely to be bogus, but this site has a lot of smart readers and a lot of people who are quite careful about things like their IQs.

Expand full comment

GreyEnlightenment

Not all IQs are the same. 150 on a childhood ratio test is not like 150 on a normed test. I don;t think they go that high.

Expand full comment

Melvin

There's also a bunch of online IQ tests which flatter you with an exaggerated score. I think I got 150+ in one of those back in the day.

I would not attempt to take one these days. I don't think I've got dumber, but I have definitely got out of practice at test-taking.

Expand full comment

If you compare the SAT scores and the IQ scores you can see that there must be many instances of people exaggerating their IQ (or, quite possibly, reporting the results of some hinky online test even though the instructions say not to do that).

*In general*, anyone with a 140+ IQ should be able to crush the modern, post-1995 SAT. You can miss multiple questions and still score 1600, and the questions are not exceptionally difficult.

Of course some may be reporting pre-1995 scores, and there are any number of possible but unlikely reasons that someone could severely underperform on the SAT, but when I see an entry like '145 IQ, 1300 SAT' (it's in there, along with a fair number of other similar cases), I just assume the IQ score is inflated.

The implication that people are more likely to inflate their self-reported IQ than their SAT score is psychologically interesting, I guess.

But anyway I wouldn't use the IQ data for anything serious.

Expand full comment

Micah Zoltu

It has been a long time since I have taken the SAT, but IIRC it is largely knowledge based, and someone with a low IQ who spent a lot of time training can ace it while someone with a high IQ who never paid attention in school could fail it.

This is different from something like the LSATs (IIRC), which does a lot more testing of reasoning and require minimal accumulated knowledge.

Expand full comment

EAll

Jan 24, 2023Edited

They're reasonably well-correlated because people who have developed skills to be good at one measure usually have had the educational exposure to be good at the other. This is further amplified by the fact that in a survey like this, people who have scored high on some form of IQ test are highly likely to have been given superior educational resources as a main use of IQ testing is to qualify people for accelerated learning programs.

A person good at what formal IQ tests are measuring doesn't have to be good at a trigonometry test, but these aren't independent variables and you'll find that people in the former category probably ending up having a solid math education.

Expand full comment

flipshod

Jan 24, 2023Edited

I took all of these tests long ago and would probably seem like an outlier (or a liar). 140 and 1220. But my SAT was over 250 points higher than the next highest score in my class (my classmates who were in gifted class, IQ>120 all scored in the low 900s.).

Expand full comment

Yejun Y.

Based on my personal sample of n=1, if you miss a single question on the SAT you won't get a perfect score. But the point holds in general I think.

Expand full comment

orthonormalbasis

Depends. I know I missed a question on the SAT verbal (indelible/ineffable, in 2011 IIRC) and got a perfect score. But maybe you can't miss even one on math.

Expand full comment

This is ACT. The readership is... not representative of the wider population in more ways than one.

Expand full comment

Given the context of a discussion including college entrance exams, I think abbreviating the blog as ACX has a clear advantage over the pure initialism.

Expand full comment

Pete

There's a big problem with reporting IQ caused by the (in)ability to properly measure it. Is testing for IQ something that's done in a semi-standartised manner in USA? For example, I have no reasonable ability to get a "proper" IQ test, since that simply isn't done where I live, and the various options that are available (or were - it's probably been over 10 years since I last looked for that) online including the quite long serious-looking ones generally "top out" and report something like 140-150 if you can answer the questions quickly, since they aren't calibrated for people doing that; so while I'd presume that the super-high number given by such online quests this isn't a correct measurement due to how normal distribution works and IQ should work, that's the only IQ number I can get, and thus the only IQ number I can report.

Expand full comment

Ferien

If a person gets all predictions for year right, the mundane work of estimating one's own IQ is easy. They used same good predicting methods to obtain their IQ that they later used to ace the prediction contest.

Expand full comment

GreyEnlightenment

Not really that demanding. Predicting involves making inferences based on pieces of information. It's not on the same level of rigor or exactness as trying to understand some complicated math or physics concept.

Expand full comment

It seems like a fine example of something that is neither rigorous nor exact, and yet can still be extraordinarily demanding if being done at a high level.

Expand full comment

Which of the following statements (if any) are you most skeptical of?

1. People who are +3 SDs on ability-to-take-IQ-tests exist.

2. Ability-to-take-IQ-tests correlates with a lot of meaningful stuff, including forecasting ability.

3. People who are +3 SDs on ability-to-take-IQ-tests will often take an IQ test (e.g. due to being identified as gifted when a child) and learn of their high IQ.

4. Having learned of their high IQ, people will be willing to share it for a competition such as this.

Or alternatively, are you just claiming that the population of SSC readers who will falsely claim a high IQ outweighs the population with actually high IQs? It certainly seems true that as the IQ score in question gets higher and higher, the ratio of liars to actual geniuses gets higher and higher too.

I think a key issue here is that of self-selection. Someone who claims a high IQ unprompted is likely a bloviator. But a survey question which asks you for your IQ doesn't select for bloviators in the same way.

Expand full comment

I'm not surprised that people with 1 in 1000+ IQs exist, or that those people are really smart, or that those people will answer honestly. I'm surprised that, given how rare they are (1 in 1000+), the people who really are in that group, or are at least genuinely really smart, are not overwhelmed by trolls and internet geniuses on a self-report survey.

Expand full comment

Martin Blank

Jan 24, 2023Edited

I think this blog is a lot more technical and odd that you are thinking. My wife who went to a very good college got good grades, probably has good test scores, I would guess IQ 125 or something….

Would never spend leisure time reading something this dry.

A huge portion of even doctors, professors, and lawyers are not that exceptional intellectually. That isn’t even getting into cashiers and delivery men.

Being 1/1000 isn’t that high bar when the readership is only a couple tens of thousands of people out of billions.

Expand full comment

Shaked Koplewitz

TBH I'm guessing this is mostly noise, given that it only works at that arbitrary threshold

Expand full comment

Yeah, I honestly find that more believable than the alternative.

Expand full comment

Read the footnote. It's correlated all the way through in the raw scores.

Expand full comment

"only works" could just mean "is only statistically significant"?

Expand full comment

Matt

Possibly those (if any) who wildly exaggerated their IQs in the survey were less likely to participate in the prediction contest? I imagine the sort who make false self-aggrandizing claims (even if only to themselves) are not the sort who would be keen to seek evidence that could undermine their self-image.

Expand full comment

Having just run the numbers, the IQ>150 answers on Round 1 look kinda garbage (even after throwing out the person claiming to have an IQ of 212), in that they're frequently moving the global average in the opposite direction from both the superforecasters and the current Manifold predictions.

E.g., what is the probability Ukraine will hold Sevastapol? Superforecasters say 23%, Manifold says 15%, IQ>150 say 40%. I'm biased, because I myself said 15%, but 40% looks well out of line. Similarly, will any new country join NATO? Superforecasters and Manifold both say 71%, IQ>150 says 58%.

Expand full comment

SEE

Woohoo, I beat putting 50% on every single answer, Vox Future Perfect, and roughly one out of four other participants!

(Sigh. I knew from GJ Open that I'm not great at predictions, but I thought that long experience over there would have actually helped. Apparently not.)

Expand full comment

See the words of the winner Ryan

> last place was no worse than second

We are all second. That's amazing!

Expand full comment

Garald

Just out of curiosity, how do you score answers that are not "yes" or "no" but rather Bayesian-style percentages? One take is that, if someone who guessed that Gorg will do Blah with probability 10%, and Gorg does do Blah, that person should get one tenth of a point. Another take is that it shouldn't be linear, and someone who guessed 0% should be executed immediately.

Expand full comment

They said they used a log loss rule which means that someone who guessed 0% would be executed immediately (i.e. have a score of +infinity regardless of their other answers).

Expand full comment

magic9mushroom

>One take is that, if someone who guessed that Gorg will do Blah with probability 10%, and Gorg does do Blah, that person should get one tenth of a point.

This take doesn't work. The problem is that, if there's a 49% chance Gorg will do Blah, and I know that, my expected gain from predicting 49% is 0.49^2 + 0.51^2 = 0.5002, but my expected gain from predicting 0% is 0.49*0 + 0.51*1 = 0.51 > 0.5002. Hence, people will only predict 0 or 1.

Log-loss is one of the ways to get "perfect incentives" i.e. if the correct answer is 49% and I know that, I should predict 49%.

Expand full comment

Sniffnoy

The term to look up here is "proper scoring rule".

Expand full comment

The two most commonly used proper scoring rules are quadratic loss (i.e., if you say something has a 40% chance of happening, then you lose .36 points if it does and .16 points if it doesn't) and logarithmic score (i.e., your score is the log of the probability you give to the thing that happened - logs of numbers less than 1 are negative, so this is again interpreted as a penalty for being far from the truth).

A "proper scoring rule" is any scoring rule where your expected score is maximized if you report your true probabilities. If your true probability is p and your reported probability is x, then a linear scoring rule gives you an expected score of p(1-x)+(1-p)x = p-px+x-px = p-x(2p-1). Thus, if p>1/2, you maximize your expected score by reporting a probability of 1, and if p<1/2 you maximize your expected score by reporting a probability of 0. Under the quadratic rule, your expected loss is p(1-x)^2+(1-p)x^2 = p-2px+px^2+(1-p)x^2 = x^2-2px+p = (x-p)^2-(p^2-p). It's clear that this expected loss is minimized when x=p. (And since p^2-p is your expected loss when reporting your true probability, this has the useful feature that your expected penalty for reporting a probability other than your own is equal to the square of the difference between your true probability and reported probability.)

It's a little less straightforward to show that the logarithmic scoring rule is proper, but it is. There are a whole family of other proper scoring rules, and they correspond to different ways of thinking about the "directed urgency" of changing your probability estimate by a little bit, when you currently estimate something less than optimally. (The quadratic score gives directed urgency proportional to your probability; the logarithmic score gives directed urgency proportional to the reciprocal of your probability; others do it differently.)

This "directed urgency" turns out to be the same as the idea used in gradient descent training of neural networks. (In each training round they change their weights in a way proportional to how much it would have improved their score according to some particular scoring rule.)

Expand full comment

Well, one consequence of the prediction markets overperformance is it suggests that, at least if you only have a few minutes to research, you should look things up on prediction markets rather than trying to do your own research.

Expand full comment

I'm thinking of averaging over the superforecasters (and people with IQ>150?) in round 1, comparing that to a prediction market, then taking a view in cases where they disagree.

Expand full comment

Actually having just looked at it, I think the Manifold numbers look hard to beat.

On both Donald Trump questions they seem high (74% to tweet vs 48% superforecasters; 70% for criminal charges vs 46% superforecasters). These don't seem like questions where having time to research will help much, and it seems possible the (play money) markets are distorted because people have strong feelings about that particular person.

Manifold also seems bullish on all of S&P (74% to be up on year vs 58% superforecasters), SSE (72% to be up vs 59% superforecasters) and BTC (71% to be up vs 48% superforecasters). Those predictions run into the usual problem that any known factors tending to increase the valuations should already be priced in. Perhaps relatedly, Manifold has Tether depegging at 21%, vs 41% for superforecasters, but I don't feel able to express a view on this question.

Something odd is going on with question 38, which asks whether any FAANG or Musk company will accept crypto at the end of the year. AFAIK, this question already resolves to true, because Tesla accepts Dogecoin and Apple accepts USDC, and it doesn't seem very likely that they will both cease to do so by the end of the year, but we have superforecasters 40%, Manifold 45%, which leaves me wondering whether I'm misinformed or misunderstand the resolution criteria.

On the other 44 (or 43) questions, I don't see any reason to dissent from the Manifold answers.

Expand full comment

Actually all of S&P, SSE and BTC are up year to date, so it is reasonable *now* to predict they will be up on the year, even though it wasn't for round 1, so I now think Manifold is well calibrated on those questions.

Expand full comment

Don't stock indexes go up most years because of the general trends of economic growth and inflation?

Expand full comment

Yes, but this gets swamped by the noise. So your prediction of whether a stock index will be higher in a year's time than it is today should be only slightly higher than 50%.

Expand full comment

S&P has gone up in 68% of the years since 1928 if I counted right, which is closer to 74% than to 58%, and probably an earlier or later start point would give a higher % since this starts right before the Great Depression.

Expand full comment

Good point.

I'm also tending towards thinking Manifold well calibrated on the other questions I mentioned.

On question 21 (will DT tweet), there was a big increase in the probability on 18/19 January. I'm not sure why, but it looks like the difference between the Manifold and superforecaster predictions is due to timing, and I'm willing to accept that something happened on 18 January that made tweeting more likely.

On question 23 (will DT be indicted), I can't see any particular reason for the difference, but on reflection working out what is likely to happen with the various investigations would take some time. Plus, some people may have insider knowledge.

On question 38, there is some relevant discussion on Manifold, and I now think I had misunderstood the position. Tesla only accepts Dogecoin for a few products and not "one of its major services" and Apple facilitates payment in USDC through ApplePay, rather than itself taking payment in USDC.

So at the moment I don't see a way to beat copying the Manifold answers on the last possible day (assuming I'm optimising for expected score rather than probability of winning). I take some comfort that, by that standard, my first round predictions look not too bad.

Expand full comment

David Gurri

Jan 24, 2023Edited

Huh, I scored much higher than I expected (on par with the "median superforecaster" apparently).

Now I'm curious to see what specific answers I put. But the Google doc is confusing. Is there some way of finding this out? (I can see my "score" for each question, but I don't know what that means.)

Expand full comment

If you go to the "Predictions" tab, you'll see your predictions in the same column.

Expand full comment

David Gurri

Ah, didn't notice that. Thanks!

Expand full comment

I think it would help vividly illustrate the mystical-ness if you turned the log-loss numbers back into something intuitively interpretable. For the best forecasters, when they say there's an 80% chance something happens, how often does it happen? Can you draw the calibration graphs that you did when scoring your own predictions?

Expand full comment

This was a great idea, I meant to do it when I was making the applet, but I completely forgot. I've quickly added the feature -- you should be able to see your calibration chart now!

Expand full comment

Thank you!! :)

Expand full comment

Thank you!

Expand full comment

Thanks. I pretty much always think in 80-20 terms, so this would be helpful.

Expand full comment

Calibration is actually conceptually independent of good score. You can be perfectly calibrated by reporting 50% for every claim and its negation, or by reporting 1/3 for each of three exhaustive and incompatible outcomes on every topic, while someone who is very poorly calibrated can outperform this fairly easily. (For instance, someone who makes a 70-30 division on every proposition, and puts the high probability on the truth 90% of the time.)

I think that if you have a high scoring method, and then tune its calibration, you can generally improve its score. But calibration is more like an internal coherence check, while the score is about how well you can differentiate things that are more likely from things that are less likely.

Expand full comment

Jan 24, 2023Edited

I see your point. Do you know of some intuitive way, then, to correctly visualize/interpret the magnitude of a Brier or log-loss score? Like, something I could show to someone who knows very little math and they could still be like "wow that person really can predict the future"?

Expand full comment

Part of the problem is that the impressive-ness of Brier scores is really domain-specific. Some things are really hard to get much confidence away from 50% on (will the temp be above the historical mean on a given day), whereas some things are really easy to predict with 90%+ confidence on even if there's still uncertainty (will it rain in Sedona Arizona on July 8th next year). The brier scores across highly difficult predictions will be higher, making higher scores contextually more impressive.

Ultimately comparisons to benchmarks like "random guessing" and to other people who attempted predictions on the same question set is generally what people go for, but I agree there might be room for improvement here.

Expand full comment

Jan 24, 2023Edited

One could try to do what professionals call a "sensitivity analysis" i.e. fiddle with the numbers in the formula:

Predicting 50% on 61 questions results in summed logloss score of 61 x - log(0.5) = 42.3 (which btw is larger than 40.2 which means either the graph is wrong or the footnote about number of questions is wrong).

Suppose you were 60% correct on all 61 questions? (Predicted 60% for all events that happened and 40% for things that didn't happen.) Score: 61 x -log(0.6) = 31.16. (About the same as the fancy aggregate.)

Suppose you were 65% correct on everything? 61 x - log(0.65) = 26.28. (Close to prediction markets / winner.)

Suppose you were 50% on 60 questions and 99% on one question that happened? Score: 60 x -log(0.5) + 1 x -log(0.99) = 41.60.

Suppose you were 50% on 50 questions and 90% correct on 11 questions? 35.82.

What if you were 50% on 50 questions, 90 % correct on 10 questions and 90% wrong on question (ie 10 correct)? 38.01.

Finally, you can ask, could this happen by chance? What if you fill in random numbers between 0 to 100 %? On average, your log score on 61 questions would be about 61 (don't ask, logarithms are funny). Probability of getting score of 36 or better is only about 1%. Probability of getting score near 26 are less than one million.

Expand full comment

A very rough way to do it would be to assume (1) everyone is perfectly calibrated and (2) they put the same probability for every question, i.e. so one person predicted 80% or 20% probability for everything and got exactly 80% of them right, while another predicted 60% or 40% for everything and got exactly 60% of them right. That lets you convert a score into "percentage correct".

Expand full comment

Isha Yiras Hashem

I definitely don't understand it, but it's very cool!

Expand full comment

Jan 24, 2023Edited

The median score in the table of this post is 34.63, but I was given a 68th percentile for a score of 36.96.

How do I reconcile this? Am I missing something obvious? The results page says higher is better for percentiles so I don't think I'm misinterpreting the ordering of the percentiles.

Side note: Would really love a calibration plot on the personal results page as well!

Expand full comment

George H.

There's a long tail out to people who did really bad... I think. I got ~38 and 51 percentile.

Expand full comment

Ad Infinitum

The link said I got a 32.29 and 98th percentile, but that seems at odds with the bar chart.

Expand full comment

The bar chart in the post was made on the subset of predictions for which there was a market comparison (see the "subscore on questions with market prediction" row in the results spreadsheet).

Expand full comment

Ad Infinitum

Thx, got 28.7 on that one. Apparently I'm good at predicting everything but stock prices :-) What a fate.

Expand full comment

To be clear, "subscore on questions with market prediction" means "sum of your scores on questions which prediction markets also had probabilities for," which is almost every question (including things like "Starship reaches orbit," that have nothing to do with stock market prices).

Expand full comment

Ad Infinitum

It was meant to be humorous, man. Like "guy can pick things where money isn't involved". Though I have to face the possibility the mark was missed.

Expand full comment

Ahh. I wonder why they added the question scores instead of averaging them. The subscore being lower made it a bit confusing.

Expand full comment

Ah gotcha, makes sense!

Expand full comment

LGS

A few comments:

1. It is really easy to apply a linear transformation to the scores to make them more interpretable; for example, a common transformation sets "guessing 50% for everything" at a score of 0, and "giving the correct answer to everything with perfect confidence" at a score of k (where k is the number of questions). Note that this linear transformation also makes larger scores better than smaller ones, which is more intuitive.

Why not apply such a linear transformation? The current scores seem strictly worse.

2. Aside from the linear transformation, another degree of freedom is the choice of scoring rule. I understand that you want a proper scoring rule to incentivize people to reveal their true Bayesian estimates. But there are a LOT of proper scoring rules, and the log-score or Brier are not the only ones.

If you want to disincentivize the strategy of "be overconfident on purpose", you can do this (at least partially, not perfectly) by picking a proper scoring rule that harshly penalizes overconfidence -- even more harshly than the log score. There are several of those! One choice is

sqrt(1/p-1)

where p is the probability assigned to the correct outcome. (For comparison, the log score is log(1/p) and the Brier score is (1-p)^2, up to linear transformations.)

Note that as p->0 (that is, if you very confidently predict something wrong), the log score goes to infinity, but the score I proposed above goes to infinity *much faster*, at a rate of around sqrt(1/p) instead of log(1/p). So being overconfident is much more expensive! But it is still a proper scoring rule.

Expand full comment

warty dog

but wouldn't a harsher rule then incentivize falsely predicting closer to 50%? I wonder if there could be a solution, if you formalized it as a game, so that everyone being honest is an equilibrium.

Expand full comment

LGS

No, it's still a proper scoring rule, so if you are maximizing your expected score (minimizing your expected loss) you still predict your true Bayesian probability; and if you do not care about your expected score and only care about your chance of winning, you'll still report over-confident numbers. They'd just be less overconfident.

It's an interesting question, though, of whether there is a proper scoring rule that minimizes the amount of overconfidence or not. It might be that some scoring rules incentivize a lot of overconfidence for 60% predictions while others incentivize a lot of overconfidence for 90% predictions, with no single scoring rule that's uniformly optimal. I'm not sure.

Expand full comment

Carlos

I guess I don't find forecasting very interesting because it's all bounded by the fact you have to know which questions to ask to begin with, meaning you can't predict something you didn't think to ask.

Expand full comment

This seems like an argument that it isn't perfect. But there are a lot of questions we know how to ask that we care a lot about the answer to, like "What is the risk of nuclear war?" or "If our company launched such-and-such a product, would it be successful?"

If there was a literal magic oracle who always answered every question truthfully, this would be pretty great, even though we could still only get answers to the questions we knew to ask.

Expand full comment

Carlos

Well, it's more that it's a limited tool, and all the questions I have seen asked of it have been rather mundane. If there's any hope for the future, it's in blue ocean thinking. I strongly believe we need some utterly mindblowing thing no one saw coming to happen. Something like that vibe collapse article you linked to once, but in reverse, or just the establishment of a new vibe. I really don't think we are in a good vibe right now.

Expand full comment

I think if an idea like that comes up, we'll face lots of questions like "will it really work?" and "will implementing it be good or bad?" or "would implementation strategy A be better than implementation strategy B?" and all of those questions can be forecast.

Expand full comment

Carlos

Hmm, if it comes up, it's not like the predictions will be useful to whoever is driving it. Imagine trying to convince Jesus to stop preaching based on what some superforecasters predicted about the effect he would have. Or a more grounded example, I don't think there is any amount of forecasting that could have persuaded Linus Torvalds to ditch developing Linux. There is much to be said for the "never tell me the odds!", switch off the targeting computer attitude.

Expand full comment

I think that that's sometimes the case, and sometimes not.

I have a lot of pie-in-the-sky ideas that I don't bother to follow up on because they don't seem to be worth the risk of trying to implement them. "Neat idea, but it probably wouldn't work." If I had some way to better determine the actual probability of success (that doesn't involve me investing a lot of time and effort into it, which is just what I'm trying to avoid), that would be super-awesome.

Expand full comment

I mean, to be fair, one reason I don't follow up on them is that I'm more focused on my own "don't tell me the odds" projects, but side projects are a thing.

Expand full comment

Does the singularity get to count as something that no one saw coming? Since the whole point of the idea of the singularity is that it's impossible to know what will happen, that's why it's called the singularity

Expand full comment

How are you going to extrapolate from the accuracy of someone (or an algorithm) in predicting the popularity of Joe Biden one year ahead to accuracy in predicting whether a nuclear war will break out?

Expand full comment

I mean, one interesting thing to do on this front would be to look at (say) the superforecasters from 2022, and see whether they are

A) equally good at predicting different kinds of events (I suppose you would need to sort your events in to categories of some kind first)

B) have similarly wide or narrow ranges of opinion on different kinds of events

For the former you could look at average score per question in comparison to some baseline. Did the superforecasters make most of their gains agaiinst median players by being experts at predictiing climate and political outcomes, or is their advantage more uniform across the board?

For the latter you could look at variance in their scores compared to everyone else (after tossing the outliers who deliberately chose to use very high and low probabilities).

Expand full comment

A very good point. The most important new events are usually those that come out of the blue, wholly unexpected, black swans and such. Stuff everyone can see is possible enough to ask probability questions about usually turns out to have a pretty moderate effect.

I think of it like the difference between trying to make money in the market by predicting next year's price for XOM or some other dividend queen, versus being an angel investor. Everyone knows the price of BigCorp stocks will go up or down some amount next year, and if you can guess it right you can make a modest amount of money. But the way to get fabulously rich -- or lose everything -- is to bet on some start-up that turns out to be the next Google (or Toys.com). Something nobody sees coming, because they dismiss it as weird, inconsequential, wrong-headed.

Expand full comment

warty dog

typo: with a background in economist (should be economics)

Expand full comment

Dweomite

The bar chart says the _median_ superforecaster scored in the 84th percentile; the text 2 paragraphs below it says the _average_ superforecaster. Are both true, or is one of those a mistake?

Expand full comment

"Median" is correct. (Note that "average" is a bit ambiguous -- do you average the scores and then turn that into a percentile, or do you take the average percentile? I think if you do the former, then by coincidence you get right around the 84th percentile as well.)

Expand full comment

Wow what can I say...I mean I would like to thank Scott but I am not sure if it's appropriate :D... but still, a fantastic contest and I am flabbergasted that I am in the top 5...

Expand full comment

Well, still, thank you Scott, for giving me (and others) the opportunity to take part in a contest like this...(I hope that's ok, otherwise I will delete it :D)!

Expand full comment

I am still not sure where I can see how I answered each individual question?

Expand full comment

https://people.math.harvard.edu/~smarks/acx-prediction-contest/results.html

If you go to the "Predictions" tab of the released data and look in your column (which the linked applet will tell you), you'll see your predictions.

Expand full comment

OK, I see...but which row is mine then (in the document)?

Expand full comment

"The linked applet will tell you"

Expand full comment

Oh ok now I get it...:D.

Expand full comment

Lech Mazur

Would there be any interest in the following contest? I could offer a $1000 prize.

- Participants will submit one prediction for 2023 on any topic or event

- If there are many participants, the top 30 most interesting predictions, as chosen by me or by Scott if he wishes to publicize it, will be selected

- People will vote on whether they expect each of these predictions to come true

- The winner will be the person with the least likely correct prediction

Basically, I am not fond of the format of pre-selected predictions. Any potential improvements?

Expand full comment

I agree this would be interesting.

Expand full comment

Martin Blank

Sounds fun, sounds like it also incentivizes gambling on low likelihood things.

Expand full comment

Lech Mazur

Yeah, maybe incentivizes very low likelihood a bit much. I don't know if participants would be motivated much by the prizes, but something like two $500 prizes could be better.

Expand full comment

Well, the prediction *does* have to be correct!

I would say it incentivizes counterintuitive or 'unpopular' predictions... ideally you'd want something that isn't actually extremely unlikely, but that people will uniformly vote 'no' on.

I think you might need to refine the voting mechanism a bit, since otherwise there could easily be a many way tie for 'least likely' in the form of 100% of people voting that the prediction(s) would not come to pass. Maybe have people choose their top five least likely. Ideally you might like to have voters rank order all 30 of them in order of likelihood, but this might be too much work, and result in limited participation.

Alternatively you could have voters choose one 'yes' and some number of 'nos', and determine 'least likely' on that basis. This would reduce problems with people focusing or voting disproportionately on sexy or interesting predictions.

Expand full comment

Lech Mazur

I was thinking of also letting people vote on how "interesting" each prediction is.

But it looks like Scott might be too busy to handle it for this year, so unfortunately this idea might have to be put on hold since it will be harder to get enough votes otherwise.

Expand full comment

Sounds interesting...

Expand full comment

Micah Zoltu

Has anyone ever looked at super-antiforcasters (people who reliably forecast worse than random betting) in aggregate like they do super-forcasters? I'm curious how effective of a strategy it would be to bet strictly *against* the most reliably terrible forecasters.

Naively, one would expect that the worst forecasters would aggregate to be equivalent of "bet 50% on everything", but perhaps in reality the worst forecasters actually have a strategy that is just upside down, like they apply a NOT to what otherwise would have been a correct result or something.

Expand full comment

Dan

Everyone* who made predictions on at least half the questions had a positive correlation between their forecasts and whether the event happened. So they all at least met the bar of "events that you say are more likely to happen do happen more often than events which you say are less likely to happen". (* excluding the one person who answered 50% for everything, for whom the correlation is not defined)

You can score worse than the "bet 50% on everything" approach by making extreme forecasts. e.g., If you say 99% for two questions, and one of the two happens, then your score is much worse than someone who said 50% for both questions.

Expand full comment

Yes. I expect that the people who scored badly on this were not consistently wrong, but rather consistently overconfident.

The spreadsheet of blind predictions includes a number of people who entered all 1 or 99 values in I guess the mistaken belief that this increases the odds of a first place finish. I would expect that the odds that one of them ends up with the highest (worst) score are pretty good, though.

Expand full comment

Michael Watts

> The spreadsheet of blind predictions includes a number of people who entered all 1 or 99 values in I guess the mistaken belief that this increases the odds of a first place finish.

That is not a mistaken belief. As the number of entrants increases, the odds that the first place finisher is drawn from that group go to 1.

Expand full comment

Vitor

Exactly. For the same reason, in rock-paper-scissors competitions the winner is usually not a randomist.

Expand full comment

Technically that's true. However, with current parameters (number of questions, actual skill at prediction, odds of events happening), the number of players would have to be over 10^12 or so (rough estimate) for the 'only 1 and 99' strategy to even have a reasonable (1/P where P is the niumber of players) chance, and the strategy doesn't become fully dominant until you go up another order of magnitude or two.

With the size of the current player pool or indeed any foreseeable future pool, it is not a winning approach.

If all of the predictions were 50/50 coin flips, a reasonable variance-increasing strategy would be to predict log 2 P events (flips) at 1 or 99 instead of 50, though the equilibrium includes prediction of some slightly smaller number.

In the actual contest it is more murky because there is actual prediction ability in the mix and not all of the events are 50/50, but in general the amount of accuracy (log probability) you want to trade for variance is proportional to the log of the number of players, and with ~50 questions and a few thousand players, trading all of it is not a winning play.

Expand full comment

Michael Watts

Jan 26, 2023Edited

Well, among the top finishers we already see one who explicitly noted that he made an informed decision about how much accuracy to sacrifice for variance, while simultaneously mocking the idea that the correct answer to that question could conceivably be zero. You and he are right that when you have a chance to win, the answer is not "all of it", but the mockery was deserved.

You are modeling a player pool that all have what are, in their own opinion, justifiable, reasonably accurate estimates of the probabilities they're being asked about. It is certainly worthwhile to ask how much accuracy those players should obscure when entering a prediction contest. But many entrants to actual prediction contests cannot be described that way. For our hypothetical clueless entrants, it is not actually obvious that, even among the pool of 500 people, providing their honest best-guess probabilities gives them a better chance of taking first than rounding to the nearest integer does.

Before you can exaggerate your best guess in a controlled, reasonable way, you need to have a best guess that is roughly on target.

Expand full comment

Bwhilders

Being reliably wrong would seem to be equally as difficult as being reliably correct.

Expand full comment

Jim Cramer? /s

Expand full comment

Rob L'Heureux

I am glad I submitted for 2023 and regret not submitting for 2022. If all my predictions for the year are worse than chance, I am going to have some hard conversations with myself. I think at that point you have to employ the famous Costanza strategy of just doing the exact opposite of what you think is a good idea and figure out why later.

Expand full comment

HumbleRando

https://old.reddit.com/r/TheMotte/comments/s2gk0v/will_nato_expansionism_lead_to_a_war_between_the/?sort=controversial

Well, I predicted the Ukraine invasion in advance. Here's proof.

Sadly, I was banned from the Motte because I don't show enough respect for other people's feelings. I guess making sure idiots feel validated is more important than being able to predict and stop a potential war, at least in the eyes of Reddit mods

Expand full comment

Reply (5)

1 other

How would you have stopped the war?

Expand full comment

Comment removed

Jan 24, 2023Edited

Comment removed

Expand full comment

1 other

Too bad that getting banned from a subreddit foiled your political ambitions. At least you've got your many millions from shorting cruise operators and airlines back in January '20.

Expand full comment

Soothsayer

Sounds like a good call by the mods, who‘re supposed to foster community

Expand full comment

Comment removed

Jan 24, 2023Edited

Comment removed

Expand full comment

Paul Brinkley

Community is also worthless if it ceases to exist because the people who think they're smart run everyone off by being rude.

Not being stupid isn't enough. Believing you're not stupid is *definitely* not enough, because it's something stupid people could also believe. If you're the only one who believes you're not stupid, and everyone else doesn't know you from Adam, then your posts are going to be functionally indistinguishable from those of a stupid person who believes he's not. They can't read your mind; all they'll see is "rude person on the Internet", and rightly move on. There are plenty of people who are both not stupid and also polite enough to convince everyone other than themselves of that.

Expand full comment

https://questioner.substack.com/p/how-to-make-enemies-and-influence

HumbleRando

Jan 24, 2023Edited

I agree, intelligence is something that should be provable through measurable objective metrics. Right now, we don't do that - instead we use "the expert consensus" to determine who is smart and who isn't. The trouble is that most of the experts are themselves narcissistic retards who value their own reputation over truth-seeking. Therefore the experts won't acknowledge the validity of any other paradigm if that paradigm disagrees with the theories that the experts have staked their credibility on. This is why "science advances one funeral at a time" - because the scientists themselves are typically too stubborn to admit that they are wrong unless you gently stroke their egos while correcting their misconceptions.

This is also why the best methodology that we currently have for advancing science is by accelerating those funerals. For example, I believe that most modern sociologists are snake-oil salesmen and their "science" is so politicized that it has become worthless (hence the reason we have a replication crisis). Obviously the "expert" sociologists wouldn't agree because they would lose a lot of credibility and reputation by admitting that I'm right. This is why I worked so hard to get Trump elected and spread conspiracy theories that delegitimized science itself - so that I can put people who agree with me into office and eventually get the sociology "experts" killed for refusing to bend the knee and acknowledge that I'm right and they're wrong. If we used objective measurable criteria to determine the intelligence and accuracy of a scientific theory rather than just "the expert consensus" then I wouldn't need to do this, but in the absence of such superior scientific best practices, we are currently forced to use "the extermination of our scientific rivals" as a temporary substitute for more objective metrics.

It's an excellent point, by the way - thank you for bringing it up. I write more about this topic here if you're interested.

Expand full comment

Pshaw. You are part of a social species that has constructed an intricately interconnected highly specialized economy, and community cohesion matters more than anything in keeping that humming along efficiently. If you don't think so, collect your superintelligence and move to the backwoods and do without the community -- build your own dwelling with your own two hands, hunt your own food with sticks and rocks you can find, or such tools as you can devise from them, clothe yourself, and deal with your occasional infections, traumatic injuries, and eventual chronic disease by making use of medicinal herbs or whatever else your superintelligence can dream up.

If your Swiss Family Robinson of One ends up clearly better than what the community of boring IQ 100s has built in Seattle or Toronto -- why, your point is made. No one could argue with such success. We would all become disciples and pilgrimage to your castle to drink of your wisdom.

But if (as alas I in my curmudgeonly skepticism suspect) you end up living like a squalid Neanderthal, spending every waking hour scrounging for grubs or chipping scrapers out of flint, and dying of a stubborn infection or treatable skin cancer in your early 50s, then you will appreciate that in a species as social as ours even extreme intelligence without community is of almost no practical value.

Now, I would personally say that a truly intelligent person already understands this, understands that his intelligence, if it significantly exceeds the mean, can only really be given wings sufficient to realize its ambitions by recruiting and retaining the loyalty and cooperation of a large number and variety of less intelligence (and maybe sensitive) people. And therefore the really smart person realizes the necessity for pro-social behavior, if his intelligence is ever to achieve anything more interesting and satisfying than self-congratulatory epistles or sterile recriminations posted on the Internet.

Expand full comment

Jan 24, 2023Edited

"I guess making sure idiots feel validated"

With such dove-like cooing discourse, how could anyone claim you are lacking in respect?

"being able to predict and stop a potential war"

So, how were you going to stop the war by having people on a Reddit-style platform not banning you? Who was going to be the major official of the Biden administration reading The Motte and saying "By Jinkies, this feller has his head screwed on right, let's get in touch with him right away to set our diplomatic policies!"

You were indeed correct about the war, but possibly for the wrong reasons. However, I don't see anyone objecting to you in that linked thread on the grounds of hurt feelings, so whatever you did to get banned, you are not showing us.

Expand full comment

Comment removed

Jan 24, 2023Edited

Comment removed

Expand full comment

So your feeling is "Thing that I am good at is most important thing". That's what everyone thinks.

I'll give you credit for forecasting the war, but I don't know how right on the particulars you were. Your post seems to be "it is all the fault of the wicked West, forcing poor Vladimir into this with their insistence on NATO" but I think Putin is the kind of guy who will expand territory and influence any way he can, even including war, anyway.

And that doesn't say that your forecasting ability is equally good for every prediction. I can see why "I'm right and I'm always right and you lot are stupid" does get on people's nerves.

Expand full comment

Comment removed

Jan 24, 2023Edited

Comment removed

Expand full comment

"I have no trouble apologizing when I'm wrong and I both expect and demand the same behavior from others. If they have a problem with my attitude and ostracize me from their community because they can't tolerate rudeness, I will join another community - preferably a community of their enemies - and work to destroy them. I already did this with Q-anon and I'm quite proud of my results."

Well that doesn't sound as tough-minded as you claim: "they hurted my fees-fees so I will DESTROY THEM". Honestly, you make yourself sound worse with each vaunt about how if nobody kisses your ass, you will stomp your footsies and make them pay, make them all pay!!!!

Expand full comment

Comment removed

Jan 24, 2023Edited

Comment removed

Expand full comment

Jan 24, 2023Edited

Okay, I'm going to try to ask this *without* starting a whole contentious debate on Putin's motivations:

How would we discern between these two hypotheses: "Putin is warlike and expansionist" vs. "Putin is defensive and acting in Russia's interests"? What evidence, real or hypothetical, would distinguish between these two positions? In short, how would people with these different views predict things differently (whether now, in 2019, or at some other time)?

Expand full comment

Comment removed

Jan 24, 2023Edited

Comment removed

Expand full comment

https://en.wikipedia.org/wiki/Poisoning_of_Sergei_and_Yulia_Skripal

> If a person who believes that Putin is warlike and expansionist can predict Putin's behavior better than somebody who believes that Putin is defensive and acting in Russia's best interests, then the warlike expansionist theory is closer to reality. If it works the other way around, then the defensive theory is more accurate.

Um…yes? That's why I'm asking the question?

Expand full comment

"What evidence, real or hypothetical, would distinguish between these two positions?"

Well, not poisoning the heck out of Salisbury would be a start:

Or, you know, people who fell out with him then going on to literally fall out of windows. Little things like that.

Expand full comment

Could you explain why these events support a contention that Putin is warlike and expansionist?

Expand full comment

Your prediction appears to have come at a time when Russia was literally mobilizing for war as a matter of public knowledge. The post largely concerns a sympathetic view of Russia's motives that Russian apologists were offering at a time as preemptive justification for their invasion.

It'd be more impressive if you predicted this in 2003 within an appropriate timeline. As it stands, you predicted something people marginally informed of the news would also be likely to predict.

Expand full comment

Comment removed

Comment removed

Expand full comment

Viliam

By "almost everybody", do you mean mainstream American journalists, or prediction markets? If the latter, do you perchance have a link?

Expand full comment

Comment removed

Jan 25, 2023Edited

Comment removed

Expand full comment

Vitor

It is also my experience that humble people go through life saying "humble, humble, huuuumble" like some sort of pokemon.

Expand full comment

Comment removed

Comment removed

Expand full comment

I agree that you are better at predicting than most of TheMotte; maybe the best.

We probably disagree on how much that means. In my opinion, TheMotte selects for people who want to post *edgy* opinions, not for being correct. That is why I mentioned prediction markets, which select for making correct predictions (or having enough money to waste on making incorrect ones). I do not remember exactly, but I think that prediction markets were at that moment giving about 50% chance of Russia invading Ukraine.

Specifically in that situation, being *edgy* was anticorrelated with being right, because people who optimize for being edgy often swallow the Russian narrative hook, line, and sinker. And the Russian narrative at that moment was "no, Russia is not going to attack anyone, it is the belligerent Americans who always accuse Russia of bad things for no reason". Therefore -- utterly predictably -- most people at TheMotte believed that Russia is not going to attack, and it's all just American propaganda. Congratulations for figuring this out! (Though you still bought the part about this all being "our mistake" as opposed to simply Russia doing what Russia always does.)

Expand full comment

https://www.washingtonpost.com/national-security/russia-ukraine-invasion/2021/12/03/98a3760e-546b-11ec-8769-2f4ecdf7a2ad_story.html

Comment removed

Jan 25, 2023Edited

Comment removed

Expand full comment

>Well, I predicted the Ukraine invasion in advance. Here's proof.

>Actually, at that time almost everybody (including the so-called "experts") was saying that this was a bluff on Russia's part and they wouldn't really do anything. I was one of the very few people to predict this accurately.

Your post is from Jan 12, 2022?

The most mainstream experts/sources like US Government or US President were "We believe Russia is likely going to invade, for real, no joke, actual invasion." Examples:

Dec 3, 2021, Washington Post on what US intelligence believes:

>U.S. intelligence has found the Kremlin is planning a multi-front offensive as soon as early next year involving up to 175,000 troops, according to U.S. officials

Jan 19, 2022, Biden saying Putin is really going through with it.

>President Biden said on Wednesday that he now expected President Vladimir V. Putin of Russia would order an invasion of Ukraine, delivering a grim assessment that the diplomacy and threat of sanctions issued by the United States and its European allies were unlikely to stop the Russian leader from sending troops across the border.

>He added, almost with an air of fatalism: “But I think he will pay a serious and dear price for it that he doesn’t think now will cost him what it’s going to cost him. And I think he will regret having done it.”

>Asked to clarify whether he was accepting that an invasion was coming, Mr. Biden said: “My guess is he will move in. He has to do something.”

https://www.nytimes.com/2022/01/19/us/politics/biden-putin-russia-ukraine.html

Expand full comment

>If you analyze raw scores, IQ correlates with score pretty well. But when you analyze percentile ranks, you need to group people into <150 and >150 to see any effect.

How is this possible? I would expect raw scores and percentile ranks to correlate very well.

I'm guessing Scott just means that the effect was only statistically significant with raw scores?

Expand full comment

Shaked Koplewitz

I wonder how I would've done at predicting what percentile I'd end up taking. Feels like I'd have been about right, but that's easy to say now that I know.

Expand full comment

Milli

Can you give the percentiles for people with >130, 140 and 160 IQ (self reported) to determine that the 150 IQ cutoff is not cherrypicked? Does it work like that?

Expand full comment

AfterValue

IQ is fake. https://www.sciencedaily.com/releases/2012/12/121219133334.htm

Expand full comment

Milli

Jan 24, 2023Edited

IQ might not be an end all be all metric for mental tasks but IQ tests measure *something*. Maybe that something correlates with the ability to make predictions.

And why should we not test hypothesis when it's cheap to do so?

Relevant: https://slatestarcodex.com/2014/12/12/beware-the-man-of-one-study/

Expand full comment

Jan 24, 2023Edited

If I'm not wrong, almost 1 in 5 people predicted probabilities of Lula being elected and Bolsonaro being re-elected as adding up to 110% or more (3 out of 12 superforcasters)

Expand full comment

If they did the kind of density analysis that the contest winner did, then they might have been trying to modify their predictions to increase their chance of a first place finish.

I'm not sure how likely that is. But it is generally true that a perfectly calibrated set of predictions can be ... occluded? blocked? by some nearby/similar set of predictions that is slightly more overconfident on a few outcomes. If these events come to pass, they get a slightly better score; if they don't, well, they're worse off (by more), but in that case neither forecaster was going to be in first place anyway.

I am considering running a Monte Carlo simulation on my own set of predictions. That is: assume I'm perfectly caliibrated. Run ten thousand sets of outcomes based on my chosen probabilities. See how the scores (based on the blind dataset) sort out. If perfect calibration doesn't yield some significant number of wins, identify the vectors or sets of predictions that are winning frequently instead, and modify my predictions so that these are no longer blocking a possible first place finish.

This is a bit more specific than just avoiding areas of 'density' in the prediction space, because it actually may not matter too much if there are *similar* predictions, so long as a large number of possible outcomes give you a slightly better score.

Expand full comment

Jan 24, 2023Edited

" Maybe it’s possible to say with confidence that a 41% chance to be better than a 40% chance, and for us to discover this, and to hand it to policy-makers charting plans that rely on knowing what will happen."

That's the bit that gives me the heebie-jeebies about using prediction markets. A change to 41% from 40% isn't a huge increase so probably the policy-makers, if they take any notice of it at all, won't adjust their plans too far from what they originally thought. And of course people who make policies are already using experts and trying to shave more and more uncertainty off predictions.

But what about when predictors start going "We're 70% sure. We're 80% sure. We're 90% sure"? That's a big divergence and *if* the policy-makers trust the predictors, that means a big change from what they were originally intending to do.

Some predictions are simple yes/no - there will or there won't be a cease-fire in Ukraine. But what concerns me is the logic-chopping even in the toy example: did Nancy Pelosi retire or not? Is Eric Lander a Cabinet-level official or not?

It's not much consolation to the smoking crater where a village used to be that "Well, *technically* the question was resolved correctly if you re-do the wording" which basically means "Not our fault the policy-makers picked the wrong decision based on our predictions".

If a simple term like "what does 'retire' mean?" can't be concluded without arguing "She did retire" "But not as a Congresswoman so she didn't in fact retire retire", or "No Cabinet official quit" "By a technical definition this obscure guy is Cabinet level", then why expect any policy-maker to give you the time of day?

Expand full comment

We're just talking about fancified polls and financial markets, and policy makers already observe these closely and take them into account. No politician fails to consult the polls, no President fails to keep an eye on the NASDAQ. Establishing a market in things more general than "the value of Exxon Mobil" or "whether people approve/disapprove of Dobbs" isn't going to change much, as politicians already keep a weather eye on what people generally think will happen.

If anything, it would probably just continue the slow trend in the 20th-21st century of decreasingly effective leadership. Leadership kind of by definition consists of guessing the right answer when everyone around you is guessing wrong, defying public opinion, and sticking it out until you're eventually proved right, and then people build statues to your vision et cetera. To the extent leaders pay closer attention to vox populi, they are definitionally less "leaders" and more executives just implementing the board's vision. The payoff is they make fewer clearly disastrous decisions, like invading Russia in the winter, but the drawback is they also make fewer inspired decisions, like landing at Inchon or raising the prime rate to 21% to smash inflation.

My impression is that we've just become more conservative in the background state of our lives[1], so we are willing to trade fewer brilliant successes for fewer stupendous fuckups, so having more polls 'n' markets seems like a natural -- conservative, ha ha -- trend.

---------------

[1] Which maybe is why we embrace flamboyance in things that don't matter at the core of our lives, like spaceflight to Mars or fierce culture debate over issues that have no direct personal bearing for almost all of us. Like Romani circa AD 150, coddled within a vast safe Pax, yet thronging to the bloody violence of the colosseum.

Expand full comment

> Leadership kind of by definition consists of guessing the right answer when everyone around you is guessing wrong, defying public opinion, and sticking it out until you're eventually proved right, and then people build statues to your vision et cetera.

Thank you for this. I think I needed to hear that today.

Expand full comment

What are the questions that entrants got wrong the most?

As Skerry said above, 2022 didn't seem as weird as, say, 2020.

Personally, I didn't think Russia would invade Ukraine (figuring it would do something more debatable like 2014), but I was wrong and the U.S. government (as of 12/31/21) was right, so I wouldn't rank that as too weird, more just me being wrong.

One problem with forecasting contests is that the the really weird events of importance don't have any questions about them because nobody saw them coming. For instance, I doubt if anybody at the end of 2014 asked if Angela Merkel would let a million Muslims in in 2015. Of course, she did, and that wound up making more likely more weird events in 2016 like Brexit and Trump.

Expand full comment

proyas

Jan 24, 2023Edited

You should form a focus panel of the top 10 forecasters for 2022 and have them make joint predictions for 2023, and out to 2028. Dissenting opinions would be noted.

I could help you do it.

Expand full comment

Alex Power

So I did not do well last year; somewhere around the 20th percentile.

My failed predictions were highlighted by "while Russia is probably going to do something militarily in Ukraine, it certainly isn't going to do (the exact invasion we got)". After I adjusted for "Putin has gone mad and is going to pursue losing strategies consistently" in March, my predictions on the topic have gotten better.

At a per-question level, I lost the most points on "will any state legalize a psychedelic in 2020"; I said 20% (average 75%, and it did happen), and still think that was a defensible guess even though it did happen in Colorado.

Unfortunately the Google spreadsheet is too large and cumbersome for me to find my other specific predictions.

Expand full comment

Nancy

My university is inviting us to attend a 2-day "foresight fundamentals workshop" offered by the Institute for the Future (https://www.iftf.org/). I never see this organization mentioned in discussions of forecasting. Does anyone know anything about this group? Would attending their trainings be worthwhile?

Expand full comment

Can we have linked footnotes, please?

Expand full comment

Aren't they linked already?

Expand full comment

Ah. This seems to be an app thing. I'll report it to the devs.

Expand full comment

[1] https://twitter.com/i/lists/83102521

> A person who estimates a 99.99999% chance of a cease-fire in Ukraine next year is clearly more wrong than someone who says a 41% chance.

Technically, if there is a cease-fire in Ukraine next year, the person who gives a probability of 99.99999% is *less* wrong than someone who gives a probability of 41%. At least, in terms of probability as a thing that is scored with reference to reality.

Some epistemologists think there is an objective notion of "evidence" that makes some probabilities be a "correct" report of the evidence. But if there is such a thing, you can't use calibration or scoring rules to measure it (at least not directly).

I don't believe in an idea of an objectively correct report of evidence. Instead, I think the way we do this work is by asking whether a person's *method* of forming probabilities does well in terms of score (as match with reality) not just in the actual world, but in nearby possibilities. I think that reliably getting relatively accurate is the only evidence-type thing that we can have.

Expand full comment

Fang

Jan 24, 2023Edited

> Actually, if you analyze raw scores, liberals did outperform conservatives, and old people did outperform young people. [...] some people did extremely badly, so their raw scores could be extreme outliers

This seems to imply that conservatives/young'uns have a greater number of individuals who are confidently *very wrong* (countered by a segment that are slightly more correct than average).

So in other words, the takeaway is that the high-temperature right wing influencer takes that look like dumb predictions probably *are* dumb, but reasonable conservatives that don't base their predictions on Ben Shapiro are likely to be grounded in reality.

Well, either that or liberals are just boring centrists as usual.

Edit: Zvi's twitter list[1] gave me an example of the exact weird bullshit predictions I was talking about: https://twitter.com/RichardHanania/status/1617765690693521408

Expand full comment

Maybe later

Stop ruining my intricate user interaction theories with your lived experience.

Expand full comment

What do you mean? Was this supposed to be a response to something else?

Expand full comment

Maybe later

…I wondered where this ended up: it showed up initially under the correct comment, and then disappeared. I refreshed, didn't show up anymore, so I reposted it (https://astralcodexten.substack.com/p/who-predicted-2022/comment/12180324).

Expand full comment

Rupert Freeman

https://pubsonline.informs.org/doi/10.1287/mnsc.2022.4410

It's interesting to see Ryan explicitly calling out that he sought to maximize the probability of winning the contest, not to minimize his expected log loss. In case anyone is interested, colleagues and I have a paper on forecasting contest design. We show that any contest where the winner is chosen deterministically will suffer from a similar problem (truthfully reporting probabilities might not be an optimal strategy). If you're willing to choose a winner randomly (typically: non-uniform), you can get around this problem, at the cost of selecting a bad forecaster as your winner with some probability. Given independence of event outcomes, this probability gets smaller and smaller the more events are in the contest.

We'd be interested to hear anyone's thoughts.

Expand full comment

Ryan Kupyn

Really interesting work! Pretty much aligns with how I was thinking about the contest, though of course you bring in a lot more nuance. Having Scott's answers already available and set as the default really helped me here - I was able to assume that there'd be a lot of clustering around those values.

Expand full comment

PPV should to go to school and work in finance or something

Expand full comment

Given that superforecasters seem to be fairly accurate in their predictions, what do they predict regarding AGI risk?

Expand full comment

Can I check: your footnote refers to group people into <150 and >150, but which group are you putting people who record an IQ of exactly 150? It makes a surprisingly big difference, because the first round responses have 29 people claiming an IQ>150 and 8 people claiming an IQ of exactly 150. Incidentally, one person claims an IQ of 212, which I think must be either a typo or a lizard: I've removed it by hand.

Expand full comment

So, I looked at my scores (in the excel file) again...overall, I wasn't that different from average, but where I really did better was in election predictions , particularly for the US midterms. Seems like most people were much more bullish on the GOP to win both the Senate and House, which was reflected in the prediction markets...Idk why I was more cautious here, but I guess it might have to do with me taking into account the education realignment benefiting the Democrats more than the GOP (especially at midterms)?

Expand full comment

KW NORTON