165 Comments
User's avatar
David Gross's avatar

You might consider unicode icons (e.g. ✔, ✘ or ☑, ☒) rather than links-to-nowhere to mark things that did/didn't happen.

Expand full comment
Vaclav's avatar

Yeah, I can understand the desire not to imply correctness/incorrectness, but my brain seems reluctant to accept 'links mean it didn't happen' without repeatedly asking for confirmation.

Expand full comment
Matt A's avatar

It helps if you start by noticing that, at the start of post, 2014 through 2019 didn't happen last year.

Expand full comment
Rana Dexsin's avatar

It's definitely going against the grain of habit to reinterpret formatting that's normally used for something so specific that way. I actually like the Japanese-style ⭕/❌ because they're very visually distinguishable.

In the strange hypothetical universe where I'm the style editor here, I think I use ⭕/❌ for “correct/incorrect direction of prediction” and ⊤/⊥ for “outcome occurred / did not occur”, like so:

Fewer than 100,000 US coronavirus deaths (⊥): 10% ⭕

Bay Area lockdown will be extended until election day (⊤): 10% ❌

I do another Nootropics survey this year (⊤): 70% ⭕

I do another SSC Survey this year (⊥): 90% ❌

Bonus:

Conditional on [redacted] being published, it gets at least 40,000 pageviews (∅): 50%

Fewer than 300,000 US coronavirus deaths (⊥): 50%

Expand full comment
Pontifex Minimus 🏴󠁧󠁢󠁳󠁣󠁴󠁿's avatar

> I actually like the Japanese-style ⭕/❌

The problem with ⭕ is it looks like 0, which has connotations of nothing, zero, false. Essentially, ⭕ partially activates my "false" category. For that reason I think ✅ and ❌ are better; they also have the advantage of being different colours. Also, red means "alert/danger/broken/wrong" and shouldn't be used for something meaning correct/true.

Expand full comment
Rana Dexsin's avatar

Your culture has such silly ideas about red circles though!

By which I mean “alas, my connotations are unfashionable, oh well”.

Expand full comment
Kenny Easwaran's avatar

For most of the predictions he listed, it's fine to just say over 50% and true or under 50% and false is "correct", while under 50% and true or over 50% and false is "incorrect". But something like "Oral Roberts wins March Madness, 40%, T" seems better to count as "correct" rather than "incorrect".

Fortunately, the calibration metric doesn't care whether you call things "correct" or "incorrect" - it just cares what fraction of the 40% predictions were T. (And same with doing it using the Brier score or log score or some other proper scoring rules.)

Expand full comment
Rana Dexsin's avatar

This is a good point which I had not considered. I wonder if it'd be possible to go △/▽ for cases in which there's a depressed base rate to consider due to being part of a class of multiple options, but that gets ill-defined and complicated quickly.

I do find that reading the list is significantly harder if I have to do what amounts to the XOR myself on every item, which is why I was speculating on this in the first place, but finding some not-link formatting could be as adequate as before, anyway.

Expand full comment
Rana Dexsin's avatar

(“base rate”, really, self? I need more coffee.)

Expand full comment
Tristan's avatar

I told my brain that the links were all links to proof that something didn't happen and that sort of worked as a mental shortcut to keep my brain from swapping to System 2 after reading each prediction.

Expand full comment
Kenny Easwaran's avatar

That's what I was thinking, but I couldn't keep it up when we got to "[redacted] are still dating at the end of the year" and there was a link, that really *looked* like it should be a public post announcing the names of the individuals who broke up but was just www.example.com.

Expand full comment
Korakys's avatar

A better solution would be to have each post just be one link that points to a wordpress or whatever blog post that hosts the actual content.

Expand full comment
Pontifex Minimus 🏴󠁧󠁢󠁳󠁣󠁴󠁿's avatar

Or a mixture of icons and words, such as:

✅ happened

❌ didn't happen

❓ withdrwn

Expand full comment
Ben's avatar

Perhaps one check or X for every 10% in the right direction, plus the word “happened” or “didn’t happen?”

Expand full comment
gordianus's avatar

You could still strike-through text to mark incorrect predictions by adding a combining stroke character after each character of text (Unicode codepoint u+336; t̶e̶x̶t̶ ̶m̶o̶d̶i̶f̶i̶e̶d̶ ̶t̶h̶u̶s̶ ̶l̶o̶o̶k̶s̶ ̶l̶i̶k̶e̶ ̶t̶h̶i̶s̶).

Expand full comment
Lincoln Shade's avatar

Well, to get the conversation about 50% predictions started... I think they're fine. The actual phrasing of each individual prediction could be flipped, sure, but what's important is that you actually publish a particular framing of such such positive/negative predictions, and then grade that set's outcomes. The results of each set of predictions at a particular confidence level should be drawn from a binomial distribution at with theta at that particular confidence level. With theta at 50% and large N of predictions, there are plenty of different ways to end up near N*0.5. What you don't want is for your particular set of outcomes to end up at, say, N*0.22 :P

Expand full comment
DaneelsSoul's avatar

I think that assigning a 50% probability to something (rather than 10% or 90%) is definitely meaningful. Saying whether or not it makes sense to be well-calibrated on your 50% predictions is less so. If you randomize which answer counts as "yes", you end up with a binomial distribution on correct outcomes independent of what the actual outcomes are. The only way that calibration makes sense is to maybe detect if you have a bias towards assigning too much probability to the positive answer to questions (which could of course be corrected by randomly deciding whether or not to negate the question wording before assigning a probability).

Expand full comment
Michael Sullivan's avatar

Sometimes Scott uses a series of predictions to simulate a range (for example, he said that number of covid deaths would be less than 300k/500k/3M at 10%/50%/90%). Those are clearly meaningful. Like, he's clearly more correct with those predictions if deaths are 550k than if they're 2.5M, despite them being false/false/true in both cases.

Expand full comment
DaneelsSoul's avatar

Read what I wrote (unless you were trying to agree with me here). Making predictions of 50% can be meaningful. Seeing whether or not your 50% predictions are well calibrated (i.e. checking that roughly half of them come true) is not.

Expand full comment
Garrett's avatar

On the contrary. If you predict "obvious" things like the sun coming up tomorrow at 50%, it means that you are underconfident and you are improperly calibrated. A perfect predictor would have exactly 50% of those things come true.

Expand full comment
DaneelsSoul's avatar

But this only applies if the obvious things you answer are all obviously true things. If half of them are obviously false things you end up calibrated. If you say that there's a 50% chance the sun will rise tomorrow and a 50% chance we will see a large scale alien invasion tomorrow, you end up well calibrated.

Expand full comment
Michael Sullivan's avatar

Relax, my dude. I was expanding on your comment, not arguing with you.

Expand full comment
Lincoln Shade's avatar

Ah, you're right. I stand corrected!

Expand full comment
Luca Petrolati's avatar

As you say, by 'randomizing which outcome counts as "yes"' (or as "head" if you think about each prediction as a fair coin flip), you get a binomial distribution on the number of "yes". But then you do get a way to calibrate: just use the binomial to.compute the probability of getting the number of "yes" that happen. With N sufficiently large, it should be easy to distinguish between a fair and a biased coin (calibration).

Expand full comment
DaneelsSoul's avatar

So... you are worried that this totally theoretical coin which determines which outcomes count as "yes" is biased? Why not just use an unbiased coin for this totally theoretical exercise?

Expand full comment
Luca Petrolati's avatar

No, sorry, I was just wrong :/ If you label using a fair coin, you wil get back just a 'fair" outcome

Expand full comment
DangerouslyUnstable's avatar

I think the actual correct thing is to always anchor 50% claims at exactly 50% correct. Because the two opposite framings are equivalent, then swapping them doesn't actually change the prediction you made, but it _does_ change the "percent correct". And since, if you actually plotted this showing the full 100% range, it is not possible to have a line that is uniformly over or under. It has to just have a varying slope _that goes exactly through the 50% line_. Mathematically, if you are under-confident in the 90% predictions, then you _must_ be exactly equivalently overconfident in the 10% predictions. So I think that a) Scott should continue to make 50% predictions, because those predictions are _not_ any other possible percent, but he should just forgo calculating a percentage, and always call the 50% category exactly 50%, because that's mathematically what this line should do.

Expand full comment
DangerouslyUnstable's avatar

Another way of making this clearer. For every 50% prediction, you are, definitionally, making the opposing 50% prediction as well. So if he actually posted all the 50% predictions he is making (the one he records _and_ it's logically necessary companion), then every single year the 50% category would necessarily be exactly 50% correct.

Expand full comment
Jonathan Weinstein's avatar

I upvote this comment. I can't think of a coherent argument for why it would mean anything at all to say you got 2/9 of your 50% predictions correct. Here's a closely related argument: On the set of events I (for indifferent) that you assign 50%, assuming independence (and Scott doesn't seem to have stated any correlations in his predictions), you have predicted that all subsets of I are equally likely to be realized. So every realized subset gives you an equal rating as a predictor.

Expand full comment
Scipio A.'s avatar

The 50% predictions should have the most variance from the line of perfect correctness.

If your estimates are good the variance of each other % should be progressively closer to the line -- the "predictive power" of a prediction is the difference from it being random -- the difference from 50% that you assign to it.... the long term estimate of the variance should be symmetric about the 50% point ... this is a consequence of 90% prediction of A being a 10% likelihood of not(A).

Expand full comment
Luca Petrolati's avatar

No, it's not. The problem is the same as flipping a coin you think fair N times, where N is the number of 50/50 predictions you made. When N is sufficiently big, you expect N/2 heads and N/2 tails. If this doesn't happen, the correct conclusion is that the coin wasn't fair, or equivalently that assigning the same probability to heads and tails was wrong from the beginning.

Expand full comment
DangerouslyUnstable's avatar

If you say 'i think that this coin has a 50% chance of heads", then you are _also_ saying "I think that this coin has a 50% chance of tails", _by logical necessity_. If you wrote all of those out, for literally any number of flips, half of them would be correct, and half of them would be incorrect. If you write out all the logically necessary predictions you are making, then it's not possible to have any other outcome than exactly 50%. This is the same reason why he only shows the plot from 50-100. For every 90% prediction, he is, by logical necessity, making the companion 10% prediction, and these two _must_ add up to 100%. So you don't need to show the other half of the graph, you know it mathematically. The 50% bin is just the one where not only do you know the other half by knowing one half, but that it has to necessarily be exactly 50%. Which is why some people argue that it isn't useful. I say it's still useful because he is _not_ putting those predictions in a bin _other_ than 50%.

Expand full comment
DangerouslyUnstable's avatar

To go back to your tails example, the reason some people argue against the value of the 50% bin is that, even if the hypothetical coin is extremely biased, so that, in 1000 flips, it shows head 90% of the time, your 50% calibrations will still be "perfectly calibrated", because of the symmetry, you can't detect when you are wrong. If this _wasn't_ the case, if you _could_ get useful calibration information from the 50% bin, then no one would argue that it isn't useful. The whole argument stems from the fact that, by mathematical necessity, it _has_ to always be "perfectly calibrated"

Expand full comment
Luca Petrolati's avatar

yeah, after some discussion and upon further reflection, now I know why I was wrong. My idea was that by randomly labeling (e.g. by flipping a fair coin) with H and T the outcomes of each 50% prediction, you could calibrate by checking you got N/2 H and N/2 T (in the long run). But the labelling operation automatically gives that. In fact, you could also do it for 90/10 prediction and get the same 50/50 sequence of H and T.

Expand full comment
ZumBeispiel's avatar

Ever considered changing it from "50%" to "51%", just to break the symmetry and get rid of this never-ending discussion?

Expand full comment
aleh's avatar

This would definitely help.

A 50% prediction can mean:

(A) "I very slightly lean towards one direction of another (perhaps implied by the phrasing)". These can be calibrated, and calibration is interesting.

OR

(B) "I didn't even bother thinking about the question; I effectively just tossed a coin". This is calibrated as 50% but is entirely uninteresting.

OR

(C) "I gave this some serious thought, but the arguments either way seemed

equally balanced". This is an objective fact, and is perhaps interesting, but it's about Scott's thought processes. If he says this is the case, well, presumably he's not lying, but in any case there's further to be gained by checking how the event actually turned out. Calibration is meaningless.

IMO, part of the confusion is that a Scott 50% prediction could be any one of these. It would be better if he used a 51% prediction for class "A", and that we all pragmatically assume that there are none of class "B", and that everything remaining is an uncalibratable statement of class "C".

Expand full comment
Daniel Reeves's avatar

Sounds smart! Or even "50.1%" / "49.9%" to really emphasize that your true Bayesian stance is "utter ignorance" and you're just forcing yourself to tip one way or the other to keep the analysis cleaner and avoid the weird special case where there's no such thing as being right or wrong.

Expand full comment
Kenny Easwaran's avatar

And as usual, I'll say that this neverending debate about 50% is just one more piece of evidence for the fact that a scoring rule is a better way to evaluate than calibration.

https://en.wikipedia.org/wiki/Scoring_rule

Calibration is useful if you take someone else's predictions as your only input when making your own predictions. But it's not a good way to evaluate someone's predictions, because it gives no measure of how discriminating the person is. Someone who makes 10 predictions of 90%, 9 of which are true, and 10 predictions of 70%, 7 of which are true, looks just as good as someone who made all 20 of them at 80%, even though the former is more discriminating. Someone who predicts all 435 House seats at 90% for the incumbent, and someone who predicts all the safe seats as 100% for the incumbent and the swing seats at 50%, may look equally good, even though the latter is more discriminating.

Expand full comment
Ben's avatar

A forecast can imply two pieces of information. One is how confident the forecaster is. The other is how confident you should be.

Insofar as you are simply interested in Scott's state of mind on any given question, a 50% forecast merely means he doesn't know. That's a relatively uninteresting piece of information, in general. Most people don't know most things most of the time.

By contrast, if you see Scott as a credible forecaster on a given question, perhaps having insider information or better judgment than you, a 50% forecast gives you an anchor point or a direction in which to adjust your own forecasts.

So whether or not a 50% prediction is meaningful seems to depend on whether or not you're including the forecaster in your "forecasting league" of predictions you allow to alter your own forecast.

This is an important choice. Basing your own forecast on those of others is often a wise choice. For example, I am better off basing my prediction of tomorrow's weather on professional weather forecasters than my own intuitions. However, if I'm making decisions based on those forecasts, I'm allowing others to control my choices. If I trust their judgment, that's fine.

But making a forecast is sometimes free. If people are "spamming" forecasts, and others update their forecasts and make decisions based on that, then making forecasts can become a tool of social control. This is what we see when certain public figures seem to alter their forecasts based on what behaviors they want to see people adopt.

So choosing who to include in your "accurate forecasting" league, and on what sorts of questions, can be a key decision. If you were 70% confident X will happen, and Scott predicts 50%, do you have a policy of adjusting your forecasted probability downward just because you trust him? If so, his prediction is meaningful to you as a forecast.

If not, his prediction isn't meaningful - but it wouldn't have been no matter what he predicted, since if you don't adjust toward his prediction at one probability, why would you adjust at a different probability? In this case, Scott's predictions are just telling you about his state of mind.

Expand full comment
Oligopsony's avatar

Intellectually uninteresting but clearly phrased question: how much does the desire to see your predictions be accurate influence the ones you have control over?

Intellectually interesting but vaguely worded question: something something desire to see predictions be accurate something something general model of brain something something actions one has control over

Expand full comment
Pontifex Minimus 🏴󠁧󠁢󠁳󠁣󠁴󠁿's avatar

Are you asking whether people only make predictions about things they know they are sure about?

Expand full comment
Evesh U. Dumbledork's avatar

I think he's wondering whether Scott might be tempted to manipulate his relationships to end up with the right number of break ups just to feel calibrated.

Expand full comment
Yosarian2's avatar

I think it's more like "Does predicting that you will think the balance of evidence support the Tara Reade accusation affect how you analyze that evidence going forward, partly becuase you want your prediction to be correct", to pick an example (I don't want to get into a debate about that here but it surprised me that he came to that conclusion.)

Expand full comment
Kenny Easwaran's avatar

Some philosophers say there is something distinctive about the deliberative attitude that makes it incompatible with the predictive attitude: https://philpapers.org/rec/LIURAJ, https://philpapers.org/rec/KATQAP

Others say the opposite: https://authors.library.caltech.edu/73964/1/Hajek-2015-Deliberation-Welcomes-Prediction-final.pdf

Regardless of whether one can reason in both modes at the same time about the same action, the fact that past-you made a prediction, and present-you is in charge of making it true or false, can definitely give some moral hazard, if any sort of internet points are on the line.

Expand full comment
JR11's avatar

So uh... when can we purchase the Unsong revision? Asking for a friend

Expand full comment
Zechariah Rosenthal's avatar

I too, am asking for a friend...specifically, for all of my friends for whom I will successfully force to read it finally, er, I mean buy gift copies for!

Expand full comment
Scott Alexander's avatar

I sent it to a publisher and they asked me to send it to their preferred editor for external editing. Then the NYT thing happened and I haven't gotten the energy to do something which means I might have to edit it all over again. So the short answer is...not for a while.

Expand full comment
Evesh U. Dumbledork's avatar

50% predictions don't help to see if you are calibrated, but other than that sure they mean something, assuming 50% results from a genuinely best attempt.

Expand full comment
Rob Lachenrock's avatar

I think they do to some extent - just not in an over / underconfident sense. But you do know whether the things you predict 50% of the time occured near 50% of the time vs far away from it in either direction.

Expand full comment
Kenny Easwaran's avatar

If all your predictions are naturally thought of as binary events, where "yes" and "no" were antecedently equally plausible, then a prediction of 50% doesn't indicate overconfidence or underconfidence. But someone who makes a prediction every year about the March Madness tournament and always puts the last seed at 50% is massively overconfident, and someone who makes a prediction about the weather in Los Angeles every day and always puts "sunny" at 50% is massively underconfident.

Calibration is a crude way to measure how good someone's predictions are, that depends very heavily on how you group them. Proper scoring rules have a lot of features that often make them better: https://en.wikipedia.org/wiki/Scoring_rule

Expand full comment
Evesh U. Dumbledork's avatar

Well, it wouldn't be about the calibration of your beliefs, but about how you phrase things. It can be fixed by tossing a coin to decide whether to phrase it "50% sure X will happen" or "50% sure X will not happen".

Expand full comment
Kevin's avatar

What biohacking projects did you try?

Expand full comment
Timothy Johnson's avatar

So Scott believes that the balance of evidence supports Tara Reade's accusation of sexual assault? I haven't followed the case closely, but I wasn't aware of any convincing evidence that it was credible.

Expand full comment
Scott Alexander's avatar

I thought so, based on her family saying she'd told them at the time, and the tape of her mother calling into Larry King (?) at the time saying a Congressman had sexually harassed her daughter.

Although in theory she could have been lying at the time, given that she never revealed the Congressman's name until now it wasn't a very good hit job, and I can't think of any other reason to lie.

Expand full comment
Nicholas Decker's avatar

My read on what all was going on was she was compensating for the loss of prestige in losing the job for her being bad at it by constructing a narrative in which she was the victim, and magnifying small things like serving drinks at an event into something larger. Of course it’s never going to be proved one way or the other, but I really don’t think Biden did it.

Expand full comment
mark robbins's avatar

It really depends on what you mean by "gives her account credence."

This account is 100% made up: "She said Biden pushed her against a wall on the Capitol grounds, kissed her, and then digitally penetrated her — all against her will."

Biden touching people in ways they considered inappropriate and his staff reacting negatively to complaints is true. He has changed that behavior by all accounts.

Expand full comment
Aftagley's avatar

Yeah; this always looked like a Motte and Bailey -esque accusation to me. When trying to defend Reade's accusations people would talk about unwanted shoulder rubs and stuff that Biden has been credibly accused of doing, but when trying to attack Biden they'd talk about the supposed sexual assault.

At the current moment, my guess of the probabilities around Tara Read would shake out something like:

1. Odds that Tara Reade had some kind of negative but non-criminal experience with Biden (shoulder rub, head sniff, weird comment): 66%

2. Odds that Tara Reade quit because of that experience: 10%

3. Odds that Tara Reade was fired because of factors unrelated to that experience: 90%

4. Odds that the full scope of the accusations against Biden are true: 5%

Referring back specifically to Scott asking about why she'd lie. This Politico article details that she apparently has a lifetime history of telling bizarre lies that don't really seem to make sense.

https://www.politico.com/news/2020/05/15/tara-reade-left-trail-of-aggrieved-acquaintances-260771

Expand full comment
Dan L's avatar

The ambiguity of the question was called out at the time, and the hostile reception benf got still annoys me. There's quite a lot of ink to be spilled debating the circumstantial evidence, but the lack of focus on the object-level disagreement is depressing.

https://slatestarcodex.com/2020/04/29/predictions-for-2020/#comment-890221

Expand full comment
DABM's avatar

I missed that. His question was reasonable and the attacks were bad. More proof that the commentators here aren't always the founts of unbiased reasonableness that they take themselves to be. (Although basically no one is, and certainly not me...)

Expand full comment
Kenny Easwaran's avatar

This all looks like a good clarification to me, though I would still quibble about 3 and the definition of "unrelated to that experience".

Expand full comment
Aftagley's avatar

ok - let me specify. Coworkers at the time remember that she was struggling with her job performance and was let go because of those difficulties. She claimed at the time that she was fired because of a medical issue that wasn't being properly accommodated.

I'm saying there is a 90% chance that one of the two above claims is true and that she was fired either due to job performance or a medical issue (or both). I'm also claiming that this negative job performance and/or medical issue did not stem from any negative experience she had with Biden.

Expand full comment
Shockwell's avatar

Interesting. My own guesses would be something like:

1. Reade's accusation is completely true, and Biden is lying: 40%

2. Biden's denial is completely true, and Reade is lying: 30%

3. Something did happen between the two but not exactly what Reade claims. Biden is lying when he claims to have no memory of the event: 25%

4. Something totally bizarre not covered by the previous three possibilities: 5%

Expand full comment
SimulatedKnave's avatar

This is, I think, a good summary of why she's full of it to at least some significant extent.

Expand full comment
Jason Hicks's avatar

The transcript of the Larry King call doesn't suggest that the senator harassed her, quite the contrary. It was just presented that way via the Intercept headline and how some people talked about it.

CALLER: Yes, hello. I’m wondering what a staffer would do besides go to the press in Washington? My daughter has just left there, after working for a prominent senator, and could not get through with her problems at all, and the only thing she could have done was go to the press, and she chose not to do it out of respect for him.

KING: In other words, she had a story to tell but, out of respect for the person she worked for, she didn’t tell it?

CALLER: That’s true.

And the family didn't originally say she had told them at the time. "Moulton [her brother] initially said he only heard her account of the assault this spring" and later texted ABC to change his story (https://abcnews.go.com/Politics/womens-event-biden-navigates-lingering-sexual-assault-allegation/story?id=70403703). Nathan Robinson, publisher of Current Affairs, tweeted at the time that he had told the brother he needed to...retell his story (the tweets were deleted, I saw them, easy to find via google).

Expand full comment
DABM's avatar

It's an interesting question how to weigh it against the Kavanaugh accusation, because (as far as I know, people can correct me if I'm wrong) on the one hand, Ford didn't make the accusation until much later whereas, as Scott says (and I'd forgotten), Reade does seem to have said something at the time. On the other hand, there is testimony from people who've interacted with Reade (like her landlord) that she is a fluent and convincing liar who consistently lies for personal gain, whereas I've never seen that about Ford. So it's a test of whether you take someone's general level of honesty as more or less reason to testimony than whether they made the allegation close to the time.

Expand full comment
Kenny Easwaran's avatar

I think it also matters how plausible you think the details of the accusations are. In general, I would think that allegations of certain kinds of sexual assault are extremely plausible when made about a drunk law student at a party.

Expand full comment
DABM's avatar

You're right of course, and it's therefore not a perfect test of the thing I mentioned. (Kavanaugh was a high school student for Ford's allegation though, not a law student yet.)

Expand full comment
Wency's avatar

The doubts cast by Reade's acquaintances do seem to be more extensive. I don't recall seeing any of Ford's personal acquaintances attack her character as Reade's character has been attacked, except for an ex-boyfriend (which isn't exactly the most reliable sort of witness when it comes to disparaging a person's character).

Though we at least have cause to doubt Ford's relationship with the truth because she was caught, in real-time, making some weird...statements...about air travel and doors to her house. We also have Ford's high school friend, who initially tried to back her, eventually conceding that the story made no sense to her and she couldn't support it. There have been reports that friends and family of Ford were privately apologizing to Kavanaugh's family.

In comparing the two, it's probably worth considering which party has the greater cultural/economic/media power, and which the lesser. Character witnesses in support of the weaker power might have more to fear from coming forward.

Expand full comment
Shockwell's avatar

It's one of those situations with enough ambiguity that it's very easy to (maybe inevitable that you will) believe what you want.

There's also the questions of what would count as "true" here. It could, for example, be the case that Reade was assaulted or harassed but not "digitally penetrated" (ie, raped); and she is either misremembering or exaggerating.

Expand full comment
Sam Harsimony's avatar

Would it be possible to see Scott's calibration aggregated over all years? Or maybe some error bars? I always find myself wondering if the deviations are due to small sample size rather than miscalibration overall.

Expand full comment
ZachH's avatar

so basically 2020 is the lizard man's constant for years. makes sense.

Expand full comment
Maksym Taran's avatar

Wait, so does 58 mean that Scott's been sitting on a revised Unsong since the start of the year? Or did it get published somewhere and I just didn't notice?

Expand full comment
Code Walrus's avatar

"At the beginning of every year, I make predictions. At the end of every year, I score them (this year I’m very late). Here are 2014, 2015, 2016, 2017, 2018, and 2019. And here are the predictions I made for 2020."

Wait, hold up, you made all these Coronavirus predictions at the *beginning* of 2020? When it was still only in Wuhan? I feel like I'm missing something here.

Expand full comment
Mo Nastri's avatar

These are from 29 April 2020. I suppose Scott has a pretty broad-minded view of "beginning of the year": https://slatestarcodex.com/2020/04/29/predictions-for-2020/

Expand full comment
Mo Nastri's avatar

Sorry, that was uncharitable of me. 2020 was the one time Scott forgot to make predictions at the "real" "beginning of the year".

In 2019 for instance he did it on January 25: https://slatestarcodex.com/2019/01/25/predictions-for-2019/

In 2018, on February 6: https://slatestarcodex.com/2018/02/06/predictions-for-2018/

Expand full comment
Adam's avatar

This was the same question that immediately popped in my mind. This couldn't have possibly been predictions from the beginning of the year. I don't pay enough attention to politics to have known who Tara Reade was and needed to look that up, and best I can tell, she made her accusations public in April, so that wasn't beginning of year either.

Expand full comment
Ron's avatar

She made her accusations public years ago. Early 2000s, I think, but don't quote me on that. In non-mainstream American media (especially the right-aligned ones), her story was all the rage in late 2019. It absolutely could've been a January prediction. It wasn't, but it could've been.

Expand full comment
Robert Stadler's avatar

Doesn't a 50% prediction mean that you would have predicted the inverse at 50% as well? How did you decide which side of that is "right" and which side is "wrong?"

Expand full comment
Vampyricon's avatar

Yeah I'm having trouble with that.

Expand full comment
tempo's avatar

For an in depth discussion, check any previous predictions post

Expand full comment
Cole Terlesky's avatar

Well I think the specific determination of right or wrong is based on his original prediction. But the 50% predictions are weird cuz they don't say the same thing as the rest of the predictions. It doesn't say he was confident or overconfident. The aggregate score of the 50% guesses would just say how good he is at predicting whether something has a 50/50 chance of occurring.

So the worse he is with the 50/50 scores the worse he is at setting odds for his predictions. But n of 7 isn't much to go on. I'd rather see more or none at all.

Expand full comment
Paul Goodman's avatar

50%s might not be super meaningful in terms of calibration but they do have value when comparing his predictions to other people predicting the same event.

Expand full comment
Aristides's avatar

We have this debate every year. I suggest 51% will be more meaningful since Scott will have to think carefully about which one is slightly more likely than the other, and if he is wrong, they'll be some useful information. I predict with 80% confidence that Scott will ignore this point, and continue to do 50% predictions the next time he makes predictions.

Expand full comment
[redacted]'s avatar

Are we still going to get to read [redacted]? Particularly those [redacted] that you were relatively confident you would end up writing...

Won't lie that I got kind of excited when I saw those were in blue; it's nice to hear that you have ideas you consider significant enough to predict in this way for future blogposts.

Expand full comment
Matthew Talamini's avatar

My mind treats a low prediction (10%, 20%, 30%) which happens as a success, rather than a failure, even though Scott's math treats, for instance, a 10% prediction of X as a 90% prediction of not-X.

I think this is because Scott picked the statements. Imagine if he predicted, with 10% certainty, something so unlikely that most of us aren't even thinking about it. For instance, that the events described in the Book of Revelation would happen, exactly as written. Then imagine that it actually happened. Making that particular 10% prediction, rather than any other, would make him seem amazingly smart, next to all the rest of us who weren't even talking about it! We wouldn't care that he had (technically) made a 90% prediction that it wouldn't happen. He wouldn't be sad that his 90% not-the-eschaton-this-year prediction failed -- he'd feel vindicated.

But I'm wrong to think of it that way, because the topics he's predicting are things we're all talking about all the time. He's not adding any new information just by bringing up that these things are possibilities. I ought to mentally flip low-probability predictions into high-probability predictions of the opposite, the way he does for the graph.

Expand full comment
tempo's avatar

in terms of calibration it is. things he predicts 10% of the time should only happen 10% of the time. in terms of performing better than some baseline, or other predictor, then yes your example makes sense. everyone else would be penalized more than Scott for a less than 10% prediction

Expand full comment
Michael Watts's avatar

> Imagine if he predicted, with 10% certainty, something so unlikely that most of us aren't even thinking about it. For instance, that the events described in the Book of Revelation would happen, exactly as written. Then imagine that it actually happened. Making that particular 10% prediction, rather than any other, would make him seem amazingly smart, next to all the rest of us who weren't even talking about it! We wouldn't care that he had (technically) made a 90% prediction that it wouldn't happen. He wouldn't be sad that his 90% not-the-eschaton-this-year prediction failed -- he'd feel vindicated.

> But I'm wrong to think of it that way

Huh? You're not wrong to think of it that way. In the situation you describe, everyone else ascribes a 0% chance of the event occurring, and Scott says it instead has a 10% chance of occurring. If the Book of Revelation then actually happens, Scott really does come out looking like a genius.

But most situations have a more than 0% chance of occurring, which changes the effect. If he predicts that the value of the US dollar will not fall by more than 98% with 10% certainty, and then it doesn't fall by more than 98%, he looks like a moron. That level of confidence was much too low.

Expand full comment
Kenny Easwaran's avatar

Evaluating either the calibration or the score (https://en.wikipedia.org/wiki/Scoring_rule) of someone's predictions doesn't depend on judging any prediction to be a "success" or "failure" - it just depends on evaluating the truth value and the credence of the prediction.

Expand full comment
Douglas Knight's avatar

Black Swan:

The problem with calibration is that it only makes sense if your predictions are independent. If a black swan appears and affects everything, they are highly correlated and you will probably be overconfident that year. But, yes, that's OK if you average over other years when you were underconfident. But covid wasn't a black swan: you knew about before making the predictions. It should have been obvious that it messed everything up. But you can still have the problem that the predictions were all correlated for other reasons, in particular that they depended on the single variable of the strength of lockdowns.

Expand full comment
Scipio A.'s avatar

The extent of covid was a black swan, even if covid happened already.

Expand full comment
SufficientlyAnonymous's avatar

I'm not entirely sure that's true. My impression is that a black swan is truly unpredictable not just highly uncertain. By the time Scott was making these predictions it's true that the extent of the lockdown was very uncertain but my sense is that since they were unprecedented, it should have been possible to guess that there was a non-negligible chance they'd last much longer than expected.

That said, if I'd made predictions I would almost certainly have underestimated the length of the lockdowns but in retrospect I think that was overconfidence in how fast things returned to "normal".

I think Douglas Knight's point about correlation does have a lot of explanatory power though since for a lot of these predictions if Scott had had lower confidence and then the lockdown had been shorter he would have been dramatically under confident instead.

Expand full comment
Scipio A.'s avatar

I mean, if you're defining it as "truly unpredictable" in a strict sense, the COVID-19 pandemic was predictable. (in fact, it was *certain* to some extent after early news that it was actually happening came out in Oct/Nov/Dec 2019 -- it is named after the year in which it started after all).

Predictable because it's happened before with varying lesser degrees of impact: SARS, MERS, etc. Various plagues in the middle ages.

Correlated predictions will definitely skew your calibration curve, for sure. Strongly agree with that.

Expand full comment
Dirichlet-to-Neumann's avatar

It was certainly not certain in December, considering that the previous coronavirus pandemy (SARS) killed a couple thousands people.

Expand full comment
Sniffnoy's avatar

As I pointed out last year: Yes, of course 50% predictions are meaningful, it's just assessing them for *calibration* in this way that's meaningless. Assess them for something other than calibration and that they're meaningful will be obvious.

Expand full comment
Andrew's avatar

Surprised you rated the Tara Reade evidence like you did. I personally also rated it highly till I found information about her Russian links that implied an covert ops angle. I think that came out well in advance of the election as well. Did that influence your rating at all?

Expand full comment
Purely functional's avatar

I think your conversion 5%->95% etc is flawed. Say, you believe an event happens at 20% probability, but it actually happens at 40%, then you are underconfident in that prediction. But you guessed the negation would happen at 80% probability but it actually happens at 60%, so you are overconfident in the negation. That means that once you combine your data you can no longer talk about over/underconfidence, but only about distance from the green line, because an original data point has a different "direction of improvement" than a flipped data point. (Also, then 50% predictions are suddenly a lot less mysterious; they were only confusing because we assumed that you can flip the direction of your prediction at will which you can't).

Expand full comment
Kenny Easwaran's avatar

I think that merging 80% predictions and the negations of 20% predictions suggests that one should merge 50% predictions and the negations of 50% predictions, and thus automatically get perfect calibration on those.

For the reasons you say, it might be better (if one insists on using calibration, rather than something else) to keep the high and low values separate.

Expand full comment
cubecumbered's avatar

Viewing a 5% prediction as a low confidence prediction seems to me to totally misunderstand prediction. If I say "X has a 1% chance of happening", I'm not saying "I'm not confident it'll happen", I'm saying "I'm extremely confident it won't happen". If one is making predictions well, a 5% chance for and a 95% chance against should require the same internal "confidence", because they both imply knowing something is quite unlikely.

I agree in the neighborhood of like 40/60 there's probably some psychological tick where you say 40 if you're lowish confidence and 60 if you're highish confidence, but that seems like the sort of psychological tick that this prediction exercise is specifically trying to unlearn.

Expand full comment
Purely functional's avatar

> If one is making predictions well, a 5% chance for and a 95% chance against should require the same internal "confidence", because they both imply knowing something is quite unlikely.

Yes, I agree. But what if your predictions of this kind turned out to be false? Were you over- or underconfident in the likelihood of the events? It depends on whether you are considering the original statement or the negation!

Expand full comment
cubecumbered's avatar

Hmmm yeah fair, I guess this hinges on what we really mean by "over/under confident". Certainly saying 10% when it was 20% was under*estimating*, but to me it feels over*confident*. I think I have an intuition here that the farther you get from 50%, the more information you're claiming to have, and the more confident you must be. I guess that assumes 50% is the default though? Maybe this depends on what you consider the default/prior prediction before bringing information into it? It doesn't feel terribly principled for 50% to be the default, but it does seem intuitively right to me...

Expand full comment
Purely functional's avatar

Read the following sentence from the post again:

> For the first time, I was consistently overconfident (below the green line of perfect calibration) in every bin (except 70%)

Here, Scott uses "overconfident" in the way you define "over-estimating". I agree that saying "overconfident" when you get farther away from the line of perfect calibration would be a good default too, but it does not seem to be as Scott uses it (and what would "under-confident" mean?).

Expand full comment
cubecumbered's avatar

But that's after converting 20%s into 80%s so that underestimated 20%s look like overestimated 80%s. I'm pretty sure he's using "overconfident" to mean "predicted too far from 50%" and "underconfident" to mean "predicted too close to 50%", implicitly assuming that it takes confidence to make a prediction away from 50%. You're right that that somewhat depends on whether he holds the prediction or its negation in his head, but my guess is part of being a good predictor is holding both in your head and really realizing what you're predicting about.

Expand full comment
Purely functional's avatar

Hmm, I re-read that section and it does seem plausible that Scott is interpreting confidence you say he does. It certainly makes the section on Black Swans more sensible when you assume that Black Swans draw predictions back to the "neutral" 50%.

Expand full comment
Rachael's avatar

Disappointing that Substack doesn't allow strikethrough, as you've used it for expressive effect in some of your previous posts. Please add it to the list of things you're asking Substack to implement, if you haven't already.

Expand full comment
Joel Long's avatar

In this scoring system, I'm not certain "overconfident" and "overconfident" are different.

If you're 40% confident of some set of things happening, and 30% of them do, you were "overconfident". But if you'd stated each of this inversed (60% chance of each thing not happening), then 70% of them did happen, then you were "underconfident". For the exact same predictions and results.

There isn't over or under confidence, just accuracy.

This also resolves the 50% issue: direction is arbitrary anyway, it's the distance from consistent accuracy that matters, for 50% and every other number.

Expand full comment
Andrew Flicker's avatar

Perhaps this makes sense if you're considering "confidence" the difference of predicted-probability from the neutral 50/50 prior? In that case, 5% is a VERY confident prediction something will not happen, and 95% is a VERY confident prediction that it will.

Expand full comment
Kenny Easwaran's avatar

That only makes sense for events where 50/50 is a natural neutral prior. Whenever you're choosing one winner from a group of many candidates (like when you predict an economic growth rate, or a winner of a presidential primary), it's clear that 50/50 is not a neutral prior. I claim that it's actually very rare that 50/50 is a "neutral prior", though that's more theoretically contested.

Expand full comment
Andrew Flicker's avatar

I don't disagree- just pointing out the way in which I think "overconfidence" is being meant, in practice, in these posts.

Expand full comment
Bucky's avatar

"35. UK, EU extend “transition” trade deal: 80%"

I think this one should be false - the UK and EU signed a new trade deal and did not extend the "transition" phase beyond 31st Dec. Concretely, this is when the UK stopped being part of the EU free market and customs union.

Expand full comment
Simon's avatar

Yeah, a deal was made, but it certainly didn't extend the transition deal.

Expand full comment
a shrek's avatar

I'm not mathy enough to weigh in on whether 50% predictions are meaningful for calibration, but I do have the unrelated concern that they're kind of... unnatural? I feel like if I were doing a project like this I would use them either extremely sparingly or not at all. Obviously there are cases where it's clearly correct, like predicting a literal coin flip, or predicting election results in a country you've barely heard of between two candidates you know nothing about, but I don't think it's appropriate when we're talking about any reasonably complex real-life scenario that you're even somewhat well-informed about.

I think my objection is that 50% seems suspiciously precise to me, in the same way that a 71.18522% prediction would seem suspiciously precise. Because it implies that you're holding both possible outcomes as being not only close but exactly equal. Like, when you said Trump had a 50% chance of being re-elected, you're saying that all the evidence you have in favour of his re-election just so happens to be exactly as convincing as all the evidence against? Really? Isn't that kind of a huge coincidence?

And you might reply, "well, predicting 80% would mean claiming that the evidence in favour is exactly four times as convincing as the evidence against, and isn't that a big coincidence too?" But I don't think that's right. If you predict something at 80% you've told us which outcome you think is more likely to happen, and we understand implicitly that the exact percentage is at best a rough indication of how confident you are that it will happen. 50% is unique in that you're declaring that you're completely unwilling to order the possible outcomes by likelihood, even at the lowest possible levels of confidence! It feels more like skipping the question rather than registering a genuine prediction. I think even in cases where I was really really unsure what would happen, my thoughts would be better represented by an ugly-looking prediction like 51% or 49.5% rather than an artificially clean-looking 50%.

I'd be interested to hear if other peoples intuitions about this accord with mine. Also interested to hear whether I'm just being an idiot, because while I stand by the above reasoning, it did feel more solid in my head than it looks now I've typed it out.

Expand full comment
Majromax's avatar

50/50 predictions are perfectly meaningful and natural. However, they're only _useful_ when they're applied to an activity where you'd expect the odds to be something else.

For example, if I predict that the Maple Leafs have a 50/50 chance of winning the Stanley Cup, that would in fact be a very bold prediction since there are 31 teams in the league.

However, the 'calibration' trick is that it's not possible to evaluate a 50/50 prediction in isolation. If you take any set of verifiable binary statements and negate each one on a coin flip, you'll have a set of 50/50 predictions where each has a 50% chance of being true. You instead need to evaluate these statements against another benchmark, such as another person making prediction or 'common knowledge' (such as bookmaker's odds, in the above case of sports betting).

Expand full comment
Thiago Ribeiro's avatar

برازیل زندہ باد! قومی آزادی کے مقدس بینر کو پکڑو!

Expand full comment
nonesuch's avatar

Why Urdu for Brazil? What nation(s) need to be liberated here? And from what? Please explain yourself.

Expand full comment
Deiseach's avatar

Looking up "India and Brazil", it seems that India sent 2 million doses of vaccine: https://www.youtube.com/watch?v=ZSvw617Zw6o

Expand full comment
Thiago Ribeiro's avatar

https://www.bbc.com/news/amp/world-latin-america-56288548

https://amp.theguardian.com/world/2021/mar/30/brazil-military-chiefs-resign-bolsonaro-fires-defense-minister

Make no mistake whatsoever, as President Nixon once pointed out, as Brazil goes, so goes (the rest of) Latin America. If Brazil falls to chaos, we will be before a geopolitical tragedy with no precedents since the fall of Rome. It is time to make Mr. Biden send Brazil the stockpiles of vaccines he knows pretty well America has no intention to use now.

Expand full comment
Evan Þ's avatar

Your conclusion doesn't follow.

If Biden believes the doses in the stockpile are effective vaccines, he should pressure the FDA to approve them so they can be used on Americans. If Biden doesn't believe they are effective vaccines, he definitely shouldn't send these possibly-dangerous drugs to foreign countries.

Now, perhaps there're other considerations here (say, if they're working vaccines but not as good as the other vaccines America will shortly be flooded with, or if geopolitics tells us a small investment of vaccines now will pay huge dividends in the future.) But we shouldn't be treating FDA approval like an immovable rock.

Expand full comment
Thiago Ribeiro's avatar

1) FDA is (or should be) an independent organization like the FED. A president is not a dictator ruling by ukases. It is criminal to withhold vaccines which could be used to save thousands upon thousands of lives every single day. Brazil is the only South American country to have fought alongside America in WW II and its paid back with genocide.

2) Most

Expand full comment
Thiago Ribeiro's avatar

2) Most Americans don't want a vaccine. It is criminal to force them to be vaccinated while thousands of Brazilians die because America's government has cornered the market.

Expand full comment
Gabriel Conroy's avatar

I'm a little new here (well, not really, but I've commented only few times and don't usually read the predictions posts), so I may have missed something, but....

....when did you make the coronavirus predictions? At the beginning of 2020, or was it after March or so, or later? To even know that hyrdochlorquine would be an issue (regardless of whether it turns out to be effective or not) would be prescient indeed if the predictions were made at the beginning of 2020.

Expand full comment
Kenny Easwaran's avatar

It was April, and I believe late April.

Expand full comment
Gabriel Conroy's avatar

Thanks!

Expand full comment
Adam's avatar

I think you're hurting yourself a bit trying to fit everything into the paradigm of binary classification. Some of these predictions are naturally multiclass and some of them are really regression problems, notably the "how many people will die of Covid in the U.S." question. It's not really continuous, obviously, as half a person can't die and the number is also bounded between 0 and <population of the United States>, but I don't see how squeezing it into an adhoc multiclass turned into several binaries helps. Just predict the actual value and give 95% confidence interval error bars and score yourself using some standard loss function for regression. Level of confidence in this case isn't calibrated by assigning some percentage value to the point estimate, but by how tightly you bound your confidence interval.

Expand full comment
Michael Watts's avatar

> It's not really continuous, obviously, as half a person can't die and the number is also bounded between 0 and <population of the United States>, but I don't see how squeezing it into an adhoc multiclass turned into several binaries helps.

The several binaries are a _better_ way of describing a probability distribution than a point estimate is. For the point estimate to make any sense, you need to assume you know the exact shape of the distribution. In contrast, the series of binary predictions is a description of the shape of the distribution.

Expand full comment
Adam's avatar

Surely, here you do know the shape of the distribution. Why on earth would deaths from any disease in a single year not be Poisson distributed?

As it stands, his implicit PMF of 0.1 X < 100k, 0.4 100k < X < 300k, 0.4 300k < X < 3m, 0.1 X < 3m is already approximating a Poisson distribution, but with extremely coarse granularity to the point of being much less useful than it could have been with more bins.

Expand full comment
Adam's avatar

Sorry last bin should obviously have been X > 3m.

Expand full comment
Michael Watts's avatar

> Why on earth would deaths from any disease in a single year not be Poisson distributed?

Well, for one thing, we already know that the rate of infection varies systematically over time.

Expand full comment
Kenny Easwaran's avatar

I want likes back for comments like this.

Expand full comment
Thiago Ribeiro's avatar

ਮੁਕੰਮਲ ਸੱਤਾਧਾਰੀ ਸਮੂਹ ਨੂੰ ਖਤਮ ਕਰੋ. ਜ਼ੁਲਮ ਦੀਆਂ ਜ਼ੰਜੀਰਾਂ ਨੂੰ ਸੁੱਟ ਦਿਓ. ਅੰਤਰਰਾਸ਼ਟਰੀ ਏਕਤਾ ਦਾ ਬੈਨਰ ਉੱਚਾ ਚੁੱਕਿਆ.

Expand full comment
nonesuch's avatar

Revolution in Punjabi. You're PRing for Google translate?

Expand full comment
Thiago Ribeiro's avatar

No. I am holding high the banner of international solidarity and national emancipation struggle against Brazil's perfidious ruling clique.

Expand full comment
nonesuch's avatar

Why do you hide the banner behind Urdu and Punjabi?

Expand full comment
Thiago Ribeiro's avatar

I am invoking the international solidarity.

Expand full comment
nonesuch's avatar

You are filtering for the curious and/or bored (and a few polyglots, plus the mostly Pakistanis and some Muslim Indian). They will join the revolution for the lulz (or to train their language skills), until the next new meme pops up (or the revolution ceases to be merely verbal).

Expand full comment
Thiago Ribeiro's avatar

Not all. I am calling the world's peace-loving masses to rise up against the Brazilian ruling clique.

Expand full comment
GoneAnon's avatar

"8. NYC widely considered worst-hit US city: 90%"

I'm not so sure about this one - or at least, I'd sure love to see some polling data on it. Depends on your definition of "widely considered" to be sure, but I'd bet at least 1/3 of Americans believe that Florida and/or Texas were "worst-hit" (because they didn't do what the experts demanded, therefore they *must* have worse outcomes), and that California would get quite a few votes too (because length/severity of lockdown *must* surely correlate with severity of infection)...

Expand full comment
Kenny Easwaran's avatar

Note that he said "city". I don't think many people believe that Houston or Dallas or Austin or Miami were hit worse than New York. In January of this year some people might reasonably have thought that Los Angeles was, but I think that's just an artifact of testing being better in the summer and winter waves than in last spring's wave - and in any case, the sustained high of New York the past few months (comparable to Los Angeles during the summer wave, while Los Angeles now down in the territory that San Francisco and Seattle have spent most of the pandemic at) restores New York to its position.

Though I guess, as to your main point, "widely considered" does raise some interestingly strange possibilities about what people think.

Expand full comment
Scott Alexander's avatar

I originally judged this false, but Zvi thought it was true and ran a Twitter poll, which confirmed it was "widely considered" - https://twitter.com/TheZvi/status/1363338180137779201

Expand full comment
GoneAnon's avatar

I highly doubt that Zvi's twitter followers are representative of the American public, but since your prediction was sufficiently vague and "well informed rationalists" could have been the demographic you had in mind, that's fine.

Expand full comment
GoneAnon's avatar

Oh, and it also looks like "not necessarily true" (the combination of false, neutral, and no comment) actually holds a majority!

Expand full comment
MathsV's avatar

What character do you play in D&D, Scott?

Expand full comment
Kenny's avatar

He answered in the AMA post I think?

Expand full comment
Akiyama's avatar

Regarding number 35: the Brexit transition period (during which time the UK remained part of the EU single market and customs union) was not extended, instead the UK and EU agreed a new trade deal. So that prediction should be in blue.

Expand full comment
Will's avatar

Even a well-calibrated predictor will have situations of appearing to be systematically off, just because of small sample size. We should put some confidence intervals on that graph to see if you were overconfident to a statistically significant degree.

Expand full comment
Eharding's avatar

"US has highest death toll as per expert guesses of real numbers: 70%"

Shouldn't this be in blue?

Expand full comment
Scott Alexander's avatar

I copy-pasted the resolution of these from the last time I worked with them for simplicity and consistency - I agree that changing that one would have been reasonable.

Expand full comment
Bob Winslow's avatar

There is a reason “Acts of God” exists in all legal contracts. So we know now that you are human...

Expand full comment
Firanx's avatar

36. Kim Jong-Un alive and in power: 60%

What? Why? Northern Korea didn't have a violent transition of power for longer than USSR was around. Nothing Un does suggests he's significantly worse at maintaining power internally than his father and grandfather. Nor he's doing particularly bad internationally. I would understand such prediction made in 2017 when Trump promised fire and fury (and even then, it's giving Trump way too much credit). ...Perhaps it was indeed made earlier than 2019?

Expand full comment
Melvin's avatar

That's sometimes the most fascinating things about these prediction threads; it's a reminder of the things that people were talking about a year ago but which have subsequently vanished from the news for one reason or another. Tara Reade and the possible instability of Kim Jong Un's regime are two of these thinsg.

Expand full comment
Bogdan Butnaru's avatar

Well, Kim is obese, and IIRC a heavy smoker, and he did disappear for a while at some point, rumored to be ill. It’s plausible that he might die of more-or-less natural causes, although that would only cover part of that 40% in one year, given that he’s less than 40.

Expand full comment
Tara's avatar

Out of curiosity, why did you predict you'd get a Surface Book 3, and why didn't you get one?

Expand full comment
mb706's avatar

In expectation we missed out on at least one [redacted] being published last year.

Expand full comment
ragnarrahl's avatar

I'm interested in your analysis of the balance of evidence favoring Tara Reade as accuser, as someone with no particular opinion on Tara Reade, a general skepticism of accusations, and the impression that the vast, vast majority of those normally politically inclined to believe all accusers make an explicit exception for Reade.

Expand full comment
a walrus's avatar

It seems to me the 50% problem is less about the percent itself and has more to do with self-selecting the questions, which makes grading systems make less sense in general.

I think 50% answers make more sense in cases where a third party supplies a list of questions to fill out and you compare answers to others.

Expand full comment
Sandro's avatar

Of course 50% predictions can be meaningful. 1) it on the error bars, and 2), it depends in what other people *expect* the outcome to be.

For instance if everyone is expecting event X to have a 90% chance of occurring and you give it 50% chance, you're signaling your belief that the data or the argument is weaker than the consensus.

Expand full comment
Mactuary's avatar

Scott's scored 577 predictions since 2014. Here are the results:

50% Level: 43% correct (32 of 74)

60% Level: 60% correct (68 of 113)

70% Level: 73% correct (72 of 98)

80% Level: 80% correct (101 of 127)

90% Level: 93% correct (100 of 107)

95% Level: 91% correct (41 of 45)

99% Level: 100% correct (13 of 13)

Expand full comment
TitaniumDragon's avatar

50% predictions are clearly useful. You should get 50% of your 50% predictions correct; if you get 75% of them correct, you're underconfident, and if you get 25% correct, you're overconfident.

A single 50% prediction isn't very useful but a large number of them aggregated together clearly is.

Expand full comment